Strategies for optimizing your BigQuery queries

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 24

  • @premmohantyagi
    @premmohantyagi 3 года назад +9

    Perfect and easy examples to describe the impact of optimization.

  • @AlesNajmann
    @AlesNajmann Год назад +3

    This is the simply most important video to watch by the users of BigQuery 🔥

  • @divertechnology
    @divertechnology Год назад +1

    great video i thought it gonna be light but is very deep in knowledge

  • @sevanajemian
    @sevanajemian 3 года назад +2

    amazing video for our new joiners :))

  • @xiaoyangshawnhuang1251
    @xiaoyangshawnhuang1251 2 года назад +1

    thanks for the great explanation. I like "stay curious"! :)

  • @abhishekjena9581
    @abhishekjena9581 2 года назад +1

    nice way of presentation .....good stuff

  • @dheer211
    @dheer211 3 года назад +2

    Great video thanks Leigha

  • @ahmedhelal174
    @ahmedhelal174 2 года назад +2

    Thank you

  • @majorcemp3612
    @majorcemp3612 3 года назад +3

    How do you know the "where" clause is performed before the join (5:17) ? Don't you have to do a sub query on the left joined (t2) table ? (also does this sub query be less performant ?)

    • @leighajarett221
      @leighajarett221 3 года назад +5

      You should be able to tell from the execution details :) In the first stage of the query, the workers will read from distributed storage and filter the data before shuffling and joining (which usually happens in the next stage). You can see the WHERE clause and the JOIN applied in the details of the stage to understand the order. You don't need the where clause in a subquery to filter the data before joining, but subqueries can be helpful if you want to pre-aggregate the data before joining to reduce the amount of data joined.

    • @majorcemp3612
      @majorcemp3612 3 года назад +1

      @@leighajarett221 so I don't need a where clause in subquery because everytime the where clause is being done at the distributed level ? Like everytime ? Or should I put a subquery to be sure ? (and if I put this subquery to be sure will I get bader performances ?)

  • @SasawatChanate
    @SasawatChanate Год назад

    5:50
    Does the common practice of putting WHERE 1=1 as the first expression has any affect on this?

  • @Code-ie8ub
    @Code-ie8ub 3 года назад +1

    thanks!

  • @michelkant
    @michelkant 3 года назад +1

    Nice one! At how much data processing do you recommend to start working on optimizing the queries? As in when will the effect really start to show an impact? 1gb of processing, 10, 1000 or 1tb?

    • @leighajarett221
      @leighajarett221 3 года назад +1

      That's a great question! I think it depends on your pricing model. For example, if you're paying on-demand for each query you may be inclined to start optimizing queries that are scanning whatever #GB translates to a reasonable cost savings. This will also depend on how often the query is being run - for example if a query scanning 1TB of data is only run once that's $5, but if you run it everyday that's a lot!
      I would try to pinpoint queries that are run on a regular basis and are consuming the most resources, but set some threshold based on the total estimated ROI. For example, (the number of times the query will be run each year) x (the total cost of the query) > $1000 (or whatever is significant to your business)
      Check out the pricing calculator if you're estimating query prices: cloud.google.com/products/calculator/

  • @Alex315101
    @Alex315101 Год назад

    really curious why such simple examples can not be automatically handled by query optimizer (which is already done by other databases)

  • @jaysyanshar
    @jaysyanshar 2 года назад +1

    is the concept can also be applied to other SQL databases?

    • @yellowsubmarine7059
      @yellowsubmarine7059 Год назад

      Some these concepts could apply to other SQL database management systems (DBMS). Also some DBMS' are smart enough to apply some of these optimizations when generating the execution plan without having the user making changes to their SQL query.

  • @jackylee6839
    @jackylee6839 2 года назад

    At 5:25, why does clustering based on common join keys improve the efficiency more than partitioning?

    • @samhithnamburu
      @samhithnamburu Год назад

      Clustering on join keys leads to least number of shuffles to perform the join

  • @gangsterrapmix8570
    @gangsterrapmix8570 3 года назад +1

    Google Google many businesses are always liked by people all over the world Thank you Google

  • @viacomsspy5439
    @viacomsspy5439 3 года назад

    Recommended is broken

  • @hritvikmohan4048
    @hritvikmohan4048 3 года назад +3

    Maal ho yrr tum toh....