Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

Поделиться
HTML-код
  • Опубликовано: 17 окт 2024

Комментарии • 10

  • @raviiit6415
    @raviiit6415 Год назад +2

    great talk both of you.

  • @LuisFelipe-qe2pj
    @LuisFelipe-qe2pj 3 года назад +1

    Very nice presentation!! 👏👏👏

  • @rishigc
    @rishigc 4 года назад +2

    @22:13 - where can i find an example of implementation with the SQL API ?

  • @vishakhrameshan9932
    @vishakhrameshan9932 5 лет назад +2

    Hi, I am facing skewed data issue in my spark application. Here I have 2 tables both are of same size (in the sense same rows but different column size) and am checking table A not in table B. This Spark SQL is taking lot of time.
    I have given 100 executers in production env and also tried writing the both tables to a file to avoid in memory processing for such huge data and tried reading it to do the sql operation.
    My application contains a lot of spark sql operation and this sql comes in some what in between the entire operation. When i run my application, it runs till this sql and then takes more than 6hrs to run 2M records
    How can I achieve faster result with repartitioning, or iterative broadcast. Please help.

    • @arpangrwl
      @arpangrwl 5 лет назад

      Hi VIshakh did you found the solution for the problem you mentioned ?

    • @shankarravi749
      @shankarravi749 5 лет назад

      @@arpangrwl May i know the Solution What was needs to be done??

    • @JoHeN1990
      @JoHeN1990 4 года назад

      Try bucketing the table before writing, it might take longer during write. But will be faster during joins

    • @TechWithViresh
      @TechWithViresh 4 года назад +1

      check this : ruclips.net/video/HIlfO1pGo0w/видео.html

  • @bikashpatra119
    @bikashpatra119 4 года назад +1

    Can you please provide the link to benchmark in githug

    • @JimRohn-u8c
      @JimRohn-u8c 5 месяцев назад +1

      Go to 23:25 in the video, he shows the GitHub URL in that part of the video.