DataExpert.io - Apache Spark Week 3 Homework Spark Fundamentals - Data Engineering Bootcamp

Поделиться
HTML-код
  • Опубликовано: 8 янв 2025

Комментарии • 3

  • @Jade-Codes
    @Jade-Codes  23 дня назад +1

    Figured out the reason why the files weren't changing size!
    In order to change the sizes, you need to repartition with those columns too,
    Sort within repartition will only sort within the partition you're already in, but by this point the data has already been partitioned into the files!

    • @Jade-Codes
      @Jade-Codes  23 дня назад

      The updated file sizes following this method:
      Table: matches_bucketed, File sizes: [0, 8, 576, 584, 588, 596, 600, 600, 600, 600, 604, 604, 612, 612, 616, 616, 616, 628, 72476, 73241, 74057, 75156, 75323, 75331, 75336, 75428, 75784, 75857, 76855, 77181, 77631, 77682, 77791, 79171]
      Table: matches_bucketed_v2, File sizes: [0, 8, 576, 588, 588, 588, 592, 592, 596, 596, 600, 600, 604, 608, 608, 612, 612, 620, 72301, 73968, 74150, 74160, 74252, 74570, 74978, 75146, 75365, 75577, 75904, 76656, 76766, 77168, 77192, 78227]
      Table: matches_bucketed_v3, File sizes: [0, 8, 580, 584, 584, 584, 588, 588, 596, 600, 600, 604, 608, 608, 612, 616, 616, 616, 73043, 73419, 73604, 73682, 73807, 74235, 74775, 75737, 75750, 75976, 76375, 76587, 77081, 77402, 77410, 77615]
      Table: matches_bucketed_v4, File sizes: [0, 8, 572, 580, 584, 592, 592, 592, 592, 592, 596, 604, 608, 612, 612, 616, 624, 632, 71864, 72773, 73313, 74461, 74461, 74530, 74591, 74604, 74999, 75852, 76768, 76849, 76967, 77635, 78460, 79361]

  • @KARNAMONAKALYANREDDY
    @KARNAMONAKALYANREDDY 16 дней назад

    can you share code