Figured out the reason why the files weren't changing size! In order to change the sizes, you need to repartition with those columns too, Sort within repartition will only sort within the partition you're already in, but by this point the data has already been partitioned into the files!
Figured out the reason why the files weren't changing size!
In order to change the sizes, you need to repartition with those columns too,
Sort within repartition will only sort within the partition you're already in, but by this point the data has already been partitioned into the files!
The updated file sizes following this method:
Table: matches_bucketed, File sizes: [0, 8, 576, 584, 588, 596, 600, 600, 600, 600, 604, 604, 612, 612, 616, 616, 616, 628, 72476, 73241, 74057, 75156, 75323, 75331, 75336, 75428, 75784, 75857, 76855, 77181, 77631, 77682, 77791, 79171]
Table: matches_bucketed_v2, File sizes: [0, 8, 576, 588, 588, 588, 592, 592, 596, 596, 600, 600, 604, 608, 608, 612, 612, 620, 72301, 73968, 74150, 74160, 74252, 74570, 74978, 75146, 75365, 75577, 75904, 76656, 76766, 77168, 77192, 78227]
Table: matches_bucketed_v3, File sizes: [0, 8, 580, 584, 584, 584, 588, 588, 596, 600, 600, 604, 608, 608, 612, 616, 616, 616, 73043, 73419, 73604, 73682, 73807, 74235, 74775, 75737, 75750, 75976, 76375, 76587, 77081, 77402, 77410, 77615]
Table: matches_bucketed_v4, File sizes: [0, 8, 572, 580, 584, 592, 592, 592, 592, 592, 596, 604, 608, 612, 612, 616, 624, 632, 71864, 72773, 73313, 74461, 74461, 74530, 74591, 74604, 74999, 75852, 76768, 76849, 76967, 77635, 78460, 79361]
can you share code