As you said, bucketing --> All the same column values of bucketed column will go into same bucket. I think partition also does the same right. If I have country column and do the partition based on the country column then all same country values will go into same partition. Then what is the diff between Partition and bucketing
The difference is, bucketing is more flexible, in his example he did a simple by using a column to bucket data other options can be range of data in one column for setting up better sized buckets and this means bucketting config you can control number of buckets in many ways while partition a column won't let you do that.
@0:18 The basic "Funda" of bucketing... Loved it! Good coverage.
Great explanation with real example . Thank you!
InstaBlaster.
Thanks for real time explaining with transformations....
Very well explained! Great job!
The best thing is you are providing new information .
Please make some video on hive serde , regex , user access management,
Spark job execution
Thanks for the appreciation
Very well explained. Thank you very much.
Thanks
Nice explanation ,Thanks :)
As you said, bucketing --> All the same column values of bucketed column will go into same bucket. I think partition also does the same right. If I have country column and do the partition based on the country column then all same country values will go into same partition. Then what is the diff between Partition and bucketing
The difference is, bucketing is more flexible, in his example he did a simple by using a column to bucket data other options can be range of data in one column for setting up better sized buckets and this means bucketting config you can control number of buckets in many ways while partition a column won't let you do that.
Joining table can have equal number of buckets or multiples of 2.
Concept that all same column values gies into single bucket ia wrong.. Its partition not bucketing
V nice
In interview i had question if Hive had 12 partition and hdfs only gets 6 then what would be the issue. Anyone?
PARTITION DIRECTORY MIGHT BE DELETED FROM HDFS MANUALLY AND METASTORE IN HIVE IS NOT UPDATED WITH NEW CHANGES.
Good