Hi Raja, Canyou let the difference among, Partition by, repartition and shuffle parameter. I remember in the previous videos that we use Repartition while reading and writing dataframe to disk and shuffle parition is to increase or decrease the partitions while suffling the data in transformations. Can you you please clarify me on the same. Thanks
If possible can you also try to explain if we can update only certain range of partition data. For eg. if the data is partition by month , and i want to update only last 3 months of partition data then how we can achieve that?
Hi Raja, Thanks for posting all the concepts! have you shared the datasets which you are referring in all lectures ? can we have these datasets please?
If i read this partitioned data, the columns on which the partition has been done are coming at last and there by schema is changing. Is there a way to preserve the schema?
Hi Kamini, partitionby and repartition both are completely different. Partitionby is used while writing a dataframe into a storage system. For each key new folder would be created in the storage location . Repartition is used to reduce or increase number of partitions within spark memory while applying any transformation
Hi Raja, Canyou let the difference among, Partition by, repartition and shuffle parameter. I remember in the previous videos that we use Repartition while reading and writing dataframe to disk and shuffle parition is to increase or decrease the partitions while suffling the data in transformations. Can you you please clarify me on the same. Thanks
Best creator on pyspark. Continue doing this
Thank you!
@@rajasdataengineering7585 hi I have a doubt. Which operations will make a emr cluster OOM
If possible can you also try to explain if we can update only certain range of partition data. For eg. if the data is partition by month , and i want to update only last 3 months of partition data then how we can achieve that?
Hi Raja, Thanks for posting all the concepts! have you shared the datasets which you are referring in all lectures ? can we have these datasets please?
very usefulll videos, can please do more videos
Very useful content
Thank you!
Crisp and clear
Please make a detail video on salting techniques and how to do salting
Sure, will create one
Hi Raja, while writing the dataframe to dbfs or blob, is there a way in which we can only write the part file and not the system files?
Nice sir!
Thanks! Kee watching
If i read this partitioned data, the columns on which the partition has been done are coming at last and there by schema is changing. Is there a way to preserve the schema?
Hi Sir, May i know the difference between partitionBy and repartition it's a bit confusing.
Hi Kamini, partitionby and repartition both are completely different. Partitionby is used while writing a dataframe into a storage system. For each key new folder would be created in the storage location .
Repartition is used to reduce or increase number of partitions within spark memory while applying any transformation
@@rajasdataengineering7585 thanks for the reply much needed :)
Welcome!
Good work Raja!
Thanks Alfonso!
How to create weekly partitions?