Been binge watching your playlists!! Trust me i have been through 10's of DE channels and none comes closer to yours. Your teachings are precise and very easy to understand. Thanks for making these playlist!! and please don't stop, The views will get sky rocketed, It's just a matter of time. Good Speed to you Subham :)
Thank you so much for your lovely comment! ❤️ I hope my playlist made it easier for you to learn PySpark. To help me grow, please make sure to share with your network over LinkedIn 👍 And make sure to follow me over LinkedIn. Thanks !!
What a Gold mine of an Information. This is the only channel where Spark optimization techniques are being taught so eloquently. I have seen Raja's DE channel, Learn with Data and many more no one is even 1% close to this content. Kudos to you man. Would love to see you do a Project on a full Huge dataset so we can learn the complexities and how to overcome them! Thanks again for your phenomenal efforts! You deserve all the praise.
Thank you so much for such a praise 💓 Can't explain how much thrilled am I feeling, after such comparisons with those huge channels. Please make sure to share your experience with your network over LinkedIn 👍 This helps a lot 💓 Don't forget to tag @Subham Khandelwal 😊
This series is definitely killing it and is the best content on learning Spark so far! Thanks for making everything free and sharing the knowledge. Do you have any links to send you some credits ? Your hard work and efforts deserve it!
Thank you so much for such a lovely comment 😊 I really dont want any credits in exchange of sharing my knowledge. If you really want to help, then share this playlist with all your friends and network over LinkedIn ❤️
New videos on this series will be added on ad-hoc basis. Make sure to follow so that you don't miss anything. Currently we are working on Databricks Zero to Hero series. Checkout that one on the same channel.
Hello Sir, your teaching is top notch, salute !!! My understanding after watching this video, partitioning delta table is nothing but data-files reorg, in the demo you shown us multiple repartitions on the same delta table based on different columns. My question: Is re-partitioning consider a common housekeeping task in Prod environment (considering delta table could contains thousands of datafiles). Thanks in advance.
Hello again, Subham. I am very new to Databricks and Azure cloud, can you please help clarify a question. From your Databricks series, looks like all delta tables are stored as parquet files in the backend (Azure storage). Does it mean every table in Azure are like this i.e. behind the scene of every table are lots of data-files ? The reason I asked because I was comparing with Google Cloud Platform, this type of table in GCP is called External Table where the query performance is very bad. Thank you again !
Delta Table stores data in form of Parquet in backend. the reason for lots of data files is data partitioned and written in parallel and sometimes the different version of files lying at the location. You cannot directly read from the parquet in case of delta table using external tables. First you need to remove all other versions using vaccumm. Please make sure to share this with your network over LinkedIn 💓
Been binge watching your playlists!! Trust me i have been through 10's of DE channels and none comes closer to yours. Your teachings are precise and very easy to understand. Thanks for making these playlist!! and please don't stop, The views will get sky rocketed, It's just a matter of time. Good Speed to you Subham :)
Thank you so much for your lovely comment! ❤️ I hope my playlist made it easier for you to learn PySpark.
To help me grow, please make sure to share with your network over LinkedIn 👍 And make sure to follow me over LinkedIn. Thanks !!
@@easewithdata Yes Yes!! Already did that.
Ease with data is ultimate man..
What a Gold mine of an Information.
This is the only channel where Spark optimization techniques are being taught so eloquently.
I have seen Raja's DE channel, Learn with Data and many more no one is even 1% close to this content.
Kudos to you man.
Would love to see you do a Project on a full Huge dataset so we can learn the complexities and how to overcome them!
Thanks again for your phenomenal efforts! You deserve all the praise.
Thank you so much for such a praise 💓 Can't explain how much thrilled am I feeling, after such comparisons with those huge channels. Please make sure to share your experience with your network over LinkedIn 👍 This helps a lot 💓
Don't forget to tag @Subham Khandelwal 😊
This series is definitely killing it and is the best content on learning Spark so far! Thanks for making everything free and sharing the knowledge. Do you have any links to send you some credits ? Your hard work and efforts deserve it!
Thank you so much for such a lovely comment 😊 I really dont want any credits in exchange of sharing my knowledge. If you really want to help, then share this playlist with all your friends and network over LinkedIn ❤️
Can't wait for the next video!
New videos on this series will be added on ad-hoc basis. Make sure to follow so that you don't miss anything.
Currently we are working on Databricks Zero to Hero series. Checkout that one on the same channel.
Hello Sir, your teaching is top notch, salute !!! My understanding after watching this video, partitioning delta table is nothing but data-files reorg, in the demo you shown us multiple repartitions on the same delta table based on different columns. My question: Is re-partitioning consider a common housekeeping task in Prod environment (considering delta table could contains thousands of datafiles). Thanks in advance.
No repartitioning is not a house keeping task. It is rather performed to optimize the data reads.
Nice explanation 👍🏼
Thanks 👍 Please make sure to share with your network over LinkedIn ❤️
z-order is a mathematical formulation, it is a very tough to understand , but in terms of what it is doing the video did a good job
Thanks 👍 Please make sure to share this with your network over LinkedIn 🙂
Why did you repartition the table into 16 files?
Hello again, Subham. I am very new to Databricks and Azure cloud, can you please help clarify a question. From your Databricks series, looks like all delta tables are stored as parquet files in the backend (Azure storage). Does it mean every table in Azure are like this i.e. behind the scene of every table are lots of data-files ? The reason I asked because I was comparing with Google Cloud Platform, this type of table in GCP is called External Table where the query performance is very bad. Thank you again !
Delta Table stores data in form of Parquet in backend. the reason for lots of data files is data partitioned and written in parallel and sometimes the different version of files lying at the location. You cannot directly read from the parquet in case of delta table using external tables. First you need to remove all other versions using vaccumm.
Please make sure to share this with your network over LinkedIn 💓
Nice :)