Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code. www.learningjournal.guru/courses/
No words! $imply excellent.I found the definition of crystal clear from your explanation. I had watched all your playlists. 2 year$ ago I got a super confident after watching your spark videos and I got a job just because of you....
Nice explanation.. the way you articulate is good. Thanks youtube for suggesting this video :) Now i'm going to have some basic knowledge about delta lake.
Dear Prashanth, your videos are addicting.. I wonder how you explain things with that much deep diving into the concepts so easily.. Thanks for all your explanation.. 🙏 do you have any plans on similar explanation on apache Hudi or atleast comparision of features ?
Hello sir, brilliant video with nicely articulated contents. One small suggestion, practical code flashes quite fast - good if you could zoom them in some slow motions or something. Looking forward for next part.
Great video.Please keep more such videos coming. I have a question. Is this an accurate summarization - If I use Append-only mode for my data frame writes, SPARK in a way complies with Atomicity using HDFS file committer version 1, but not when I use overwrite mode?
Thanks Prashanth for your good videos, i have observerd few things. dataframe writer is consistant in databricks spark cluster, where as it is not consistent in local spark cluster. as you said older data is not beeing deleted in the databricks spark cluster ,but it is in local spark cluster what might be the reason? keen to know the Anser. thanks, Mallik.
Hi Sir, Thank you for such a nice video. Your videos are really helpfull. :) Would bucketing help in problem of reading all files in a partition for a filtered query. Kindly clarify. Eagerly waiting for the part-2. Thank you :)
Partitions sole purpose is for filtering only . Similarly bucketing is for efficient joining, Ex two tables with 10^6 rows and 10^7 ( 10000000) rows while joining will check 10^(6+7)=10^13 combinations. Now if we bucket both these tables in 10^3 i.e 1000 buckets, each bucket from left side will only be compared with a single bucket from the right. and each bucket will hold close to 10^3 and a0^4 rows of data respectively for the two tables. Now the number of combination will be 10^3*10^4 =10^7 multiplying by number of buckets * 10^3 . So total combination =10^(3+4+3) =10^10 which is a 10^3x or 1000x less combination, so the join will be this much faster.
Can anyone suggest few pet projects on spark. Doing pet projects/hand on is a good chance to face all the different problems that happen in real projects also increase confidence.
Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
www.learningjournal.guru/courses/
No words! $imply excellent.I found the definition of crystal clear from your explanation. I had watched all your playlists. 2 year$ ago I got a super confident after watching your spark videos and I got a job just because of you....
Superb way of making things clear
great job brother. beautifully explained. thanks from Pakistan.
You are awesome dheivame! Fantastic video
Superb!! Excellent way of describing and summarizing.. Thanks
I was hooked since the first minute . Thank you for uploading this. Please push for more . Cheers !
Great job breaking down the content and explaining it clearly. I understood very well how everything fit together, on the first try too!
Hello Sir, i really love your video the explanation are really details. thank you :)
Great video, this really laid out the problem space perfectly 👍
Nice explanation.. the way you articulate is good. Thanks youtube for suggesting this video :)
Now i'm going to have some basic knowledge about delta lake.
Very nicely explained
Excellent video for delta. It helps a lot and I love it.
This is just great to me! He explained the facts just the way I wanted. Thanks for this great presentation.
Excellent video and well done with actually showing the issues with demos!
Superb video, all points are valid, we do face such issues in our projects
classic quote of Johnny Depp's "if nobody sees it, it didn't happen" in the context of Spark write
Great video, very well explained.
Excellent explanation for ACID Concept.
Could you please make a video on SCD implementation Using Databricks Spark
This is an excellent video .. MUST WATCH ... do you have your own tutorial site I would love to learn from those ... you are too good
Thank you so much ..excellent excellent explanation ..of the real need of Delta Lake in Datalake
Dear Prashanth, your videos are addicting.. I wonder how you explain things with that much deep diving into the concepts so easily.. Thanks for all your explanation.. 🙏 do you have any plans on similar explanation on apache Hudi or atleast comparision of features ?
Hello sir, brilliant video with nicely articulated contents. One small suggestion, practical code flashes quite fast - good if you could zoom them in some slow motions or something. Looking forward for next part.
Thank you so much ..excellent excellent explanation ..
Great detailed video, thanks for sharing .
Really nice and demystifying . Thank you so much. Subscribed.
Well explained mate
Excellent explanation and clear examples.
Great video.Please keep more such videos coming. I have a question. Is this an accurate summarization - If I use Append-only mode for my data frame writes, SPARK in a way complies with Atomicity using HDFS file committer version 1, but not when I use overwrite mode?
As per documentation, even append is not atomic. But it just works due to the simplicity in the append and job level commit protocol.
Bawa u rocks 🙌
Very Helpful ! Thank you :)
where can we find more videos on delta lake with spark?
Excellent Video !!
Thanks Prashanth for your good videos,
i have observerd few things.
dataframe writer is consistant in databricks spark cluster, where as it is not consistent in local spark cluster.
as you said older data is not beeing deleted in the databricks spark cluster
,but it is in local spark cluster
what might be the reason?
keen to know the Anser.
thanks,
Mallik.
Are those Spark Architecture videos deleted :(
Very useful content... Thanks for you effort
Really great Video!
Thank you
Thank you must watch content.
Great videos
Could you share the slides deck and sample codes used for testing?
Could you please do complete KSQL tutorial with a project
Hi Sir, Thank you for such a nice video. Your videos are really helpfull. :)
Would bucketing help in problem of reading all files in a partition for a filtered query. Kindly clarify.
Eagerly waiting for the part-2.
Thank you :)
Bucketing in Spark has a different purpose. I will do a video on bucketing.
Partitions sole purpose is for filtering only .
Similarly bucketing is for efficient joining,
Ex two tables with 10^6 rows and 10^7 ( 10000000) rows while joining will check 10^(6+7)=10^13 combinations.
Now if we bucket both these tables in 10^3 i.e 1000 buckets, each bucket from left side will only be compared with a single bucket from the right. and each bucket will hold close to 10^3 and a0^4 rows of data respectively for the two tables.
Now the number of combination will be 10^3*10^4 =10^7 multiplying by number of buckets * 10^3 .
So total combination =10^(3+4+3) =10^10 which is a 10^3x or 1000x less combination, so the join will be this much faster.
But that never happened so far! Spark write in overwrite mode failed many times but never lost data in cloudera cluster in each company
Can anyone suggest few pet projects on spark.
Doing pet projects/hand on is a good chance to face all the different problems that happen in real projects also increase confidence.