00:00 Big Data and Hadoop 01:25 Hadoop processed data in batches and was slower due to disk storage, Apache Spark solves these limitations. 02:43 Apache Spark is a fast and efficient data processing framework. 04:11 Apache Spark is a powerful tool for processing and analyzing Big Data. 05:42 Apache Spark application consists of a driver process and executor processes. 07:02 Spark data frames are distributed across multiple computers and require partitioning for parallel execution. 08:24 Spark transformation block will give the final output. 09:40 Spark allows the conversion of data frames and the execution of SQL queries on top of it.
I had understood the concept clearly within 10 min . Now I had a great understanding and knowledge about Apache Spark . This is best Spark video I had gone through . Its clear and top notch Explanation about each of the topics .
Such a nice content! What a man you are! You have covered everything in spark in just 10 mins. I wonder how you made this video and the effort u put in to make this video is wonderful. Thank you for sharing nice content in such a simple manner!!
I usually be off from content titled learn/master/excel X in Y minutes. would have definitely done the same had I came accross this by myself. Watched it only because my frd shared to me. Now I feel that I am lucky after watching this as I could wrap my head around SPARK. Subscribed.
As many other already said, tantastic and informative video on Spark. Nice context by providing the history of Hadoop. Nice pace too, not to fast, not too slow!
Saw a bunch of your roadmap videos back in my freshman year, and now back here prepping for my DS internship, thanks ! The job description had spark/mapreduce which brings me here : )
I tried to replicate the code block at 10:13, Can we use tips.filter(filterA & filterB), this applied both filters at the same time and does not create intermediate results tips.filter(filterA) will create some dataframe, which will be filtered by another filterB Please correct me if I'm wrong thanks !
I'm just getting started with creating a group CNN project with friends and we are dealing a huge dataset of mri scans so I was thinking about platforms that could deal with lots of data without having to deplete my disk lol. Thank you so much for breaking down how Apache works compared to Hadoop, I really appreciate it! 😊
So in just 10 mins, I get to know about Big Data, Hadoop, Spark, Pyspark and how I can write code in Pyspark. Wow, that's what a good explanatory should be like!
Thank you so much for this explanation, youve outlined it quite clearly before Ive even had any experience using Spark, so thank you! If you could slow down your explanation a bit though, that would be helpful
Amazing concise detailed explanation with great editing. Such a great way of presenting a hard topic in an easy manner. Love your comparisons with teamwork, puzzles etc. So impressed. Big thumbs up and subscribe from me. Eager to see your other videos. Thanks!
Darshil Sir, I had a query regarding Memory Management concept of Spark. As per my understanding, Spark uses it Execution memory to store intermediate data in execution memory which it shares with storage memory too, if needed. It can also utilize the off-heap memory for storing extra data. 1) Does it access the off heap memory after filling up storage memory? 2) What if it fills up Off heap memory too? Does it wait till GC clears up on-heap part or spills the extra data to disc? Now, in a wide transformation, Spark either sends the data back to disc or transfer it over the network, say for a join operation. Is the part of data sending data back to disc same as above where Spark has the option to spill data to disc on filling up on-heap memory? Please do clarify my above queries, sir. I feel like breaking my head as I couldn't make a headway through it yet even after referring few materials.
In Spark, memory management involves both on-heap memory and off-heap memory. Let me address your queries regarding Spark's memory management: 1. Off-heap memory usage: By default, Spark primarily uses on-heap memory for storing data and execution metadata. However, Spark can also utilize off-heap memory for certain purposes, such as caching and data serialization. Off-heap memory is typically used when the data size exceeds the available on-heap memory or when explicit off-heap memory is configured. It is not used as an overflow for storage memory. 2. Filling up off-heap memory: If off-heap memory fills up, Spark does not automatically spill the data to disk. Instead, it relies on garbage collection (GC) to free up memory. Spark's memory management relies on the JVM's garbage collector to reclaim memory when it becomes necessary. When off-heap memory is full, Spark waits for the JVM's garbage collector to reclaim memory by cleaning up unused objects. Therefore, if off-heap memory fills up, Spark may experience performance degradation or even out-of-memory errors if the garbage collector cannot free enough memory. Thanks, ChatGPT
Don't forget to hit that Subscribe Button for more amazing content :)
Get ready with project.
Please also upload GCP data engineering End-to-End project
You deserve much more than 1000 buddy. I learn so much from your channel
lets get that.
Are you from Gujarat?
Bro cooked. From the history, to the technical design and demo! Hats off!
Thanks for actually explaining spark, instead of making general comments or assuming we know the basics. Great video. Thumbs up, subscribed.
agreed. I watched like 5 videos prior to this one that made wild assumptions about what I knew
00:00 Big Data and Hadoop
01:25 Hadoop processed data in batches and was slower due to disk storage, Apache Spark solves these limitations.
02:43 Apache Spark is a fast and efficient data processing framework.
04:11 Apache Spark is a powerful tool for processing and analyzing Big Data.
05:42 Apache Spark application consists of a driver process and executor processes.
07:02 Spark data frames are distributed across multiple computers and require partitioning for parallel execution.
08:24 Spark transformation block will give the final output.
09:40 Spark allows the conversion of data frames and the execution of SQL queries on top of it.
That was an extremely good explanation. Not only explained the theory but also practical examples.
best explanation on spark in 10 minutes. its like feynman explaining physics. excellent job!
I had understood the concept clearly within 10 min . Now I had a great understanding and knowledge about Apache Spark . This is best Spark video I had gone through . Its clear and top notch Explanation about each of the topics .
I never knew I could recall so much in just under 10min...
Wonderful content and well explained keeping it simple...
Glad you liked it
This was insanely good. Thanks for explaining the basics so clearly. Now I can learn deeper more comfortably.
Such a nice content!
What a man you are!
You have covered everything in spark in just 10 mins. I wonder how you made this video and the effort u put in to make this video is wonderful. Thank you for sharing nice content in such a simple manner!!
I usually be off from content titled learn/master/excel X in Y minutes. would have definitely done the same had I came accross this by myself. Watched it only because my frd shared to me. Now I feel that I am lucky after watching this as I could wrap my head around SPARK.
Subscribed.
You are doing a fabulous job of making Data analytics so easy for everyone. Thank you so very much. God bless you!
I was waiting for this. Please share an end to end project using Spark.
Yes
Waiting for the same...right from spark installation on local as well as on cloud platform
Please upload ASAP.
Yes, if possible can you please also share using pyspark as well..
Excellent explanation-clear, concise, and straight to the point.
Apache Spark -- explained core concept in such a simple language..
Wonderful job 👍👍👍
The best Spark tutorial I have ever gone through. Thanks a lot Darshil.
Wow, thanks!
As many other already said, tantastic and informative video on Spark. Nice context by providing the history of Hadoop. Nice pace too, not to fast, not too slow!
The first very clear video about spark that I have seen.
Saw a bunch of your roadmap videos back in my freshman year, and now back here prepping for my DS internship, thanks !
The job description had spark/mapreduce which brings me here : )
I tried to replicate the code block at 10:13,
Can we use tips.filter(filterA & filterB), this applied both filters at the same time and does not create intermediate results
tips.filter(filterA) will create some dataframe, which will be filtered by another filterB
Please correct me if I'm wrong
thanks !
Amazing, You explained everything in detail with examples. Best video on RUclips to know about Spark.👏
What a video really understood the apache spark that i could not in my university.
Thanks for this.Currently reading spark definitive guide.Looking forward to full tutorial
Coming soon!
Thanks For Explaining in 10 Min 🙌
One of the best video ! You really exxplained in very precise and esay way. Love it!
crystal clear explanation! loved it❤
Thanks for this video , much informative and easy to understand using the examples you gave.
I'm just getting started with creating a group CNN project with friends and we are dealing a huge dataset of mri scans so I was thinking about platforms that could deal with lots of data without having to deplete my disk lol. Thank you so much for breaking down how Apache works compared to Hadoop, I really appreciate it! 😊
I really understand the software really quickly, thanks man
You nailed it man! Amazing information that i am using for my DE interviews
Very good intro to Spark. I've started my data science journey and it really helps.
superb man.. didn't waste the time.. great explaination..
I didn't understand apache spark since my undergraduate until I found this gem.
Explained and the presentation is good.
amazing explanation!! Thank you!
Thank you very much and it's a very nice primer to refresh once the concepts. Thank you for your contributions 👍
You explained the content simple and clear. Thank you for this video.
So in just 10 mins, I get to know about Big Data, Hadoop, Spark, Pyspark and how I can write code in Pyspark.
Wow, that's what a good explanatory should be like!
Brief and informative . Thanks 👍
A excellent video on Apache Spark. Covered almost everything. Very helpful video to the beginners like me.
Alright, but need a full tutorial on this topic, if you can.
Working on it!
@@DarshilParmarthank you please upload it ASAP
@@DarshilParmarplease upload
@@DarshilParmar this is a what the heroes did. Kudos to you Darshil
Thank you so much for this explanation, youve outlined it quite clearly before Ive even had any experience using Spark, so thank you! If you could slow down your explanation a bit though, that would be helpful
It's a 10min series, you can check out my courses for more in-depth guide
Wonderfully explained in just 10 mins.
Thank you!! So helpful
Clear and concise explanation
Impressive explanation of spark. Making it easy for every beginner to understand.
Glad it was helpful!
Hello Darshil,
This is great content ! A little bit too much information, hehe. Now it should be digested :)
Amazing concise detailed explanation with great editing. Such a great way of presenting a hard topic in an easy manner. Love your comparisons with teamwork, puzzles etc. So impressed. Big thumbs up and subscribe from me. Eager to see your other videos. Thanks!
Great introduction. Thank you so much.
Good job Darshil. Appreciate the work.
Super explanation bro, I got many answers in one vedio 🥳🥳
You nailed it Bro in just 10 mins 😊
Nice Video, Thank You.
Thank you for this video, I liked it: simple, clear, and short! Perfect :)
Superb one! Can we expect full tutorial over spark!?
Yes, coming soon!
@@DarshilParmar Ah nice then 😍
Really very nice explanations..
Understood video very well. Without any prior knowledge of apache spark
Glad it was helpful
To the point, quick, simple and comprehensive knowledge sharing!
I appreciate that!
Very nice video. Thank you!
Thank u i got the basics
Excellent video Darshil. Clear and concise! Subscribed!
Wonderful summarize!
Fantastic explanation… 👏👏 the way you take your audience through the flow of explaining these concepts is very effective👌
Thanks a lot 😊
nice job - short, to the point, great info. I really appreciate you sharing this. will like and subscribe.
Nice video
Great content buddy 💯💯 any specific resources to go with spark as I am reading the definive guide i find it bit overwhelming any course??
Very well explained! Thank you!
Waiting for full course from you apache spark
Darshil Sir, I had a query regarding Memory Management concept of Spark.
As per my understanding, Spark uses it Execution memory to store intermediate data in execution memory which it shares with storage memory too, if needed. It can also utilize the off-heap memory for storing extra data.
1) Does it access the off heap memory after filling up storage memory?
2) What if it fills up Off heap memory too? Does it wait till GC clears up on-heap part or spills the extra data to disc?
Now, in a wide transformation, Spark either sends the data back to disc or transfer it over the network, say for a join operation.
Is the part of data sending data back to disc same as above where Spark has the option to spill data to disc on filling up on-heap memory?
Please do clarify my above queries, sir. I feel like breaking my head as I couldn't make a headway through it yet even after referring few materials.
In Spark, memory management involves both on-heap memory and off-heap memory. Let me address your queries regarding Spark's memory management:
1. Off-heap memory usage: By default, Spark primarily uses on-heap memory for storing data and execution metadata. However, Spark can also utilize off-heap memory for certain purposes, such as caching and data serialization. Off-heap memory is typically used when the data size exceeds the available on-heap memory or when explicit off-heap memory is configured. It is not used as an overflow for storage memory.
2. Filling up off-heap memory: If off-heap memory fills up, Spark does not automatically spill the data to disk. Instead, it relies on garbage collection (GC) to free up memory. Spark's memory management relies on the JVM's garbage collector to reclaim memory when it becomes necessary. When off-heap memory is full, Spark waits for the JVM's garbage collector to reclaim memory by cleaning up unused objects. Therefore, if off-heap memory fills up, Spark may experience performance degradation or even out-of-memory errors if the garbage collector cannot free enough memory.
Thanks,
ChatGPT
really good explanation
Very well explained , thank you very much
As simple as that.. Liked
This explanation is very gooooooooooooooooooooooooood
Thank u
you are welcome mate!
very nicely explained
can you make one of these vids on lakesail's pysail?
Amazing content, keep up the good work, and thank you for the brilliant presentation. You really present topics precisely, simple-to-understand.
Awesome video mate! well done.
A very very good video. Thanks, you are doing a really great job!
Just Amazing😇Thank you
Great video bro
Wonderful video you explained everything perfectly
Nice Explanation, Thank you
Thank you sir 👍
Very good video
Really good content .
Very well explained😊
Great video! Thank you
Best tutorial ❤❤all in one
Super🎉
Waiting for full tutorial
Very soon
Excellent Explanation...
Really productive video.
Nice video!
Very brief and informative video
Nice explanation..plz do series on spark.
I have a course on Spark, please check description
This is a great explanation
Nice overview.
So is pandas similar to spark where pandas is more suitable for for a single node data processing vs spark is for distributed data processing ?
Darshil I want to learn data engineering from scratch. I don't know anything about these changes, so where do I start? Which course should be taken.

My Python & SQL for Data Engineering is a good place to start - learn.datawithdarshil.com/
Nicely presented and explained.
You explained so many things in 10 minutes 🫡🫡🫡
Thank you man
such a clear and crisp video
Thanks a lot Darshil for this
Please share an end to end project using Spark.
Thank you, I will