Advanced Apache Spark Training - Sameer Farooqui (Databricks)
HTML-код
- Опубликовано: 9 апр 2015
- Live Big Data Training from Spark Summit 2015 in New York City.
"Today I'll cover Spark core in depth and get you prepared to use Spark in your own prototypes. We'll start by learning about the big data ecosystem, then jump into RDDs (Resilient Distributed Datasets). Then we'll talk about integrating Spark with resource managers like YARN and Standalone mode. After a peek into some Spark Internals, we touch base upon Accumulators and Broadcast Variables. Finally, we end with Spark Streaming and a technical explanation of how the 100 TB sort competition was won in 2014." - Sameer
Slides:
spark-summit.org/wp-content/u...
Want to learn more about Spark?
Check out my new class, "Exploring Wikipedia with Apache Spark", recorded June 2016:
• "Exploring Wikipedia W...
// About the Presenter //
Sameer Farooqui is a Technology Evangelist at Databricks where he helps promote the adoption of Apache Spark. As a founding member of the training team, he created and taught advanced Spark classes at private clients, meetups and conferences globally.
Follow Sameer on -
Twitter: / blueplastic
LinkedIn: / blueplastic Наука
1:30 Agenda
5:14 History of Spark
27:40 RDD fundamentals
1:20:23 Spark Runtime architecture and resource managers
2:49:24 Memory and Persistence
3:15:30 Serialization
3:19:50 Staging
3:42:00 Shuffle
3:55:00 Broadcast and accumulators
4:31:25 PySpark
4:49:00 Next Gen Shuffle
5:32:00 Spark Streaming
Thanks for the info!
very helpful breakdown of the video. thanks.
Thank you for time offsets
MrTulufan has
this is really helpful thanks
Probably the best Spark video on the Internet right now.
Even till date
It still is
This is best tutorials I seen..I admire you Sameer for your patience while you answered all Q...
I wish they made a sequel in 2020
Samee
Can we get the entire deck with all the technical slides?
Sameer thank you for putting a professional video that finally explains Spark at the pro level. Much appreciated.
Excellent presentation of core spark, among the best I've ever watched, despite the older version it covers. Presenter's knowledge is very deep and he delivers it very clearly. Excellent job!!
Great work Sameer,
So far the best detailed Spark presentation I have seen online.
Appreciate a bunch.
Thank you,
Tushar Kale
Best Spark tutorial I have ever come accross.... Thanks Sameer Farooqui....
such a sincere presentation.
Sameer, you have done us all a great service here, appreciate having this posted....very deep coverage of the core architecture, helpful from any number of aspects. Look forward to seeing more in the future as the platform evolves.
Ultimate video ever seen on Spark internals!
The best tutorials for spark, really.
Excellent presentation! It really walks through all aspects in detail. thanks
complex concepts explained nicely in diagrams, easy to grasp when Sameer explains :)
the best presenter ever. Expert in spark as well.
Excellent video. Great starting point for Databricks/Spark
Really, you are the fantastic presentation Sameer.! Keep posting some more video.
Thanks Sameer !! This is a best video on Spark Internals i came across.
one of the best presentation on spark
Thank You Sameer.I learned a lot about spark after watching your videos....Will be waiting for your next 5hrs hands on video in next Summit
Thank you so much! I had lot of my fundamental doubts cleared (as an Engineer who likes to know what goes on underneath)
What a introduction and overview. Great session
This is one of the best free videos ever available on the youtube community.
Well, it can't compete with 3 blue 1 brown's educational videos. Those are on another level.
Excellent presentation Sameer. Thank you.
Very nice video. Best online tutorial for Spark. Sameer has superb presentation skill. Thanks:)
Good tutorial to understand in-depth knowledge about spark core. It also help for production setup.
Wow, fantastic presentation Sameer! The topics you cover about Spark Core are awesomely explained. Great work!
Excellent content on Spark Architecture
Excellent Session Sameer !
The best ever on spark!!
Awesome Sameer. Thank you.
A Masterpiece, thanks Sameer & Databricks
Awesome explanation. Thanks a lot Sameer.
Joining others, it's a must watch video
Best video on spark
Very good presentation ..Thank you .
best spark talk ever !!
Very good Presentation. Thank you!!
you are doing a great job Bro.....your sessions are very useful...please keep posting
Just want to share. I came across this video back in 2016 when spark was a buzz word mostly. Did not understand most of it back then and did not watch it. Now again watching it in 2022. It's true gem.
is this video still relevent? I am new to spark and came across this video should I watch it?
Definitely. It will help you understand the core fundamentals of spark and many other things. Though some of the points might be irrelevant now, but that is not deal breaker.
Aww, my goal with it was to on-board completely new folk to Spark. Sorry if it was confusing first time you watched it.
Great stuff Sameer!!
very good presentation
Great video!
link to slides: www.slideshare.net/databricks/spark-summit-east-2015-advdevopsstudentslides?from_action=save
Just amazing..
Thank you so much Mr. Farooqui!
best stuff ever.
Nice detailed explanation
Thank you so much Sameer..
seriously good
Great talk - got a lot out of this.
The best video. Any chance to get updated one with the latest changes? Like support for multiple executors. Anything else is out of date for Spark 2.x?
Best Spark Material.
Thank you this is very helpful!
One of the best detailed spark session. Thank you
where can I find the slides?
just loved it,,
It is excellent session.
great vid!
Hi Sameer you are a good presenter man, not so sure i need any sparks or Apache but well done
very good presentation :)
Excellent job Sameer.... thank you!!!
As a new spark learner I can't ask for more :) This is real developer talk and help in designing and modelling any initial spark projects. Thanks a ton Sameer!!!
Here are more Spark videos, if you are interested Spark Interview Questions: ruclips.net/p/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std
@@harjeetkumar4632 hi bro, iam newbie to spark, so want to learn can you pls share the path..thank you..😊
Loved It,,Thankyou Sameer for Such a nice very very informative presentation!!
Thanks, Pravin. Glad you found it helpful!
Great presentation!
***** - Great Lecture Sameer. Can we have access to DevOps labs 101 and 102 too ??
Hi Sameer,
You have mentioned that in sort based shuffle Map side will keep one file handle open. So in above example will that mean one File would be of 1200 MB(1.2 GB) as total size of RDD Partition is 3.6 gb and there are 3 files for each map tasks thereby making 3.6 gb?
Thanks
Rahul
Sameer,i being a beginner,found this talk a very useful one and towards the end of it i am confident to talk to people about spark.BTW,i loved the standalone flamingo logo you have chosen
Hah! I was able to somehow sneak that in. When making the sides, I was looking for an icon that could visually remind the students of Standalone mode... so I searched google images for "standalone" and found that Flamingo standing alone on one leg...
@@blueplasticvideos how can I download the slides
The link is not working.
Fantastic
Sammer, you are awesome ... very good presentation Thanks bro.
Off course.
It would be great if you could share link to the labs.
In Yarn client or cluster mode, is one executor per application per node holds true as in Spark Standalone?
Hi Sameer,
Can I get access to those labs to play with ? Maybe just the devops notebooks
Could someone post link to the slides as it is no longer available? Thank you.
i have a question ? on what basis the partition in RDD decides ?
Where can I possibly get latest spark2019 summit videos
Really great explanation about Spark Core.. I've followed your Hadoop tutorials as well, Seems this one is a best one(Improved one). Voice is very clear Sameer
can you share the link for his hadoop tutorials?
ruclips.net/video/ziqx2hJY8Hg/видео.html
Here are more videos if you are interested Spark Interview Questions: ruclips.net/p/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std
can anyone tell me how to use note on my local Apache Spark instead command line(shell)
about cluster mode (standalone mode) @1:39:12
Note that it is now possible for a worker to spawn multiple executors for the same application, in standalone mode. See PR github.com/apache/spark/pull/731
Is the code and data available from this session?
Session starts at @5:20
Do you provide online training for hadoop?
Great presentation Sameer. Thanks to you and your team for putting it together. It really helped me solidify some fuzzy concepts. I'm looking forward to more in depth learning with the goal of becoming a contributor. :)
John McCullough h
John McCullough was
Great work Sameer, the depth and clarity which you explain is just outstanding. Could you please help me with the PPTs url, i am not able to find it in link attached in description.
Great
great
wow I jumped straight to the part that I was looking for 4:47:30, which is benchmarking, how are the odds :D
Today Kubernetes has become the go to Cluster manager for Spark Cluster Computing. Correct me if i am wrong .
Hii.. this is one of the best presentation about spark. One question is, Spark evolved a lot from here. Are these concepts still relevant till today? Any changes or obsolete content of this video? Can any one tell me pls.
Thanks! I'm surprised to see that this video is still being watched since it's 8 years old 😳 I would say that like 75% of it is still accurate. Even if it's not accurate, watch it for the fancy graphics and jokes man.
about cluster mode (local mode) @1:29:44
Does anyone have resources or source code for a deep learning based RLScheduler in a single node level task scheduling
can anyone share slides
Is there a way to add subtitles?
Who disliked this video? This is the spark bible. Thanks Sameer
haters gonna hate.
nice session.thanks.
jokes apart glass water level was not going down though you drank multiple times...lol.also not a single time found smile on your face...so serious.lol...anyways it was a great session Sameer.
Andy Kaufman never smiled either.
The slides link doesn't work?
Hi Sameer, I like your way of presentation and the useful information. For my side, I learned a lot from your presentation. Meanwhile, I wish one day accept to work as a team to publish a paper with you . I am Ph.D in Data replication and I wish to see more videos and I have misunderstanding of RDD. So, Could you please advise me a link.
Dr.Ziyad Al-Khinalie No answer, nobody cares this is the hypocrisy of tech gurus.
what is the hardware configuration of each of the worker node. How should we decide that ?
Typically in production Spark deployments I'm seeing machines with like 30-60 GB of RAM and maybe 2 TB SSDs. Each Executor JVM is typically ~30 GB and the Driver JVM is also around 30 GB. For the Worker JVM or Spark Master JVM (in Standalone mode) maybe 4 GB of RAM for each should be fine. You'll want to experiment with different hardware profiles for your specific workloads and use case though.
If I am running spark local mode, should the number of cores be equal to number of logical cpus?
local[*]
can you please elaborate a scenario where shuffling of data is good ?
before you play poker in a data center.
about cluster mode @1:21:06