Advanced Apache Spark Training - Sameer Farooqui (Databricks)

Поделиться
HTML-код
  • Опубликовано: 9 апр 2015
  • Live Big Data Training from Spark Summit 2015 in New York City.
    "Today I'll cover Spark core in depth and get you prepared to use Spark in your own prototypes. We'll start by learning about the big data ecosystem, then jump into RDDs (Resilient Distributed Datasets). Then we'll talk about integrating Spark with resource managers like YARN and Standalone mode. After a peek into some Spark Internals, we touch base upon Accumulators and Broadcast Variables. Finally, we end with Spark Streaming and a technical explanation of how the 100 TB sort competition was won in 2014." - Sameer
    Slides:
    spark-summit.org/wp-content/u...
    Want to learn more about Spark?
    Check out my new class, "Exploring Wikipedia with Apache Spark", recorded June 2016:
    • "Exploring Wikipedia W...
    // About the Presenter //
    Sameer Farooqui is a Technology Evangelist at Databricks where he helps promote the adoption of Apache Spark. As a founding member of the training team, he created and taught advanced Spark classes at private clients, meetups and conferences globally.
    Follow Sameer on -
    Twitter: / blueplastic
    LinkedIn: / blueplastic
  • НаукаНаука

Комментарии • 195

  • @MrTulufan
    @MrTulufan 5 лет назад +361

    1:30 Agenda
    5:14 History of Spark
    27:40 RDD fundamentals
    1:20:23 Spark Runtime architecture and resource managers
    2:49:24 Memory and Persistence
    3:15:30 Serialization
    3:19:50 Staging
    3:42:00 Shuffle
    3:55:00 Broadcast and accumulators
    4:31:25 PySpark
    4:49:00 Next Gen Shuffle
    5:32:00 Spark Streaming

  • @skipperkongen
    @skipperkongen 9 лет назад +92

    Probably the best Spark video on the Internet right now.

  • @arunbm123
    @arunbm123 8 лет назад +14

    This is best tutorials I seen..I admire you Sameer for your patience while you answered all Q...

  • @pats8589
    @pats8589 4 года назад +68

    I wish they made a sequel in 2020

  • @christianlira1259
    @christianlira1259 5 лет назад +5

    Sameer thank you for putting a professional video that finally explains Spark at the pro level. Much appreciated.

  • @singalong9962
    @singalong9962 5 лет назад +3

    Excellent presentation of core spark, among the best I've ever watched, despite the older version it covers. Presenter's knowledge is very deep and he delivers it very clearly. Excellent job!!

  • @TusharKale9
    @TusharKale9 9 лет назад +4

    Great work Sameer,
    So far the best detailed Spark presentation I have seen online.
    Appreciate a bunch.
    Thank you,
    Tushar Kale

  • @jahartyagi9695
    @jahartyagi9695 7 лет назад +2

    Best Spark tutorial I have ever come accross.... Thanks Sameer Farooqui....

  • @dharmendrabhojwani
    @dharmendrabhojwani 8 лет назад +14

    such a sincere presentation.

  • @craigholley3287
    @craigholley3287 8 лет назад +3

    Sameer, you have done us all a great service here, appreciate having this posted....very deep coverage of the core architecture, helpful from any number of aspects. Look forward to seeing more in the future as the platform evolves.

  • @SandeshMendan
    @SandeshMendan 6 лет назад +2

    Ultimate video ever seen on Spark internals!

  • @yjwoo1131
    @yjwoo1131 6 лет назад +1

    The best tutorials for spark, really.

  • @tomkoshy
    @tomkoshy 9 лет назад +1

    Excellent presentation! It really walks through all aspects in detail. thanks

  • @arunsundar3739
    @arunsundar3739 5 лет назад +1

    complex concepts explained nicely in diagrams, easy to grasp when Sameer explains :)

  • @arada123
    @arada123 7 лет назад +2

    the best presenter ever. Expert in spark as well.

  • @lackshubalasubramaniam7311
    @lackshubalasubramaniam7311 5 лет назад

    Excellent video. Great starting point for Databricks/Spark

  • @rajeshsurpur
    @rajeshsurpur 6 лет назад +1

    Really, you are the fantastic presentation Sameer.! Keep posting some more video.

  • @com2ram
    @com2ram 4 года назад

    Thanks Sameer !! This is a best video on Spark Internals i came across.

  • @HolmesPatrick
    @HolmesPatrick 9 лет назад

    one of the best presentation on spark

  • @jubinsoni4694
    @jubinsoni4694 5 лет назад +1

    Thank You Sameer.I learned a lot about spark after watching your videos....Will be waiting for your next 5hrs hands on video in next Summit

  • @sarnathk1946
    @sarnathk1946 5 лет назад +2

    Thank you so much! I had lot of my fundamental doubts cleared (as an Engineer who likes to know what goes on underneath)

  • @kiraninam
    @kiraninam 4 года назад

    What a introduction and overview. Great session

  • @javaidmir9831
    @javaidmir9831 Год назад

    This is one of the best free videos ever available on the youtube community.

    • @blueplasticvideos
      @blueplasticvideos Год назад

      Well, it can't compete with 3 blue 1 brown's educational videos. Those are on another level.

  • @SohelKhan-tr6jr
    @SohelKhan-tr6jr 9 лет назад

    Excellent presentation Sameer. Thank you.

  • @AbhaySingh-ir4fy
    @AbhaySingh-ir4fy 7 лет назад +1

    Very nice video. Best online tutorial for Spark. Sameer has superb presentation skill. Thanks:)

  • @provashdowari2926
    @provashdowari2926 7 лет назад

    Good tutorial to understand in-depth knowledge about spark core. It also help for production setup.

  • @bpriorb
    @bpriorb 9 лет назад +16

    Wow, fantastic presentation Sameer! The topics you cover about Spark Core are awesomely explained. Great work!

  • @aiSuresh
    @aiSuresh 5 лет назад

    Excellent content on Spark Architecture

  • @krishna079
    @krishna079 7 лет назад

    Excellent Session Sameer !

  • @hafizca
    @hafizca 8 лет назад +1

    The best ever on spark!!

  • @NB-xc6qq
    @NB-xc6qq 8 лет назад +1

    Awesome Sameer. Thank you.

  • @debmalyapanday
    @debmalyapanday 3 года назад

    A Masterpiece, thanks Sameer & Databricks

  • @nirmalagra
    @nirmalagra 6 лет назад

    Awesome explanation. Thanks a lot Sameer.

  • @aleksandrivanov4345
    @aleksandrivanov4345 8 лет назад +10

    Joining others, it's a must watch video

  • @kaleemahmadkhan5764
    @kaleemahmadkhan5764 4 года назад

    Best video on spark

  • @uma_mataji
    @uma_mataji 8 лет назад +1

    Very good presentation ..Thank you .

  • @viren8577
    @viren8577 4 года назад

    best spark talk ever !!

  • @vinodpatil3497
    @vinodpatil3497 7 лет назад

    Very good Presentation. Thank you!!

  • @virenderdeswal
    @virenderdeswal 7 лет назад

    you are doing a great job Bro.....your sessions are very useful...please keep posting

  • @adrishpal8713
    @adrishpal8713 Год назад +1

    Just want to share. I came across this video back in 2016 when spark was a buzz word mostly. Did not understand most of it back then and did not watch it. Now again watching it in 2022. It's true gem.

    • @PatelMahendra
      @PatelMahendra Год назад

      is this video still relevent? I am new to spark and came across this video should I watch it?

    • @adrishpal8713
      @adrishpal8713 Год назад +2

      Definitely. It will help you understand the core fundamentals of spark and many other things. Though some of the points might be irrelevant now, but that is not deal breaker.

    • @blueplasticvideos
      @blueplasticvideos Год назад +1

      Aww, my goal with it was to on-board completely new folk to Spark. Sorry if it was confusing first time you watched it.

  • @sureshsindhwani6317
    @sureshsindhwani6317 7 лет назад

    Great stuff Sameer!!

  • @mdfurqan3487
    @mdfurqan3487 8 лет назад +1

    very good presentation

  • @dionwang
    @dionwang 8 лет назад +1

    Great video!

  • @AkeelAhamedInsights
    @AkeelAhamedInsights 4 года назад +4

    link to slides: www.slideshare.net/databricks/spark-summit-east-2015-advdevopsstudentslides?from_action=save

  • @Yash94888
    @Yash94888 8 лет назад +2

    Just amazing..

  • @BlackHermit
    @BlackHermit 4 года назад

    Thank you so much Mr. Farooqui!

  • @meditating010
    @meditating010 8 лет назад +3

    best stuff ever.

  • @sarrae100
    @sarrae100 6 лет назад

    Nice detailed explanation

  • @girishjangannavar7827
    @girishjangannavar7827 5 лет назад

    Thank you so much Sameer..

  • @smagadi124
    @smagadi124 8 лет назад +2

    seriously good

  • @thomasswann1800
    @thomasswann1800 9 лет назад

    Great talk - got a lot out of this.

  • @petrnovak9271
    @petrnovak9271 7 лет назад

    The best video. Any chance to get updated one with the latest changes? Like support for multiple executors. Anything else is out of date for Spark 2.x?

  • @rmyou
    @rmyou 6 лет назад

    Best Spark Material.

  • @aidenzhang5959
    @aidenzhang5959 Год назад

    Thank you this is very helpful!

  • @NishaKumari-op2ek
    @NishaKumari-op2ek 4 года назад +1

    One of the best detailed spark session. Thank you
    where can I find the slides?

  • @sasmitapanigrahi3520
    @sasmitapanigrahi3520 8 лет назад +2

    just loved it,,

  • @2007selvam
    @2007selvam 7 лет назад

    It is excellent session.

  • @hazdazzler
    @hazdazzler 9 лет назад

    great vid!

  • @Nerky7654
    @Nerky7654 9 лет назад +1

    Hi Sameer you are a good presenter man, not so sure i need any sparks or Apache but well done

  • @maouhoubmouchtaq1688
    @maouhoubmouchtaq1688 7 лет назад

    very good presentation :)

  • @madhu.badiginchala
    @madhu.badiginchala 7 лет назад

    Excellent job Sameer.... thank you!!!

  • @surendratiwari7980
    @surendratiwari7980 8 лет назад +9

    As a new spark learner I can't ask for more :) This is real developer talk and help in designing and modelling any initial spark projects. Thanks a ton Sameer!!!

    • @harjeetkumar4632
      @harjeetkumar4632 5 лет назад +2

      Here are more Spark videos, if you are interested Spark Interview Questions: ruclips.net/p/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std

    • @passions9730
      @passions9730 2 года назад

      @@harjeetkumar4632 hi bro, iam newbie to spark, so want to learn can you pls share the path..thank you..😊

  • @pravinpathak7934
    @pravinpathak7934 6 лет назад

    Loved It,,Thankyou Sameer for Such a nice very very informative presentation!!

  • @sribaddela
    @sribaddela 9 лет назад

    Great presentation!

    • @rishiagr
      @rishiagr 9 лет назад

      ***** - Great Lecture Sameer. Can we have access to DevOps labs 101 and 102 too ??

  • @rahulgulati890
    @rahulgulati890 8 лет назад +1

    Hi Sameer,
    You have mentioned that in sort based shuffle Map side will keep one file handle open. So in above example will that mean one File would be of 1200 MB(1.2 GB) as total size of RDD Partition is 3.6 gb and there are 3 files for each map tasks thereby making 3.6 gb?
    Thanks
    Rahul

  • @surajmon123
    @surajmon123 8 лет назад +2

    Sameer,i being a beginner,found this talk a very useful one and towards the end of it i am confident to talk to people about spark.BTW,i loved the standalone flamingo logo you have chosen

    • @blueplasticvideos
      @blueplasticvideos 7 лет назад +3

      Hah! I was able to somehow sneak that in. When making the sides, I was looking for an icon that could visually remind the students of Standalone mode... so I searched google images for "standalone" and found that Flamingo standing alone on one leg...

    • @madhavkondapalli785
      @madhavkondapalli785 4 года назад

      @@blueplasticvideos how can I download the slides
      The link is not working.

  • @TalhaAsifRahim
    @TalhaAsifRahim 8 лет назад

    Fantastic

  • @vsandeep06
    @vsandeep06 4 года назад +1

    Sammer, you are awesome ... very good presentation Thanks bro.

  • @rpiitkgpian
    @rpiitkgpian 6 лет назад +1

    It would be great if you could share link to the labs.

  • @soravgulati100
    @soravgulati100 7 лет назад

    In Yarn client or cluster mode, is one executor per application per node holds true as in Spark Standalone?

  • @arastuece04
    @arastuece04 7 лет назад

    Hi Sameer,
    Can I get access to those labs to play with ? Maybe just the devops notebooks

  • @tadastadux
    @tadastadux 4 года назад

    Could someone post link to the slides as it is no longer available? Thank you.

  • @mdfurqan3487
    @mdfurqan3487 8 лет назад

    i have a question ? on what basis the partition in RDD decides ?

  • @rajturani4721
    @rajturani4721 4 года назад

    Where can I possibly get latest spark2019 summit videos

  • @uday264
    @uday264 8 лет назад +5

    Really great explanation about Spark Core.. I've followed your Hadoop tutorials as well, Seems this one is a best one(Improved one). Voice is very clear Sameer

    • @syedtahaaziz240
      @syedtahaaziz240 7 лет назад

      can you share the link for his hadoop tutorials?

    • @PMestry007
      @PMestry007 6 лет назад

      ruclips.net/video/ziqx2hJY8Hg/видео.html

    • @harjeetkumar4632
      @harjeetkumar4632 5 лет назад

      Here are more videos if you are interested Spark Interview Questions: ruclips.net/p/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std

  • @shakkur07
    @shakkur07 7 лет назад

    can anyone tell me how to use note on my local Apache Spark instead command line(shell)

  • @user-ts6fn3lj9v
    @user-ts6fn3lj9v 6 лет назад

    about cluster mode (standalone mode) @1:39:12

  • @maxdemoulin5245
    @maxdemoulin5245 9 лет назад +3

    Note that it is now possible for a worker to spawn multiple executors for the same application, in standalone mode. See PR github.com/apache/spark/pull/731

  • @hatrixyesa
    @hatrixyesa 7 лет назад

    Is the code and data available from this session?

  • @user-sw9kd9pv4n
    @user-sw9kd9pv4n 7 лет назад +1

    Session starts at @5:20

  • @JavaHomeCloud
    @JavaHomeCloud 9 лет назад

    Do you provide online training for hadoop?

  • @JohnMcCullough97
    @JohnMcCullough97 9 лет назад +1

    Great presentation Sameer. Thanks to you and your team for putting it together. It really helped me solidify some fuzzy concepts. I'm looking forward to more in depth learning with the goal of becoming a contributor. :)

  • @vandanac3098
    @vandanac3098 Год назад

    Great work Sameer, the depth and clarity which you explain is just outstanding. Could you please help me with the PPTs url, i am not able to find it in link attached in description.

  • @SujeetKumarSinghlive
    @SujeetKumarSinghlive 2 года назад

    Great

  • @bernardndayishimiye2482
    @bernardndayishimiye2482 5 лет назад

    great

  • @lols1503
    @lols1503 6 лет назад

    wow I jumped straight to the part that I was looking for 4:47:30, which is benchmarking, how are the odds :D

  • @avsbharadwaj8190
    @avsbharadwaj8190 3 года назад +1

    Today Kubernetes has become the go to Cluster manager for Spark Cluster Computing. Correct me if i am wrong .

  • @seenu0104
    @seenu0104 Год назад +2

    Hii.. this is one of the best presentation about spark. One question is, Spark evolved a lot from here. Are these concepts still relevant till today? Any changes or obsolete content of this video? Can any one tell me pls.

    • @blueplasticvideos
      @blueplasticvideos Год назад

      Thanks! I'm surprised to see that this video is still being watched since it's 8 years old 😳 I would say that like 75% of it is still accurate. Even if it's not accurate, watch it for the fancy graphics and jokes man.

  • @user-ts6fn3lj9v
    @user-ts6fn3lj9v 6 лет назад

    about cluster mode (local mode) @1:29:44

  • @1UniverseGames
    @1UniverseGames 2 года назад

    Does anyone have resources or source code for a deep learning based RLScheduler in a single node level task scheduling

  • @watchmanling
    @watchmanling 5 лет назад +2

    can anyone share slides

  • @Luchox5006
    @Luchox5006 3 дня назад

    Is there a way to add subtitles?

  • @DeepakRajak05
    @DeepakRajak05 5 лет назад +1

    Who disliked this video? This is the spark bible. Thanks Sameer

  • @user-co8oc1rm5w
    @user-co8oc1rm5w 3 года назад +1

    nice session.thanks.
    jokes apart glass water level was not going down though you drank multiple times...lol.also not a single time found smile on your face...so serious.lol...anyways it was a great session Sameer.

  • @CyberAussie
    @CyberAussie 4 года назад

    The slides link doesn't work?

  • @dr.ziyadal-khinalie4326
    @dr.ziyadal-khinalie4326 7 лет назад

    Hi Sameer, I like your way of presentation and the useful information. For my side, I learned a lot from your presentation. Meanwhile, I wish one day accept to work as a team to publish a paper with you . I am Ph.D in Data replication and I wish to see more videos and I have misunderstanding of RDD. So, Could you please advise me a link.

    • @danielt.p.3692
      @danielt.p.3692 6 лет назад

      Dr.Ziyad Al-Khinalie No answer, nobody cares this is the hypocrisy of tech gurus.

  • @dharmendrabhojwani
    @dharmendrabhojwani 8 лет назад

    what is the hardware configuration of each of the worker node. How should we decide that ?

    • @blueplasticvideos
      @blueplasticvideos 7 лет назад +1

      Typically in production Spark deployments I'm seeing machines with like 30-60 GB of RAM and maybe 2 TB SSDs. Each Executor JVM is typically ~30 GB and the Driver JVM is also around 30 GB. For the Worker JVM or Spark Master JVM (in Standalone mode) maybe 4 GB of RAM for each should be fine. You'll want to experiment with different hardware profiles for your specific workloads and use case though.

  • @sharu8080
    @sharu8080 8 лет назад

    If I am running spark local mode, should the number of cores be equal to number of logical cpus?

  • @jumperankur
    @jumperankur 3 года назад

    can you please elaborate a scenario where shuffling of data is good ?

  • @user-ts6fn3lj9v
    @user-ts6fn3lj9v 6 лет назад

    about cluster mode @1:21:06