A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji

Поделиться
HTML-код
  • Опубликовано: 26 июн 2024
  • "Of all the developers' delight, none is more attractive than a set of APIs that make developers productive, that are easy to use, and that are intuitive and expressive. Apache Spark offers these APIs across components such as Spark SQL, Streaming, Machine Learning, and Graph Processing to operate on large data sets in languages such as Scala, Java, Python, and R for doing distributed big data processing at scale. In this talk, I will explore the evolution of three sets of APIs-RDDs, DataFrames, and Datasets-available in Apache Spark 2.x. In particular, I will emphasize three takeaways: 1) why and when you should use each set as best practices 2) outline its performance and optimization benefits; and 3) underscore scenarios when to use DataFrames and Datasets instead of RDDs for your big data distributed processing. Through simple notebook demonstrations with API code examples, you'll learn how to process big data using RDDs, DataFrames, and Datasets and interoperate among them. (this will be vocalization of the blog, along with the latest developments in Apache Spark 2.x Dataframe/Datasets and Spark SQL APIs: databricks.com/blog/2016/07/1... databricks.com/glossary/what-...)
    Session hashtag: #EUdev12"
    About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
    Read more here: databricks.com/product/unifie...
    Connect with us:
    Website: databricks.com
    Facebook: / databricksinc
    Twitter: / databricks
    LinkedIn: / databricks
    Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...
  • НаукаНаука

Комментарии • 38

  • @chiranjibghorai6950
    @chiranjibghorai6950 5 лет назад +1

    Such a pleasure to hear him talk!

  • @techoral2261
    @techoral2261 2 года назад

    Now i know about RDDs, DataFrames and Datasets. Thanks for explaining it more precisely. Appreciated.

  • @tdkboxster
    @tdkboxster 5 лет назад +9

    What an amazing talk! Crisp and Clear! truly impressed.

  • @rahultiwari2860
    @rahultiwari2860 5 лет назад +1

    Thanks for in-depth explaining RDD DF And DS...

  • @ctriz76
    @ctriz76 5 лет назад +3

    this is a brilliant and fluid explanation

  • @mjrajeshmj
    @mjrajeshmj 3 года назад

    Excellent talk. Thanks Jules Damji.

  • @sayandbhattacharya1100
    @sayandbhattacharya1100 6 месяцев назад

    I had a nice learning time thanks for the talk!

  • @daviduzumaki
    @daviduzumaki 8 месяцев назад

    this guy is such a good speaker

  • @harshtiku3240
    @harshtiku3240 3 года назад

    An excellent talk by a clear master.

  • @puja9689
    @puja9689 4 года назад +1

    Amazing presentation. Very intuitive..Thanks Boss!

  • @lbasavaraj
    @lbasavaraj 5 лет назад +1

    What a brilliant talk!! Thanks

  • @aliwaheed906
    @aliwaheed906 3 года назад +9

    Amazing talk, I left off Spark to move in to ML when there was only RDD, I came back and see DataFrame in Spark and I am totally confused, your video helped a lot, Thank you

  • @abhiganta
    @abhiganta 5 лет назад

    This is best and clear talk on 3 APIs

  • @soufianebenkhaldoun7765
    @soufianebenkhaldoun7765 5 лет назад +2

    Very well explained !! Thank's

  • @AllForLove3
    @AllForLove3 4 года назад +3

    Amazing talk! very well explained indeed.

  • @jijotitus1755
    @jijotitus1755 4 года назад +1

    Amazing Talk. Thank you!

  • @TheTambourinist
    @TheTambourinist 2 года назад

    Thanks for the video. Very understandable!

  • @shemantkr
    @shemantkr 4 года назад

    it was very insightful, such talks really helps developer why/how one should use structure API

  • @pauliewalnuts6734
    @pauliewalnuts6734 4 года назад +1

    so good!!! thanks for this

  • @Dyslexic_Neuron
    @Dyslexic_Neuron 5 лет назад

    excellent explanation!! :D

  • @shannithssachin
    @shannithssachin Год назад

    Great Talk

  • @BasemKhalaf-uj7cc
    @BasemKhalaf-uj7cc 27 дней назад

    Thank you!

  • @anibaldk
    @anibaldk 5 лет назад +1

    Only 300 likes for such an informative, crystal clear talk??

  • @Blobonat
    @Blobonat 4 года назад

    Very good talk!

  • @vipultyagi1369
    @vipultyagi1369 3 года назад

    brilliant talk!

  • @shankarsr1
    @shankarsr1 5 лет назад

    awwwwesome talk thanks!

  • @tableauvizwithvineet148
    @tableauvizwithvineet148 3 года назад

    Nice and informative video

  • @nareshgb1
    @nareshgb1 6 лет назад

    I am wondering how the "type safe" feature combines with the "unstructured data" that is the nature of data in the systems that spark would be used in.

  • @varundosapati7148
    @varundosapati7148 4 года назад

    I was trying out the example you mentioned @10:46 and as i am getting compile time error, I had to rewrite the final statement as below.
    parsedRdd.filter( content => content._2 == "en").map(filteredContent => filteredContent._3).reduce(_+_).take(100).foreach(reducedContent => printf(s"$reducedContent._1: $reducedContent._2"))
    I would really appreciate if you can review above statement

  • @meravchkroun4197
    @meravchkroun4197 6 лет назад

    Thanks!
    Can you attach the links here?

  • @nithints302
    @nithints302 3 года назад

    wow

  • @AbhijeetSachdev
    @AbhijeetSachdev 6 лет назад

  • @ernesthert1898
    @ernesthert1898 6 лет назад

    No SS

  • @ernesthert1898
    @ernesthert1898 6 лет назад

    Hibud

  • @Chris_zacas
    @Chris_zacas 4 года назад +2

    This was amazing! Pretty well explained!
    Thanks!

  • @goodyoyo0214
    @goodyoyo0214 4 года назад

    Amazing Talk