Making Apache Spark™ Better with Delta Lake

Поделиться
HTML-код
  • Опубликовано: 18 окт 2024

Комментарии • 16

  • @sonagy23
    @sonagy23 2 года назад +16

    28:32 How does Delta Lake work?
    28:50 Delta On Disk
    29:59 Table = result of a set of actions
    31:31 Implementing Atomicity
    32:48 Ensuring Serializability
    33:33 Solving Conflicts Optimistically
    35:08 Handling Massive Metadata
    36:32 Roadmap
    38:20 QnA

    • @kbkonatham1701
      @kbkonatham1701 2 года назад

      hi kim thanks for support , you are from ? , i am from india.

  • @rakshithvenkatesh2773
    @rakshithvenkatesh2773 4 года назад +6

    I see this whole "Hierarchical Data Pipeline" strategy being talked about quite a bit these days. We did establish this as part of a ready solution we built for Manufacturing use case using Confluent Kafka + KSQL. But the Data Lake is something i believe will remain/continue to exist as a depot for long term retention of data where AI/DA platforms leverage data from these data lakes for batch processing. I see this story from DataBricks to be a Data-warehouse convergence towards Data Lakes !

  • @meryplays8952
    @meryplays8952 4 года назад +9

    The architecture comes with a nice VLDB 2020 paper (which the presenter did not mention).

  • @RossittoS
    @RossittoS 3 года назад +1

    Excellent features!!

  • @hanssylvest8390
    @hanssylvest8390 3 года назад +22

    Please give all empl. a better audio recording microphone.

    • @jacekb4057
      @jacekb4057 Год назад

      Or use some AI audio cleaner :D

  • @Sangeethsasidharanak
    @Sangeethsasidharanak 3 года назад +2

    27.28 on automating data quality. .. isn't it same as we do quality check before we save using custom code..Will there be any additional benefits?

    • @gustavemuhoza4212
      @gustavemuhoza4212 3 года назад +1

      It's probably the same, but not sure how you could do that on a datalake consistently. As described here, Delta appears to make it easier to do and making it possible to do it as if you were doing it on a relational database.

  • @srh80
    @srh80 Год назад +2

    Wait, people still use comcast and watch TV?

  • @moebakry3203
    @moebakry3203 3 года назад +3

    What is the best way to load data from Sql server to Delta lake every 5 seconds?

  • @hidemisuzuki965
    @hidemisuzuki965 3 года назад

    Where can I download the slides? Thanks!

  • @rahulpathak3161
    @rahulpathak3161 4 года назад +2

    Thank you and can you please share PPT..

    • @張博凱-p7z
      @張博凱-p7z 4 года назад +10

      www.slideshare.net/databricks/making-apache-spark-better-with-delta-lake

    • @hanmuster
      @hanmuster 4 года назад +1

      @@張博凱-p7z Many thanks!