Advancing Spark - Getting hands-on with Delta Cloning

Поделиться
HTML-код
  • Опубликовано: 7 сен 2024
  • Last week we looked at the announcements for Databricks Runtime 7.2 and got all excited about the notes for Delta Cloning - but we had some really good questions raised about exactly what happens under the hood. So this week join Simon as he takes a bit of a dive into DEEP and SHALLOW cloning with Delta on Databricks.
    For more info on the Clone functionality and the other syntax available, take a look at the notes here: docs.databrick...
    As always, for more tasty blogs, or info about our hands-on training courses, come visit us at: www.advancinga...

Комментарии • 13

  • @lucian1511
    @lucian1511 Год назад +1

    Very nice video! Keep up the good work!
    I am in the process of learning and your videos are excellent. I can only hope you will continue to upload new interesting stuff.
    Thank you!

  • @tanushreenagar3116
    @tanushreenagar3116 11 месяцев назад

    nice sir

  • @the.activist.nightingale
    @the.activist.nightingale 4 года назад +1

    Nice video -yet- again Simon!
    I really appreciate how you take the time to show all the manipulations and even the bugs ;)
    Seems like a cool feature but I'm wondering how it would fare if I am cloning a huge table 70-140M of rows? Maybe some stress-test would be needed on my side :)
    On the light side, please don't zoom on your face too often I get mesmerized by your eyes (are they blue-green) and I need to replay the parts multiple times :D HAHAHAHA#GirlProblems

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 года назад

      Hey! Glad the videos are still useful!
      So the shallow clone for ~140m rows will be a couple of seconds, as it's just a bit of metadata. The deep clone will depend on your cluster, but that's not a huge amount of data for spark, you could easily have it cloned in between 5-20mins depending on the size of the cluster!
      Simon

  • @nikkaz5639
    @nikkaz5639 Год назад

    Hey Simon, thanks a lot for this video. A question: how would you then make live the clone version to become the original one? Thanks

    • @AdvancingAnalytics
      @AdvancingAnalytics  Год назад

      Hrm, not sure that's possible - unless you update all files within the delta table, it will still be pointing to some files from the original! I'd say to treat clones like temporary entities, then re-do the operation if you want to make it permanent?

  • @bhaveshpatelaus
    @bhaveshpatelaus 4 года назад

    Thanks Simon. I can see the use case of this in DR scenario where primary and secondary regions in ADLS or Blob is doing asynochrnous copy of data and thus make delta tables corrupted! Does DEEP CLONE happens with ACID guarantees. What if you are CLONING big tables and there is an interrpution to the cloning operation. Does it land incomplete data?

  • @prashanthxavierchinnappa9457
    @prashanthxavierchinnappa9457 3 года назад

    Hey Simon Thanks for a great video. Just the kind of channel I was looking for. A quick question I am wondering what is the best way to copy only certain partitions of a delta table and create a new delta table without having to copy all the contents. I assumed cloning would help somehow, but does not seem the case.

    • @AdvancingAnalytics
      @AdvancingAnalytics  3 года назад +1

      Afraid cloning doesn't support partition-scoping that I know of. You would likely need to write a quick dataframe that reads your source, filters to your desired partitions and writes to the new table - you wouldn't get table settings, transaction history etc copied across though! There are some workarounds with cloning, deleting partitions etc, but it'll be more work than just writing a quick dataframe!

  • @sid0000009
    @sid0000009 4 года назад

    Shallow Clone : What happens to the cloned table if we update on the original table. As we understand the initial pointer of the cloned table is towards the original table data. Thanks

    • @AdvancingAnalytics
      @AdvancingAnalytics  4 года назад +2

      So the original table will see the new files as "replaced" in the trans log. The cloned table will point at the old files and work as expected. The only problem will come if you run a Vacuum on the original table after updating, then the shallow clone will no longer function. So not great for long-term, but fantastic for short term testing/experimentation!
      Simon

    • @sid0000009
      @sid0000009 4 года назад

      @@AdvancingAnalytics ur a genius!

    • @nishu2u85
      @nishu2u85 2 года назад

      Thanks much for clarifying :)