Advancing Spark - Delta Deletion Vectors

Поделиться
HTML-код
  • Опубликовано: 7 сен 2024
  • Whenever we explain how Delta works with parquet, performing redundant copies of "unchanged" data whenever a record is updated or deleted, people are understandably shocked - it's a huge amount of unnecessary work. With Delta Deletion Vectors, we finally have a better answer - deleting records is now a quick, simply metadata operation!
    In this video Simon walks through the concept of deletion vectors, looking at how they are implemented and walking through a simple example - following what happens at the file & transaction log level.
    To learn more about deletion vectors, check out: docs.databrick...
    And if you need help on your Data & AI journey, give Advancing Analytics a call!

Комментарии • 7

  • @alexischicoine2072
    @alexischicoine2072 3 месяца назад

    Deletion vectors are amazing. They improve concurrency as well which is detailed on the page about isolation and serialization. If you need to delete data about customers for compliance it’s great. Also if you need to replicate your data to another region you won’t be creating as many extra files that need to be transferred and stored so you can get good savings from that as well. Imagine if you have big gigabyte parquet files in a huge table and you need to delete a record here and there it will make a massive difference.

  • @riteshsharma344
    @riteshsharma344 10 месяцев назад

    Thanks for great video as always 🙂

  • @malebeauty
    @malebeauty 5 месяцев назад

    You're so cool

  • @2307Leito
    @2307Leito 10 месяцев назад

    Awesome! love your videos! nice feature, quick question, for doing upserts in delta what could be the best way to implement it? let's say you have a fact table by day and on daily runs it loads the 3 closest day to getdate() (it reloads some data and insert new one -upsert-)

  • @jeanchindeko5477
    @jeanchindeko5477 10 месяцев назад

    Thanks for this great video. Is this like Merge on Read like in Iceberg and Hudi?

  • @SladeFlash
    @SladeFlash 9 месяцев назад

    Hi, can we set this property in streaming table?

  • @NeumsFor9
    @NeumsFor9 10 месяцев назад

    Pretty soon we will be at the old SSAS .deleted store, and all those .store files 😂😂😂....