Advancing Spark - Bloom Filter Indexes in Databricks Delta

Поделиться
HTML-код
  • Опубликовано: 29 июл 2024
  • Data Lakes are notoriously bad at single record lookups, the kind of query where you are looking for a specific ID in amongst millions of records. We have ways of organising our data, through partitions and z-ordering, which helps this a little... but wouldn't it be great if we could just pop an index over the top?
    Turns out we can - in this video Simon runs through a quick introduction to using Bloom Filter Indexes with Databricks Delta. We look briefly at the available documentation, before digging into a notebook example and the how the files are managed underneath!
    To get started with Bloom indexes, see the documentation here: docs.databricks.com/delta/opt...
    And the demo notebook is available from: docs.databricks.com/_static/n...
    As always, don't forget to like & subscribe, and let us know what you think in the comments!

Комментарии • 23