Delta Lake Deep Dive: Liquid Clustering

Поделиться
HTML-код
  • Опубликовано: 6 дек 2023
  • Join us on Thursday, December 7 at 10AM PST for an enlightening session on Delta Lake's Liquid Clustering, a transformative approach in data management and optimization with Vítor Teixeira, Senior Data Engineer at Veeva Systems.
    Liquid Clustering is Delta Lake's answer to the complex challenges of Big Data. Traditionally, partitioning and Z-Order clustering have been used to improve query performance by managing large datasets effectively. However, these methods come with limitations such as complexity in implementation, rigidity in data layout, and the need for frequent data rewrites. Delta Lake’s Liquid Clustering offers a dynamic solution. It allows for flexible redefinition of clustering keys without the need to rewrite existing data, adapting effortlessly to evolving analytic needs.
    This session will cover how Liquid Clustering simplifies data layout decisions and optimizes query performance, marking a significant advancement over traditional partitioning and Z-Order clustering methods. Don’t miss this opportunity to learn about Liquid Clustering and how it can revolutionize your data management strategy.
    Quick Links
    Join us on Slack: go.delta.io/slack
    GitHub: github.com/delta-io
    Join Google Groups: groups.google.com/forum/#!for...
  • НаукаНаука

Комментарии • 7

  • @alexischicoine2072
    @alexischicoine2072 3 месяца назад

    It's a great combo with vector deletions as you don't have to rewrite the data. Without vector deletions it could make deletes more expensive as the data would be spread and mixed across files.

  • @alexischicoine2072
    @alexischicoine2072 3 месяца назад

    Very interesting. For zordering you can store the columns in table properties at table creation and then retrieve them when optimizing it's not that much code.

  • @chrisstephenson9890
    @chrisstephenson9890 6 месяцев назад

    Thank for sharing this talk. Would you be so kind to share a link to the slide deck presented by Vitor?

  • @luisriveros1119
    @luisriveros1119 8 месяцев назад

    Hi !! I have a question is it possible to implementing liquid clustering for DataFrames directly saved to delta files (df.write.format("delta").save("path")), The conventional approach involving table creation

  • @k.saibhargav8072
    @k.saibhargav8072 4 месяца назад

    what is difference between bucket By vs Liquid Clustering

  • @raviv5109
    @raviv5109 5 месяцев назад

    One question, is it wise decision to apply partition to liquid clustering table?

    • @paulfunigga
      @paulfunigga 5 месяцев назад

      partitioning is not compatible with liquid clustering