How to Build a Metadata Driven Data Pipelines with Delta Live Tables

Поделиться
HTML-код
  • Опубликовано: 13 янв 2025

Комментарии • 10

  • @brads2041
    @brads2041 Год назад +1

    not clear how this process would handle what happens if your source query, for silver in this context, though might be more relevant to gold, uses something like an aggregate, which dlt streaming doesn't like and you may have to fully materialize a table instead of streaming

  • @garethbayvel8374
    @garethbayvel8374 Год назад +1

    Does this support Unity Catalog?

    • @ganeshchand
      @ganeshchand 11 месяцев назад

      yes. the recent release support UC.

  • @rishabhruwatia6201
    @rishabhruwatia6201 Год назад

    Can we have a video for loading multiple tables using single pipeline

    • @rishabhruwatia6201
      @rishabhruwatia6201 Год назад

      I mean something of a for each activity

    • @RaviGawai-db
      @RaviGawai-db Год назад

      @@rishabhruwatia6201 you can check repo dlt-meta and check dlt-meta-demo or run integration tests

    • @brads2041
      @brads2041 Год назад

      We tried that just recently. Depending on how you approach this it may not work. In our case, we did not always call the DLT with the same tables to be processed. Any table that was processed previously, but not in a next run would be removed from unity (though the parquet files still exist - ie behavior like an external table). This is of course not acceptable, so we switched to meta data driven structured streaming. To put this a different way, if you call the pipeline with table a, then call it again with table b, table a is dropped. You'd have to always execute the pipeline with all tables relevant to the pipeline.

    • @RaviGawai-db
      @RaviGawai-db Год назад

      @@brads2041 you reload onboarding before each run to add or remove tables from group. So workflow might be: onboarding(can refresh each row addition removal for tables a,b) -> DLT Pipeline

  • @AnjaliH-wo4hm
    @AnjaliH-wo4hm 9 месяцев назад +2

    would appreciate if databricks comes with a proper explanation ...both the tutor's explanation aren't clear