Efficient CI/CD for dbt, Clearcover

Поделиться
HTML-код
  • Опубликовано: 13 сен 2024

Комментарии • 8

  • @kshitijpathak8646
    @kshitijpathak8646 2 года назад +1

    Does the cloning time also depends on the number of tables/records? If so, can you give an approx. I am interested in understanding if this approach (of cloning prod and then dropping )is valid for tables that would potentially hold 1B+ records.

  • @chrism3790
    @chrism3790 3 года назад

    I wonder if the manifest feature was available back when they solved this?

  • @tomhallett7439
    @tomhallett7439 3 года назад +1

    At 22:00, Mark mentions the "dbt clone" takes 10 minutes (out of the 20 minutes). I was under the impression that snowflake zero-copy cloning was "instant". Is this wrong? Or is the snowflake clone part instant but then you are spending 9 minutes doing other cleaning/transforms to the cloned data to get it ready for the automated tests?

    • @simianinc
      @simianinc 2 года назад +2

      It's not instant. We've seen times of 40 minutes and more. I'm guessing it's setting up pointers to all the micro-partitions in the original DB, but that's speculation. We have logged this with Snowflake, and the response is it takes what it takes.

  • @franckleveneur676
    @franckleveneur676 3 года назад +1

    800 Models !! That does not make sense. Remind me PeriscopeData reports built by each analysts. You probably need to look into building fact and dimensions tables.

    • @arcadia485
      @arcadia485 3 года назад

      if you have hundreds of source tables and you build staging tables for these it's not implausible to have 800 models.

    • @chrism3790
      @chrism3790 3 года назад +2

      They probably already do, they just break up the processing of them into sub-models. That's what we do to keep things understandable for bigger models. Fact and dimension models commonly have tens of models supporting them.

    • @franckleveneur676
      @franckleveneur676 3 года назад

      @@chrism3790 I don’t remember the speaker mentioned fact and dimensions. Maybe he’s using DBT to join raw tables and create table, maybe each analyst can create their own models. By doing so, they (analysts) won’t be able to unify the data properly and come up with standardized KPIs at the company level.