Does the cloning time also depends on the number of tables/records? If so, can you give an approx. I am interested in understanding if this approach (of cloning prod and then dropping )is valid for tables that would potentially hold 1B+ records.
At 22:00, Mark mentions the "dbt clone" takes 10 minutes (out of the 20 minutes). I was under the impression that snowflake zero-copy cloning was "instant". Is this wrong? Or is the snowflake clone part instant but then you are spending 9 minutes doing other cleaning/transforms to the cloned data to get it ready for the automated tests?
It's not instant. We've seen times of 40 minutes and more. I'm guessing it's setting up pointers to all the micro-partitions in the original DB, but that's speculation. We have logged this with Snowflake, and the response is it takes what it takes.
800 Models !! That does not make sense. Remind me PeriscopeData reports built by each analysts. You probably need to look into building fact and dimensions tables.
They probably already do, they just break up the processing of them into sub-models. That's what we do to keep things understandable for bigger models. Fact and dimension models commonly have tens of models supporting them.
@@chrism3790 I don’t remember the speaker mentioned fact and dimensions. Maybe he’s using DBT to join raw tables and create table, maybe each analyst can create their own models. By doing so, they (analysts) won’t be able to unify the data properly and come up with standardized KPIs at the company level.
Does the cloning time also depends on the number of tables/records? If so, can you give an approx. I am interested in understanding if this approach (of cloning prod and then dropping )is valid for tables that would potentially hold 1B+ records.
I wonder if the manifest feature was available back when they solved this?
At 22:00, Mark mentions the "dbt clone" takes 10 minutes (out of the 20 minutes). I was under the impression that snowflake zero-copy cloning was "instant". Is this wrong? Or is the snowflake clone part instant but then you are spending 9 minutes doing other cleaning/transforms to the cloned data to get it ready for the automated tests?
It's not instant. We've seen times of 40 minutes and more. I'm guessing it's setting up pointers to all the micro-partitions in the original DB, but that's speculation. We have logged this with Snowflake, and the response is it takes what it takes.
800 Models !! That does not make sense. Remind me PeriscopeData reports built by each analysts. You probably need to look into building fact and dimensions tables.
if you have hundreds of source tables and you build staging tables for these it's not implausible to have 800 models.
They probably already do, they just break up the processing of them into sub-models. That's what we do to keep things understandable for bigger models. Fact and dimension models commonly have tens of models supporting them.
@@chrism3790 I don’t remember the speaker mentioned fact and dimensions. Maybe he’s using DBT to join raw tables and create table, maybe each analyst can create their own models. By doing so, they (analysts) won’t be able to unify the data properly and come up with standardized KPIs at the company level.