Great video. A couple of comments Where you have very complex transformation logic combined with many many millions of rows, I have found that materializing intermediate models to tables (instead of views, and incrementally) to be much more performant. Also, I have found that you can indeed benefit from sourcing more than one mart table, from the same (high performing) intermediate table (or intermediate view over that intermediate table). To give you an example, NetSuite data ingested to snowflake. We build several netsuite fact tables, which all source from the same highly transformed "core" intermediate table. If we did that from an intermediate view, its just way too slow.
Great video ! Funny thing that we ran at the exact same problem you explained and we did the same exact solution by creating intermediate models folder, i did't now it was a thing in dbt, but to be honest i came expecting a special feature that dbt offer, for example in our case we had to use intermediate model for optomazation and materlaized as table, i wish dbt have a special intermediate model that drops the table when the run is done, more like a temp tables
Your modeling pipeline could go Staging > Warehouse > Marts but then you can use Intermediate models to support either Warehouse and/or Marts. Basically, you can use an Intermediate model to offload some of the complex logic (if you want) rather than doing it all in one file. Hope that helps!
@@KahanDataSolutions Ah thanks. I've understood that staging should be 1-to-1 wrappers over sources. But what if your source holds json in a column and needs flattening. Would it be okay to have another schema called 'tabular' where you use sql to flatten the json into a table in between staging and source? Or should this be done inbetween staging and warehouse?
Intermediate = A naming convention for a directory of models Ephemeral = A dbt Materialization (aka a config) Therefore, you can create models in the Intermediate directory AS Ephemeral models. But in theory these could alternatively be View or Table materializations. Hope that helps clear it up!
Deliver more impact w/ modern data tools, without getting overwhelmed
See how in The Starter Guide for Modern Data → www.kahandatasolutions.com/guide
Great video. A couple of comments
Where you have very complex transformation logic combined with many many millions of rows, I have found that materializing intermediate models to tables (instead of views, and incrementally) to be much more performant. Also, I have found that you can indeed benefit from sourcing more than one mart table, from the same (high performing) intermediate table (or intermediate view over that intermediate table). To give you an example, NetSuite data ingested to snowflake. We build several netsuite fact tables, which all source from the same highly transformed "core" intermediate table. If we did that from an intermediate view, its just way too slow.
Great video !
Funny thing that we ran at the exact same problem you explained and we did the same exact solution by creating intermediate models folder, i did't now it was a thing in dbt, but to be honest i came expecting a special feature that dbt offer, for example in our case we had to use intermediate model for optomazation and materlaized as table, i wish dbt have a special intermediate model that drops the table when the run is done, more like a temp tables
What would be the right way if I have to use an intermediate model in two different marts? Great tutorial btw.
There is an antipattern on your graph where you are using staging model twice inside intermediate model and inside its mart.
at what stage (staging, intermediate, marts) do we create the fact/dimensional tables of the source data?
Your modeling pipeline could go Staging > Warehouse > Marts but then you can use Intermediate models to support either Warehouse and/or Marts.
Basically, you can use an Intermediate model to offload some of the complex logic (if you want) rather than doing it all in one file. Hope that helps!
@@KahanDataSolutions Ah thanks.
I've understood that staging should be 1-to-1 wrappers over sources. But what if your source holds json in a column and needs flattening. Would it be okay to have another schema called 'tabular' where you use sql to flatten the json into a table in between staging and source? Or should this be done inbetween staging and warehouse?
How is it different from ephemeral models? I do not see that much difference.
Intermediate = A naming convention for a directory of models
Ephemeral = A dbt Materialization (aka a config)
Therefore, you can create models in the Intermediate directory AS Ephemeral models. But in theory these could alternatively be View or Table materializations.
Hope that helps clear it up!