This is great and to the point. But pls add more to this topic .. some challenges/real time examples. Would be of great help to lot of people in the DE community @Zach
Hi Zach, thanks for sharing your thoughts. I just wanted to see if there was a mistyped last name in your CC (0:16) -- "... go read the Kimball book, some people say, go read the Eman book ..." Should this be -- "... go read the Kimball book, some people say, go read the Inmon book ..." (Bill Inmon)
I know you mentioned learning by doing which is my preferred approach as well however do you have any resources on learning about scd type 1, type 2 etc?
@@EcZachly_ most companies are using MPPs these days just from the sheer speed/efficiency to cost ratio; then why are companies still testing facts/dimension/PK-FK based data modeling knowledge?
SCDs are still very worth it once your dimensions hit a certain scale. Airbnb still considers them the gold standard. Maxime wrote that article and the other Airbnb data architects decided he was wrong.
@@EcZachly_ I see, I haven’t seen any SCDs at Meta.. maybe I didnt explore enough..not sure on Airbnb usecases. Its a bit challenging to read from SCDs from reporting tools(Unidash or Tableau). we access detail dimensions often for many metrics.. Besides, MapReduce architecture don’t directly support Updates, you’d have to compute the whole dataset again(by omitting older version and adding a newer version) for a single update. I felt Dimension Snapshot approach made a lot of sense for many practical usecases, even at the cost of storage and compute. Yes, there are drawbacks. Like, you can’t have unlimited history when capturing full snapshots.. that can be addressed by Hist, first/last events or datelist fields. I would want to know the usecase where SCDs are optimal.
This is great and to the point. But pls add more to this topic .. some challenges/real time examples. Would be of great help to lot of people in the DE community @Zach
Now I definitely understand SCD
Thank you Zach !
Great one. I would say that leaning normalization is a good start.
de-normalization too
I’ve always wanted to model data, but I’ve never had the right dimensions for it. 🤓
good one, keeps making plz
Hi Zach, thanks for sharing your thoughts. I just wanted to see if there was a mistyped last name in your CC (0:16) -- "... go read the Kimball book, some people say, go read the Eman book ..." Should this be -- "... go read the Kimball book, some people say, go read the Inmon book ..." (Bill Inmon)
You’re totally right! Nice catch!
Inmon is unreadable. Makes Kimball look like Shakespeare.
Why have the end date be in the future instead of just null?
BETWEEN syntax doesn’t work if end date is NULL
please try Biryani. It will become your no. 1 and it will be a permanent dimension then
Why would you not just have the end-date null for the current dimension?
I like the metaphorical explanation. Why don't you write your own platform agnostic data modeling book?
I know you mentioned learning by doing which is my preferred approach as well however do you have any resources on learning about scd type 1, type 2 etc?
kimballs the data warehouse toolkit has in depth explanations of slowly changing dimensions (all types)
I want to learn data modeling from you do you offer any course to do that because I want to learn in depth
knowledge on this concept
What about modeling in MPPs like Redshift? Traditional dimensions/facts does not match the archi of MPPs
Those are more denormalized, you’re right!
@@EcZachly_ most companies are using MPPs these days just from the sheer speed/efficiency to cost ratio; then why are companies still testing facts/dimension/PK-FK based data modeling knowledge?
wouldn't storing age a bad idea? Just store the year
Dimension Snapshots is all that you want to know.. SCDs are outdated and not efficient, as storage and compute got cheaper..
SCDs are still very worth it once your dimensions hit a certain scale.
Airbnb still considers them the gold standard. Maxime wrote that article and the other Airbnb data architects decided he was wrong.
@@EcZachly_ I see, I haven’t seen any SCDs at Meta.. maybe I didnt explore enough..not sure on Airbnb usecases. Its a bit challenging to read from SCDs from reporting tools(Unidash or Tableau). we access detail dimensions often for many metrics.. Besides, MapReduce architecture don’t directly support Updates, you’d have to compute the whole dataset again(by omitting older version and adding a newer version) for a single update. I felt Dimension Snapshot approach made a lot of sense for many practical usecases, even at the cost of storage and compute. Yes, there are drawbacks. Like, you can’t have unlimited history when capturing full snapshots.. that can be addressed by Hist, first/last events or datelist fields. I would want to know the usecase where SCDs are optimal.
Std or SCD.😂😂😂😂😂 thanks!
I was going to comment same 😝😂😂😂😂😂😂.
He said that’s how you kind of get the STD. Lol
*promosm*