Crisp, clear and short introductory explanation of the how Data Vault has evolved from Dimensional Modeling. Personal thanks to the engaging and excellent presenter.
Data Vault is typically used for the main DW layer. This layer is integrated, non-volatile, time-variant, and subject oriented (@ Inmon) and auditable, agile, atomic etc. The business meaning implied by the "conformed" dimension in this example represents the common business rule transformations. These are intended to be completed within the DW layer as part of the target or BDW area. In the same layer there is also a RAW area (traceability). The Data Marts remain Dimensional Models.
Thanks for the Great Introduction into Data Vault. The videos in the academy are great. Appreciate if you could put a logical structure of Videos that can be read on a sequential basis
Thanks for sharing the concept. My queries are: How is this different from Snowflake? The detail lies is implementation because of more joins and hence performance for faster read.
How can we design the already connected dimension tables to the fact table in Data vault ? You were just explaining about the extended dimensions with vault
Very clear explanation. Thanks! Simple Question: If I aggregate Sales upwards, via the link tables, to a unique customer contained within the hub, how do I know what version of a Satellite record to use. A unique Hub record can now have multiple Satellite records by having different Load Dates.
An SCD2 is painful to maintain and use, but very effective at tracking history, as new versions have unique surrogates that can be associated with transactions. The hub-Sat model does not associate different satellite versions for the same hub record to a transaction. Only to the unique hub record. So If I use a Sat table to track changes to a specific customer contained in the hub, I cannot see what transactions occurred when during the life of the customer. Is my understanding correct??
Pretty good explanation on DV. Though it talks about Hub and Satellites it doesn't seem to talk about the Links. Is there a separate video on that? Covering Links would make this topic complete in my opinion.
This particular topic didn't concern links however there are other videos in this channel that do. Also you can find more on the Genesee Academy youtube channel.
Can you please help me with these . Would be great if you can explain with some examples. 1. Why link to link relationship is not recommended in RDV? 2. In BDV bridge table, if we are storing only hash keys( not natural keys), then how in fact/dimension we are going to get natural keys?
This approach is no way fundamentally different from dimensional modeling. Just a variation. If a new hub comes, you still have to change all the link tables . Small changes may be are easier but Ralph never said you must prop everything in a giant dimension. This is more or less what DV improves, isolating attributes based on the same frequency of changes (satellites) and linking this to a dimension (hub) instead of direct link to the facts tables like classical MDM does. Makes sense, but packaging this like new modeling paradigm and calling it all different 😂😂😂😂. Presentation would benefit from talking about frequency of changes grouping. This is the essential part
I am not convinced that data vault modeling is the evolution of dimensional modeling. The main purpose for dimensional modeling ( denormalizing) is to reduce the number of joins, make it less expensive and hence get performance gain while querying reports in realtime. With the above data vault approach, you are again increasing the number of joins ( seems like you are again somewhat normalizing). Please let me know if my logic is correct.
Is this really the evolution of star or Snowflake schema’d DWH design? A data vault modeled DWH would require far more joins and likely perform poorly. What about that? Whereas a single dimensional model has fewer tables is easier to partition faster to query.
No, no, no, no... No on so many levels. And people cannot see the lecturer is making mistakes because he is giving no examples. The final structure you are making is turning dimension table into a fact table and you would have a record there for every customer. So basically if you have a million customers, you would have a million records there. If you have half a million customers from Melbourne you would end up writing "Melbourne" half a million times. Good luck with aggregating data by value. But that is only half a problem. Say you want to add a "gender" value to your customer. On your initial structure you would need to go through a million records of customers and add a new value to a new field. And you say it is difficult. And you offer an easier way, you create a new table and you add the gender there. So you create a new table with a key for every customer and then and a value of their gender, that would take exactly the same amount of calculation - it would be 1 million records processed. Where is a simplification?
Melbourne would not be written a million times. What this video did not mention is that there would be an intermediate link table. That is, there would be a customer-location link table with a million records linking those customers to the same (integer) id. Then there is a final location SAT table that contains a single record associating that id with “Melbourne”. As for your second point, the benefits come from modularity. Instead of having to modify a “dim_customers” file every time you want to add new attributes, you can write brand new “sat_customer_location” and “link_customer_location” files without modifying the original files. In a fast paced agile environment, writing new files is much easier than modifying existing files.
Don't you understand how ridiculous dimensional modelling is compared to 3rdNF. Dimensional modelling is simple nonsense. Data Vault is a fair good concept, but very related to 3rdNF. I like Data Vault, because you can do things in parallel, but dimensional modelling is simple pure nonsense. One of those unprecedented failures in IT.
The moment you reverse the business key to reverse the relation.. it is a pragmatic way ti show evolution. Loved this.
Short, straightforward and objective as every great work should be done. Congrats
this is one of the most important video for one who comes from a the kimbal world ...
Brilliant! So helpful in helping me to understand *both* modeling techniques.
Crisp, clear and short introductory explanation of the how Data Vault has evolved from Dimensional Modeling. Personal thanks to the engaging and excellent presenter.
Dimensional modelling was SAP. A complete company without any idea how data models woks at all.,
absolutely stunning explanation. Thank you a lot
Data Vault is typically used for the main DW layer. This layer is integrated, non-volatile, time-variant, and subject oriented (@ Inmon) and auditable, agile, atomic etc. The business meaning implied by the "conformed" dimension in this example represents the common business rule transformations. These are intended to be completed within the DW layer as part of the target or BDW area. In the same layer there is also a RAW area (traceability).
The Data Marts remain Dimensional Models.
Thank you. This is the best explanation of the problem that Data Vault actually solves.
Very good! Very clear story on DV modeling - thanks a lot for sharing.
Wow this was so easy to understand. Great story telling and explanation. Thank you!!
Best video on adding new source attributes
Really well explained, great example!
Fantastic! It just solved the DV to DM on the Dimensions!!! The dimension is a plain cross product between the satellites on the hub!
Great explanation!
very well said.. I really enjoyed this sessions, and learned a lot.
Thanks for the Great Introduction into Data Vault. The videos in the academy are great. Appreciate if you could put a logical structure of Videos that can be read on a sequential basis
Great and easy to watch video, big thanks
wowow!!! simple and perfect explanation
Thank you. This made it real and practical for me.
Brilliant explanation !!!! Thank you very much
I love this great presentation.
Awesome explanation. Great. Thank you
wonderfully explained!!
Thanks for sharing the concept. My queries are:
How is this different from Snowflake?
The detail lies is implementation because of more joins and hence performance for faster read.
Very good. Thank you.
Thank You
great explanation, thanks a lot
Awesome!
This is Great quickly digestible
Thanks very much for this clear explanation. Could you please also create a video explaining how FACTS are derived from DV or vice versa? Thanks
Well done.
v good. i have just started reading on data vaults
akthar
THANKS!!!
Good one
It is really a great lecture
It's very similar to EAV ( Entity Attribute Value) Model
How can we design the already connected dimension tables to the fact table in Data vault ? You were just explaining about the extended dimensions with vault
Very clear explanation. Thanks! Simple Question: If I aggregate Sales upwards, via the link tables, to a unique customer contained within the hub, how do I know what version of a Satellite record to use. A unique Hub record can now have multiple Satellite records by having different Load Dates.
An SCD2 is painful to maintain and use, but very effective at tracking history, as new versions have unique surrogates that can be associated with transactions. The hub-Sat model does not associate different satellite versions for the same hub record to a transaction. Only to the unique hub record.
So If I use a Sat table to track changes to a specific customer contained in the hub, I cannot see what transactions occurred when during the life of the customer.
Is my understanding correct??
best 11.23 mins i spent today
Pretty good explanation on DV. Though it talks about Hub and Satellites it doesn't seem to talk about the Links. Is there a separate video on that? Covering Links would make this topic complete in my opinion.
This particular topic didn't concern links however there are other videos in this channel that do. Also you can find more on the Genesee Academy youtube channel.
Can you please help me with these . Would be great if you can explain with some examples.
1. Why link to link relationship is not recommended in RDV?
2. In BDV bridge table, if we are storing only hash keys( not natural keys), then how in fact/dimension we are going to get natural keys?
I could see this becoming a performance nightmare ...
are there any modeling tools that support Data Vault?
I didn't get if all or any of the smaller customer dimensions are type 2 dimensions?
They are all type 2. :-)
Perfect but It seems 3NF like hybrid approach.
This approach is no way fundamentally different from dimensional modeling. Just a variation. If a new hub comes, you still have to change all the link tables . Small changes may be are easier but Ralph never said you must prop everything in a giant dimension. This is more or less what DV improves, isolating attributes based on the same frequency of changes (satellites) and linking this to a dimension (hub) instead of direct link to the facts tables like classical MDM does. Makes sense, but packaging this like new modeling paradigm and calling it all different 😂😂😂😂. Presentation would benefit from talking about frequency of changes grouping. This is the essential part
I am not convinced that data vault modeling is the evolution of dimensional modeling. The main purpose for dimensional modeling ( denormalizing) is to reduce the number of joins, make it less expensive and hence get performance gain while querying reports in realtime. With the above data vault approach, you are again increasing the number of joins ( seems like you are again somewhat normalizing). Please let me know if my logic is correct.
Is this really the evolution of star or Snowflake schema’d DWH design? A data vault modeled DWH would require far more joins and likely perform poorly. What about that? Whereas a single dimensional model has fewer tables is easier to partition faster to query.
No, no, no, no... No on so many levels. And people cannot see the lecturer is making mistakes because he is giving no examples. The final structure you are making is turning dimension table into a fact table and you would have a record there for every customer. So basically if you have a million customers, you would have a million records there. If you have half a million customers from Melbourne you would end up writing "Melbourne" half a million times. Good luck with aggregating data by value.
But that is only half a problem.
Say you want to add a "gender" value to your customer. On your initial structure you would need to go through a million records of customers and add a new value to a new field. And you say it is difficult. And you offer an easier way, you create a new table and you add the gender there. So you create a new table with a key for every customer and then and a value of their gender, that would take exactly the same amount of calculation - it would be 1 million records processed. Where is a simplification?
Melbourne would not be written a million times. What this video did not mention is that there would be an intermediate link table. That is, there would be a customer-location link table with a million records linking those customers to the same (integer) id. Then there is a final location SAT table that contains a single record associating that id with “Melbourne”.
As for your second point, the benefits come from modularity. Instead of having to modify a “dim_customers” file every time you want to add new attributes, you can write brand new “sat_customer_location” and “link_customer_location” files without modifying the original files. In a fast paced agile environment, writing new files is much easier than modifying existing files.
Don't you understand how ridiculous dimensional modelling is compared to 3rdNF. Dimensional modelling is simple nonsense. Data Vault is a fair good concept, but very related to 3rdNF. I like Data Vault, because you can do things in parallel, but dimensional modelling is simple pure nonsense. One of those unprecedented failures in IT.