- Видео 77
- Просмотров 314 661
Datavault
Великобритания
Добавлен 1 фев 2018
Learn more about Data Vault and Information Governance from our RUclips channel - subscribe today!
We are passionate about helping organisations make the most of their data. We believe that your data should be your company's greatest asset. Don't waste your data!
Our channel is seeks to help you learn a more about how to make the most of your data with Data Warehouse modernisation using the Data Vault method and practical Data Governance.
We are a consultancy specialising in helping clients maximise their investment in Data Warehousing, Business Intelligence, Analytics, Data Science and Information Governance solutions.
We are also sponsors of the Data Vault User Group and we host the recordings of their Meetups on our site.
We are passionate about helping organisations make the most of their data. We believe that your data should be your company's greatest asset. Don't waste your data!
Our channel is seeks to help you learn a more about how to make the most of your data with Data Warehouse modernisation using the Data Vault method and practical Data Governance.
We are a consultancy specialising in helping clients maximise their investment in Data Warehousing, Business Intelligence, Analytics, Data Science and Information Governance solutions.
We are also sponsors of the Data Vault User Group and we host the recordings of their Meetups on our site.
Links in Data Vault - The Datavault Podcast E8
Welcome to The Datavault Podcast, your go-to source for everything Data Vault!
In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) explore Links in Data Vault.
For more expert insights, check out our services designed to help organizations like yours unlock the full potential of Data Vault.
👉 Explore our training courses to level up your data management skills: bit.ly/4eNkczq
👉 Download our free Data Vault resources: bit.ly/4f9AYs9
👉 Book a free consultation with our team: bit.ly/3BT6VXC
Subscribe to never miss an episode and let us know in the comments if you have any questions or topics you'd like us to cover in the future!
In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) explore Links in Data Vault.
For more expert insights, check out our services designed to help organizations like yours unlock the full potential of Data Vault.
👉 Explore our training courses to level up your data management skills: bit.ly/4eNkczq
👉 Download our free Data Vault resources: bit.ly/4f9AYs9
👉 Book a free consultation with our team: bit.ly/3BT6VXC
Subscribe to never miss an episode and let us know in the comments if you have any questions or topics you'd like us to cover in the future!
Просмотров: 48
Видео
Data Mesh & Data Vault on Snowflake
Просмотров 339День назад
Join Patrick Cuba, Senior Solutions Architect at Snowflake, as he shares his expertise on combining Data Mesh, Data Vault, and Domain-Driven Design. With over 20 years of experience, Patrick is a leading expert in Data Vault 2.0 and the author of 'The Data Vault Guru.' In this session, Patrick will explain how Data Mesh decentralises data ownership and promotes a data-driven culture. He'll disc...
PII Data in Data Vault - The Datavault Podcast E7
Просмотров 5414 дней назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) describe how you to get started with Data Vault. Whether you're just getting started with Data Vault or are looking to optimize your current implementation, we’ve got you covered. For more expert insights, check o...
Data Vault Hubs Explained - The Datavault Podcast E5
Просмотров 6714 дней назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) describe how you to get started with Data Vault. Whether you're just getting started with Data Vault or are looking to optimize your current implementation, we’ve got you covered. For more expert insights, check o...
The Data Vault Conference - The Datavault Podcast E6
Просмотров 38Месяц назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Neil Strange and Alex Higgs debrief from the 2024 Data Vault User Group Conference at Royal Holloway, University of London. For more expert insights, check out our services designed to help organizations like yours unlock the full potential of Data Vault. 👉 Explore our training courses to level up yo...
Data Engineering with dbt - a pragmatic approach
Просмотров 177Месяц назад
Catch up on our recent online meetup where Roberto Zagni introduced the Pragmatic Data Platform (PDP), a practical solution that combines the best of Software Engineering and modern data platform architectures. Learn how to leverage Data Vault with an easier learning curve and make the most of your existing skills. In this dynamic session, Roberto shared invaluable insights on efficient data st...
Understanding Business Keys in Data Vault - The Datavault Podcast E4
Просмотров 119Месяц назад
In this episode of the Data Vault Podcast, Neil Strange and Alex Higgs dive into the essential concept of business keys-those unique identifiers that ensure seamless integration and interoperability within Data Vault hubs. For more insights, check out our services designed to help organizations unlock the full potential of Data Vault: 👉 Explore our training courses: bit.ly/4eNkczq 👉 Download fr...
Perfect Harmony: Modeling Data with Ellie & Haley for the Willibald Team
Просмотров 108Месяц назад
Join Andreas Heitmann, a seasoned Business Consultant from Alligator-Company in Germany, as he shares his expertise in automated Data Vault solutions. With 18 years of consulting experience, Andreas specializes in tools like dbt, Vaultspeed, Data Vault Builder, AutomateDV, Snowflake, and Exasol. In this insightful virtual meetup, Andreas will cover the importance of defining the right scope and...
The Data Vault Q&A Forum - Datavault Podcast E3
Просмотров 39Месяц назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) explore the Data Vault Q&A Forum. What you'll learn in this episode: 0:00 - Introduction 0:43 - What is the Data Vault Q&A Forum? 1:30 - Who is on the forum? 2:18 - Exploring the forum 4:23 - Thanks for watching! ...
Is Data Vault right for you? - The Datavault Podcast E2
Просмотров 992 месяца назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) tackle the key question: Is Data Vault right for your business? They dive into the benefits of Data Vault, from rapid iteration and seamless integration of multiple data sources to enhancing audit, compliance, and...
How to get started with Data Vault - The Datavault Podcast E1
Просмотров 2312 месяца назад
Welcome to The Datavault Podcast, your go-to source for everything Data Vault! In this episode, Alex Higgs (AutomateDV Product Manager) and Neil Strange (CEO and Founder of Datavault) describe how you to get started with Data Vault. Whether you're just getting started with Data Vault or are looking to optimize your current implementation, we’ve got you covered. What you'll learn in this episode...
Automating your Data Platform for Self-Service
Просмотров 742 месяца назад
This is part 4 of our 4-part webinar series with erwin on practical Data Mesh Implementations. Discover the power of self-service in data management with our final webinar. Learn how Erwin Data Intelligence can automate your Data Vault data platform. The approach simplifies access but also ensures that data is reliable, and governance is maintained to unlock the potential of self-service for yo...
McDonald’s Nordics: Enabling improved focus on modelling and the business
Просмотров 852 месяца назад
Christian Ivanoff discusses how McDonald’s Nordics has embraced Data Vault to unify their complex reporting and business intelligence requirements. He will explain how the architecture selected made their technical work easier because of flexibility and automation. This enabled them to focus more on answering the business questions and data modelling. McDonald’s master franchisee Food Folk (McD...
Decentralising - getting the balance right with Federated Governance
Просмотров 692 месяца назад
Decentralising - getting the balance right with Federated Governance
Removing the Barriers to Delivery, Domain-Orientated Architecture
Просмотров 823 месяца назад
Removing the Barriers to Delivery, Domain-Orientated Architecture
Migration challenges in Data Warehousing
Просмотров 833 месяца назад
Migration challenges in Data Warehousing
Data Vault User Group Conference 2024
Просмотров 1696 месяцев назад
Data Vault User Group Conference 2024
JOINing your data teams with dbt Mesh
Просмотров 3086 месяцев назад
JOINing your data teams with dbt Mesh
Model-Driven Data Vault Construction
Просмотров 3406 месяцев назад
Model-Driven Data Vault Construction
4 Rules for Successful Data Vault Projects
Просмотров 2277 месяцев назад
4 Rules for Successful Data Vault Projects
How Twine can efficiently move data from Data Vault to Data Mart
Просмотров 3337 месяцев назад
How Twine can efficiently move data from Data Vault to Data Mart
How supercharged CI/CD & Data Vault ensures data quality and development agility
Просмотров 1679 месяцев назад
How supercharged CI/CD & Data Vault ensures data quality and development agility
Clearing Skies for Cloud Data Warehousing
Просмотров 19710 месяцев назад
Clearing Skies for Cloud Data Warehousing
Agile building of Information using Data Vault 2.0
Просмотров 1,2 тыс.Год назад
Agile building of Information using Data Vault 2.0
Data Vault Performance & Constraints on Snowflake
Просмотров 2,1 тыс.Год назад
Data Vault Performance & Constraints on Snowflake
Like the presentation, gives insights on practial apporches
brand name booking date together cannot ensure a unique booking key
The voice is a bit low..it's only for me :(
Very informative video!
Thank you!
Useful and great intro. Covers essentials
This was a great presentation. This part in particular is what made data vault click. There is no "right model". The right model is the one that works for the needs of your project.
Booking Details satellite should have Booking hash key but it was mentioned Customer Hash
Very useful series. Thank you!
Glad it was helpful!
❤
♥♥
Information does not exist in this "physical realm" Only structured data. Go Datavault.🤩
Can you please help me with these . Would be great if you can explain with some examples. 1. Why link to link relationship is not recommended in RDV? 2. In BDV bridge table, if we are storing only hash keys( not natural keys), then how in fact/dimension we are going to get natural keys?
Hi, I also have a doubt about how the satellite to handle the SCD(slow changing data) case, could you help me to clarify it, thank! If there a satellite stored the address information with a hash key, when the incremental data come, a address was updated. will have two hash keys related to these two address info or only one hash keys and the different is the load_date?
Can you please help me with these . Would be great if you can explain with some examples. 1. Why link to link relationship is not recommended in RDV? 2. In BDV bridge table, if we are storing only hash keys( not natural keys), then how in fact/dimension we are going to get natural keys?
well, they are not recommended for several reasons: first of all: loss of modularity data vault’s modular design separates hubs, links, and satellites to create a clean architecture. linking two links directly breaks this modularity, because links are supposed to represent associations between hubs, not between other links. if links connect links, it becomes harder to maintain and extend the model. redundancy and complexity: a link-to-link relationship introduces redundancy since the original hubs already contain the necessary relationships. querying becomes complex, requiring additional joins or custom logic to trace back to the original hubs. second thing: semantic clarity links are designed to represent business relationships, not relationships between relationships. if two links have a dependency, it may indicate a missing business concept or hub. best practice is to use a link that directly connects hubs instead of linking links, you can create a new link table that directly connects the relevant hubs. for example, if you have a salesorderlink (connecting customerhub and orderhub) and deliverylink (connecting orderhub and deliveryhub), and you need to relate customers to deliveries, create a new customerdeliverylink. let’s say: hubcustomer has a hash key for customers. huborder has a hash key for orders. hubdelivery has a hash key for deliveries. Instead of doing this: SalesOrderLink → DeliveryLink → Some indirect connection. Do this: Create a new link like CustomerDeliveryLink, which directly connects CustomerHub and DeliveryHub with their hash keys.
Can you please help me with these . Would be great if you can explain with some examples. 1. Why link to link relationship is not recommended in RDV? 2. In BDV bridge table, if we are storing only hash keys( not natural keys), then how in fact/dimension we are going to get natural keys?
Around 2m37s for the link table the natural key isn't the concatenation of customer hash and booking hash? If it is so shouldn't the "customer booking hash" be calculated as the md5 of them instead of the md5 of the natural keys extracted from of the staging table? The md5 of the concatenated natural keys shouldn't happen to be identical to the md5 of the concatenated hashes of the natural keys, right?
😩 *Promosm*
Helpful. Thank you.
This is a fantastic talk from Patrick going into great detail
Thanks for watching, we hope you enjoyed!
vgood content as always with PCuba 😊
what's the difference between hash_diff column and other hash column at 7:42. Aren't both same?
Both look the same, but they have a different purpose. Hashdiff is used to detect changes in the payload of a satellite. Rather than checking each column individually it combines all the columns to check together in one hash and then that's checked. If the hashdiff changes it will only be because the value of one of the payload columns has changed, therefore we can assume the data has changed. We hope this helps. Why not join in the Data Vault User Group Forum? It's full of industry experts to answer whatever questions you may have - forum.ukdatavaultusergroup.co.uk/
Genius!! Thnx a lot!!
LIKE the technique discussions, but DO NOT like the political examples.
you cant avoid reality if your job is to mitigate these factors, you wanted a real life example you will get real life problems
data vault what form of data?
Very good presentation and content on streaming into a data vault. Thank you!
highly informative and was latched on to it all the time. Excellent.
Very good explanation!
Thank you!
Can you explain why you do not recommend implementing a DV on a sql server? Why do you think snowflake is a better choice?
Thank you. I was interested to look into satellite loading sql as well. Which book did you ask to refer to?
Thank you! This slide was the most important slide for me. Showing the architecture enabled me to understand how Data vault works and how it can be used in my organization.
We are glad it helped!
þrðmð§m 😡
Gert presentation Neil. You display a lovely style of communication. I recommend your videos to those coming to terms with DV for the first time
Thank you for your comment! We are glad they are useful!
thank you so much
You're welcome!
Is there a next session continuation for this video..This is really good 👍
We are working on other sessions soon!
Can you provide a link to the white paper mentioned in the beginning of the video?
Hi Pasi, www.data-vault.co.uk/what-is-data-vault/ If you follow this link, scroll half way down the website and you will find all of our white papers - including the one mentioned at the beginning of this video. I hope this helps :)
I love it .. :) Always there is a simple solution to a hard problem. You are making data modeler job less ..
Thank you very much!
In my entire life I was waiting for the Puppini Bridge.
Hi Neil, I like this training a lot : ) One question though - why does it mention the effective dates in the satellites? I'm quite sure that DV 2.0 is about insert-only architecture, so getting rid of any updates, which means we cannot have the effective / end dates in the raw vault. Please comment : )
Hi Piotr, thank you for your comment. Its rare that Neil looks at these comments, however its best to post this question on the Data Vault Q&A Forum - forum.ukdatavaultusergroup.co.uk/ where industry experts including Neil will see and answer this or any other questions you may have! :)
Interesting. Look like a great solution and I feel like I need to go deeper... I have a first question... In this model... What is the convenient way to implement more than one relationship to the same table? EG: Email -> to:Contact from:Contact
My concern here is that introduces a new ETL layer. You have to maintain the bridge table. What if my fact tables are very large, with several billion rows? That can be expensive to maintain. If I insert or delete from the fact table I have to do the same on the bridge. It can get problematic very quickly
Hi Fernando, thanks for your comment. You are right: there is one additional step of maintenance to do. But then, you don't have to maintain the dozens (or hundreds) of ad hoc reports, which sometimes are outdated, but no one takes the responsibility to make obsolete. With the USS, you only maintain a live self service environment, useful to everyone. As for the system effort, I recommend saving the Puppini Bridge as a set of physical tables (one for each stage). Imagine your fact table of Sales has 100 million rows (lucky you, BTW!!). If you have an incremental load of that table, once it is finished, it will have 101 million rows. Then you need to update your table Sales_PBS (Puppini Bridge Stage). That table will also need to have 101 million rows. The Puppini Bridge as a full table, with all the stages, should be virtualized: a query that creates on the fly the UNION ALL of all the _PBS physical tables. I recommend maintaining the script of the view with DBT, because it makes it easier when a brand new table gets added to the USS, and some new key columns need to be added to the UNION ALL. It's hard, I agree. But self service BI, until now, was only a very limited success. With this approach, self service becomes really possible. I hope it helps. You can contact me on LinkedIn and we can talk further! 🙂
40:19 A wild cat appears! Just needed to timestamp that for reference. Great video! Learned a lot!
Excellent presentation. This guy solves real world problems. I liked the way the talked about unit of work for the links.
Thank you for your comment!
If each stack of colors represents a dataset then the model is incorrect. The model contains just one link (red, purple and blue) and the link is coloured in Yellow. That link is Correct only if the values of the yellow fields are non hubs making the link peged leged. I am missing the link from stack2(purple, light purple, red, green and blue) and the link from stack3(yellow and blue). Why didnt you model them? This model is not Jedi safe, data Cannot be recreated from the model as it was in the sources.
You got it Right Andreas, I think he just demonstrated to show how to draw hubs/links/sats and use it as a Star schema. My assumption is creating a complete DV model wasnt an intention.
Data vault! Seriously! Cheap and fast? Hmmm. Quick and easy? Hmmm.
can we have multiple satellite for a hub, if yes then do we keep natural key from both satellite in to hub?
Hi Ashok, thank you for your comment. Its best to ask a questions like this on the Data Vault Q&A Forum, follow this link - forum.ukdatavaultusergroup.co.uk/ There are many other similar questions currently being answered and its a great place to learn about Data Vault.
p̶r̶o̶m̶o̶s̶m̶
nice presentations, i was wondering on last use case about the snapshots asked on demand by users, would they get loaded on independent tables or simple something like timetravel posfix on sql command at <timestamp> as feature in Snowflake ?
Before you continue to publish this blunder any further I'd recommend to read Sorting and Searching from Donald Knuth. You can't cover all and anything with hash keys. In particular not in databases. You simply do not know the hash function for arbitrary data. Hash-keys as indices in compilers are perfect, but not in databases. You will not get a structured data-model. Without defining cardinality you won't get anything useful with hash-keys.
Hi Neil. This was a very interesting session. On the DevOps slide, the diagram showed Kanban flowing into GIT and Wiki having some icon. What system are you using there?
The green hub should have 2 satellites instead of one?
Yes, but only if the green Field in stack2 represents More then just a reference / business key to the green hub.
I still struggling to accept Satellites using system load_date as pk instead of a more business date, just because "we dont control" source dates. In my opinion would be better to use extract date in what i call "real data date", i done this in several data lakes ingestion processes that would grab from rdbms like oracle 1 extra column on each table: instead of: SELECT * FROM <some table> do: SELECT sysdate extract_ts, * FROM <some table> then no matter if ingestion of data takes a week to get loaded and eventually loaded after more recent data, we always have proper scd2 "real data date". Dates are one of the most important business aspects, and as such I congrat you making a video calling attention on this.