I have watched or read many explanation about the differences among these 3 terms, but so far this video is the simpliest yet cleariest and easiest to understand. Thanks a lot!!!
In a typical database there will be transactions taking place like insert of a table row, update of a table row, read of a table row that are in line with a set of business cases. In a datawarehouse there will be analysis taking place to across multiple rows from multiple tables. A data lake is where data goes to get drowned.
I dont think any other video in the internet explains this difference as clearly as this video. Thank you brother. Keep posting more videos to educate us.
Super! In just 8 minutes, you have put such a clear picture of data base, data warehouse and data lake, that I can never forget and in future, any time I deal with these terminology, I have crystal clear idea of what am I dealing with! You are a GREAT teacher Chandoo and I really appreciate your effort!
Little correction - data warehouse is a system and/or db where Hundreds of heterogeneous dbs(eg- chocolate db, biscuits db, candy, icecream dbs) or file based systems like excel xml are altogether modelled/stored/streamed using ETL(tool) for data analytics & applications downstreaming, data science & AI build purpose also.
Very good, thanks! What I got is that it is more of a conceptual difference rather than technical, understanding that there must be some infrastructure nuances..
I love how you indirectly went into explaining facts and dimensions for the database. Suggestion, it would be very helpful for these 3 concepts if you explained the concept of NOSQL and file based vs table based storages. Ofcourse there's alot more to it, but a simple summary and the benefits will do.
Nice illustration , Since I entered into IT struggling a lot to understand between these 3 components. You have cleared all my doubts ... Thank you very much
Thank u so very much sir. Now I can comfortably create my Web app. Irony is, about 2 months ago, I searched to know the difference but to no avail. Now utube just suggest this to me when I am busy watching comedies. Eventually, everything falls in place. Just time
Wow, this video explanation is very easy to understand & really helpful. Just by watching this video 2-3 times, I already have a big picture about the data lake, database, data warehouse in my mind. Thank you so much for making the video!!!
Thanks Chandu for making these concepts so simple to understand. Whenever I get confused I just refer to your videos for quick and accurate understanding of the concepts.
i really appreciate your effort and time which you put into your video.TBH your video is on the point and very interesting i never thought that someone explain these topic that much easily.May God Bless you and give you r more power so you make more video for us
Hi, your video is very clear to understand the basics... but still I have question, please clarify. 1) why can't ETL take the source table and target table in the databases itself to create reporting or historical data table. Why we need to load into another database and call it as Data warehouse? Is there any significant difference like performance or something? Please explain this part... you explained "Why we are using each type", but I want you to cover why can't we use one instead of other. Eg., why cant we use create historical table in databases itself, why we need data warehouse separately. What is the special thing to go to DW instead of DB... also "why DL? And why can't DW?"
The answer is more technical. While you "CAN" keep both DB & DW kind of tables in the same place, normally people don't do it. Because, 1) Databases are "designed" so that they can add / change or delete data very quickly and efficiently. They also ensure that your data integrity is maintained (if a customer is deleted, they can no longer transact for ex.) 2) Data warehouses are "designed" so that they can add data and generate reports (or summaries) quickly. As there are usually no delete or edit operations in DW, the system is optimized to instead focus on piling up data and doing massive calculations quickly. 3) Hence, Internally the architecture and software / hardware design is different for these two systems. So it makes sense to keep them separate. Think of it like this. While both a car & tractor can drive, you won't use them interchangeably as they have their strengths & weaknesses.
@@chandoo_ Thanks for your reply. I wanted to know that "Design" behind the Strength and weakness of each type. I understand it is very technical and cannot cover in text reply. Thank you so much for explaining this to make us understand the differences.
Thanks a lot! That was one of the best explanations I ever heard. Sometimes I think people want things to be difficult so they seem more intelligent...
Hey Chandoo. Great video, but also I’m so happy to have found your channel and see you speak as over so many years your website has been solving my excel queries when I Google them.
Great job, clear explanation and I also enjoy your humor. Would be great if you could create a video describing the difference between data scientist, engineer, analyst and architect. Kudos on your excellent work!
If you're starting in I.T. doing analysis type work, you'll start as an Analyst. This can be anything from reporting, automated feed maintenance/RCA, and even development. Most of the above 3 (maybe save for Data Scientist) start here. Data Engineer is probably the most logical next step from analyst. You'll definitely be doing more development and analytical work as an analyst prior to this. This shifts your scope from retrieving data from a data warehouse/db/lake (lake is quite rare for a run of the mill analyst), to actually designing and some possible light architecting of table/schema structures for data to import into from other sources (typically starting as transactional information into a database from an app, or maybe an external source of some sort). Typically as an engineer you won't start on data warehouse modelling until you've had some experience with general transactional architecting/engineering since the data within a warehouse shouldn't be updated/deleted, only inserted. It will be deleted, possibly if you've archived it in some situation (like data that's over x-years old and based on specific policies), but even then it probably wouldn't be deleted. If the architecture allows, you may just duplicate the tables, or partition them in some way and then archive the older pages. They may also determine certain structural recommendations (rowstore vs columnstore table structures, for example, or using NoSQL vs relational databases), but usually it's in concert with an Architect if the process being designed is large enough, or has significant impact, especially in terms of performance. However, after discussions between Engineers and Architects, the Engineers (and to a lesser extent, Analysts) will IMPLEMENT the requisites of decided Architecture. Engineers are typically more hands on than Architects, but Archs may get their hands dirty if something is largely conceptual and they want to start plugging away earlier in the phase to ensure design solidity. Data Architect is anything from designing the schema for your transactional infrastructure (your primary database), data warehouse, or even data lake, as well as helping navigate and determine how to import data into those repositories, as well as even more expansive things such as CI/CD pipelines, *maybe* networking tasks if you're familiar enough with that (usually system administrators do that, though), or even helping implement connection string/authentication against your cloud resource targets originating from nearly any source caller (on premises machine, like a developer computer, a VM hosting an app service, CI/CD agent, or a completely separate cloud service not native to your cloud service, even on a completely different domain or client server). An Architect is going to be responsible for HOW disparate system objects are going to interact with each other and any potential issues given certain implementations or design sequences. Typically Architects are going to have some knowledge as to what different approaches are available and determine which makes sense given what's required for the need or problem that needs resolution. As an Architect you're not expected to know how to implement everything as if you were doing all the work yourself. However, having a basic understanding of the limitations of each element in the design will definitely help you determine which is possible and which may not be earlier in design phase, which helps mitigate wasted developer time later during spikes (Proof of Concept phases) and help with further engineering alignment tasks. Most people consider scientists as the babies in the room because the data they require should be perfect in terms of not needing to accommodate any changes to their representations outside of any algorithmic modelling is concerned. It's entirely possible a Scientist will ask the Engineer to modify schema and data to accommodate some sort of analysis or data modelling they're trying to complete. It's not a-typical for an Engineer to work closely with a Scientist, but not typical for the Scientist to work with the Architect, aside from initial standing up of a new Data Warehouse or Data Lake. Typically the Engineer maintains or may make the every-day changes to those structures once the inputs/outputs/transformational processes have already been established. Scientists are typically Statisticians or anything having to do with applied mathematics. They will also typically work with code that isn't strictly SQL, such as Python, R, Power BI, DAX, (maybe MDX, but I think that's fallen largely by the way-side), etc...Scientists are tasked with supplying the answers to complex problems for the business using quantitative analysis. These are the people that determine what Ads you may see given your previous and most recent search history. Something you searched for 3 years ago may not be as relevant as something you searched for yesterday. That would be a typical example of what a Scientist may do. Also, Google translate, things like that will be developed by the Scientist, but the Architect will design the bridges to source that data whereas the Engineer will make that design a reality. The Analyst will make sure data makes sense as it starts trickling through the design process and if there's any issues, the Analyst and maybe working with the Engineer will troubleshoot the why/how and determine a fix where either of them may implement that fix to ensure it works as intended. If you look at it as a decision tree, it may look something like: Analyst > Engineer > Architect Analyst > Engineer > Scientist Analyst > Scientist (again, typically short cut by a Masters in Statistics or similar) Hope that helps!
Great explanation. I've personally found data warehouses rarely service multiple areas of an organisation. They are built with the influence of one area in mind and therefore deemed not fit for purpose for other areas. That means that other areas are left running their own jobs on the databases. If data lakes are meant to provide a partial solution for that issue then great. But they do run in to the issue of huge tech debt at that point.
Thanks Carleast. Many organizations also implement "data marts", kind of like topic / theme specific data warehouses. This opens another can of worms where data is duplicated and often inconsistent.
Correct me if im wrong. But i see database as the source of live dashboards? It is the representation of data that are being used currently. Data warehouse is like a storehouse for the historical data that the database have produced. Its like a back up copy of the data? And data lake is like a cloud storage where you just store all kinds of data randomly just for the sake of storage? It may contain datawarehouse data, tables, isolated tables, reports etc??
I have watched or read many explanation about the differences among these 3 terms, but so far this video is the simpliest yet cleariest and easiest to understand. Thanks a lot!!!
Wow.. thank you for that 😀
Exactly, this is how I feel. Thanks Chandoo.
I came to the comments to say the same thing! Thank you for this simple, illustrative explanation.
@@chandoo_ qqq
i very agree
There is no other video on youtube that explains DB/DW/DL this easy. Really appreciate the time and effort you put into making these videos.
Wise men can explain sophisticated things in a way that a 5-year kid can easily learn! Congrats Wise Man!
😍 That is a beautiful compliment. Thanks Amir.
I have seen many videos but this explanation is very nice and clear
As a person in this industry, this is the best video ever. Exceptionally clear.
In a typical database there will be transactions taking place like insert of a table row, update of a table row, read of a table row that are in line with a set of business cases.
In a datawarehouse there will be analysis taking place to across multiple rows from multiple tables.
A data lake is where data goes to get drowned.
"A data lake is where data goes to get drowned." 😂😂😂
Simple to start with. No PPT slides, just notepad is enough to explain ❤️ Thank you bro. Keep up your good work 👍
One more comment for me.
The best, most simple, laconic, yet rich, explanation about the diffs of the terms.
I dont think any other video in the internet explains this difference as clearly as this video. Thank you brother. Keep posting more videos to educate us.
Super! In just 8 minutes, you have put such a clear picture of data base, data warehouse and data lake, that I can never forget and in future, any time I deal with these terminology, I have crystal clear idea of what am I dealing with! You are a GREAT teacher Chandoo and I really appreciate your effort!
I love this. This explains perfectly what I've been trying to explain at work. Instead of me keep arguing I am just going to show this video
Little correction - data warehouse is a system and/or db where Hundreds of heterogeneous dbs(eg- chocolate db, biscuits db, candy, icecream dbs) or file based systems like excel xml are altogether modelled/stored/streamed using ETL(tool) for data analytics & applications downstreaming, data science & AI build purpose also.
@@ChrisSmithFW Yeah, but he forgot to mention so.
I believe that's what he said. His explanation is just a lot more understandable than yours.
Your answer sounds like quoted from an NCERT textbook and his is more like a next door tuition teacher
Wtf
I think because of your clear and concise points and humor, I learn more from you than other Excel tutorial channels.
Keep up the great work.
Aww.. that means a lot Patrick :)
This video did a great job of helping me learn the distinction between these 3 things. Love it!
Thank you Demetri... 😍
Even a person who is at the earliest stages of his data career would understand this. Thanks a lot.
Without any flattery- the BEST explanation of this topic I ever encountered on youtube!
Very good, thanks!
What I got is that it is more of a conceptual difference rather than technical, understanding that there must be some infrastructure nuances..
There was not ppt , just sweet and crisp explanation of the topic using a notebook. 👌🏽 Loved it.
Thank you Martin 😀
I love how you indirectly went into explaining facts and dimensions for the database.
Suggestion, it would be very helpful for these 3 concepts if you explained the concept of NOSQL and file based vs table based storages.
Ofcourse there's alot more to it, but a simple summary and the benefits will do.
It’s a magic that I found you, thank you so much for explaining in simple words difficult at first glance things.
I love how this man explains things
This was the best explanatory video that I came across!!
Glad it was helpful!
Very nice for non data specialists. I was searching for basic explanation and that's what you gave me!
Data gods have smiled. Thank you Chandu!!!
Thank you Venwhen.
Awesome video of comparison/differences! The the explanation was very easy to follow! Thank you! Also the puns are hilarious! Keep the content coming.
Chandoo...you have a divine gift for explaining things so clearly. The drawings help so much too. I wish I had found your channel sooner.
Nice illustration , Since I entered into IT struggling a lot to understand between these 3 components. You have cleared all my doubts ... Thank you very much
Thank u so very much sir. Now I can comfortably create my Web app.
Irony is, about 2 months ago, I searched to know the difference but to no avail. Now utube just suggest this to me when I am busy watching comedies.
Eventually, everything falls in place. Just time
Wow, this video explanation is very easy to understand & really helpful. Just by watching this video 2-3 times, I already have a big picture about the data lake, database, data warehouse in my mind. Thank you so much for making the video!!!
Glad it was helpful!
OMG!!. Never knew this thing could be made this interesting. Great Humour! Subscribed !!
FANTASTIC explanation!
Now I realize I have created all three of these over the years. I wish I had understood these concepts better back in the day.
Glad it was helpful!
Best Explanation on the internet!
Just wanted say I love you so much and I appreciate your effort to make us educated on these type of stuff .
Thanks Chandoo.
RUclips algo was brilliant today suggesting me this goldmine!
Thanks Chandu for making these concepts so simple to understand. Whenever I get confused I just refer to your videos for quick and accurate understanding of the concepts.
This is the clearest and understandable in layperson’s term. Thank you!
AWESOME, SIMPLIFIED EXPLANATION THANK YOU SO MUCH.
i really appreciate your effort and time which you put into your video.TBH your video is on the point and very interesting i never thought that someone explain these topic that much easily.May God Bless you and give you r more power so you make more video for us
Excellent video, with great and user friendly explanation. Loved it
The best video which explains behind the scenes....simple language and simple example....All the best for ur future endeavors...
I have always struggled to understand what a datawarehouse is but this video made it so simple to understand thank you
With your explanation, I am confident that I can get an A in this course. 😊
You got this!
Simplicity is the utmost form of sophistication.
Subscribed
Welcome Shahnawaz... 😀
One of the finest explanations. 👍
Loved it ❤️
Glad you liked it!
Thanks, Chandoo, for this humorous primer on these database buzzwords! Keep posting such conceptual nuggets in your signature style! 👍🏻
I had no idea about data warehouse or data lakes. Thanks Chandoo for sharing your knowledge and the great breakdown of each.
Best video there is about the topic! 🎉
Thanks, man
Glad you liked it!
Short, sweet and right on point to help quick learning, you got a new sub!
I love you you explained these terms so simply! On a side note, Bigquery is more of a Data Warehouse than a Data Lake.
Thanks Ophir :)
A super simple and understandable explanation! Thanks man! Thumbs up!
Just found your channel. I’m sharing your videos with my team that is a bit behind on these concepts. Thanks!!
You just won another fan and subscriber. Nice content, Chandoo. You humor is well dosed too.
Brilliant.
Well explained, clear, simple and concise.
Thank you very much
Clear, concise, and to the point. Thank you so much for sharing your knowledge!
Simply superb... Greate explanation with general example..
Hi, your video is very clear to understand the basics... but still I have question, please clarify.
1) why can't ETL take the source table and target table in the databases itself to create reporting or historical data table. Why we need to load into another database and call it as Data warehouse? Is there any significant difference like performance or something? Please explain this part... you explained "Why we are using each type", but I want you to cover why can't we use one instead of other. Eg., why cant we use create historical table in databases itself, why we need data warehouse separately. What is the special thing to go to DW instead of DB... also "why DL? And why can't DW?"
The answer is more technical.
While you "CAN" keep both DB & DW kind of tables in the same place, normally people don't do it. Because,
1) Databases are "designed" so that they can add / change or delete data very quickly and efficiently. They also ensure that your data integrity is maintained (if a customer is deleted, they can no longer transact for ex.)
2) Data warehouses are "designed" so that they can add data and generate reports (or summaries) quickly. As there are usually no delete or edit operations in DW, the system is optimized to instead focus on piling up data and doing massive calculations quickly.
3) Hence, Internally the architecture and software / hardware design is different for these two systems. So it makes sense to keep them separate.
Think of it like this. While both a car & tractor can drive, you won't use them interchangeably as they have their strengths & weaknesses.
@@chandoo_ Thanks for your reply. I wanted to know that "Design" behind the Strength and weakness of each type. I understand it is very technical and cannot cover in text reply. Thank you so much for explaining this to make us understand the differences.
Very easy and clear explanation of DB/DW/DL :)
I liked your video because of your clear and concise points and humor. Keep up the great work.
Very simple yet effective articulation!!
Your teaching style is simple and superb, thank you.
Good explanation! Expecting more like this..
Thanks a lot! That was one of the best explanations I ever heard. Sometimes I think people want things to be difficult so they seem more intelligent...
Hey Chandoo. Great video, but also I’m so happy to have found your channel and see you speak as over so many years your website has been solving my excel queries when I Google them.
First time watching your video,I am unlucky because I didn't found till now,you are doing great job thank you so much.keep doing more...
Wow. It's easy to understand. You are a genius
Wow, I didn't think I'd learn anything, but I learned some more about OLAP (DW) vs OLTP (DB).
Omg u were that guy who owns that website which helped me in my early career.
Easy and simple explanation which makes us clear about the concept. Great video..
Love this! Thanks for explaining it really really in easiest manner and choice of words.
Great job, clear explanation and I also enjoy your humor. Would be great if you could create a video describing the difference between data scientist, engineer, analyst and architect. Kudos on your excellent work!
If you're starting in I.T. doing analysis type work, you'll start as an Analyst. This can be anything from reporting, automated feed maintenance/RCA, and even development. Most of the above 3 (maybe save for Data Scientist) start here.
Data Engineer is probably the most logical next step from analyst. You'll definitely be doing more development and analytical work as an analyst prior to this. This shifts your scope from retrieving data from a data warehouse/db/lake (lake is quite rare for a run of the mill analyst), to actually designing and some possible light architecting of table/schema structures for data to import into from other sources (typically starting as transactional information into a database from an app, or maybe an external source of some sort). Typically as an engineer you won't start on data warehouse modelling until you've had some experience with general transactional architecting/engineering since the data within a warehouse shouldn't be updated/deleted, only inserted. It will be deleted, possibly if you've archived it in some situation (like data that's over x-years old and based on specific policies), but even then it probably wouldn't be deleted. If the architecture allows, you may just duplicate the tables, or partition them in some way and then archive the older pages. They may also determine certain structural recommendations (rowstore vs columnstore table structures, for example, or using NoSQL vs relational databases), but usually it's in concert with an Architect if the process being designed is large enough, or has significant impact, especially in terms of performance. However, after discussions between Engineers and Architects, the Engineers (and to a lesser extent, Analysts) will IMPLEMENT the requisites of decided Architecture. Engineers are typically more hands on than Architects, but Archs may get their hands dirty if something is largely conceptual and they want to start plugging away earlier in the phase to ensure design solidity.
Data Architect is anything from designing the schema for your transactional infrastructure (your primary database), data warehouse, or even data lake, as well as helping navigate and determine how to import data into those repositories, as well as even more expansive things such as CI/CD pipelines, *maybe* networking tasks if you're familiar enough with that (usually system administrators do that, though), or even helping implement connection string/authentication against your cloud resource targets originating from nearly any source caller (on premises machine, like a developer computer, a VM hosting an app service, CI/CD agent, or a completely separate cloud service not native to your cloud service, even on a completely different domain or client server).
An Architect is going to be responsible for HOW disparate system objects are going to interact with each other and any potential issues given certain implementations or design sequences. Typically Architects are going to have some knowledge as to what different approaches are available and determine which makes sense given what's required for the need or problem that needs resolution. As an Architect you're not expected to know how to implement everything as if you were doing all the work yourself. However, having a basic understanding of the limitations of each element in the design will definitely help you determine which is possible and which may not be earlier in design phase, which helps mitigate wasted developer time later during spikes (Proof of Concept phases) and help with further engineering alignment tasks.
Most people consider scientists as the babies in the room because the data they require should be perfect in terms of not needing to accommodate any changes to their representations outside of any algorithmic modelling is concerned. It's entirely possible a Scientist will ask the Engineer to modify schema and data to accommodate some sort of analysis or data modelling they're trying to complete. It's not a-typical for an Engineer to work closely with a Scientist, but not typical for the Scientist to work with the Architect, aside from initial standing up of a new Data Warehouse or Data Lake. Typically the Engineer maintains or may make the every-day changes to those structures once the inputs/outputs/transformational processes have already been established. Scientists are typically Statisticians or anything having to do with applied mathematics. They will also typically work with code that isn't strictly SQL, such as Python, R, Power BI, DAX, (maybe MDX, but I think that's fallen largely by the way-side), etc...Scientists are tasked with supplying the answers to complex problems for the business using quantitative analysis. These are the people that determine what Ads you may see given your previous and most recent search history. Something you searched for 3 years ago may not be as relevant as something you searched for yesterday. That would be a typical example of what a Scientist may do. Also, Google translate, things like that will be developed by the Scientist, but the Architect will design the bridges to source that data whereas the Engineer will make that design a reality. The Analyst will make sure data makes sense as it starts trickling through the design process and if there's any issues, the Analyst and maybe working with the Engineer will troubleshoot the why/how and determine a fix where either of them may implement that fix to ensure it works as intended.
If you look at it as a decision tree, it may look something like:
Analyst > Engineer > Architect
Analyst > Engineer > Scientist
Analyst > Scientist (again, typically short cut by a Masters in Statistics or similar)
Hope that helps!
Thank you very much for this illustrative explanation! Very easy to comprehend indeed!
You video is very helpful, it cleared my cloud about DB, DW, DL Thank you very much!
I dont think there are any other vedios in the internet that explains database/datawarehourse/datalake like you did..thanks for your explainations
The best explanation that I heard. Thanks!
Your video is the only one that made this clear to me.. thank you teacher!
This was a great explanation in a simple, clear, and concise manner.
This video is a piece of ART! Awesome work :) :)
Thank you so much 😀
Just the way I like it, Barney style! Amazing job!!!
Thank you so much!
DL explained so simply - Thank you Chandoo
Perfect explanation. I immediately subscribed 👊👊👊
my best mentor ever
Awesome.. understood data lake for the first time
Glad it helped
very well explained, crystal clear, and superb way of explanation in between videos and images. Amazing.
You should add: data hub, delta lake, lake house, data virtualization.... A neverending story :)
Thanks Man, I started learning big data concepts and this video is very useful for me
Excellent Thank you sir - I am non tech from product side but this makes it clear for me - THANKS Again
Great explanation. I've personally found data warehouses rarely service multiple areas of an organisation. They are built with the influence of one area in mind and therefore deemed not fit for purpose for other areas. That means that other areas are left running their own jobs on the databases. If data lakes are meant to provide a partial solution for that issue then great. But they do run in to the issue of huge tech debt at that point.
Thanks Carleast. Many organizations also implement "data marts", kind of like topic / theme specific data warehouses. This opens another can of worms where data is duplicated and often inconsistent.
Man that was good way to explain stuff. I like it and understood in a better way.
Great explanation and very easy to understand example.
Great explanation and awesome video! Very helpful!
This is the first video I saw on your channel and it made me instantly subscribe. Brilliant explanation.
Thank you and welcome aboard Abhishek.
Solid explanation. We can always count on Indian uncles for STEM
Excellent explanation... Can't resist myself to appreciate your efforts publicly...
Thank you so much 😀
Your videos are wonderful and soo easy to understand. Also your sense of humor 😂😂...loving it.
so well explained..!! very second worth watching.!! thanks a lot.
Thanks for this very clean and simple explaining - really helpful.
Excellent. Thank you. You are a great teacher Sir.
Thanks a lot.. well depicted.. now it’s clear to me about the grey area.
Correct me if im wrong. But i see database as the source of live dashboards? It is the representation of data that are being used currently.
Data warehouse is like a storehouse for the historical data that the database have produced. Its like a back up copy of the data?
And data lake is like a cloud storage where you just store all kinds of data randomly just for the sake of storage? It may contain datawarehouse data, tables, isolated tables, reports etc??
Super clear. Very very well explained.
Thank you so much for the time put in your videos! extremely helpful!