Difference between Database vs Data lake vs Warehouse
HTML-код
- Опубликовано: 30 июн 2024
- Want to learn Big Data by Sumit Sir?
Checkout the Big Data course details here: trendytech.in/?referrer=youtu...
Difference between Database vs Data lake vs Warehouse
𝗝𝗼𝗶𝗻 𝗺𝗲 𝗼𝗻 𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮:🔥
🔅Sumit LinkedIn - / bigdatabysumit
🔅Sumit Instagram - / bigdatabysumit
Database
=========
Transactional data
OLTP (online transaction processing)
Structured data
Recent data - day to day data.
Example - online banking transaction.
Oracle, Mysql
Schema on Write
DatawareHouse - DWH
====================
Analytical processing where we require a lot of historical data to find the insights.
The moment we run complex queries on our database with an intent to do some analysis then your day to day transaction will become slow.
we take the data from databases and migrate it to Datawarehouse to do analytical processing.
we get the data from multiple sources.
Structured Data - Schema on write.
example - TeraData
storage cost is high but lesser than your database.
ETL process -
suppose your data is in database
extract the data
Transform it (is a complex process)
Load it to Datawarehouse
This approach reduces our flexibility.
Data Lake
==========
to get insights from huge amount of data.
the data is present in its raw form. It can be structured or unstructured.
Log File - we can directly have this file in raw form in data lake.
ELT process - Extract Load & Transform.
HDFS, Amazon S3
Cost effective..
Schema on Read.
create structure to visualize or see the data.
it gives you enough flexibility.
#bigdata #dataengineering
Checkout the Big Data course details here: trendytech.in/?referrer=youtube_bd12
Got this question in Amazon DE interview round 1
Tried looking for defination in many videos but the way you explained really made it easy to understand. Thanks for sharing knowledge 🙏
Thanks for this video! In our project we were loading data from oracle db to Teradata stg tables using informatica one to one mapping and after loading to stg tables were doing transformations and loading dimension and fact tables so it was ELT process.
Thank you for the wonderful explanation. It was simple and crisp
Perfect explication! Thanks!!!
You changed my mind about this subject.
Extremely useful. Thanks for this video and all the valuable information
Very thorough and helpful! Easy to understand. Thanks
Great explanation and very esay to follow!👍 Thanks!
Sumit sir really hats off to you.
Your explanation 100% very clear and understanding.....
You made this easy to understand, Thank you sir.
Happy to hear that!
Superb & Informative! nicely articulated.
It was simple and clear, thank you
What an amazing explanation sir!!! Superb!! Love it!!! You are great sir!!!
Thanks a ton. Keep watching
excellent,got Deep knowledge!
Now I understand better
it was a much needed session
Happy that you found the session useful!
Bro your video helped me a lot. Thanks man
Good explanation..!
Very well explained thanks a lot
Loved the explanation
Thanks for the amazing content :)
Nice one!!
Thanks!
Awesome
Nice explanation
Super awesome
Can MongoDB, with its analytical engine and Time Series collections, be considered a hybrid DB/Warehouse? Would it be inherently wrong to store e.g. historical sales transactions of a shop and current transactions within a Time Series Collections on Mongo?
If not, which alternative would you consider for it?
Thank you so much for your video 👏🏻👏🏻
Can we use ELT for datawarehouse ?
Ca we store log file as it in datawarehouse if yes then what's exactly different between data warehouse and data lake?
No in data warehouse we can not store log file as it is unstructured data. but in data lake we can have unstructured and structure data that's the difference exactly
can we use Oracle also as data warehouse because as far I know Teradata is also database only. so
any database available in market, we can use that as data warehouse. Am I correct?
Yup.. We have used oracle as dwh
While technically u could use but this would be at the cost of performance.
Please note that traditional databases works on rows where as typical DWH database is configured to operate in columnar fashion.
Are datawarehouse and data lake different sections of the same set up? For example if we have a snowflake DW, can the data lake be also within the same snowflake set up but an isolated environment?
Snowflake is created/hosted on any of the three clouds we have. So as per my understanding, snowflake is data warehouse but it can be used as a datalake as well. When used as a datawhere house it stores the data in snowflake itself but when you use it as a data Lake, it keeps or make use of Amazon S3 for example bucket for storage
Please correct for better understanding
I have a doubt , does datawarehouse supports acid properties? Also is only historical data stored in datawarehouse?
The idea behind developing data warehouses is to keep historical data you can relate it with a normal datawhere house people are using to store goods so it's a kind of storing huge amount of data
If we cannot store historical data in data base then how we will copy those data to a data warehouse to do analysis.
So what happens is when you have data in a database then on regular intervals we move this data from database to a datawarehouse using an ETL tool. So the data flows from your database -> then ETL tool ~> then a datawarehouse.
The frequency at which you move data from a database to a data warehouse is generally suggested by the clients or the business users
Is Google Cloud Storage a database or a data lake?
Can a database become a datawarehouse and visa versa?
both have certain limitations as discussed in the video.
Why cost of storing data in database is high?
May be onpremises storaege . now due to cloud the storage cost comes down. Pls comment it is correct or not. THanks.
Wonderful explanation