Hi Gowtham bro, I enjoyed your latest video on the Data Lakehouse! I just wanted to clarify a couple of points-please correct me if I’m wrong. From my understanding, traditional data lakes, which are loosely coupled in terms of storage and computing, don’t natively support ACID transactions. The Data Lakehouse architecture seems to address this by introducing ACID transactions, schema enforcement, and versioning with open formats like Delta, Hudi, and Iceberg. However, I didn’t notice these aspects discussed in the video. Also, while Amazon Athena’s integration with S3 fits well within a data lake architecture-allowing SQL queries on S3 data without a database-it doesn’t seem to support the main lakehouse features like ACID transactions, and schema enforcement. Would love to hear your thoughts on this. Thanks again for the insightful content!
Hey bro The main moto of data lakehouse is mix of data lake and warehouse no matter if it the ACID or other stuff , this how it started :P , later then some companies started using this ACID and Schema stuff as an address of Lakehouse :P Data Lakehouse is not just about ACID transactions and schema enforcement-although these are key features. The concept of a data lakehouse combines the flexibility of a data lake with the performance and management features of a data warehouse. While ACID transactions and schema enforcement are essential for reliability and consistency, the data lakehouse architecture includes several other features and goals. Key Features and Benefits of a Data Lakehouse: ACID Transactions Schema Enforcement and Evolution Unified Storage for Structured and Unstructured Data Support for BI and Machine Learning Separation of Compute and Storage Low-Cost Storage with High-Performance Queries Data Governance and Security Simplified Data Management Versioning and Time Travel Efficient Data Processing with Unified Workflows since we have this much , which make people to get confused on understanding the concept of lakehouse , so i didnt make the video complex Happy to such questions :) :) :)
I guess Redshift is the data warehouse. Athena itself cannot be categorized as a data warehouse, as it is more of a managed service for building data pipelines, while EMR is an unmanaged service which provides more flexibility for optimization purposes.
Exactly, as mentioned in the video, we can't rely on Athena for complex queries, and performance aspect Redshift is good also, EMR is a fully managed service :), but yes EMR is flexible for optimization purposes, also people use data lakehosue in aws even without EMR
Bro in hadoop single node installation video . I got an error that resource manager and node manager are showing once I have given jps please make a video on that
Bro was this video is gold ❤
With zero knowledge on aws … it is easy to understand for me bro …
Can you do cloud related videos more ! Plese
superbu bro...do more like this..hats off..👏👏👍
Banger anna💥✨!!
Hi Gowtham bro, I enjoyed your latest video on the Data Lakehouse! I just wanted to clarify a couple of points-please correct me if I’m wrong.
From my understanding, traditional data lakes, which are loosely coupled in terms of storage and computing, don’t natively support ACID transactions. The Data Lakehouse architecture seems to address this by introducing ACID transactions, schema enforcement, and versioning with open formats like Delta, Hudi, and Iceberg. However, I didn’t notice these aspects discussed in the video.
Also, while Amazon Athena’s integration with S3 fits well within a data lake architecture-allowing SQL queries on S3 data without a database-it doesn’t seem to support the main lakehouse features like ACID transactions, and schema enforcement.
Would love to hear your thoughts on this. Thanks again for the insightful content!
Hey bro
The main moto of data lakehouse is mix of data lake and warehouse no matter if it the ACID or other stuff , this how it started :P , later then some companies started using this ACID and Schema stuff as an address of Lakehouse :P
Data Lakehouse is not just about ACID transactions and schema enforcement-although these are key features. The concept of a data lakehouse combines the flexibility of a data lake with the performance and management features of a data warehouse. While ACID transactions and schema enforcement are essential for reliability and consistency, the data lakehouse architecture includes several other features and goals.
Key Features and Benefits of a Data Lakehouse:
ACID Transactions
Schema Enforcement and Evolution
Unified Storage for Structured and Unstructured Data
Support for BI and Machine Learning
Separation of Compute and Storage
Low-Cost Storage with High-Performance Queries
Data Governance and Security
Simplified Data Management
Versioning and Time Travel
Efficient Data Processing with Unified Workflows
since we have this much , which make people to get confused on understanding the concept of lakehouse , so i didnt make the video complex
Happy to such questions :) :) :)
@@dataengineeringvideos Cool! Thanks bro
Brooo ❤
Bro which product you are using for writing like this??
I guess Redshift is the data warehouse. Athena itself cannot be categorized as a data warehouse, as it is more of a managed service for building data pipelines, while EMR is an unmanaged service which provides more flexibility for optimization purposes.
Exactly, as mentioned in the video, we can't rely on Athena for complex queries, and performance aspect Redshift is good
also, EMR is a fully managed service :), but yes EMR is flexible for optimization purposes, also people use data lakehosue in aws even without EMR
Hi anna
In the future, data engineers may be replaced by AI. What is your opinion on this?
vaipilla raja
Bro in hadoop single node installation video . I got an error that resource manager and node manager are showing once I have given jps please make a video on that
Another one video for Hadoop installation with latest version