Set Up and Use Apache Iceberg Tables on Your Data Lake - AWS Virtual Workshop
HTML-код
- Опубликовано: 30 июл 2024
- Data lakes are critical to an organization's success and it's important to pick a data lake table format to give you the right capabilities and performance to get the most out of your data. Many customers are turning to Apache Iceberg, a data lake table format, to improve the performance of their data lake and to adopt enhanced capabilities such as time-travel queries and concurrent updates. In this workshop, we will introduce you to Apache Iceberg and show you how to get started with Apache Iceberg on AWS using Amazon EMR and Amazon Athena. We will go through step-by-step demonstrations of reading data, writing data, and more using the Apache Iceberg format.
Learning Objectives:
* Objective 1: Learn about Apache Iceberg and key fundamentals of transactional data lakes.
* Objective 2: Read, write, update, and delete data using the Apache Iceberg format in both Amazon EMR and Amazon Athena.
* Objective 3: Explore concepts such as ACID transactions and time-travel queries.
***To learn more about the services featured in this talk, please visit: aws.amazon.com/emr/
****To download a copy of the slide deck from this webinar visit: pages.awscloud.com/Analytics-... Subscribe to AWS Online Tech Talks On AWS:
www.youtube.com/@AWSOnlineTec...
Follow Amazon Web Services:
Official Website: aws.amazon.com/what-is-aws
Twitch: / aws
Twitter: / awsdevelopers
Facebook: / amazonwebservices
Instagram: / amazonwebservices
☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q&A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.
#AWS Наука
Thanks for this very clear presentation on more of the details of Iceberg. Whilst there are lot of talks about Iceberg they gloss over the details which are quite important for those who need them.
You're welcome, Riza 😊 ☁️
thanks. easy to understand and follow it
that saved my day
thank you, very clear and easy to understand
Lovely presentation. Thanks for sharing!
You're welcome! 😀 🙌
great information. very clear demo. thanks
It's our pleasure, Hari! 😁 Glad you liked it! 😀
excellent presentation !
We are glad you liked it, Amir! 😀 🤝
When you add a new column for instance, it create a new snapshot and you can query the snapshot you want. But how performant is it ? Let's say our team use iceberg and over a year, 1000 snapshots were created, with some time the create of a new column added or the deletion of another.
If the snapshots store the transactions, does it means that when we are going to query the first snapshot, it reapply all the 1000 modifications done, and then query this version of the table ? Or does it create new data file each time that copy our table with the modification ?
Hi, excellent video about iceberg. I have a question, i have a datalake with many parquet files and i want to use iceberg tables. what is the correct way to deals with this parquet files, do i read all parquet files and insert data into iceberg table? or is there any to link iceberg table to existing parque files without copy then into iceberg table?
I created iceberg tables inside an EMR notebook, and while they do show up in Athena, the columns do not load. When I went to view the table in Glue, well the columns are also not there. Why does this happen? I can only interact with the table within the Spark session, but in Athena or in Glue, it's just an empty table with the name but no columns nor the data.
Sorry about this inconvenience you've faced here. I recommend reaching out via our re:Post forum and posting your question there for more visibility & insight from our tech community. You can do that via this link: go.aws/aws-repost. ^BG