Set Up and Use Apache Iceberg Tables on Your Data Lake - AWS Virtual Workshop

Поделиться
HTML-код
  • Опубликовано: 30 июл 2024
  • Data lakes are critical to an organization's success and it's important to pick a data lake table format to give you the right capabilities and performance to get the most out of your data. Many customers are turning to Apache Iceberg, a data lake table format, to improve the performance of their data lake and to adopt enhanced capabilities such as time-travel queries and concurrent updates. In this workshop, we will introduce you to Apache Iceberg and show you how to get started with Apache Iceberg on AWS using Amazon EMR and Amazon Athena. We will go through step-by-step demonstrations of reading data, writing data, and more using the Apache Iceberg format.
    Learning Objectives:
    * Objective 1: Learn about Apache Iceberg and key fundamentals of transactional data lakes.
    * Objective 2: Read, write, update, and delete data using the Apache Iceberg format in both Amazon EMR and Amazon Athena.
    * Objective 3: Explore concepts such as ACID transactions and time-travel queries.
    ***To learn more about the services featured in this talk, please visit: aws.amazon.com/emr/
    ****To download a copy of the slide deck from this webinar visit: pages.awscloud.com/Analytics-... Subscribe to AWS Online Tech Talks On AWS:
    www.youtube.com/@AWSOnlineTec...
    Follow Amazon Web Services:
    Official Website: aws.amazon.com/what-is-aws
    Twitch: / aws
    Twitter: / awsdevelopers
    Facebook: / amazonwebservices
    Instagram: / amazonwebservices
    ☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q&A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.
    #AWS
  • НаукаНаука

Комментарии • 14

  • @che5ari
    @che5ari Год назад +1

    Thanks for this very clear presentation on more of the details of Iceberg. Whilst there are lot of talks about Iceberg they gloss over the details which are quite important for those who need them.

  • @AnNguyen-en3tz
    @AnNguyen-en3tz 3 месяца назад +1

    thanks. easy to understand and follow it
    that saved my day

  • @tranminhhaifet
    @tranminhhaifet Год назад

    thank you, very clear and easy to understand

  • @anandsharma213
    @anandsharma213 Год назад

    Lovely presentation. Thanks for sharing!

  • @hariporandla8044
    @hariporandla8044 Год назад

    great information. very clear demo. thanks

    • @amazonwebservices
      @amazonwebservices Год назад

      It's our pleasure, Hari! 😁 Glad you liked it! 😀

  • @amirabraham100
    @amirabraham100 Год назад

    excellent presentation !

  • @senro3960
    @senro3960 Год назад

    When you add a new column for instance, it create a new snapshot and you can query the snapshot you want. But how performant is it ? Let's say our team use iceberg and over a year, 1000 snapshots were created, with some time the create of a new column added or the deletion of another.
    If the snapshots store the transactions, does it means that when we are going to query the first snapshot, it reapply all the 1000 modifications done, and then query this version of the table ? Or does it create new data file each time that copy our table with the modification ?

  • @user-uf7ie5pt9e
    @user-uf7ie5pt9e 4 месяца назад

    Hi, excellent video about iceberg. I have a question, i have a datalake with many parquet files and i want to use iceberg tables. what is the correct way to deals with this parquet files, do i read all parquet files and insert data into iceberg table? or is there any to link iceberg table to existing parque files without copy then into iceberg table?

  • @nagusameta366
    @nagusameta366 9 месяцев назад

    I created iceberg tables inside an EMR notebook, and while they do show up in Athena, the columns do not load. When I went to view the table in Glue, well the columns are also not there. Why does this happen? I can only interact with the table within the Spark session, but in Athena or in Glue, it's just an empty table with the name but no columns nor the data.

    • @awssupport
      @awssupport 9 месяцев назад +1

      Sorry about this inconvenience you've faced here. I recommend reaching out via our re:Post forum and posting your question there for more visibility & insight from our tech community. You can do that via this link: go.aws/aws-repost. ^BG