AWS Tutorials - Create Data Lake with Amazon S3, Lake Formation and Glue

Поделиться
HTML-код
  • Опубликовано: 1 янв 2025

Комментарии • 59

  • @vivekjacobalex
    @vivekjacobalex 3 года назад +1

    super dojo. before i stepped into this video, data lake was a different planet for me. now you took me to that place and showed the highlights of it. thank you

  • @user-lz2no5mp9i
    @user-lz2no5mp9i 3 года назад +2

    Thank you so much for the very detailed tutorial! Was searching around for tutorials and noticed that the tutorials out there were either outdated or missing some information, so glad that I found yours!

  • @maryo1134
    @maryo1134 2 года назад +1

    Fantastic, I have been looking for a detailed explanation on how to create Data Lake not until i found your video..This is well explained, now I can follow your steps to create my own Data Lake within the AWS console before I automate this with Terraform.

  • @sujitbehera3446
    @sujitbehera3446 4 года назад +1

    Awesome presentation and tutorials .. Keep it up with this good work for AWS user community .

  • @GlowGineer
    @GlowGineer 8 месяцев назад

    Great Great tutorial!
    I have one request,
    From where i can download sales and customer data set ?

  • @1414Akash
    @1414Akash 3 года назад +1

    Exactly what I was looking for, thanks.
    Would like see more such videos.

  • @muneeswarana5550
    @muneeswarana5550 Год назад +1

    Can we implement the upsert logic in data Lake formation using glue for large data?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Год назад

      You can if you are using incremental data framework such as Hudi, Delta Lake or Iceberg.

  • @niluparupasinghe7307
    @niluparupasinghe7307 3 года назад

    Great video, saves my time and learned what is essentially relevant, thanks a lot. Please keep up the good work!!!

  • @gentlecraftsman
    @gentlecraftsman 2 года назад

    Very neat tutorial, I was able to practice and learn the functionality and use case. Thank you!

  • @nestonalex7350
    @nestonalex7350 2 года назад

    This what I was exactly looking for... Thank you so much.

  • @shayankabasi160
    @shayankabasi160 2 года назад +1

    @AWS Tutorials- Thanks for amazing video, please advise when the crash course is coming on Glue and lake formation. eagerly waiting for that..

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      The course is released in the first week of Feb if not earlier.

    • @shayankabasi160
      @shayankabasi160 2 года назад

      @@AWSTutorialsOnline So means it is going to come or already published ?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      @@shayankabasi160 It will come in the first week of Feb

  • @sonavive007
    @sonavive007 4 года назад +2

    A good basic end to end overview

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  4 года назад

      Thanks. Please let us know if you want any specific topic to cover for the data lake in AWS. Happy to create exercise / workshop for the same.

  • @ankitatiwari4568
    @ankitatiwari4568 Год назад

    It was really wonderful video, you provided detailed steps which helped in understanding well, though the video is outdated now but helped alot.If possible do make a video on lake formation using Glue blueprints(instead of crawler).

  • @michellesantos435
    @michellesantos435 3 года назад +1

    Upon running my crawler the 2 tables have been created and viewable under AWS Glue but not under Cloud Formation.. may I ask why?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      I think you mean lake formation not cloud formation. Have you configured lake formation as per instructions in the lab? Are you checking glue and lake formation both in the same region? Please check and confirm. If possible share pictures.

  • @bhatiamadhurful
    @bhatiamadhurful Год назад

    Great great explanation. I have two questions
    1) in data lake when we have producer and consumer how does data encryption (KMS) work along with lake formation. Just to ensure apart from IAM access data is secured with proper encryption
    (2) in data lake when we have producer and consumer as two different AWS accounts (2 VPC) how does these two VPC communicate)

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Год назад

      1) hard to explain in comments here but this link should help - docs.aws.amazon.com/lake-formation/latest/dg/register-encrypted.html
      2) You need to create VPC peering if network connection is required.

  • @muralidharanmurugan2373
    @muralidharanmurugan2373 2 года назад

    Thank You much.Got a Good video to understand AWS DataLake.

  • @gayathrichakravarthy1056
    @gayathrichakravarthy1056 3 года назад +1

    Thanks for the tutorial. Would it be possible to import JSON files instead of csv? Thanks

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Apologies for the late response due to my summer break.
      yes - along with CSV, other files formats like JSON, PARQUET, ORC etc. are also supported by the crawlers.

  • @leonidb8205
    @leonidb8205 3 года назад +1

    Awesome material, thanks a lot for your effort.

  • @krishnat1302
    @krishnat1302 3 года назад +1

    You did not talk about securing under lying s3 buckets. Does lakeformation have options to tighten security at s3 level as well ? Otherwise it is not 100% secure imho.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Hi, yeah I did not talk about tightening the security for S3. Missed somehow. I will see if I can rerecord this video with some improvements. Meanwhile, you should restrict access in S3 while giving access via lake formation. That way - you can secure data end to end and make glue catalog the only way to access the data.

    • @krishnat1302
      @krishnat1302 3 года назад

      A few comments about glue catalog upgrade(to use Lakeformation only) and potential impacts to existing(prior to upgrade) athena/glue/spectrum resources would be appreciated. AWS documentation is not clear about it.
      Thanks to your community contributions.

  • @purabization
    @purabization Год назад

    same thing i did and when i tried to query the data using athena it is saying "This query ran against the "purab-db" database, unless qualified by the query" this error i was trying to query s3 data using athena why it is happening like this

  • @patricktan7575
    @patricktan7575 3 года назад +1

    Great presentation and workshop. I have a question. Even if data is protected from salesuser and consumersuser via Glue database, it is not protected via S3. salesuser and consumersuser can download files from S3 since they have full S3 access. How do you protect against this?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Hi Patrick, apologies for the late reply. Just for the lab, we kept it simple and gave full access to salesuser and consumersuser. But in actual implementation, salesuser and consumersuser access to S3 will be restricted based on what object (bucket, folder or file) they can see. Hope it clarifies and really sorry for creating the confusion.

    • @patricktan7575
      @patricktan7575 3 года назад +1

      @@AWSTutorialsOnline Thanks for your reply. I am able to remove S3 access from user and still able to access data using Glue catalog (query data using Athena)

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Sounds good :)

  • @shreyashlavate6448
    @shreyashlavate6448 4 года назад +1

    I'm having problem at step no 8 , resource doesn't exist or requester is not authorised to access requested permissions

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  4 года назад +1

      I understand you have problem @ "User Permission to the Catalog" step. Is your logged in Lake Formation Administrator? This is something you configure at Step 6 "Configure Data Lake". Please also share the screen shot if the problem persists.

  • @alladamk
    @alladamk 3 года назад +1

    Hi , thanks for providing detailed steps on data lake, lake formation configuration and usage. It would also be helpful, if you can do specific video on ingestion process (file based and relational db).

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Great you liked. I have many tutorials (@ youtube and aws-dojo website) using Glue with RDS. That can help you for relational db. For the file based, can you please share details of the use case?

  • @bc210192
    @bc210192 4 года назад +1

    Hello, thank you very much for the tutorial which is quite interesting, about which I want to tell you that I have problems in step 7: Configure and Run Crawler since when executing the crawler although it tells me that I create the two tables, they do not They appear in the AWS Lake Formation console, I followed all the steps but I don't see them, can you help me or give me any recommendations? thank you

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  4 года назад

      Hi, It seems your aws logged in user is not lake administrator in Lake Formation. Otherwise, you should be able to see database and table. Also check if there is no filter. Ensure, you are checking in the same region. Also can you see tables in Glue? Please let me know if it helped otherwise I would see how I can help.

  • @coldstone87
    @coldstone87 3 года назад +1

    Awesome. Thanks for the effort

  • @fanig1458
    @fanig1458 3 года назад +1

    Nice tutorial great job. it did helped me

  • @vjnt1star
    @vjnt1star 3 года назад +1

    good content

  • @sunilpawar4195
    @sunilpawar4195 4 года назад +1

    good for beginners