super dojo. before i stepped into this video, data lake was a different planet for me. now you took me to that place and showed the highlights of it. thank you
Thank you so much for the very detailed tutorial! Was searching around for tutorials and noticed that the tutorials out there were either outdated or missing some information, so glad that I found yours!
Fantastic, I have been looking for a detailed explanation on how to create Data Lake not until i found your video..This is well explained, now I can follow your steps to create my own Data Lake within the AWS console before I automate this with Terraform.
It was really wonderful video, you provided detailed steps which helped in understanding well, though the video is outdated now but helped alot.If possible do make a video on lake formation using Glue blueprints(instead of crawler).
I think you mean lake formation not cloud formation. Have you configured lake formation as per instructions in the lab? Are you checking glue and lake formation both in the same region? Please check and confirm. If possible share pictures.
Great great explanation. I have two questions 1) in data lake when we have producer and consumer how does data encryption (KMS) work along with lake formation. Just to ensure apart from IAM access data is secured with proper encryption (2) in data lake when we have producer and consumer as two different AWS accounts (2 VPC) how does these two VPC communicate)
1) hard to explain in comments here but this link should help - docs.aws.amazon.com/lake-formation/latest/dg/register-encrypted.html 2) You need to create VPC peering if network connection is required.
Apologies for the late response due to my summer break. yes - along with CSV, other files formats like JSON, PARQUET, ORC etc. are also supported by the crawlers.
You did not talk about securing under lying s3 buckets. Does lakeformation have options to tighten security at s3 level as well ? Otherwise it is not 100% secure imho.
Hi, yeah I did not talk about tightening the security for S3. Missed somehow. I will see if I can rerecord this video with some improvements. Meanwhile, you should restrict access in S3 while giving access via lake formation. That way - you can secure data end to end and make glue catalog the only way to access the data.
A few comments about glue catalog upgrade(to use Lakeformation only) and potential impacts to existing(prior to upgrade) athena/glue/spectrum resources would be appreciated. AWS documentation is not clear about it. Thanks to your community contributions.
same thing i did and when i tried to query the data using athena it is saying "This query ran against the "purab-db" database, unless qualified by the query" this error i was trying to query s3 data using athena why it is happening like this
Great presentation and workshop. I have a question. Even if data is protected from salesuser and consumersuser via Glue database, it is not protected via S3. salesuser and consumersuser can download files from S3 since they have full S3 access. How do you protect against this?
Hi Patrick, apologies for the late reply. Just for the lab, we kept it simple and gave full access to salesuser and consumersuser. But in actual implementation, salesuser and consumersuser access to S3 will be restricted based on what object (bucket, folder or file) they can see. Hope it clarifies and really sorry for creating the confusion.
@@AWSTutorialsOnline Thanks for your reply. I am able to remove S3 access from user and still able to access data using Glue catalog (query data using Athena)
I understand you have problem @ "User Permission to the Catalog" step. Is your logged in Lake Formation Administrator? This is something you configure at Step 6 "Configure Data Lake". Please also share the screen shot if the problem persists.
Hi , thanks for providing detailed steps on data lake, lake formation configuration and usage. It would also be helpful, if you can do specific video on ingestion process (file based and relational db).
Great you liked. I have many tutorials (@ youtube and aws-dojo website) using Glue with RDS. That can help you for relational db. For the file based, can you please share details of the use case?
Hello, thank you very much for the tutorial which is quite interesting, about which I want to tell you that I have problems in step 7: Configure and Run Crawler since when executing the crawler although it tells me that I create the two tables, they do not They appear in the AWS Lake Formation console, I followed all the steps but I don't see them, can you help me or give me any recommendations? thank you
Hi, It seems your aws logged in user is not lake administrator in Lake Formation. Otherwise, you should be able to see database and table. Also check if there is no filter. Ensure, you are checking in the same region. Also can you see tables in Glue? Please let me know if it helped otherwise I would see how I can help.
super dojo. before i stepped into this video, data lake was a different planet for me. now you took me to that place and showed the highlights of it. thank you
Glad you enjoyed it!
Thank you so much for the very detailed tutorial! Was searching around for tutorials and noticed that the tutorials out there were either outdated or missing some information, so glad that I found yours!
You're very welcome!
Fantastic, I have been looking for a detailed explanation on how to create Data Lake not until i found your video..This is well explained, now I can follow your steps to create my own Data Lake within the AWS console before I automate this with Terraform.
You are welcome!
Awesome presentation and tutorials .. Keep it up with this good work for AWS user community .
Thanks a lot!
Great Great tutorial!
I have one request,
From where i can download sales and customer data set ?
Exactly what I was looking for, thanks.
Would like see more such videos.
Glad it was helpful!
Can we implement the upsert logic in data Lake formation using glue for large data?
You can if you are using incremental data framework such as Hudi, Delta Lake or Iceberg.
Great video, saves my time and learned what is essentially relevant, thanks a lot. Please keep up the good work!!!
Glad it was helpful!
Very neat tutorial, I was able to practice and learn the functionality and use case. Thank you!
This what I was exactly looking for... Thank you so much.
@AWS Tutorials- Thanks for amazing video, please advise when the crash course is coming on Glue and lake formation. eagerly waiting for that..
The course is released in the first week of Feb if not earlier.
@@AWSTutorialsOnline So means it is going to come or already published ?
@@shayankabasi160 It will come in the first week of Feb
A good basic end to end overview
Thanks. Please let us know if you want any specific topic to cover for the data lake in AWS. Happy to create exercise / workshop for the same.
It was really wonderful video, you provided detailed steps which helped in understanding well, though the video is outdated now but helped alot.If possible do make a video on lake formation using Glue blueprints(instead of crawler).
Upon running my crawler the 2 tables have been created and viewable under AWS Glue but not under Cloud Formation.. may I ask why?
I think you mean lake formation not cloud formation. Have you configured lake formation as per instructions in the lab? Are you checking glue and lake formation both in the same region? Please check and confirm. If possible share pictures.
Great great explanation. I have two questions
1) in data lake when we have producer and consumer how does data encryption (KMS) work along with lake formation. Just to ensure apart from IAM access data is secured with proper encryption
(2) in data lake when we have producer and consumer as two different AWS accounts (2 VPC) how does these two VPC communicate)
1) hard to explain in comments here but this link should help - docs.aws.amazon.com/lake-formation/latest/dg/register-encrypted.html
2) You need to create VPC peering if network connection is required.
Thank You much.Got a Good video to understand AWS DataLake.
Glad it was helpful!
Thanks for the tutorial. Would it be possible to import JSON files instead of csv? Thanks
Apologies for the late response due to my summer break.
yes - along with CSV, other files formats like JSON, PARQUET, ORC etc. are also supported by the crawlers.
Awesome material, thanks a lot for your effort.
Glad you liked it!
You did not talk about securing under lying s3 buckets. Does lakeformation have options to tighten security at s3 level as well ? Otherwise it is not 100% secure imho.
Hi, yeah I did not talk about tightening the security for S3. Missed somehow. I will see if I can rerecord this video with some improvements. Meanwhile, you should restrict access in S3 while giving access via lake formation. That way - you can secure data end to end and make glue catalog the only way to access the data.
A few comments about glue catalog upgrade(to use Lakeformation only) and potential impacts to existing(prior to upgrade) athena/glue/spectrum resources would be appreciated. AWS documentation is not clear about it.
Thanks to your community contributions.
same thing i did and when i tried to query the data using athena it is saying "This query ran against the "purab-db" database, unless qualified by the query" this error i was trying to query s3 data using athena why it is happening like this
Great presentation and workshop. I have a question. Even if data is protected from salesuser and consumersuser via Glue database, it is not protected via S3. salesuser and consumersuser can download files from S3 since they have full S3 access. How do you protect against this?
Hi Patrick, apologies for the late reply. Just for the lab, we kept it simple and gave full access to salesuser and consumersuser. But in actual implementation, salesuser and consumersuser access to S3 will be restricted based on what object (bucket, folder or file) they can see. Hope it clarifies and really sorry for creating the confusion.
@@AWSTutorialsOnline Thanks for your reply. I am able to remove S3 access from user and still able to access data using Glue catalog (query data using Athena)
Sounds good :)
I'm having problem at step no 8 , resource doesn't exist or requester is not authorised to access requested permissions
I understand you have problem @ "User Permission to the Catalog" step. Is your logged in Lake Formation Administrator? This is something you configure at Step 6 "Configure Data Lake". Please also share the screen shot if the problem persists.
Hi , thanks for providing detailed steps on data lake, lake formation configuration and usage. It would also be helpful, if you can do specific video on ingestion process (file based and relational db).
Great you liked. I have many tutorials (@ youtube and aws-dojo website) using Glue with RDS. That can help you for relational db. For the file based, can you please share details of the use case?
Hello, thank you very much for the tutorial which is quite interesting, about which I want to tell you that I have problems in step 7: Configure and Run Crawler since when executing the crawler although it tells me that I create the two tables, they do not They appear in the AWS Lake Formation console, I followed all the steps but I don't see them, can you help me or give me any recommendations? thank you
Hi, It seems your aws logged in user is not lake administrator in Lake Formation. Otherwise, you should be able to see database and table. Also check if there is no filter. Ensure, you are checking in the same region. Also can you see tables in Glue? Please let me know if it helped otherwise I would see how I can help.
Awesome. Thanks for the effort
My pleasure
Nice tutorial great job. it did helped me
Awesome, thank you!
good content
Thanks
good for beginners
Yes, thanks
Happy to cover advanced topics. Please let me know if you have something in mind.