AWS Tutorials - Build Enterprise Scale Python ETL Jobs using AWS Glue on Ray
HTML-код
- Опубликовано: 4 окт 2024
- AWS Glue comes with a new engine option called Ray. Ray allows to process large amount of data using python script and python libraries. Ray is based on open-source compute framework and it helps build enterprise level scalable jobs as it leverages distributed processing of the data.
Great video.
I'm currently refactoring some python shell jobs into Ray cause most of them were getting a bit too big for just 1 DPU.
But I'm having problems importing job parameters into the script. I usually import the getResolvedOptions function from the awsglue.utils library, but ray it doesn't support awsglue, which is odd. Should I add it manually or just use another method for importing job paramenters?
Approach to retrieve the parameter is different. Check this link - docs.aws.amazon.com/glue/latest/dg/author-job-ray-job-parameters.html
Hi, I watched some of your videos and liked it, but I didn't see any videos that covers AWS Amplify. Do you want to cover that as well?
Hi, unfortunately, I don't have expertise in AWS Amplify. Sorry about that.
@@AWSTutorialsOnline No Problem. Thanks
Brother Thanks for the video, When to use Glue and when to use EMR?
Glue can be used only with Apache Spark and Python. While, EMR support additional frameworks such as Hadoop, Hive, Presto etc. So use EMR when you want to work on these framework other than Apache Spark and Python. Hope it helps.
@@AWSTutorialsOnline thanks it makes sense now.
is there a way to add a vpc?