AWS Tutorials - Using AWS Glue Workflow

AWS Tutorials

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 дек 2020
The Workshop URL - aws-dojo.com/workshoplists/wo...
AWS Glue Workflow help create complex ETL activities involving multiple crawlers, jobs, and triggers. Each workflow manages the execution and monitoring of the components it orchestrates. The workflow records execution progress and status of its components, providing an overview of the larger task and the details of each step. The AWS Glue console also provides a visual representation of the workflow as a graph.
In this workshop, you create a workflow to which orchestrates Glue Crawler and Glue Job.
Наука

Комментарии • 61

@7sandy 2 года назад ⁺²
Best and to the point, your channel should be the official AWS learning channel.
@AWSTutorialsOnline 2 года назад
Many thanks for the appreciation
@liguangyu6610 2 года назад
I just want to say thank you for all the tutorials you have done
@AWSTutorialsOnline 2 года назад
Glad you like them!
@hareeshsa8381 3 года назад
Thanks a ton for all your effort in making this video
@AWSTutorialsOnline 3 года назад
Thanks for the appreciation
@ladakshay 3 года назад ⁺¹
Really like your videos, they are simple which helps us easily understand the concept.
@AWSTutorialsOnline 3 года назад
Thanks for appreciation
@tientrile5733 Год назад ⁺¹
I really like your video. Thank you so much for a wonderful video.
@AWSTutorialsOnline Год назад
Glad you liked it
@vascomonteiro2297 3 года назад
Thank you very much!! amaizing videos
@AWSTutorialsOnline 3 года назад
Thanks for the appreciation
@mallik1232 3 года назад ⁺¹
Very good explanation in detail...
@AWSTutorialsOnline 3 года назад
Thanks
@prakashs2150 3 года назад ⁺¹
Nice video.. very easy steps thanks!
@AWSTutorialsOnline 3 года назад
Glad it helped
@pradeepyogesh4481 3 месяца назад
You Are Awesome 🙂
@shubhamaaws 3 года назад
You are doing great work. Please keep making videos on glue. Your content is best. Can you make video on reading from rds with secure ssl connection using glue.
@AWSTutorialsOnline 3 года назад
sure - I will put it to the backlog.
@atsource3143 2 года назад ⁺¹
Thank you so much for such a wonderful tutorial, really appreciate.
Can you please tell us how we can set a global variable in glue job. Thank you
@AWSTutorialsOnline 2 года назад
Apologies for the late response due to my summer break.
There is no concept of global variables. But jobs can maintain states between them in the workflow - here is a video about it - ruclips.net/video/G6d6-abiQno/видео.html
Hope it helps,
@radhasowjanya6872 2 года назад
Hello Sir, Thanks for the wonderful session. I have a quick question: I was able to create 2 different data loads in the same glue job and it was successfully loading 2 targets. But i would like to know how we can configure the target load plan(similar to Informatica ) in a AWS Glue studio job.?.
@AWSTutorialsOnline 2 года назад
Glue job supports parameters. You can parameterize target location when running the glue job.
@alphaprimer6485 Год назад ⁺¹
It was a good tutorial but I would recommend a better mic as it is hard to hear you at some times.
@rokos1825 6 месяцев назад
Good tutorial, but the audio fades in and out.
AWS Glue has been updated enough to make some of this information irrelevant. I would update with the latest UI and correct the audio issues.
Thank you.
@abiodunadeoye9327 2 года назад ⁺¹
Please How do you make use of the properties, is there another tutorial on that? thanks
@AWSTutorialsOnline 2 года назад
yes there is - ruclips.net/video/G6d6-abiQno/видео.html
@prathapn01 21 день назад
you better use a headset or earphone while speaking.. otherwise the session is very good.
@prasadkavuri8871 2 года назад ⁺¹
How we can add DPU'S in Glue job using glue workflow.
@AWSTutorialsOnline 2 года назад
Not sure why you want to add DPU to Glue Job from Workflow. When you configure Glue Job, you can configure default DPUs for it.
@rexe1166 3 года назад ⁺¹
Hi, Is it possible to move an s3 file(csv) after it has been imported to RDS mysql table by a glue job to an processed S3 folder? Great content as always.
@AWSTutorialsOnline 3 года назад ⁺¹
Sure, it is possible. I created a workshop for this scenario which might help you. aws-dojo.com/workshoplists/workshoplist33
Hope it helps,
@rexe1166 3 года назад
@@AWSTutorialsOnline Thank you and much appreciated.
@amn5341 Год назад
22:40 AWS Glue Workflows
@rajatdixit4912 2 года назад ⁺¹
Hi sir, My que is, when any push happens in s3 that time my workflow is runs automatically how i can do plz help.
@AWSTutorialsOnline 2 года назад
Configure event for S3 bucket. Event will call a Lambda function and the Lambda function will call Glue workflow using SDK like python boto3
@atsource3143 2 года назад ⁺¹
Sir, is there any way were we can set a trigger for S3 and Glue Job?
What I mean is , whenever a new file upload in S3 one trigger should get active and it run the Glue job and same thing for Crawler also.
So whenever new file upload in S3 it active trigger for crawler and job. Thank you
@AWSTutorialsOnline 2 года назад
You can do it. Configure event for S3 bucket which gets trigger on put and post event. On the raise of the event, call a Lambda function. In the lambda function, use Python Boto3 API to start glue job and crawler.
@akshaypunewar3887 3 года назад ⁺¹
Thanks for sharing knowledge... I am not sure why we should use workflow instead of stepfunction... we do have better control in stepfunction... can you please advise ?
@AWSTutorialsOnline 3 года назад ⁺¹
You raised a very good question. Simple answer is - use Glue Workflow only when you are orchestrating jobs and crawlers only. If you have need to orchestrate other AWS services, StepFunction is more suited. I personally believe - over period of time, StepFunction would become main orchestrator service for Glue as well.
@akshaypunewar3887 3 года назад
@@AWSTutorialsOnline Thanksl you..
@venkateshanganesan2606 3 года назад ⁺¹
Nice and clear explanation. I have query here, how can we run one after another workflow (not job/crawlers) i.e. one workflow for dim and another for fact. once dimension is loaded it should another workflow for fact.
@AWSTutorialsOnline 3 года назад ⁺¹
Nested workflow is not available. The best approach will be - at the end dimension workflow, you run a job (using Python Shell) which simply starts the workflow for fact.
You can also use other mechanism such as orchestration using Lambda based business logic or Step Function but it will be little complicated because between dimension and fact workflow you need to make API call to check successful end of the dimension workflow before you start the fact workflow.
So probably - the first approach I talked about is the best way.
@venkateshanganesan2606 3 года назад ⁺¹
@@AWSTutorialsOnline Thanks for your time. I really appreciate it. you answered my query and i got an idea what to do, let me try create one specific job to call fact workflow at the end of dimension workflow using python scripts.
@venkateshanganesan2606 3 года назад
Hi @@AWSTutorialsOnline, I tried some blogs and google, I don't find code to call AWS workflow using python shell, is that possible to share any our blog and git where I can find some info regarding to execute the workflow using python. Thanks in advance.
@AWSTutorialsOnline 3 года назад
@@venkateshanganesan2606 Hi, basically - you need to use boto3 Python SDK in python shell based job. You can google plenty of examples for that. if not let me know. In this job, you use Glue API to start the workflow. API for this method is here - docs.aws.amazon.com/glue/latest/dg/aws-glue-api-workflow.html#aws-glue-api-workflow-StartWorkflowRun
Hope it helps. Otherwise - let me know,
@venkateshanganesan2606 3 года назад ⁺¹
@@AWSTutorialsOnline Thanks a lot, it works as you suggested. I used the below piece of code in end of my dimension job to invoke the fact workflow. I really appreciate that your sharing your knowledge wisely.
import boto3
glueClient = boto3.client(service_name='glue', region_name='eu-west-1',
aws_access_key_id='access_key',
aws_secret_access_key='secret_access_key'

)
response = glueClient.start_workflow_run(Name = 'wfl_load_fact')
Thanks again for sharing your knowledge.
@bhuneshwarsingh630 2 года назад ⁺¹
Thank for sharing knowledge but can you create video on read data from s3 and writing to database while we need to handle bad records while reading and only insert good records in rds table and badrecords in s3 location
@AWSTutorialsOnline 2 года назад
how you differentiate between good and bad records?
@bhuneshwarsingh630 2 года назад
@@AWSTutorialsOnline if record don't not match schema I mean data type is like datatype is int like 1,2,3 are coming but sometimes it comes as four ,five i will share you example link
@bhuneshwarsingh630 2 года назад ⁺¹
Basically i m looking for whenever any corrupt record found so I want write in S3 path and normal record I want to write in database ,i don't want my job to stop corrupt record found then it must continue my job running in AWS glue
@AWSTutorialsOnline 2 года назад
I need to see some example of corrupt data in order to understand how to check for the same. But once you know whether dataset is corrupt or not; you can use dynamic frame write method to write to S3 bucket or database.
@AWSTutorialsOnline 2 года назад
@@bhuneshwarsingh630 I am publishing a video in 1/2 days about doing data quality check. Please have a look. I think it might help you.
@parantikaghosh4396 2 года назад
This video is really helpful but the audio is not good, please fix the audio if possible
@AWSTutorialsOnline 2 года назад
Thanks for the feedback. I have improved audio in the later videos. Need to find time to fix these old ones.
@nlopedebarrios 5 месяцев назад
Unfortunately, the audio is not good in this video
@satishpala2584 2 года назад ⁺¹
Please fix audio
@AWSTutorialsOnline 2 года назад
Thanks for the feedback. I did it in the later videos.
@picklu1079 Год назад
GlueArgumentError: the following arguments are required: --WORKFLOW_NAME, --WORKFLOW_RUN_ID, I am getting this error.

Следующие

Автовоспроизведение

AWS Tutorials - Using Amazon Redshift in AWS based Data Lake