Orchestrate Glue Jobs With Step Functions

Поделиться
HTML-код
  • Опубликовано: 4 июл 2022
  • This is a step-by-step tutorial on how to create a step function to orchestrate a single or multiple glue jobs and configure the I am role.
    #aws #awsglue #stepfunctions
    IAM Permission Link: docs.aws.amazon.com/step-func...

Комментарии • 40

  • @PatrickPoplawska
    @PatrickPoplawska 8 месяцев назад +1

    Excellent video. To the point, called out common failure points. Well done all around.

  • @khandoor7228
    @khandoor7228 2 года назад +2

    I am really interested in Step Functions as well. Thanks for this, hope you do more!

  • @julioarenas7150
    @julioarenas7150 Год назад

    Thank you very much, very well explained very precise. greetings from Chile

  • @bhumisounds5107
    @bhumisounds5107 Год назад

    The additional policy adds that you mentioned helped a lot. My machine was hanging.

  • @user-hv9wx2md3c
    @user-hv9wx2md3c 5 месяцев назад

    could you please upload the complete AWS data engineering playlist?
    It will be helpful for us.
    your tutorials are easy to watch and grab things faster.
    Thank you.

    • @DataEngUncomplicated
      @DataEngUncomplicated  5 месяцев назад

      Hey, that's a good idea, I can put them all into 1 playlist. It will be a lot of videos though, I kind of broke them down into different aws services

  • @NehalVerma-zr4mq
    @NehalVerma-zr4mq Год назад

    Thanks Brother! You Great!

  • @jaffarahamed6089
    @jaffarahamed6089 2 года назад +2

    Well explained... Thanks 👍🏻

  • @cringe6006
    @cringe6006 Год назад +2

    Really great video
    Thank you for posting
    Hope you don't get demotivated by view count 😭
    Your videos are really good.

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад +2

      Thanks! Much appreciated!

    • @felixa4705
      @felixa4705 Год назад

      As of today, there are about 6k views! That's a lot more people than you could reach through normal means. I think they're doing a great job!

  • @claytonvanderhaar3772
    @claytonvanderhaar3772 Год назад

    Hi great tutorial as usual but I am struggling with get a choice working I am not sure how to get the result input path from the Glue job and then pass it onto the choice state please if you know how do this I would really appreciate it

  • @joegenshlea6827
    @joegenshlea6827 Год назад

    Thank you so much for this video. It was a huge help to show the IAM permissions for the Glue job. Is there anything about the "permission_to_glue_topic" permission that we should know?
    Also, In my lambda invocation I'm pasting the lambda "event" json object into the the payload options which seems to work beautifully. Is there a way to reference the event configuration in lambda from the step function directly without having to copy-and-paste?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hi Joe, You're welcome! If you are trying to pass your event payload to your lambda function through step functions, when you are running your step function execution in the console manually, you can paste your test payload there. You should set up your step function so the payload gets passed directly to your lambda function with the parameters your lambda needs. I hope this is what you are looking for.

  • @theroadbacktonature
    @theroadbacktonature Год назад +1

    thanks for the demo. Can you provide more details on what Glue publishes to SNS? So we dont have to write any custom json message to sns from glue, that Glue writes success or failure depending the run state automatically?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hi Pradeep, if you attempt to configure a rule with eventbridge with the glue sample, it will tell you what the general payload will look like being passed to sns:
      for example:
      {
      "version": "0",
      "id": "66fbc5e1-aac3-5e85-63d0-856ec669a050",
      "detail-type": "Glue Job Run Status",
      "source": "aws.glue",
      "account": "123456789012",
      "time": "2018-04-24T20:57:34Z",
      "region": "us-east-1",
      "resources": [],
      "detail": {
      "jobName": "MyJob",
      "severity": "INFO",
      "notificationCondition": {
      "NotifyDelayAfter": 1
      },
      "state": "STARTING",
      "jobRunId": "jr_6aa58e7a3aa44e2e4c7db2c50e2f7396cb57901729e4b702dcb2cfbbeb3f7a86",
      "message": "Job is in STARTING state",
      "startedOn": "2018-04-24T20:55:47.941Z"
      }
      }

  • @GiorgosBastoulis
    @GiorgosBastoulis 4 месяца назад

    Excellent video, thanks for sharing!
    I have a question, I want to run a bash script and trigger it via Lambda with Step Functions. Is that possible?

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад +1

      Yes, you can “wrap” your bash script within a supported language like Node.js or Python. For example, in Node.js, you can use the child_process module to execute a bash script.
      Remember to package your bash script and any other necessary files into a ZIP file and upload it to AWS Lambda. Also, ensure that your bash script has the appropriate permissions to be executable.

  • @STEVEN4841
    @STEVEN4841 4 месяца назад

    Very useful, thanks, but, if I need to call 5 glue have bs for example, I can tell crate a workflow an then call whit workflow from this same way?

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад

      Hi Steven, can you edit your sentance, I don't understand what you trying to do.

  • @Kaisean
    @Kaisean 3 месяца назад

    What would be the rationale for using Glue in Step Functions vs. Glue Orchestration?
    If you're doing more than using GlueJob and GlueCrawler, Step Functions make sense, but is that all?

    • @DataEngUncomplicated
      @DataEngUncomplicated  3 месяца назад +1

      The choice between using AWS Glue in Step Functions vs. Glue Orchestration (Glue Workflows) depends on the complexity of your data pipeline and the services you’re using.
      AWS Glue Workflows are beneficial when you’re chaining together multiple Glue jobs and/or crawler. They are particularly useful for batch processing, where you can schedule workflows directly. However, Glue Workflows lack several features common in flow control tools, such as conditional branching, loops, dynamic maps, and custom steps.
      On the other hand, AWS Step Functions are more suitable when the complexity exceeds simple triggers and the services used extend beyond Glue. Step Functions provide more advanced orchestration capabilities, including support for error handling, parallel execution, and conditional logic. They also integrate with over 220 AWS services, making them a more flexible choice for complex, multi-service workflows.
      In addition, Step Functions can handle quick start and shutdown, which can manage a reasonable throughput. They also allow for the execution of parallel jobs, which is not possible in Glue Workflows.

  • @Velben
    @Velben Год назад +1

    I'm curious. How did you learn data engineering?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Working as a data engineer and in the data analytics field for 10 years. Also doing Udemy courses, AWS certifications and side projects to continue to learn as the field is changing so fast with new services coming out all the time.

  • @oscarnegrete486
    @oscarnegrete486 2 года назад +1

    What are the permissions for the publish_to_glue_topic?

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад +1

      Hi Oscar, It just had the sns:Publish action. The full statement looks like this:
      {
      "Version": "2012-10-17",
      "Statement": [
      {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sns:Publish",
      "Resource": "arn:aws:sns:us-east-1:account#:glue_jobs"
      }
      ]
      }

  • @InvestorKiddd
    @InvestorKiddd Год назад

    is their any way to give s3 path and database as input to JobRun s3 stepfunction?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад +1

      Yes, you can pass the S3 path and database as input parameters to an AWS Step Functions State Machine that includes an AWS Glue JobRun S3 Step.
      When you define your Step Function state machine, you can include an input parameter section that specifies the input data that will be passed to the state machine when it is executed. You can define the input parameters as key-value pairs in JSON format.

    • @InvestorKiddd
      @InvestorKiddd Год назад

      @@DataEngUncomplicated thanks,

  • @mallikarjunsangannavar907
    @mallikarjunsangannavar907 Год назад

    How to enable the step function to run the jobs in parallel

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hi Mallikarjun, there is a parallel state which will allow you to run whatever jobs in parallel

  • @SimonLopez-hj2cj
    @SimonLopez-hj2cj Месяц назад

    how do i get to personalize the message that sns sends?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Месяц назад

      In the sns step there should be a box where you can customize the message

    • @SimonLopez-hj2cj
      @SimonLopez-hj2cj Месяц назад

      @@DataEngUncomplicated then how do i use the parameters of the job? for example if i want to send "The job state is (~SUCCEDED~ or ~FAILED~). At this time ~endtime~ ", thanks