Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step

Поделиться
HTML-код
  • Опубликовано: 2 июл 2024
  • This Video is a step-by-step tutorial on configuring your Windows computer to work with Python Professional and Docker to run AWS Glue Jobs. This video will walk through how to configure AWS Glue 4.0.
    timeline
    00:00 Introduction
    01:11 Pull aws glue docker image
    02:46 Configure Python interrupter with docker
    04:16 Set up AWS Glue-related code completion suggestions
    07:06 Configure Environment Variables
    08:26 Update Docker Container Settings
    10:32 Test the setup
    Buy me a Coffee - www.buymeacoffee.com/dataengu
    Tutorial Links:
    AWS Documentation - aws.amazon.com/blogs/big-data...
    AWS Glue 4.0 Image on Docker Hub - hub.docker.com/layers/amazon/...
    aws-glue-libs: github.com/awslabs/aws-glue-libs
    sample glue script: github.com/AdrianoNicolucci/d...
    #aws #dataengineering #awsglue

Комментарии • 76

  • @bartoszturkowyd3608
    @bartoszturkowyd3608 10 месяцев назад +1

    Oh, such a great timing for such a great tutorial! Thank you very much! ❤

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад

      Thanks for your kind words! I'm glad it was helpful! I recommend this way to develop glue jobs.

  • @IWasBoredSo
    @IWasBoredSo 9 месяцев назад +2

    you just saved me a few bucks that I was spending on Glue during some experiments and learning! Good to have that kind of content on youtube and possibility to support you :)

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 месяцев назад

      Wow, Thanks for the direct support through buying me a coffee, I really appreciate the support Hubert! I'm happy that you were able to save on compute costs!

  • @kandikondakarthik1432
    @kandikondakarthik1432 10 месяцев назад +2

    Wow, simply amazing video. Very well explained and detailed information. Please keep doing the great work!

  • @herleyshaori
    @herleyshaori 5 месяцев назад +1

    This video helps me.

  • @prabhathkota107
    @prabhathkota107 2 месяца назад +1

    Very much helpful. Thanks

  • @maximilianrausch5193
    @maximilianrausch5193 8 месяцев назад

    Amazing video

  • @dougkfarrell
    @dougkfarrell 27 дней назад +1

    This is fantastic! I'm new to AWS Glue and was really struggling to get traction developing an ETL script. Being able to develop locally, I don't really care about the costs, but the ability to debug, get feedback, and just the turnaround time to try things is amazing. Again, thanks.
    I'd like to ask you more questions, how can I do that?

    • @DataEngUncomplicated
      @DataEngUncomplicated  27 дней назад +1

      Thanks, feel free to post your questions here. Me or someone else might be able to help you out!

    • @dougkfarrell
      @dougkfarrell 27 дней назад

      @@DataEngUncomplicated Thanks! I'm using Glue ETL to read two different CSV files into Dynamic Frames, normalize and union them together. I need to write some SQL to an existing RDS MySQL database to query records to figure out if I need to update or insert data. Is there a good (as in fast) way to iterate over the normalized, unioned DynamicFrame and read and write to an RDS MySQL database?
      Thanks in advance for any help!

  • @ahm_mask5161
    @ahm_mask5161 10 месяцев назад +1

    Loved the video would of loved it more if it was in vs code also if you could make a etl tutorial using glue locally that would be awesome

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад +1

      Thanks, I will make a video with vs code since there seems to be some demand for this! Yup I'm also working on some tutorials using glue locally in my next couple of videos

  • @mackfarshi8289
    @mackfarshi8289 6 месяцев назад

    Thank you so much for this video. Very well explained and helpful. I was wondering if there is a way that we can also resolve "SparkContext" error in the import or link to a video you explain it. really appreciate it.

  • @Fight3211
    @Fight3211 10 месяцев назад +3

    Would love a similar tutorial for VScode :)

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад +3

      You're the second person that has requested this! Do you think more folks use vs code? I am considering making a video soon.

    • @ahm_mask5161
      @ahm_mask5161 10 месяцев назад +2

      I was literally thinking the same thing

    • @waleayeni
      @waleayeni 9 месяцев назад

      yes please

    • @user-cj4ug8pv3z
      @user-cj4ug8pv3z 9 месяцев назад

      please >

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @Patrick-ig3cn
    @Patrick-ig3cn 10 месяцев назад +1

    Amazing tutorial! You explained the whole process extremely clearly.
    Quick question, do you know if this is also possible to set up in VSCode?

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад +5

      Thanks Patrick! Great I'm glad it made sense. Yes! It is also possible to set up in vs code! I don't use vscode but I could make a video if enough people think it would be useful.

    • @Patrick-ig3cn
      @Patrick-ig3cn 10 месяцев назад +1

      Thanks for the reply, if there is demand for it I'd be extremely grateful!
      Otherwise thanks so much again for the tutorial, it's extremely enlightening on the whole process!

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад

      ​@@Patrick-ig3cn you're welcome! I highly recommend developing glue jobs locally. I have another video coming out tomorrow briefly explaining the benefits.

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      I have just uploaded the video for setting it up with vs code Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @kkos
    @kkos 7 месяцев назад

    Great Video! All works, however we cannot use Docker API you're using in tutorial. I've tried to connect to Docker daemon using SSH. I can run Glue Job, but cannot run debugger. Getting ConnectionRefusedError: [Errno 111] Connection refused. Did you manage to make debugger work for Docker SSH?

  • @user-rh1xc4qp6u
    @user-rh1xc4qp6u 7 месяцев назад

    Great video! really nice!
    I am struggling to find out how to set the "--additional-python-modules" anyone else ? can´t find anything related to it for local run :(

  • @aabbassp
    @aabbassp 8 месяцев назад

    Thanks for the video! Amazing.
    Can you deploy this to AWS somehow automatically or you need to do it manually?

    • @DataEngUncomplicated
      @DataEngUncomplicated  8 месяцев назад

      Yes! You cann deploy this automatically many ways. Using terraform, cdk , or cloud formation template

  • @nguyentonggiang1994
    @nguyentonggiang1994 9 месяцев назад

    Nice video. You've got a thumbs up from me. However, I got trouble when installing extra python libraries to the glue container. Could you please guide me how to install external python library to this glue container? Thanks a lot.

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 месяцев назад

      Thanks, that's a good question, I will have to get back to you. I'm sure you can do it by going into the docker container and installing them directly in there but I wonder if there is an easier way to do this.

  • @yashsrivastava14
    @yashsrivastava14 5 месяцев назад

    Can you also show how to setup default credentials in the docker container?

    • @DataEngUncomplicated
      @DataEngUncomplicated  5 месяцев назад

      Hi, yes I cover this in the video. You have to set the credential path in the docker image.

  • @user-qm3lq4dv6g
    @user-qm3lq4dv6g 7 месяцев назад

    Nice video. I want to know how to access tables of the glue catalog that belongs to the related aws account? if run spark.sql('show databases') in your script, will all databases of the online catalog be shown?

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      It should as long as your profile has the permission to see the databases

  • @giorgosstamatakis7144
    @giorgosstamatakis7144 9 месяцев назад +1

    Great video, I would like to ask if anyone else experienced the following issue.
    When I add the glue-libs repo as a new content root, PyCharm stops recognising the pyspark imports as valid. Moreover, the window visible in 6:10 (showing the available python packages) is empty. Any ideas on what could have gone wrong?

    • @maximilianrausch5193
      @maximilianrausch5193 4 месяца назад

      I am having the same issue (no packages shown as available). Any ideas how to fix it?

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад

      Thanks, is your docker container running? That's the first thing I would check to make sure it's not a problem finding the docker container on your machine

    • @maximilianrausch5193
      @maximilianrausch5193 4 месяца назад +1

      @@DataEngUncomplicated I updated pycharm to newest version and it resolved the issue.

  • @abhishekgarg6301
    @abhishekgarg6301 8 месяцев назад +1

    great tutorial, will you be creating the same with visual studio code?

    • @DataEngUncomplicated
      @DataEngUncomplicated  8 месяцев назад +1

      I will be as soon as I come back from vacation!

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @ahkamnaseek2850
    @ahkamnaseek2850 9 месяцев назад

    Hi, can you please tell what’s your exact pycharm version please? Coz, docker is not working correctly with new pucharm version. I tried with 2023.1.4 and it worked

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 месяцев назад

      Sure! My version is 2023.1. strange, hopefully they fix the issue with the latest version. I wonder if anyone else is experiencing the same issue you have encountered?

  • @guyfridman4426
    @guyfridman4426 7 месяцев назад

    Thank you, any chance to do the same tutorial on Mac ?

  • @asishb
    @asishb 9 месяцев назад +1

    Hello ! I dont have Professional version of PyCharm. Is there any way that you can explain how to configure using VS Code or free version of PyCharm ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 месяцев назад +3

      Hey! Sorry you need the professional version of pycharm for it to work. I plan on making a tutorial for vs code soon.

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад +1

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @ahkamnaseek2850
    @ahkamnaseek2850 8 месяцев назад

    Did you try to install additional python packages to the image?? From the IDE it’s not allowing.

    • @DataEngUncomplicated
      @DataEngUncomplicated  8 месяцев назад

      No I did not, you probably need to go into the docket image and install it that way vs through the UI. Have you tried that?

    • @ahkamnaseek2850
      @ahkamnaseek2850 8 месяцев назад

      @@DataEngUncomplicated we can’t log inside the image directly right? Wht I did was I could be able to run the default image as container and logged in to it and installed the library and built the container as a new image. Then from pycharm, I pointed to it. Now the library is visible from pycharm but the import statement is failing while running the code. Idkw 😌

  • @prabhathkota107
    @prabhathkota107 2 месяца назад

    Docker option not available in PyCharm community edition I guess

  • @errrbrrr3821
    @errrbrrr3821 9 месяцев назад

    please make also for vs code

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @LearningNewThings0407
    @LearningNewThings0407 4 месяца назад

    The sample data used in the script should be present in my personal aws accounts s3 bucket ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад

      For testing locally or deployment to use on the AWS Glue Service? For testing locally you do not need it in your personal aws account s3 bucket if you are running docker locally.

    • @LearningNewThings0407
      @LearningNewThings0407 4 месяца назад

      @@DataEngUncomplicated I am trying to test Glue locally. I have docker running locally. I am not sure about the "Update Docker Container Settings". Why do we need to provide AWS credentials and why IAM permissions are required specifically for this testing ? My understanding is that these credentials and permissions are used to connect/use services on AWS but since we are running it locally, do we still need to provide AWS credentials? Also, say if I don't have an AWS account setup yet, does it mean I cannot run AWS Glue locally as well ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад

      Good question, so if you need to connect to data on an s3 bucket for testing then you need to pass in credentials. If not, then you don't need to pass in any profile and can skip this sense. It's not a requirement.

    • @LearningNewThings0407
      @LearningNewThings0407 4 месяца назад

      @@DataEngUncomplicated thank you so much for confirming this. So is the data file "memberships.json" used in this example located in the docker image running locally? In the code the path points to s3 location. Please let me know if this assumption is correct.

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 месяца назад

      No, the member.json file is coming from s3 and I needed an iam role that had permission to access that s3 bucket which is why I had to pass the credential file into the docker image. The data is being moved from s3 into the docker container when I run the code. Hopefully this helps clarify things.

  • @ricardoroa5874
    @ricardoroa5874 9 месяцев назад

    I dont have the AWS Connection window, I need to install something additional on pycharm?

    • @ricardoroa5874
      @ricardoroa5874 9 месяцев назад

      I just installed AWS CLI and It work!, thanks, great tutorial!

    • @DataEngUncomplicated
      @DataEngUncomplicated  9 месяцев назад +1

      Hi Ricardo, Sorry I must have missed that pre-requisite. Thanks for flagging this for others! I'm glad you got it working! It's going to make development much better

    • @gouravroy4573
      @gouravroy4573 Месяц назад

      @@DataEngUncomplicated I am not getting AWS connection window even after installing aws cli. I am using pycharm professional edition.

  • @brunoniello2019
    @brunoniello2019 8 месяцев назад

    i use vs code :(

    • @DataEngUncomplicated
      @DataEngUncomplicated  8 месяцев назад

      I'll make a video setting it up with vs code

    • @DataEngUncomplicated
      @DataEngUncomplicated  7 месяцев назад

      I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! ruclips.net/video/__j-SyopVBs/видео.html

  • @Dickandsongs
    @Dickandsongs 3 месяца назад

    Great tutorial. Would be even more useful if you would explain, how to add additional libraries to the run.

    • @Dickandsongs
      @Dickandsongs 3 месяца назад

      and aws configuration didn't worked...