Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions

Поделиться
HTML-код
  • Опубликовано: 2 июл 2024
  • This video is based on an AWS blog post tutorial on getting started with AWS Glue interactive sessions.
    AWS Blog Article: aws.amazon.com/blogs/big-data...
    timeline
    00:00 tutorial overview
    01:33 AWS permission configuration
    03:53 Pycharm Configuration
    08:31 Run Jupyter notebook
    #awsglue #aws

Комментарии • 54

  • @tahayusufkomur1710
    @tahayusufkomur1710 3 месяца назад +1

    That was so great man, thank you for sharing this.
    Subscribed!

  • @ericknavadel
    @ericknavadel Год назад +2

    Excellent video, thank you very much for sharing your knowledge with us.

  • @ericsalesdeandrade9420
    @ericsalesdeandrade9420 Год назад +1

    Hi thanks for the tutorial. Super helpful.
    Would you know if it's possible (+ how) to use an external library within these interactive shell notebooks? I would like to use Pydantic and Pandera to validate the Dataframe schemas.

  • @DataEngUncomplicated
    @DataEngUncomplicated  Год назад

    Hi everyone, for anyone recently getting errors when trying to start a interactive session, there appears to be a bug on windows machine. To solve this, if you revert the aws-glue-sessions to 0.32 this seemed to work again. You can revert by running the command pip install aws-glue-sessions==0.32 in the environment that has it installed.

  • @RenevanDuren
    @RenevanDuren Год назад

    Finally got it to work, thanks to the comment of DataEng Uncomplicated solved my nightmare of not getting this to work on a Windows machine.

    • @DataEngUncomplicated
      @DataEngUncomplicated  11 месяцев назад

      No problem! I felt your pain as I was reading this. I encountered the exact same issue. But when I first made the video, the issue didn't exist. I'm surprised they haven't fixed the issue yet with the later glue interactive session version.

  • @tejaswinirao4873
    @tejaswinirao4873 Год назад

    Hi, I am not able to connect to the kernel , I could see all the glue libraries installed correctly. Jupyter terminal says kernel died please help to resolve this issue.

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Strange, I'm not sure the root of the problem but perhaps restarting the kernel might help?

  • @AmritAgarwal07
    @AmritAgarwal07 Год назад

    To run this command Jupiter-kernelspec. Aws cli is required.?

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Not that I'm aware of since I believe this is not a aws cli command.

  • @SoumilShah
    @SoumilShah Год назад +1

    You can just run glue locally on docker that ways you don’t pay a dime

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад

      This is a great point! I'm working on a tutorial to set this up in my next video. I learned the documentation is not up to date and is missing important steps to get aws glue 4.0 docker container set up! I will share my video when it's done in a link here

    • @DataEngUncomplicated
      @DataEngUncomplicated  10 месяцев назад

      ruclips.net/video/-4ZnJkM-QDk/видео.htmlsi=GfAggr80BlN5eDp3

  • @jeancarlovallejos2464
    @jeancarlovallejos2464 Год назад +1

    Hi! I have this error mesage: raise Exception(f"Valid Glue versions are {VALID_GLUE_VERSIONS}")
    Exception: Valid Glue versions are {'3.0', '2.0'}

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hi Jean, Sounds like you have not entered a valid glue version

    • @jeancarlovallejos2464
      @jeancarlovallejos2464 Год назад

      @@DataEngUncomplicated thanks for answer! And how can I ensure the correct version of Glue? I followed the same steps as yours installing py 3.10, spark and pyspark

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      There is a glue magic you can use to assign it to be 3.0

    • @zhouhaozqq
      @zhouhaozqq Год назад

      I had the same problem, have you solved it? And how?

  • @prabhathkota107
    @prabhathkota107 2 месяца назад

    Didnt understand why cost incured? As its running locally & why to keeep nodes set to 2

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 месяца назад

      Your ui is local but there is still a spark cluster running in AWS.

  • @js3860
    @js3860 2 года назад +1

    Nice video. If like me you are using federated access and an assumed role this whole process will fail. Sadly AWS hasn't built out their service for SSO customers :(

  • @jinmina
    @jinmina Год назад

    how do we revert glue_kernel_version to 0.32? Is there any command line you can share?
    I do have the following error message:
    C:\Users\***\PycharmProjects\GlueInteractiveSession1\venv\Scripts\python.exe: Error while finding module specification for 'aws_glue_interactive_sessions_kernel.glue_pyspark.GlueKernel' (ModuleNotFoundError: No module named 'aws_glue_interactive_sessions_kernel')

    • @jinmina
      @jinmina Год назад

      I learn that there is a bug with current aws-glue-sessions. Please run the following command. That will do the trick.
      pip install aws-glue-sessions==0.32

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      If you are on a windows machine, you can use pip to revert back to 0.32.

  • @mujadidkhalid3826
    @mujadidkhalid3826 2 года назад

    Can you please show some example how can we put functions in simple python file and then use that in notebook for glue interactive session? thanks

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад

      Hi Mujadid, it would be the same as developing a function in any other jupyter notebook with pyspark. Sorry the video was based on a tutorial by aws

    • @maximilianrausch5193
      @maximilianrausch5193 2 года назад

      I have a similar question. Is it possible to run a python script or can you only use jupyter?

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад

      @@maximilianrausch5193 Hi Max, their documentation doesn't say it's only for jupyternote books but I haven't tested it out with just a python script...in their press release they say "Interactive Sessions let them process data interactively using the Jupyter-based notebook or IDE of their choice."

    • @maximilianrausch5193
      @maximilianrausch5193 2 года назад

      @@DataEngUncomplicated if you could make a follow up video where you test a script that would be really helpful! I’ll also put in a ticket on AWS support.

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад +2

      Sure I have added this to my future video list, I'll try to play around with this feature to see if I can get it to work....Are you ok if I give you a shout out in this future video..."this video was requested by Maximilian as a follow up to my previous video"

  • @CHANTI8947
    @CHANTI8947 2 года назад

    Is the same possible to do using vs code

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад

      Yes I believe so. I haven't tried it thougb

    • @ryanyue5159
      @ryanyue5159 Год назад

      @@DataEngUncomplicated can you please have a video or instruction to do with vs code? I got a problem which cannot find pyspark kernel via vscode

  • @thevijayraj34
    @thevijayraj34 2 года назад

    Can we do this with Python Community edition?

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад +1

      I don't think community edition supports Jupiter notebooks so it won't be possible as far as I know.

    • @thevijayraj34
      @thevijayraj34 2 года назад

      @@DataEngUncomplicated ok

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 года назад +1

      Actually it says interactive glue sessions is supported by other IDS so it might be possible we don't need jupyter notebooks for this to work...I'm going to test this out I don't want to give you the wrong answer.

    • @thevijayraj34
      @thevijayraj34 2 года назад

      @@DataEngUncomplicated Thanks mate. Actually I'm stuck with Office credentials, I don't have free access to many things. 🥴

    • @trinath89
      @trinath89 Год назад

      @@DataEngUncomplicated Hi, I am struck with similar situation I am using the Pycharm latest community edition, i configured everything that is mentioned until 8:28 and then i cannot see the option to create a jupyter notebook and stuck here. Please help me.

  • @shashankreddy8390
    @shashankreddy8390 Год назад +1

    Hi buddy this is a nice video, but every one creates video on reading and writing from s3.
    1. Can you create a video on how to use Glue studio notebook to read data from Awsgluecatalog and write the results to S3?
    2. Please can you include every step- i.e what kind of permissions should we need to create to read and write.
    Also recommend doing a video on Athena notebook editor reading data from Gluecatalog using pyspark.
    (Please also include detailed permissions steps)

  • @AmritAgarwal07
    @AmritAgarwal07 Год назад

    Jupiter-kernelspec: command not found

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hi Armit, check out the blog post I included in the description for code. Sounds like you might have missed a step.

    • @AmritAgarwal07
      @AmritAgarwal07 Год назад

      @@DataEngUncomplicated I have gone through with the blog. I am doing as per the steps but still facing the same issue

    • @AmritAgarwal07
      @AmritAgarwal07 Год назад

      @@DataEngUncomplicated in blog you are using EC2 not in video EC2 is not there

    • @DataEngUncomplicated
      @DataEngUncomplicated  Год назад

      Hmm did you make sure your awscli is up to date? Not sure what version is mentions you need for this to work.

    • @AmritAgarwal07
      @AmritAgarwal07 Год назад

      @@DataEngUncomplicated can you help me to resolve this error