Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions

DataEng Uncomplicated

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 июл 2024
This video is based on an AWS blog post tutorial on getting started with AWS Glue interactive sessions.
AWS Blog Article: aws.amazon.com/blogs/big-data...
timeline
00:00 tutorial overview
01:33 AWS permission configuration
03:53 Pycharm Configuration
08:31 Run Jupyter notebook
#awsglue #aws

Комментарии • 54

@tahayusufkomur1710 3 месяца назад ⁺¹
That was so great man, thank you for sharing this.
Subscribed!
@ericknavadel Год назад ⁺²
Excellent video, thank you very much for sharing your knowledge with us.
@DataEngUncomplicated Год назад
Thanks Erick! I'm glad you found this helpful!
@ericsalesdeandrade9420 Год назад ⁺¹
Hi thanks for the tutorial. Super helpful.
Would you know if it's possible (+ how) to use an external library within these interactive shell notebooks? I would like to use Pydantic and Pandera to validate the Dataframe schemas.
@DataEngUncomplicated Год назад
Hi everyone, for anyone recently getting errors when trying to start a interactive session, there appears to be a bug on windows machine. To solve this, if you revert the aws-glue-sessions to 0.32 this seemed to work again. You can revert by running the command pip install aws-glue-sessions==0.32 in the environment that has it installed.
@RenevanDuren Год назад
Finally got it to work, thanks to the comment of DataEng Uncomplicated solved my nightmare of not getting this to work on a Windows machine.
@DataEngUncomplicated 11 месяцев назад
No problem! I felt your pain as I was reading this. I encountered the exact same issue. But when I first made the video, the issue didn't exist. I'm surprised they haven't fixed the issue yet with the later glue interactive session version.
@tejaswinirao4873 Год назад
Hi, I am not able to connect to the kernel , I could see all the glue libraries installed correctly. Jupyter terminal says kernel died please help to resolve this issue.
@DataEngUncomplicated Год назад
Strange, I'm not sure the root of the problem but perhaps restarting the kernel might help?
@AmritAgarwal07 Год назад
To run this command Jupiter-kernelspec. Aws cli is required.?
@DataEngUncomplicated Год назад
Not that I'm aware of since I believe this is not a aws cli command.
@SoumilShah Год назад ⁺¹
You can just run glue locally on docker that ways you don’t pay a dime
@DataEngUncomplicated 10 месяцев назад
This is a great point! I'm working on a tutorial to set this up in my next video. I learned the documentation is not up to date and is missing important steps to get aws glue 4.0 docker container set up! I will share my video when it's done in a link here
@DataEngUncomplicated 10 месяцев назад
ruclips.net/video/-4ZnJkM-QDk/видео.htmlsi=GfAggr80BlN5eDp3
@jeancarlovallejos2464 Год назад ⁺¹
Hi! I have this error mesage: raise Exception(f"Valid Glue versions are {VALID_GLUE_VERSIONS}")
Exception: Valid Glue versions are {'3.0', '2.0'}
@DataEngUncomplicated Год назад
Hi Jean, Sounds like you have not entered a valid glue version
@jeancarlovallejos2464 Год назад
@@DataEngUncomplicated thanks for answer! And how can I ensure the correct version of Glue? I followed the same steps as yours installing py 3.10, spark and pyspark
@DataEngUncomplicated Год назад
There is a glue magic you can use to assign it to be 3.0
@zhouhaozqq Год назад
I had the same problem, have you solved it? And how?
@prabhathkota107 2 месяца назад
Didnt understand why cost incured? As its running locally & why to keeep nodes set to 2
@DataEngUncomplicated 2 месяца назад
Your ui is local but there is still a spark cluster running in AWS.
@js3860 2 года назад ⁺¹
Nice video. If like me you are using federated access and an assumed role this whole process will fail. Sadly AWS hasn't built out their service for SSO customers :(
@DataEngUncomplicated 2 года назад
Thanks, ah that's a shame! thanks for posting this point.
@jinmina Год назад
how do we revert glue_kernel_version to 0.32? Is there any command line you can share?
I do have the following error message:
C:\Users\***\PycharmProjects\GlueInteractiveSession1\venv\Scripts\python.exe: Error while finding module specification for 'aws_glue_interactive_sessions_kernel.glue_pyspark.GlueKernel' (ModuleNotFoundError: No module named 'aws_glue_interactive_sessions_kernel')
@jinmina Год назад
I learn that there is a bug with current aws-glue-sessions. Please run the following command. That will do the trick.
pip install aws-glue-sessions==0.32
@DataEngUncomplicated Год назад
If you are on a windows machine, you can use pip to revert back to 0.32.
@mujadidkhalid3826 2 года назад
Can you please show some example how can we put functions in simple python file and then use that in notebook for glue interactive session? thanks
@DataEngUncomplicated 2 года назад
Hi Mujadid, it would be the same as developing a function in any other jupyter notebook with pyspark. Sorry the video was based on a tutorial by aws
@maximilianrausch5193 2 года назад
I have a similar question. Is it possible to run a python script or can you only use jupyter?
@DataEngUncomplicated 2 года назад
@@maximilianrausch5193 Hi Max, their documentation doesn't say it's only for jupyternote books but I haven't tested it out with just a python script...in their press release they say "Interactive Sessions let them process data interactively using the Jupyter-based notebook or IDE of their choice."
@maximilianrausch5193 2 года назад
@@DataEngUncomplicated if you could make a follow up video where you test a script that would be really helpful! I’ll also put in a ticket on AWS support.
@DataEngUncomplicated 2 года назад ⁺²
Sure I have added this to my future video list, I'll try to play around with this feature to see if I can get it to work....Are you ok if I give you a shout out in this future video..."this video was requested by Maximilian as a follow up to my previous video"
@CHANTI8947 2 года назад
Is the same possible to do using vs code
@DataEngUncomplicated 2 года назад
Yes I believe so. I haven't tried it thougb
@ryanyue5159 Год назад
@@DataEngUncomplicated can you please have a video or instruction to do with vs code? I got a problem which cannot find pyspark kernel via vscode
@thevijayraj34 2 года назад
Can we do this with Python Community edition?
@DataEngUncomplicated 2 года назад ⁺¹
I don't think community edition supports Jupiter notebooks so it won't be possible as far as I know.
@thevijayraj34 2 года назад
@@DataEngUncomplicated ok
@DataEngUncomplicated 2 года назад ⁺¹
Actually it says interactive glue sessions is supported by other IDS so it might be possible we don't need jupyter notebooks for this to work...I'm going to test this out I don't want to give you the wrong answer.
@thevijayraj34 2 года назад
@@DataEngUncomplicated Thanks mate. Actually I'm stuck with Office credentials, I don't have free access to many things. 🥴
@trinath89 Год назад
@@DataEngUncomplicated Hi, I am struck with similar situation I am using the Pycharm latest community edition, i configured everything that is mentioned until 8:28 and then i cannot see the option to create a jupyter notebook and stuck here. Please help me.
@shashankreddy8390 Год назад ⁺¹
Hi buddy this is a nice video, but every one creates video on reading and writing from s3.
1. Can you create a video on how to use Glue studio notebook to read data from Awsgluecatalog and write the results to S3?
2. Please can you include every step- i.e what kind of permissions should we need to create to read and write.
Also recommend doing a video on Athena notebook editor reading data from Gluecatalog using pyspark.
(Please also include detailed permissions steps)
@AmritAgarwal07 Год назад
Jupiter-kernelspec: command not found
@DataEngUncomplicated Год назад
Hi Armit, check out the blog post I included in the description for code. Sounds like you might have missed a step.
@AmritAgarwal07 Год назад
@@DataEngUncomplicated I have gone through with the blog. I am doing as per the steps but still facing the same issue
@AmritAgarwal07 Год назад
@@DataEngUncomplicated in blog you are using EC2 not in video EC2 is not there
@DataEngUncomplicated Год назад
Hmm did you make sure your awscli is up to date? Not sure what version is mentions you need for this to work.
@AmritAgarwal07 Год назад
@@DataEngUncomplicated can you help me to resolve this error

Следующие

Автовоспроизведение

Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step