Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
HTML-код
- Опубликовано: 2 июл 2024
- This video is based on an AWS blog post tutorial on getting started with AWS Glue interactive sessions.
AWS Blog Article: aws.amazon.com/blogs/big-data...
timeline
00:00 tutorial overview
01:33 AWS permission configuration
03:53 Pycharm Configuration
08:31 Run Jupyter notebook
#awsglue #aws
That was so great man, thank you for sharing this.
Subscribed!
Excellent video, thank you very much for sharing your knowledge with us.
Thanks Erick! I'm glad you found this helpful!
Hi thanks for the tutorial. Super helpful.
Would you know if it's possible (+ how) to use an external library within these interactive shell notebooks? I would like to use Pydantic and Pandera to validate the Dataframe schemas.
Hi everyone, for anyone recently getting errors when trying to start a interactive session, there appears to be a bug on windows machine. To solve this, if you revert the aws-glue-sessions to 0.32 this seemed to work again. You can revert by running the command pip install aws-glue-sessions==0.32 in the environment that has it installed.
Finally got it to work, thanks to the comment of DataEng Uncomplicated solved my nightmare of not getting this to work on a Windows machine.
No problem! I felt your pain as I was reading this. I encountered the exact same issue. But when I first made the video, the issue didn't exist. I'm surprised they haven't fixed the issue yet with the later glue interactive session version.
Hi, I am not able to connect to the kernel , I could see all the glue libraries installed correctly. Jupyter terminal says kernel died please help to resolve this issue.
Strange, I'm not sure the root of the problem but perhaps restarting the kernel might help?
To run this command Jupiter-kernelspec. Aws cli is required.?
Not that I'm aware of since I believe this is not a aws cli command.
You can just run glue locally on docker that ways you don’t pay a dime
This is a great point! I'm working on a tutorial to set this up in my next video. I learned the documentation is not up to date and is missing important steps to get aws glue 4.0 docker container set up! I will share my video when it's done in a link here
ruclips.net/video/-4ZnJkM-QDk/видео.htmlsi=GfAggr80BlN5eDp3
Hi! I have this error mesage: raise Exception(f"Valid Glue versions are {VALID_GLUE_VERSIONS}")
Exception: Valid Glue versions are {'3.0', '2.0'}
Hi Jean, Sounds like you have not entered a valid glue version
@@DataEngUncomplicated thanks for answer! And how can I ensure the correct version of Glue? I followed the same steps as yours installing py 3.10, spark and pyspark
There is a glue magic you can use to assign it to be 3.0
I had the same problem, have you solved it? And how?
Didnt understand why cost incured? As its running locally & why to keeep nodes set to 2
Your ui is local but there is still a spark cluster running in AWS.
Nice video. If like me you are using federated access and an assumed role this whole process will fail. Sadly AWS hasn't built out their service for SSO customers :(
Thanks, ah that's a shame! thanks for posting this point.
how do we revert glue_kernel_version to 0.32? Is there any command line you can share?
I do have the following error message:
C:\Users\***\PycharmProjects\GlueInteractiveSession1\venv\Scripts\python.exe: Error while finding module specification for 'aws_glue_interactive_sessions_kernel.glue_pyspark.GlueKernel' (ModuleNotFoundError: No module named 'aws_glue_interactive_sessions_kernel')
I learn that there is a bug with current aws-glue-sessions. Please run the following command. That will do the trick.
pip install aws-glue-sessions==0.32
If you are on a windows machine, you can use pip to revert back to 0.32.
Can you please show some example how can we put functions in simple python file and then use that in notebook for glue interactive session? thanks
Hi Mujadid, it would be the same as developing a function in any other jupyter notebook with pyspark. Sorry the video was based on a tutorial by aws
I have a similar question. Is it possible to run a python script or can you only use jupyter?
@@maximilianrausch5193 Hi Max, their documentation doesn't say it's only for jupyternote books but I haven't tested it out with just a python script...in their press release they say "Interactive Sessions let them process data interactively using the Jupyter-based notebook or IDE of their choice."
@@DataEngUncomplicated if you could make a follow up video where you test a script that would be really helpful! I’ll also put in a ticket on AWS support.
Sure I have added this to my future video list, I'll try to play around with this feature to see if I can get it to work....Are you ok if I give you a shout out in this future video..."this video was requested by Maximilian as a follow up to my previous video"
Is the same possible to do using vs code
Yes I believe so. I haven't tried it thougb
@@DataEngUncomplicated can you please have a video or instruction to do with vs code? I got a problem which cannot find pyspark kernel via vscode
Can we do this with Python Community edition?
I don't think community edition supports Jupiter notebooks so it won't be possible as far as I know.
@@DataEngUncomplicated ok
Actually it says interactive glue sessions is supported by other IDS so it might be possible we don't need jupyter notebooks for this to work...I'm going to test this out I don't want to give you the wrong answer.
@@DataEngUncomplicated Thanks mate. Actually I'm stuck with Office credentials, I don't have free access to many things. 🥴
@@DataEngUncomplicated Hi, I am struck with similar situation I am using the Pycharm latest community edition, i configured everything that is mentioned until 8:28 and then i cannot see the option to create a jupyter notebook and stuck here. Please help me.
Hi buddy this is a nice video, but every one creates video on reading and writing from s3.
1. Can you create a video on how to use Glue studio notebook to read data from Awsgluecatalog and write the results to S3?
2. Please can you include every step- i.e what kind of permissions should we need to create to read and write.
Also recommend doing a video on Athena notebook editor reading data from Gluecatalog using pyspark.
(Please also include detailed permissions steps)
Jupiter-kernelspec: command not found
Hi Armit, check out the blog post I included in the description for code. Sounds like you might have missed a step.
@@DataEngUncomplicated I have gone through with the blog. I am doing as per the steps but still facing the same issue
@@DataEngUncomplicated in blog you are using EC2 not in video EC2 is not there
Hmm did you make sure your awscli is up to date? Not sure what version is mentions you need for this to work.
@@DataEngUncomplicated can you help me to resolve this error