AWS EMR Big Data Processing with Spark and Hadoop | Python, PySpark, Step by Step Instructions
HTML-код
- Опубликовано: 30 июл 2024
- In this video, I gave an overview of what EMR is and its benefits in the big data and machine learning world. I then provided a step by step instruction on how to spin up an EMR cluster and do a spark submit job on it to process data from a Stack Over Flow survey.
Support the channel plz 😊: www.buymeacoffee.com/felixyu
Instruction on how to create a key pair on aws: docs.aws.amazon.com/AWSEC2/la...
Good stuffs! Your videos are always so detailed and informative 👍🏻
Wow, Felix, you are a fantastic teacher. Really happy to find your channel. Thanks
Wow, so informative!!! Thanks so much for teaching me how to do this!!!
Crisp and to the point, thanks Felix please keep it up.
tyty
Thanks! Your video is far, far better than most out there!
Note to other viewers: If you decide to try this yourself, you should download the 2020 survey results, and not something newer. The fields are different in newer versions of the survey.
very good and undervalued content keep up with the good work man :) you are helping a lot! I bet your channel will start growing soon.
Thank you!! This means a lot :)
awesome, one of the few 30min tutorials for big data on aws that actually worked!
glad that u found it helpful!! 👍
This is what I was looking for. Thanks for the video.
glad that u found it helpful!! 👍
Very informative! 🙂Thank you so much!
thank you sir for this session....love from india🙏
Very well explained 👏👌
Excellent!!
Great content!
Awesome! Thanks!
sir your explanation is very clear ...i request you to make end to end project videos on aws etl
Thank you!
Wow, That's amazing
Thank you!!
Thank you so much sir for detail explanation it will be very useful to us ❤❤ thanks a lot
Glad that u found it helpful!!
Outstanding . Thanks dude
Glad u like it
Awesome video!
glad that u found it helpful!!
Good stuff sir!
Really great stuff... excellent presentation !!
thank you..glad that it's helpful
Great tutorial man keep it up
The best...Awesome.
Thanks a lot!
Thanks for the AWS EMR Configuration details. How the underlying S3 or HDFS is distributing data blocks for parallel processing? How redundancy and parallelism can be configured? I have logs from airline equipment for the last 30 years, equivalent to 1 PB. I want to use all of it to identify failures with indicators.
Great Tutorial
Thank you
well done Felix :-)
thank for the sharing !
Glad that u found it helpful
Very nicely explained
thank you!! glad that u found it helpful!!
Very informative. Can you do a deep dive of aws emr?
Awesome tutorial brother
tyty 😄
Yu-demy! thank you!
Haha glad that u found it helpful!!
Very good to start with EMR hands-on
Glad that u found it helpful!!
Clean...👍
Thank you!!
good tute
good stuff Felix ..do u have video of migrating on premises to cloud bigdata cluster ?
Do you have any document to map the property graph model to Hadoop
hi what is the difference in 4 applications? when u create the cluster there is 4 options, how do you know wchich one to select?
god job mate
Thanks mate!!
thanks!
Glad that u found it helpful
Very good
thanks mate
Great Video !! However, right now I think you do have to set up an IAM role for accessing your S3 bucket is it not ?
Thanks for the video ....can you help me with dependency files like the python uses other module ...how to go about that? when i want to submit with spark submit in EMR
excallent
Thank you!!
How to set application with database ?
When I am doing big data then I am really using B I G data. What are the challenges at the Peta Byte level?
Felix, thank you for your sharing but the application interface is changed. So I can't do your application. Will you share your video updated version?
I have a question , do you have any course explaining pyspark ?? or any recomendations maybe
If we terminate Amazon got going to charge money...what is the police to use it for free for practice purpose
Hi, the EMR outputs many files as part_0000,....and so on in S3. But i want just one output file after pre-processing it in spark EMR. How to do that?
U can do something like
df.repartition(1).write.mode(“overwrite”).parquet(“locationPath”)
@@FelixYu Thanks a lot
When I try to use command "spark-submit" I get "-bash: spark-submit: command not found"
Is there any solution to this ? I´m using Putty on Win
On the EMR creation (4:10 of the video), did u choose the last combination that has spark and Hadoop??
Friend, how do I save Jupyter notebooks in my EMR?
I guess there's an option to enable jupyterhub right when you are initializing the EMR cluster
I don't have vs code installed. What should I do?
U don’t have to use vs code to write the code. U can use any IDE for it. Or u can just google vscode to download it
Excellent demo! What was the AWS cost of running this demo?
I didn’t check but prob less than $1
Good stuff, however seems like someone is sleeping and snoring in background 😴
who is snoring in the background