AWS EMR Big Data Processing with Spark and Hadoop | Python, PySpark, Step by Step Instructions

Поделиться
HTML-код
  • Опубликовано: 30 июл 2024
  • In this video, I gave an overview of what EMR is and its benefits in the big data and machine learning world. I then provided a step by step instruction on how to spin up an EMR cluster and do a spark submit job on it to process data from a Stack Over Flow survey.
    Support the channel plz 😊: www.buymeacoffee.com/felixyu
    Instruction on how to create a key pair on aws: docs.aws.amazon.com/AWSEC2/la...

Комментарии • 85

  • @aimeeyu1991
    @aimeeyu1991 2 года назад +2

    Good stuffs! Your videos are always so detailed and informative 👍🏻

  • @jasper5016
    @jasper5016 10 месяцев назад

    Wow, Felix, you are a fantastic teacher. Really happy to find your channel. Thanks

  • @maulikpatel7116
    @maulikpatel7116 2 года назад +6

    Wow, so informative!!! Thanks so much for teaching me how to do this!!!

  • @kkb_now_i_have_a_handle
    @kkb_now_i_have_a_handle 2 года назад +2

    Crisp and to the point, thanks Felix please keep it up.

  • @privatestuff4432
    @privatestuff4432 Год назад +9

    Thanks! Your video is far, far better than most out there!
    Note to other viewers: If you decide to try this yourself, you should download the 2020 survey results, and not something newer. The fields are different in newer versions of the survey.

  • @BlazinEdit
    @BlazinEdit 2 года назад +5

    very good and undervalued content keep up with the good work man :) you are helping a lot! I bet your channel will start growing soon.

    • @FelixYu
      @FelixYu  2 года назад

      Thank you!! This means a lot :)

  • @brothermalcolm
    @brothermalcolm 2 года назад +3

    awesome, one of the few 30min tutorials for big data on aws that actually worked!

    • @FelixYu
      @FelixYu  Год назад

      glad that u found it helpful!! 👍

  • @mrmen556
    @mrmen556 Год назад +2

    This is what I was looking for. Thanks for the video.

    • @FelixYu
      @FelixYu  Год назад

      glad that u found it helpful!! 👍

  • @shashikantkrishna4156
    @shashikantkrishna4156 Год назад

    Very informative! 🙂Thank you so much!

  • @sakshi0070
    @sakshi0070 10 месяцев назад +1

    thank you sir for this session....love from india🙏

  • @rajach6636
    @rajach6636 2 года назад

    Very well explained 👏👌

  • @sudippandit9855
    @sudippandit9855 2 года назад

    Excellent!!

  • @khandoor7228
    @khandoor7228 2 года назад

    Great content!

  • @orpat007
    @orpat007 Год назад

    Awesome! Thanks!

  • @WolfmaninKannada
    @WolfmaninKannada Месяц назад

    sir your explanation is very clear ...i request you to make end to end project videos on aws etl

  • @krishnabisen2666
    @krishnabisen2666 2 года назад

    Thank you!

  • @joaovitor12full
    @joaovitor12full 2 года назад

    Wow, That's amazing

  • @electricalsir
    @electricalsir Год назад +1

    Thank you so much sir for detail explanation it will be very useful to us ❤❤ thanks a lot

    • @FelixYu
      @FelixYu  Год назад +1

      Glad that u found it helpful!!

  • @Chennai59
    @Chennai59 Год назад

    Outstanding . Thanks dude

  • @simpledataengineer5231
    @simpledataengineer5231 Год назад

    Awesome video!

    • @FelixYu
      @FelixYu  Год назад

      glad that u found it helpful!!

  • @abhishekchandrashukla3814
    @abhishekchandrashukla3814 3 месяца назад

    Good stuff sir!

  • @VamsiKrishna-vf5gm
    @VamsiKrishna-vf5gm 2 года назад +1

    Really great stuff... excellent presentation !!

    • @FelixYu
      @FelixYu  2 года назад

      thank you..glad that it's helpful

    • @amadeeyo8181
      @amadeeyo8181 Год назад

      Great tutorial man keep it up

  • @PraveenKumar-ic5zo
    @PraveenKumar-ic5zo Год назад

    The best...Awesome.

  • @ParijatKar
    @ParijatKar Год назад

    Thanks for the AWS EMR Configuration details. How the underlying S3 or HDFS is distributing data blocks for parallel processing? How redundancy and parallelism can be configured? I have logs from airline equipment for the last 30 years, equivalent to 1 PB. I want to use all of it to identify failures with indicators.

  • @letscodewithvivek5191
    @letscodewithvivek5191 2 года назад

    Great Tutorial

  • @AndrewPa
    @AndrewPa Год назад

    well done Felix :-)

  • @xiaocuizhang6879
    @xiaocuizhang6879 Год назад

    thank for the sharing !

    • @FelixYu
      @FelixYu  Год назад

      Glad that u found it helpful

  • @ajinkyahatolkar294
    @ajinkyahatolkar294 Год назад

    Very nicely explained

    • @FelixYu
      @FelixYu  Год назад

      thank you!! glad that u found it helpful!!

  • @santoshraju3546
    @santoshraju3546 2 года назад

    Very informative. Can you do a deep dive of aws emr?

  • @ririraman7
    @ririraman7 2 года назад

    Awesome tutorial brother

  • @johnmcgettigan4882
    @johnmcgettigan4882 Год назад

    Yu-demy! thank you!

    • @FelixYu
      @FelixYu  Год назад

      Haha glad that u found it helpful!!

  • @amitchaurasia2691
    @amitchaurasia2691 Год назад

    Very good to start with EMR hands-on

    • @FelixYu
      @FelixYu  Год назад

      Glad that u found it helpful!!

  • @sayedaweshrahman4608
    @sayedaweshrahman4608 2 года назад

    Clean...👍

  • @SahilP-yv4fv
    @SahilP-yv4fv Год назад

    good tute

  • @chaithanyamannem5040
    @chaithanyamannem5040 2 года назад

    good stuff Felix ..do u have video of migrating on premises to cloud bigdata cluster ?

  • @chandnimirchandani3040
    @chandnimirchandani3040 2 года назад

    Do you have any document to map the property graph model to Hadoop

  • @hikariuchiha977
    @hikariuchiha977 2 года назад

    hi what is the difference in 4 applications? when u create the cluster there is 4 options, how do you know wchich one to select?

  • @dimitris_k2841
    @dimitris_k2841 Год назад

    god job mate

  • @andresnicolasrodriguezsanc4586

    thanks!

    • @FelixYu
      @FelixYu  Год назад

      Glad that u found it helpful

  • @shbhmara4915
    @shbhmara4915 2 года назад

    Very good

  • @user-nk6ov5rk5h
    @user-nk6ov5rk5h Год назад

    Great Video !! However, right now I think you do have to set up an IAM role for accessing your S3 bucket is it not ?

  • @user-tf4il6gs9w
    @user-tf4il6gs9w Год назад

    Thanks for the video ....can you help me with dependency files like the python uses other module ...how to go about that? when i want to submit with spark submit in EMR

  • @deenadayalmuli3364
    @deenadayalmuli3364 Год назад

    excallent

  • @poolput1
    @poolput1 2 года назад

    How to set application with database ?

  • @ParijatKar
    @ParijatKar Год назад

    When I am doing big data then I am really using B I G data. What are the challenges at the Peta Byte level?

  • @burakevrentug
    @burakevrentug 8 месяцев назад

    Felix, thank you for your sharing but the application interface is changed. So I can't do your application. Will you share your video updated version?

  • @josemanuelgutierrez4095
    @josemanuelgutierrez4095 Год назад

    I have a question , do you have any course explaining pyspark ?? or any recomendations maybe

  • @elitedecor8510
    @elitedecor8510 2 года назад

    If we terminate Amazon got going to charge money...what is the police to use it for free for practice purpose

  • @surajthallapalli4227
    @surajthallapalli4227 2 года назад

    Hi, the EMR outputs many files as part_0000,....and so on in S3. But i want just one output file after pre-processing it in spark EMR. How to do that?

    • @FelixYu
      @FelixYu  2 года назад +2

      U can do something like
      df.repartition(1).write.mode(“overwrite”).parquet(“locationPath”)

    • @surajthallapalli4227
      @surajthallapalli4227 2 года назад

      @@FelixYu Thanks a lot

  • @blosun641
    @blosun641 2 года назад

    When I try to use command "spark-submit" I get "-bash: spark-submit: command not found"
    Is there any solution to this ? I´m using Putty on Win

    • @FelixYu
      @FelixYu  2 года назад

      On the EMR creation (4:10 of the video), did u choose the last combination that has spark and Hadoop??

  • @viniciusfigueiredo6740
    @viniciusfigueiredo6740 Год назад

    Friend, how do I save Jupyter notebooks in my EMR?

    • @eshanpandey4186
      @eshanpandey4186 Год назад

      I guess there's an option to enable jupyterhub right when you are initializing the EMR cluster

  • @simrangulani4299
    @simrangulani4299 Год назад

    I don't have vs code installed. What should I do?

    • @FelixYu
      @FelixYu  Год назад

      U don’t have to use vs code to write the code. U can use any IDE for it. Or u can just google vscode to download it

  • @krishnasanagavarapu4858
    @krishnasanagavarapu4858 2 года назад

  • @raulgutierrez5862
    @raulgutierrez5862 2 года назад

    Excellent demo! What was the AWS cost of running this demo?

    • @FelixYu
      @FelixYu  2 года назад

      I didn’t check but prob less than $1

  • @chaitanyarishabh1857
    @chaitanyarishabh1857 2 года назад

    Good stuff, however seems like someone is sleeping and snoring in background 😴

  • @SahilP-yv4fv
    @SahilP-yv4fv Год назад

    who is snoring in the background