AWS Tutorials - Absolute Beginners Tutorial for Amazon EMR

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024

Комментарии • 91

  • @RaguM-k5z
    @RaguM-k5z 2 года назад +3

    Very crisp & clear!! easy to understand :) “If you can’t explain it simply, you don’t understand it well enough.” - Albert Einstein

  • @bugfacedog44
    @bugfacedog44 Год назад +3

    My god dude..... This was fantastic! Explained on high-level but then you actually followed through and covered specific concrete examples

  • @akkijaiho
    @akkijaiho 3 года назад +5

    Many Thanks.. you are simply superb... one of the best resources available on internet...best part of all workshops you share is its always having practical content... truly appreciable...Many Thanks...

  • @saaransh01
    @saaransh01 10 месяцев назад

    wow wow wow ... just awesome Sir... Thank you so much for this beautiful time consuming job for all the beginners to learn from your knowledge... Thank you once again🙏🙏

  • @evanwang2514
    @evanwang2514 2 года назад +1

    This is the best tutorial I have seen

  • @smog1980jr
    @smog1980jr 2 года назад

    Great introductory tutorial to AWS EMR. After watching your tutorial I now have some knowledge about EMR. Thanks a lot.

  • @HareshRCPatel
    @HareshRCPatel 2 года назад +1

    Excellent presentation described in simple language. Really appreciate your effort.

  • @sukanyaraja816
    @sukanyaraja816 2 года назад +1

    Its really good session as a beginner i learned many things thank u soo much

  • @sspk1973
    @sspk1973 2 года назад +1

    Very good tutorials and demos.

  • @arjunaare4544
    @arjunaare4544 3 года назад +2

    Awesome😊... Really helped alot... Looking one more session on read write hbase table from spark in EMR along with version compatibility...

  • @ARUNKUMAR-gf3zv
    @ARUNKUMAR-gf3zv 3 года назад +2

    Great job. Exactly what I needed. Thanks a ton

  • @ankan1627
    @ankan1627 3 года назад

    great content. focuses on the basics and gets into the right level of details. amazing job !

    • @ankan1627
      @ankan1627 3 года назад

      I would love for you to do a pyspark tutorial.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад +1

      I already have a pyspark tutorials. Please check my channel.

  • @abhisheknagappanavar2290
    @abhisheknagappanavar2290 Год назад +1

    Masterpiece content

  • @dorinxtg
    @dorinxtg Год назад

    Great video!
    One small correction: it's Jupyter Notebook

  • @ghay3
    @ghay3 3 года назад +2

    Amazing, thanks for the great introduction!

  • @hsz7338
    @hsz7338 3 года назад +5

    Once again, this is a great tutorial. Thank you. I was wondering what is your view on running Spark ETL on both AWS Glue and Amazon EMR Spark cluster, what would be your preference between these two services assume the AWS cost isn't of concern?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад +12

      if you keep cost aside - the primary difference is -
      1. Glue is Serverless . EMR is IaaS
      2. Glue has scheduling, workflow mechanism in place. EMR needs support from other services like CloudWatch and StepFunctions.
      3. Glue support scala, pyspark and python shell only. EMR support wider frameworks such as Hive, Pig and HBase.
      So, my recommendation is to use Glue if working around scala, python and pyspark. But if you are using Hiv or Pig like programs, EMR is the choice. Hope it helps,

    • @hsz7338
      @hsz7338 3 года назад +1

      @@AWSTutorialsOnline Agree 100%.

    • @Mustafa-yk8lk
      @Mustafa-yk8lk 2 года назад

      @@AWSTutorialsOnline How u can chose now between glue and emr ? bc they both serverless now

  • @sakinafakhri1320
    @sakinafakhri1320 3 года назад +2

    Very informative video, please do tutorial for Glue and Athena as well

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад +2

      There many videos on Glue and Athena in my channel. If you want any specific topic which is not there, please let me know.

  • @michellesantos435
    @michellesantos435 3 года назад +1

    Very helpful and informative

  • @KS-ni7vv
    @KS-ni7vv 2 года назад

    good job! i liked it a lot, keep doing an awesome job!

  • @sumitkumarsah8782
    @sumitkumarsah8782 2 года назад

    Really Useful. Thanks for sharing the knowledge😃

  • @rupeshdeoria1
    @rupeshdeoria1 3 года назад +2

    Thanks sir making such video..

  • @sathyajithputtaiah
    @sathyajithputtaiah 3 года назад

    Awesome tutorial! great work! thank you

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Glad you like it!

    • @sathyajithputtaiah
      @sathyajithputtaiah 2 года назад

      @@AWSTutorialsOnline any idea how do i download jupyter notebook running in EMR as .py file so that can be uploaded to s3?

  • @alltheblessingswithbusisos2576
    @alltheblessingswithbusisos2576 2 года назад +1

    This is amazing 👏🏽

  • @lakshmanmanu2965
    @lakshmanmanu2965 2 года назад +1

    Sir can we get a dedicate playlist to master EMR or any other open source resources for more help to learn from scratch like you instructed here with the pattern of teaching new things and implementing at the same time, if possible plesas prepare a dedicated EMR targeting playlists.
    Jai Hind

  • @dheerajsharma1036
    @dheerajsharma1036 3 года назад +1

    It was quite informative 👍

  • @user-sw6cg1de5g
    @user-sw6cg1de5g 2 года назад +1

    Hi first of all thank you for this video.
    my question is while i successfully created cluster and notebook but my jupytor notebook says kernel error. unable to solve it. my cluster is ready to use.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад

      Try restarting notebook kernel. It generally fixes any issue.

  • @scriptbeesdem
    @scriptbeesdem Год назад +1

    great content. but someone can tell me how to fetch input parameters in the notebook when EMR notebook being hit through boto3 or any backend language

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  Год назад

      Not sure I get the question. Why would you call notebook using boto3 to the job? if you want some data processing; simply create EMR task and submit it. Hope it helps.

  • @pokeshoot
    @pokeshoot 3 года назад +1

    Excellent👍👍👏

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Thank you very much

    • @pokeshoot
      @pokeshoot 3 года назад

      @@AWSTutorialsOnline do you also teach bigadta on cloud any such program available if so please message me

  • @marian6040
    @marian6040 2 года назад

    How can i create a cluster with emrfs in stead of hdfs? Great video btw.

  • @dhanraj429
    @dhanraj429 2 года назад +1

    Awesome video. from where we can download the jar file?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  2 года назад +1

      I don't think you can. It is located on the Amazon EMR AMI for your cluster.

  • @jovelynobias5422
    @jovelynobias5422 7 месяцев назад

    What to choose under "New" option if I will be doing Scala code in Spark instead of python?

  • @billykovalsky8149
    @billykovalsky8149 3 года назад

    this is brilliant, thank you

  • @catchritesh2007
    @catchritesh2007 3 года назад +1

    Good work

  • @udayb6171
    @udayb6171 3 года назад +1

    its very helpful

  • @vivekkumargoel2676
    @vivekkumargoel2676 2 года назад

    How we can use Presto with Emr ? If you can share a document or tutorial I can refer ?

  • @gaurav___18
    @gaurav___18 3 года назад +1

    thanks

  • @emraanpathan767
    @emraanpathan767 3 года назад +2

    can plz give workshop on aws emr hadoop and presto

  • @neelchandarana8122
    @neelchandarana8122 3 года назад +1

    I tried the workshop by myself. I followed all the steps carefully. When I tried PySpark programming for running tasks using Notebook; I click on run and nothing happens. I do not see anything in the output folder. Please help

    • @neelchandarana8122
      @neelchandarana8122 3 года назад

      I tried using the EMR task too I am getting status as failed.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      for the step 5 when you write code in Jupyter notebook. Can you please share the output of the each of the code statements you are running. That might give me some clue.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Also send me screen shot of customers.csv file stored in S3 bucket.

    • @neelchandarana8122
      @neelchandarana8122 3 года назад

      @@AWSTutorialsOnline I tried again. I tried the first line of code(to import library). I copied the code and clicked run(as per steps in the tutorial), it does not give me any output and directly jumps to the new line.

    • @neelchandarana8122
      @neelchandarana8122 3 года назад

      @@AWSTutorialsOnline I am not able to share the screenshot here.

  • @arjunaare4544
    @arjunaare4544 3 года назад +1

    What is the best way to load aws glue catalog data into rds(postgesql)?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Please help me understand - you have data in S3 which you have cataloged in Glue. You want to move data to RDS (postgresql). Is this the requirement?

    • @arjunaare4544
      @arjunaare4544 3 года назад +1

      @@AWSTutorialsOnline yes ,my requirement is i need to insert the data to my rds table from the catalog table as source...

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Hi - I published a workshop which can help you. Here is the link -
      aws-glue-pyspark-lab.s3-website-eu-west-1.amazonaws.com/labs/
      It talks about working with Glue Data Catalog and Redshift cluster. But the same code can be used with Postgresql as well. Hope it helps.

    • @arjunaare4544
      @arjunaare4544 3 года назад

      @@AWSTutorialsOnline thanks for the inputs. But, if we use jdbc connection in dynamic frame to write the data into rds will get performance issues. Is there any way to do this?

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Why you get performance issue? Have you noticed anything like that?

  • @kameshn1297
    @kameshn1297 2 года назад

    Fi

  • @rishi3311
    @rishi3311 3 года назад +1

    Suppose I have 1 Master and 1 Core Node in EMR. [ df = spark.read.csv("s3://...../demo.csv") ] I submit this task in EMR. After executing this line of code I should have data in the dataframe. But is that demo.csv data getting saved in HDFS also? If yes, then how can I find that demo.csv data in HDFS. And if no, then where does the data store after reading from S3.

    • @AWSTutorialsOnline
      @AWSTutorialsOnline  3 года назад

      Sorry Rishi, I somehow missed your comment. Apologies for that. The dataframe data is stored in HDFS and dataframe is a way to access the data. Dataframe provides a lazy load mechanism to access and process data stored in HDFS.