Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Поделиться
HTML-код
  • Опубликовано: 29 сен 2019
  • #Spark #Persist #Broadcast #Performance #Optimization
    Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more
    Click here to subscribe : / @techwithviresh
    About us:
    We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.
    Mastering Spark : • Spark Scenario Based I...
    Mastering Hive : • Mastering Hive Tutoria...
    Spark Interview Questions : • Cache vs Persist | Spa...
    Mastering Hadoop : • Hadoop Tutorial | Map ...
    Visit us :
    Email: techwithviresh@gmail.com
    Facebook : / tech-greens
    Twitter :
    Thanks for watching
    Please Subscribe!!! Like, share and comment!!!!
  • НаукаНаука

Комментарии • 40

  • @RohanKumar-mh3pt
    @RohanKumar-mh3pt 11 месяцев назад +1

    Very Nice and clear explanation before this video i was very confused regarding executor tuning part now after this video it is now crystal clear.

  • @TheFaso1964
    @TheFaso1964 3 года назад +3

    Dude. I feel like I knew nothing about spark in particular before I got my hands dirty with your performance improvement solutions.
    Appreciate a lot, got my subscription. Cheers from Germany !

  • @nivedita5639
    @nivedita5639 3 года назад +1

    Very very helpful. Thanks

  • @ranju184
    @ranju184 3 года назад

    excellent explanation. Thanks

  • @aneksingh4496
    @aneksingh4496 4 года назад +1

    As always best !!! Please include some real simulation example s

  • @sankarn6016
    @sankarn6016 3 года назад +2

    Nice Explanation!!
    can we use this approach for tuning/triggering multiple jobs in cluster ??

  • @fahad_ishaqwala
    @fahad_ishaqwala 4 года назад +2

    Excellent videos brother. Much Appreciated. Can you do a video on Performance Tuning for Spark Structured Streaming jobs as well.

    • @TechWithViresh
      @TechWithViresh  3 года назад

      Surely, Working on a video for the same.

  • @whatever-genuine7945
    @whatever-genuine7945 2 года назад

    How to allocate executers, core and memory if there are multiple jobs running on the cluster?

  • @DilipDiwakarAricent
    @DilipDiwakarAricent 3 года назад

    If not configure , so what will be the default number choose by spark.

  • @KNOW-HOW-HUB
    @KNOW-HOW-HUB 2 года назад

    To process 1TB data what could be the best approach we have to follow

  • @giyama
    @giyama 4 года назад +2

    This calculation is for just one job, what would be the calculation for multiple jobs running simultaneously?
    And how to calculate based on the volumetry?
    (Great job btw, tks!)

    • @SidharthanPV
      @SidharthanPV 4 года назад +1

      Dynamic allocation is currently supported. You can set the max limit, yarn takes care of managing it in case of multiple instances running parallel.

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa 4 года назад +1

    thanks bro , really wonderful explanation.... bro , can you make some vid on how to analyze Stages , Physical Plans etc on SparkUI ...based on that how to fix the issues regarding optimization ... its always confusing a lot to interpret these sql explain plans?

    • @TechWithViresh
      @TechWithViresh  4 года назад

      Thanks very much, check out the video on stage details

    • @SpiritOfIndiaaa
      @SpiritOfIndiaaa 4 года назад

      @@TechWithViresh i dont find it, any url plz

  • @manisekhar4446
    @manisekhar4446 3 года назад

    According to your eg. How much GB if data can be processed by spark job??

  • @snehakavinkar2240
    @snehakavinkar2240 3 года назад

    How to decide these configurations for a certain volume of data? Thank you.

    • @TechWithViresh
      @TechWithViresh  3 года назад

      idea is to make sure max 5 tasks per executor, and the partition size is within the memory allocated to exec

  • @mdmoniruzzaman703
    @mdmoniruzzaman703 Год назад

    Hi, 10 nodes means including the master node?
    i have a configuration like this:
    "Instances": {
    "InstanceGroups": [
    {
    "Name": "Master nodes",
    "Market": "SPOT",
    "InstanceRole": "MASTER",
    "InstanceType": "m5.4xlarge",
    "InstanceCount": 1
    },
    {
    "Name": "Worker nodes",
    "Market": "SPOT",
    "InstanceRole": "CORE",
    "InstanceType": "m5.4xlarge",
    "InstanceCount": 9
    }
    ],
    "KeepJobFlowAliveWhenNoSteps": false,
    "TerminationProtected": false
    },

  • @anusha0504
    @anusha0504 4 года назад

    What are advanced spark technologies

  • @user-vl1ld3be3n
    @user-vl1ld3be3n 8 месяцев назад

    What if I have multiple spark jobs in parallel in on spark session

  • @umeshkatighar3635
    @umeshkatighar3635 Год назад

    What If each node has only 8cores?? How does spark allocate 5cores per jvm ?

  • @snehakavinkar2240
    @snehakavinkar2240 4 года назад

    Is there any upper or lower limit to the amount of memory per executor?

    • @TechWithViresh
      @TechWithViresh  4 года назад

      depends on the total memory resource available in your cluster.

  • @inferno9004
    @inferno9004 4 года назад +2

    @5:10 can you explain how 20GB + 7% of 20GB is 23GB and not 21.4GB ?

    • @rockngelement
      @rockngelement 3 года назад

      calculation mistake bhai, anyway it doesn't affect the info in this video

  • @sivavulli7487
    @sivavulli7487 3 года назад +1

    Hi Sir , thank you for your nice explanation but if only one job is running over the cluster , that is more meaningful and understandable ..what if there are so many jobs running on the same cluster ??

    • @TechWithViresh
      @TechWithViresh  3 года назад

      Based on the executor params passed for the each , that defines the container boundaries or running scope for that.If there are not enough resources available to be allocated, then that job(s) would be in queue.

    • @sivavulli7487
      @sivavulli7487 3 года назад

      @@TechWithViresh so executor core can run only one job task at a time ..so if that is the case , in your examples , if there are 2 jobs on the same cluster, we need to take half of the resources mentioned in that video or better to take whatever you mentioned ..then first job runs successfully , it will take second job??( Until first job completed, second will be in queue).. could you please suggest best approach...alltogather before giving spark resource configurations for any job , just if we look at the cluster configuration is enough or need to look at how many other jobs running on the same cluster??

    • @TechWithViresh
      @TechWithViresh  3 года назад

      @@sivavulli7487 Yes, we should take into account, how many concurrent jobs need to be run .How better approach followed these days to have interactive clusters for each job..

    • @sivavulli7487
      @sivavulli7487 3 года назад +1

      @@TechWithViresh okay ..thank you sir ..if possible , pls can you make a video how to give the resources if there are multiple concurrent jobs running on the same cluster...

  • @rikuntri
    @rikuntri 4 года назад

    One executor is having four core so it can handle one task or 4 at a time

  • @girijapanda1306
    @girijapanda1306 2 года назад

    7% of 21GB = 1.4 GB am I missing something here

  • @KiranKumar-cg3yg
    @KiranKumar-cg3yg Год назад +1

    Means what I know is nothing.

  • @RAB-fu4rw
    @RAB-fu4rw 3 года назад

    7% of 21 gb is 3gb ????? how come it is 1.47 GB how did u arrive at 3 GB ???

    • @komalkarnam1429
      @komalkarnam1429 3 года назад

      Yes had the same question

    • @divyar7991
      @divyar7991 Год назад

      For yarn you can choose between 6 to 10 per