Processing 25GB of data in Spark | How many Executors and how much Memory per Executor is required.

Поделиться
HTML-код
  • Опубликовано: 26 сен 2024
  • #pyspark #azuredataengineer #databricks #spark
    Use the below link to enroll for our free materials and other course.
    www.cleverstud...
    You can talk to me directly on Topmate by using the below link:
    topmate.io/nar...
    Follow me on LinkedIn
    / nareshkumarboddupally
    -----------------------------------------------------------------------------
    Clever Studies Official WhatsApp Group joining link:
    Clever Studies 2.0: chat.whatsapp....
    Clever Studies: chat.whatsapp.... (Full)
    --------------------------------------------------
    Follow this link to join 'Clever Studies' official telegram channel:
    t.me/+eMaiZNWT...
    --------------------------------------------------
    Facebook: www.facebook.c...
    Instagram: / cleverstudiesindia
    PySpark by Naresh playlist:
    • PYSPARK BY NARESH
    --------------------------------------------------
    Realtime Interview playlist:
    • How To Explain Project...
    --------------------------------------------------
    Apache Spark playlist:
    • How Spark Executes A P...
    --------------------------------------------------
    PySpark playlist:
    • PySpark | Tutorial-9 |...
    Hello Viewers,
    We ‘Clever Studies’ RUclips Channel formed by group of experienced software professionals to fill the gap in the industry by providing free content on software tutorials, mock interviews, study materials, interview tips, knowledge sharing by Real-time working professionals and many more to help the freshers, working professionals, software aspirants to get a job.
    If you like our videos, please do subscribe and share within your circle.
    Contact us: cleverstudies.edu@gmail.com
    Thank you!

Комментарии • 33

  • @shafimahmed7711
    @shafimahmed7711 9 дней назад

    Your explanation of the Spark cluster and memory configurations was excellent. I really appreciate it!

  • @anirudh2704
    @anirudh2704 Месяц назад +2

    Good explanation. Spark is all about good resource allocation or use and optimization

  • @sureshpujari2510
    @sureshpujari2510 2 месяца назад +1

    Awesome explanation

  • @anubhavsingh2290
    @anubhavsingh2290 6 месяцев назад +1

    Simple explanation
    Great sir 🙌

  • @shivamchandan50
    @shivamchandan50 6 месяцев назад +7

    plz make video on pyspark unit testing

  • @sravankumar1767
    @sravankumar1767 Месяц назад

    Superb explanation 👌 👏 👍

  • @arindamnath1233
    @arindamnath1233 5 месяцев назад

    Wonderful Explanation.

  • @yadi4diamond
    @yadi4diamond 6 месяцев назад

    You are simply superb.

  • @tanushreenagar3116
    @tanushreenagar3116 3 месяца назад

    perfect video sir

  • @aditya9c
    @aditya9c 6 месяцев назад +1

    If num of partition is 200 ... And so it the number of core required ... So core size is 128mb ... Right ?
    Then how in 3rd block core size turn to 512mb and thus executer is then 4*512 ????

    • @PravinUser
      @PravinUser 3 месяца назад

      in each core memory should be minimum 4 times of data it is going to process(128mb) roughly it should be minimum 512 mb of memory.

  • @VikasChavan-v1c
    @VikasChavan-v1c 6 месяцев назад

    for example you are assigning 25 executors instead of 50 then in each executors there will be 8 cores and parallel task will be run(25*8). Then also it will take 5 mins only to complete the job then how 10min. can you please explain this point once again?

    • @vamshi878
      @vamshi878 6 месяцев назад

      For each executor 2-5 cores should be there, so he is saying he is going to take 4 this number is fixed, if the data size increased or increased

  • @shibhamalik1274
    @shibhamalik1274 5 месяцев назад

    There are 200 cores in total . Each core will use one partition at a time so will use 128MB
    Each executor has 4 core so each executor requires 4*128 MB which is 512 mb. Where does extra 4 multiplier came from ?😊

    • @bhanuprakashtadepalli7248
      @bhanuprakashtadepalli7248 5 месяцев назад

      by default, to process a file in one core, we need 4 times the file size memory.

    • @anirudh2704
      @anirudh2704 Месяц назад

      Spark is in memory processing. So it requires min 512mb of memory to perform cache, persist, shuffling and overhead tasks. 1 core handles 1 block of data.

  • @kingoyster3246
    @kingoyster3246 5 месяцев назад

    what if we have limited resource? what configuration would you recommend to process 25GB? (16 cores and 32GB)

    • @paulinaadamski8233
      @paulinaadamski8233 4 месяца назад +1

      You would have to choose between an increased partition size or lowered parallelism with an increased number of partitions.

  • @dineshughade6741
    @dineshughade6741 5 месяцев назад

    Zuper

  • @Rakesh-q7m8r
    @Rakesh-q7m8r 6 месяцев назад

    Hi,
    Does the same study applies if we are working in Data Bricks?

  • @kamatchiprabu
    @kamatchiprabu 5 месяцев назад

    Sir,I want to join Job ready program.How to join .Link is not enabled.pls help

    • @cleverstudies
      @cleverstudies  5 месяцев назад

      Sorry, we are not conducting CSJRP sessions at present. Please check our website www.cleverstudies.in for more details.

  • @shibhamalik1274
    @shibhamalik1274 5 месяцев назад

    Is it that each core would take 4 * partition size memory ?

    • @anirudh2704
      @anirudh2704 Месяц назад

      1 executor.the best configuration is1 executor = 4 cores, 512 mb
      There's concept of fat and thin executors

  • @Amarjeet-fb3lk
    @Amarjeet-fb3lk 4 месяца назад

    What is use of giving each core 512 mb,if blcok size is 128 MB.
    Each block process on a single core,so if each block is 128 mb, why we should give 512mb
    To each core?
    There will be wastage of memory,Am I right?
    Please explain this.
    Thanks

    • @debayanmitter
      @debayanmitter Месяц назад

      The memory is for processing, not for storage.

    • @anirudh2704
      @anirudh2704 Месяц назад

      The min req of executor is 4-5 cores and 512 mb memory. 1 core can handle 1 block data. And as spark is in memory processing so it requires memory space for cache, persist, shuffling etc

  • @Fresh-sh2gc
    @Fresh-sh2gc 6 месяцев назад

    In my company the cpu per executor is 5 min and 8 max.

    • @cleverstudies
      @cleverstudies  6 месяцев назад

      It depends on the use case and resources availability.

    • @Fresh-sh2gc
      @Fresh-sh2gc 6 месяцев назад

      @@cleverstudies depends on cluster. We have a state of the art one over $1b data center that can support high cpu’s per executor