Tuning Apache Spark for Large Scale Workloads - Sital Kedia & Gaoxiang Liu

Поделиться
HTML-код
  • Опубликовано: 26 июл 2024
  • Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you'll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload.
    You'll also learn about Facebook's new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.
    Session hashtag: #SFexp1
    Session overview:
    - Apache Spark at Facebook
    - Spark Architecture
    - Scaling Spark Driver
    - Dynamic Executor Allocation
    - Multi-threaded event processor
    - Better fetch failure handling
    - Scaling Spark Driver
    - executor memory layout
    - Tuning memory configurations
    - Eliminating disk i/o bottleneck
    - Scaling external shuffle service
    - Cache index files on shuffle server
    - Scaling external shuffle service
    - Application tuning
    - motivation
    - Auto tuning of mapper and reducer
    - Tools
    - Resources
    - Questions?
    Sign up for a 1-day course on Apache Spark Tuning and Best Practices: bit.ly/2I0KMcj
    About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
    Read more here: databricks.com/product/unifie...
    Connect with us:
    Website: databricks.com
    Facebook: / databricksinc
    Twitter: / databricks
    LinkedIn: / databricks
    Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...
  • НаукаНаука

Комментарии • 8

  • @JoHeN1990
    @JoHeN1990 4 года назад +8

    FYI, when you hear executor “um”, he meant executor OOM

  • @kushagraverma7855
    @kushagraverma7855 6 лет назад +3

    Slides: www.slideshare.net/databricks/tuning-apache-spark-for-largescale-workloads-gaoxiang-liu-and-sital-kedia
    Thanks guys, wonderfully helpful talk !!

  • @oricoil
    @oricoil 5 лет назад +1

    Wow, awesome! Thank you!!

  • @mnbvcxzzxcvbnm
    @mnbvcxzzxcvbnm 6 лет назад

    Thank you. It helps.

  • @nelsonjma
    @nelsonjma 6 лет назад +1

    nice presentation mate. thanks for the information.

  • @VishwajeetPol
    @VishwajeetPol 5 лет назад +11

    Could have been better if he could have explained why they come to the conclusion on the numbers with before and after scenarios while setting the parameter values.
    And a demo would have been much better to see how the cluster works with before and after values for spark.executor.cores, spark.executor.memory, spark.driver.memory, spark.driver.cores and spark.executor.instances rather than dynamic allocation set to true with min and max values for executor instances.

  • @sandycheeks6001
    @sandycheeks6001 3 года назад +5

    we really stalking him aren’t we…