How to implement multi-threading in Databricks Notebook | Pyspark Tutorial | Step-By-Step Approach

Поделиться
HTML-код
  • Опубликовано: 1 фев 2025

Комментарии • 2

  • @Manikanta-n5v
    @Manikanta-n5v 17 дней назад

    I have a question on threadpool in spark. When we use threadpool executor, all threads are running on same node? Like only on driver node? Or Will it utilize all the workers in the cluster? Can you please clarify ?

    • @CognitiveCoders
      @CognitiveCoders  11 дней назад

      When you use threadpool executor, all threads are running on the same node, might run out of memory as well. o tackle your problem, can you try running each notebook as a separate process and create a Spark Context within that process. Please try using "subprocess" module in Python to spawn a new process for each notebook.