Machine Learning in Production with Airflow

Поделиться
HTML-код
  • Опубликовано: 15 сен 2024

Комментарии • 13

  • @chrisogonas
    @chrisogonas Год назад +3

    That was an excellent illustration. Superb!

  • @ministryNoiz
    @ministryNoiz 2 года назад

    Thanks. Exciting topic. How reasonable is it to run long-running ML processes in airflow?

    • @Astronomer
      @Astronomer  2 года назад

      Hi, Oleg! If and when possible, the process should be broken up into separate tasks. The crucial aspect of the decision is where the compute will actually happen. It would be best for Airflow to submit the ML process as a compute service and then retrieve back the results. For this, the Airflow task doesn't have to be "always-on," so you don't have to be limited by Airflow's constraints. If you would like to discuss the matter further, you can set up a meeting with us and get the support you need: www.astronomer.io/office-hours

  • @JosephRivera517
    @JosephRivera517 Год назад

    I love the presentation. would you mind to share with me your presentation? Thanks.

    • @Astronomer
      @Astronomer  Год назад

      Hey Joseph, definitely, would you mind emailing me and I'll send it over that way? My email is george.yates@astronomer.io

  • @ryank8463
    @ryank8463 5 месяцев назад

    Hi, this video is really beneficial. I have some question about the best practive of handling data transmission btw tasks. I am building MLops using airflow. In my model training dag, it contains data preprocess-> model training. So there would be massive data transmission btw this 2 dags. I am using Xcom to transmit data btw them. But there's like a 2G limitation in Xcom. So what's the best practice to deal with this problem? Using a S3 to sned/pull data from tasks? Or should I simply combine these 2 tasks(data preprocess-> model training)? Thank you.

    • @Astronomer
      @Astronomer  5 месяцев назад

      Thank you! For passing larger amounts of data between tasks you have two main options: a custom XCom backend or writing to intermediary storage directly from within the tasks.
      In general we recommend a custom XCom backend as a best practice in these situations, because you can keep your DAG code the same, the change happens in how the data sent to and retrieved from XCom is processed. You can find a tutorial on how to set up a custom XCom backend here: docs.astronomer.io/learn/xcom-backend-tutorial.
      Merging the tasks is generally not recommended because it makes it harder to get observability and rerun individual actions.

    • @ryank8463
      @ryank8463 5 месяцев назад

      @@Astronomer Hi, Thanks for your valuable reply. I would also like to ask what level of granularity should we aim for when allocating tasks. Since the more tasks there are, the more push/pull data from the external storage happens, and when the data is large, it brings some level of network overhead.

  • @mohamedchafiq7793
    @mohamedchafiq7793 Год назад

    could you share with us the presentation?

    • @Astronomer
      @Astronomer  Год назад

      Sure, just email me george.yates@astronomer.io and I'll send it over!