Building an ML Platform from Scratch: Live Coding Session // Alon Gubkin // MLOps Meetup #67

Поделиться
HTML-код
  • Опубликовано: 4 авг 2024
  • MLOps community tutorial on how to build an MLOps platform. MLOps Community Meetup #67! Last Wednesday we talked to Alon Gubkin, VP R&D at Aporia.
    //Abstract
    This workshop will teach you how to set up an ML platform based on open-source tools like Cookiecutter, DVC, MLFlow, FastAPI, Pulumi, GitHub Actions, and more.
    Alon explains each tool in an intuitive way and the problem it solves and then builds a useful platform that combines all of them.
    All codes are available on GitHub so you'll be able to integrate them into your existing work easily.
    In this live coding session, Alon demonstrates how you can easily build your own custom ML Platform using open source tools such as Pulumi, MLFlow, KFServing, and more.
    All codes are available here:
    github.com/aporia-ai/mlplatfo...
    //Bio
    Alon Gubkin is the VP R&D at Aporia. Writing code from the age of 7, Alon has an unhealthy obsession with programming and a huge love for mixing engineering with data science. When he isn't coding, Alon enjoys playing video games and musing about biology.
    -------------- ✌️Connect With Us ✌️ ------------
    Join our slack community: go.mlops.community/slack
    Follow us on Twitter: @mlopscommunity
    Sign up for the next meetup: go.mlops.community/register
    Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: mlops.community/
    Connect with Demetrios on LinkedIn: / dpbrinkm
    Connect with Alon on LinkedIn: / alongubkin
    Timestamps:
    [00:00] Introduction to Alon Gubkin
    [03:10] Introduction to Building an ML Platform
    [04:33] Blueprint for an ML Platform
    [05:26] What are we going to build today?
    [06:25] "It's really important not to take the tools for granted and evaluate the alternatives. It's important to evaluate your problems and your needs to choose the best solutions and tools for the job."
    [07:05] MLOps.toys
    [07:40] On Kubernetes...
    [10:50] AWS services we're going to use
    [12:35] Architecture of what we will build
    [17:06] Infrastructure
    [18:00] Infrastructure as Code - Pulumi
    [20:30] Let's get started!
    [21:14] ML-Infra
    [22:53] Pre-requisites for AWS
    [27:20] Model Metadata - Pulumi
    [32:15] Alon's favorite plug-ins for code editor
    [33:20] More than one cluster via Pulumi and auto-scaling
    [36:30] Creating Artifact Storage
    [37:52] Installing MLFlow
    [40:25] Connecting MLFlow to Artifact Storage
    [45:36] Configuring service account
    [48:00] Installing Traefik Helm Chart
    [49:00] Traefik Route
    [50:06] MLFlow Route
    [50:36] Route 53
    [53:44] Pulumi keeping track of the current state
    [55:05] Recap
    [55:50] Model template that can be cloned
    [56:55] Python
    [57:00] Poetry
    [58:17] Poetry supporting dependencies
    [1:02:21] Copy the training code data sets
    [1:02:49] Divide the model package into 2 packages - Training Package/Code and Serving Package/Code
    [1:03:09] "When creating the model packages, it's important to think of the problems and consider the best structure for the models."
    [1:04:04] Creating Skips
    [1:04:55] Integrating MLFlow into the training code
    [1:05:20] Enabling auto logins
    [1:06:05] MLFlow Run
    [1:07:02] Train the Model
    [1:08:07] Recap
    [1:10:08] Serving the Model
    [1:12:08] Uvicorn
    [1:12:38] Load model into MLFlow
    [1:15:20] Prediction
    [1:15:56] Probability
    [1:16:32] Write the processing code
    [1:17:23] Skip for Serving
    [1:17:46] Run the Serving Code
    [1:18:16] Testing
    [1:19:41] Recap
    [1:21:17] Creating CI/CD pipeline
    [1:22:00] 3 Commands - Install Dependencies, Train the Model, Deploy the Model Using Pulumi
    [1:24:45] Pulumi Infrastructure for the Model Template
    [1:25:54] Deploy the Fast API's to Pulumi
    [1:26:51] Build and Push Docker Image
    [1:27:30] Connect to Kubernetes
    [1:28:17] Reference to the previous shared infrastructure
    [1:28:55] Connect Kubernetes to MLInfra - Export Kubeconfig
    [1:29:58] Triton imprint server
    [1:32:34] Output of Pulumi
    [1:34:20] Traefik Route
    [1:35:38] Recap
    [1:36:29] Create Service Account for models
    [1:37:27] Github Action
    [1:38:52] Commit then Deploy to master branch
    [1:40:38] Recap
    [1:40:58] Use DVC to create a storage for data version
    [1:44:24] Export data storage
    [1:45:00] Deploy model to Kubernetes
    [1:46:16] Compile docker image
    [1:46:45] Add remote to DVC
    [1:47:27] Add data set to DVC
    [1:49:05] Kubernetes deployment
    [1:50:19] Convert model to cookie cutter
    [1:53:50] What's missing?
    [1:54:54] Use of DVC with MLFlow
    [1:55:23] "The use of DVC with MLFlow depends on your needs and what your problems are."
    [1:56:12] "Don't be afraid to use a tool and replace it to something else. It happens a lot."
  • НаукаНаука

Комментарии • 18

  • @animadurkar179
    @animadurkar179 Год назад +1

    Oh man, this is exactly what I was looking for recently. Awesome walkthrough!

  • @eitkocat
    @eitkocat 3 года назад +3

    This is great!

  • @polarstar3498
    @polarstar3498 2 года назад +4

    Awesome platform! And nice solution for infrastructure!

    • @MLOps
      @MLOps  2 года назад

      Yesss!

  • @francismumbi49
    @francismumbi49 5 месяцев назад +1

    The intro was awesome....

    • @MLOps
      @MLOps  5 месяцев назад

      yaaaaaah!

  • @prabhakarray3925
    @prabhakarray3925 2 года назад +1

    great workshop! Thank you for arranging this. I was wondering if we managed to narrow down the reason behind the 502 Bad gateway error (1:49:56) in the end. I am kind of stuck at that point.

  • @SergioTobal
    @SergioTobal 3 года назад +3

    Very nice indeed, is possible to arrange the second part to build the missing components?

    • @MLOps
      @MLOps  3 года назад +7

      Yep we are on it! happening on august 17th!

    • @vulnerablegrowth3774
      @vulnerablegrowth3774 3 года назад

      @@MLOps Can't wait!

  • @phox8047
    @phox8047 2 года назад +2

    great video and really thanks. I had followed and run this video I have questions.
    1. How to check out my Zone ID? in this video he just paste some ID but I can't catch how to check and where it is
    2. I'm using Window OS so I had installed Poetry follow the docs but on the cmd poetry can't use

    • @alongubkin
      @alongubkin 2 года назад +1

      You're welcome! :)
      1. The Zone ID can be copied from the Route53 service in AWS console. Note that this is only relevant if you actually use Route53 as your DNS provider.
      2. Can you elaborate?

  • @cryptohorizon5970
    @cryptohorizon5970 Год назад

    How do we test if MLFlow has been installed without using an DNS server?

  • @shabiehsaeed8633
    @shabiehsaeed8633 Год назад

    Since AWS is not allowing Kubernetes 1.21, some of the code is breaking. I think we need to change the traefik helm chart to reflect the incompatibility of the graph. I was able to run this with V1.21, but not any longer with v1.24. If anyone figures out what changes are needed to get this to run again, can you please let me know?

    • @shabiehsaeed8633
      @shabiehsaeed8633 Год назад +1

      I have added the resolution for above a a response to an issue "Unable to deploy" on the GitHub repo. Please take a look if you're facing similar issues

  • @liucharles2973
    @liucharles2973 2 года назад +1

    can you share the slides

    • @alongubkin
      @alongubkin 2 года назад

      Will do :) In the meantime, checkout the blog post version of this workshop: www.aporia.com/blog/building-an-ml-platform-from-scratch/

    • @liucharles2973
      @liucharles2973 2 года назад

      @@alongubkin thx! excellent work.