How to create kubeflow pipeline from scratch | Live Demo | Machine Learning | Ashutosh Tripathi

Поделиться
HTML-код
  • Опубликовано: 17 окт 2024
  • How to create kubeflow pipeline from scratch | Live Demo | Machine Learning | Ashutosh Tripathi
    End to End Jupyter Notebook Explanation for Kubeflow pipeline building and executing
    Topics Covered:
    1. Python function needed to train and predict
    2. Creating components from python functions
    3. Initialise kubeflow pipeline
    4. define the pipeline function and put together all the components
    5. Mounting volume for component's output storage
    6. Compiling pipeline and generating yaml - it can be directly uploaded to kubeflow and create experiments and runs using UI
    7. Create run from pipeline function using the code
    8. How to disable cache to see the each steps output on second and successive runs
    Notebook link:
    github.dev/Tri...
    Part 2: CSV file passing between kubeflow components: • How to pass csv and da...
    If you find this video helpful, don't forget to like share and subscribe. This is how you can support me.
    Connect me:
    LinkedIn: / ashutoshtripathiai
    Instagram: / ashutoshtripathi_ai
    Twitter: / ashutosh_ai
    Website: ashutoshtripat...
    If you want to message me directly, then connect me on LinkedIn and send a DM.
    #machinelearning #kubeflow #mlops

Комментарии • 119

  • @AshutoshTripathi_AI
    @AshutoshTripathi_AI  Год назад +3

    Video on Kubeflow pipeline installation on windows:
    ruclips.net/video/LSvvIt2m1Jo/видео.html

  • @akshaykotawar5816
    @akshaykotawar5816 7 месяцев назад +1

    Thankyou sir iam looking for this topics from very long period

  • @BIZSURESH
    @BIZSURESH Год назад +2

    EXCELLENT ..YOUR TUTORIAL IS VERY HELPFUL FOR LEARNING ABOUT MLOPS.......BRO.....👌👌🙌🙌🙌🙌🙏🙏🙏🙏🙏

  • @kanakorn
    @kanakorn 6 месяцев назад

    Great job, I can run my first pipeline from this tutorial. Thanks.

  • @pradipkarad6837
    @pradipkarad6837 Год назад +2

    Thanks @AshutoshTripathi_AI ! Your contents are very much exciting and with full of knowledge. Can you please provide a video of full kubeflow components locally ?

  • @praveenkuthuru7439
    @praveenkuthuru7439 2 месяца назад

    Your work is really impressive. I have been following your videos and gaining a lot of knowledge. excellent work...keep it up!!!

  • @mbmathematicsacademic7038
    @mbmathematicsacademic7038 Месяц назад +1

    walking into the new week with MLOps as a new skill on my set of skills😎

  • @MsRAJDIP
    @MsRAJDIP Год назад +1

    Your way of explaining is really good.😊

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Год назад +2

    Thanks a lot Brother. One of the best videos on this concept. May you please do the same steps in GCP?

  • @KSANTOSHKUMAR-ge5xr
    @KSANTOSHKUMAR-ge5xr Год назад +2

    Excellent tutorial... Please make a video on Kubeflow installation.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      Here is the video link on kubeflow installation locally on windows: ruclips.net/video/LSvvIt2m1Jo/видео.html

  • @sajadsafarveisi4512
    @sajadsafarveisi4512 Год назад +1

    Thanks a lot for the tutorial (this one turned on my engine). One question. What if we want to create a component not from a function but from an instance of a custom resource? Assume that the instance kind is SparkApplication (with the associated operator already created under some namespace).

    • @Veer1516
      @Veer1516 10 месяцев назад

      If you have something in a spark app, why not just create a spark pipeline?
      Im actually asking, I wanna know the scenario in which you use both

  • @tushitdave9795
    @tushitdave9795 Год назад

    Good one, Thanks.. However can you tell me about kfp module. I have installed Kubeflow in my base environment however when I did open notebook and imported kfp it is not recognised , I did tried pip install kfp and kubeflow both on my Jupyter notebook. Please put some torch on this.

  • @mdowais4322
    @mdowais4322 4 месяца назад

    Hi Ashutosh thanks for master piece video, can you help me to understand about the storage. I want to use postgreSQL or any relational database how can I interact with relational database ?

  • @nissarahmad8545
    @nissarahmad8545 Год назад +1

    Nicely explained E2E flow

  • @RAKESHKUMARSINGH-tp7mk
    @RAKESHKUMARSINGH-tp7mk Год назад +1

    Great way to get introduced to Kubeflow Pipeline.
    Where can I get the source code for the example you have demonstarted. Kindly let us know. I would like to try it on my Kubeflow deployment.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      Hi, I have updated the description of the video with the notebook link. use the link to download the kf-pipeline notebook. let me know if you face any difficulty in downloading.

  • @Dr.SureshPanchal
    @Dr.SureshPanchal 10 дней назад

    do we need Kubernetes preinstalled?

  • @shivaprasad1277
    @shivaprasad1277 Год назад +1

    Hi @Ashutosh. Evrytime i run the pipeline in the Kubeflow. I am getting logs as "This step output is taken from cache." Can you please help me?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      You need to disable cache while creating the pipeline.
      def some_pipeline():
      # task is a target step in a pipeline
      task = some_op()
      task.execution_options.caching_strategy.max_cache_staleness = "P30D"

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      You can also refer this document
      www.kubeflow.org/docs/components/pipelines/v1/overview/caching/

  • @astrovedics
    @astrovedics Год назад +1

    Hello, I am new to this whole data science concept. So my questions can be silly. Can i setup model registry and Model Tracking UI on JFrog artifactory?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      As for i know jfrog is a repository manager where you can store docker images, handle CI CD. But I m not sure if we can use this for model registry. As I know it is not used for model regyand tracking purposes but need to be double checked.

  • @keerthigavenkatesh3806
    @keerthigavenkatesh3806 Год назад

    Can you please make a video of how you are managing data ( for image dataset) in the bucket and accessing them in the program and kubeflow, please!

  • @RaushanKumar-ut2ke
    @RaushanKumar-ut2ke Год назад +1

    Hi Ashutosh, You are reading csv file from Git. But when i am trying to read from Local directory then it is giving me error no such directory, i am using Xeroflow for this , is there a different way to read from local directory.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Actually while running the pipeline, your local directory is not accessible from inside the pod. Hence just keep the CSV in some online repo and read it.

  • @Sam-nn3en
    @Sam-nn3en Год назад

    Hello, in terms of comparison what did you find better to use kubeflow or MLflow. It seemed like kubeflow was hanging and was using extra resources. We haven't done heavy pipeline runs and was curious to know

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      Kubeflow I used for pipeline creation and mlflow for model registry. Kubeflow provides registry with minio but mlflow seems more user friendly and feature rich.

    • @Sam-nn3en
      @Sam-nn3en Год назад +1

      @@AshutoshTripathi_AI Thank you for sharing. Yes, that was very relevant from the other MLflow video you made. It does model serving with registry very nicely.

  • @camiloperez2376
    @camiloperez2376 10 месяцев назад

    Thanks for share!. Where is te doc 'IRIS_Classifier_pipeline1.yaml' for download?

  • @adilshaikh9123
    @adilshaikh9123 Год назад

    Sir as of now I have created the MLFLOW UI which is logging all the metrics and artifacts are exactly as shown in your previous MLFLOW video and on other hand I have written the separate Kubeflow pipeline code like done in this video and my pipeline is also created successfully. But how come I can Integrate MLflow as a part of Kubeflow as both are separate as of now???

  • @datasciencewitharbaaz5221
    @datasciencewitharbaaz5221 Год назад +1

    Hello Sir, very nice explanation. I have one doubt cant we use .py files rather than ipynb files? since I have an entire project. with different functionalities based on dataset.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +2

      Yes you can use .py file. Even in .ipynb file every chunk can be considered as a separate .py file

    • @datasciencewitharbaaz5221
      @datasciencewitharbaaz5221 Год назад +1

      @@AshutoshTripathi_AI can we do model versioning in kubeflow if yes then how sir, can you give an idea or any possible solution.

    • @mateopolancec8478
      @mateopolancec8478 Год назад

      @@datasciencewitharbaaz5221 use MLFlow for that look at my previous answer how to use MLFlow with KubeFlow pipelines.

  • @ShailendraMishra26
    @ShailendraMishra26 Год назад +1

    Hi Ashutosh,
    This video was very helpful. I am stuck on one point. Pls help.
    What is the process if we want to execute a task, after multiple task is executed. Is there any option in .after method to add more tasks. Any help would be greatly appreciated.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Do u mean to run tasks in a sequential manner?

    • @ShailendraMishra26
      @ShailendraMishra26 Год назад

      Yes

    • @ShailendraMishra26
      @ShailendraMishra26 Год назад

      @@AshutoshTripathi_AI could you please help one above ask?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@ShailendraMishra26 hi Shailendra, i replied to your above question. I did not understood what you exactly mean. Do you mean to run your task sequentially mean one after another for example if task two is dependent on output of first task then task 2 should wait for first task to finish? Is this what you are expecting?

    • @ShailendraMishra26
      @ShailendraMishra26 Год назад +1

      @@AshutoshTripathi_AI Yes I want to run sequentially. But my ask is I have 3 tasks, third should be executed once other two is executed. Output of two tasks is required to run the third one. I want to check if there is any way by which I can pass multiple output parameter in after method?

  • @purvijain-j1g
    @purvijain-j1g Год назад

    Hi, thanks for the video,
    Although I am not able to execute the code because the pipeline is not able to access the data file. I have tried giving absolute path as well but no luck. Can you help me

  • @madhavilatha716
    @madhavilatha716 9 месяцев назад

    This code no more supports with latest version 2.4.0 any help?

  • @reddyvarinaresh7924
    @reddyvarinaresh7924 Год назад +1

    Nice Ashutosh !

  • @yasshhh-y1u
    @yasshhh-y1u Год назад

    Hi Ashutosh thanks for your session but for me when I started pipline t-vol is showing .This step is in pending state with this message :ContainerCreating

  • @kirancrazy393
    @kirancrazy393 10 месяцев назад

    I was trying to replicate your code , but getting this error : AttributeError: module 'kfp.components' has no attribute 'create_component_from_fun' . my kfp version 2.4.0
    how to fix this

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  10 месяцев назад

      In this case please refer kubeflow official documentation of version 2.4.0 if they have changed the method name.

  • @jilanikashif
    @jilanikashif Год назад

    Thanks for sharing valuable information , I was looking for Kubeflow tutorial for long time. One thing which I am not getting clear is how to setup dashboard for kubeflow.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Ok. So do you mean the central dashboard for kubeflow where we see all the components of kubeflow like notebook server, volume, experiment, contributors..etc.....?
      If yes then for this you need to deploy complete kubeflow on a kubernetes cluster.
      It requires a lot of memory that is why I setup only kubeflow pipeline locally which suffices the main work for data scientists.

    • @jilanikashif
      @jilanikashif Год назад

      @@AshutoshTripathi_AI how we can setup locally, i have followed tutorial and created yaml file. Now I am stuck to upload yaml file locally and see pipeline

    • @jilanikashif
      @jilanikashif Год назад

      @@AshutoshTripathi_AI Please help on that to install locally and see pipeline

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@jilanikashif ok. I will create a video on installation soon but till then you can follow below step to install kubeflow pipeline SDK locally:
      1. Install docker desktop
      2. Install minikube. So just type minikube installation in Google search and open the official site. Then just follow those steps.
      3. Start minikube using minikube start command
      4. Type in google- kubeflow pipeline installation locally then open the kubeflow page and scroll down. There you will find there are two command which you need to execute and finally the port forwarding.
      5. Once you done till this point kubeflow pipeline will be installed locally.

    • @jilanikashif
      @jilanikashif Год назад +1

      @@AshutoshTripathi_AI Thanks for replying and sharing knowledge, I have followed till Minikube start and its working, however for kubeflow pipeline installation it's not been working. Could you please share that page which shows command to setup in locally and port forwarding.

  • @chandrashekhartiwari508
    @chandrashekhartiwari508 Год назад

    Hi sir, can we use both mlflow and kuber flow in a project

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Kubeflow has its own artifact registry which uses minio for storage.
      However if you want to use mlflow with kubeflow then you have to integrate mlflow with kubeflow.
      U can use teraform to do this. I have not done this as this mainly need devops knowledge. Please refer kubeflow documentation they have some documents which u can refer.

  • @geetatripathi9335
    @geetatripathi9335 Год назад +1

    Good 👍

  • @placementandjobs4102
    @placementandjobs4102 Год назад

    Sir for example if any component fail kubeflow pipeline how i can skip and started next componet for example i have 3 componet a, b, c
    b is fail i want run c even if b is fail or sucess how to achive this because when b is fail i will not move next component c so how we can do it.

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      If components are dependent on others then they have to run sequentially else they will run parallel without depending on others.
      For sequentially execution you cant skip.

  • @ramanjulubodisetty3665
    @ramanjulubodisetty3665 Год назад

    Hii Ashutosh,, I am getting error @while Kubeflow_Pipeline... Its showing like there no file directory path.. with out using S3 buckket u have any Suggestion to read the Dataset Plz,,,,

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      You can read it from GitHub repository. gs bucket etc

    • @ramanjulubodisetty3665
      @ramanjulubodisetty3665 Год назад

      Sir I wants to become an MlOps expert can u plz,, suggest me any crack course like institute

  • @sumitchauhan8245
    @sumitchauhan8245 Год назад +1

    How can I find the Session cookie, could you please share the steps in order to get session cookie. Thanks

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      In the browser just right click and click ok inspect option. Then click on application tab there at left side and you will see the cookies option. Then expand that and you will see the url. Just click on the url then on the body section you will find the auth_session. Just copy that long string. This is your browser auth session cookie I'd.

    • @sumitchauhan8245
      @sumitchauhan8245 Год назад

      I did the same thing but after running my pipeline on server I am getting this error :
      702 # Make the request on the httplib connection object.
      - -> 703 httplib response = self. make request
      704
      conn,
      705
      method.
      706
      url,
      707
      timeout=timeout obj,
      708
      body=body,
      709
      headers=headers,
      710
      chunked chunked,
      711
      713 # If we're going to release the connection in finally:
      )
      then
      714 # the response doesn't need to know about the connection. Otherwise
      715 # it will also try to release it and we'll have a double-release
      716 # mess

  • @sumitchauhan8245
    @sumitchauhan8245 Год назад +1

    What should be the namespace parameter, the notebook name ??

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      No. Not the notebook name. By default the namespace is kubeflow. But if you are working in server deployed one then ops team might have created multiple accounts for different users. So you need to check. If u see it in the url then also u can find the namespace parameter.
      As a concept kubeflow is multitenant so user accounts are segregated based on nespaces

  • @placementandjobs4102
    @placementandjobs4102 Год назад +1

    Sir how to add Jupiter notebook in kubleflow?

  • @devanshumishra6430
    @devanshumishra6430 Год назад

    How we Integrate it with Kserve?

  • @lug__aman
    @lug__aman Год назад

    brother not working module 'kfp.components' has no attribute 'create_component_from_func'

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Please check the version you are installing. It might be the case in upgraded version they have renamed it or new method came.

    • @lug__aman
      @lug__aman Год назад

      @@AshutoshTripathi_AI i am using the 2.0.1 may be some function name would change but there is no latest documentation out there. I am facing the problem any latest documentation is available?? I checked the Kubeflow document but it's not updated
      And you are version 1.8.18 I am not able to install this specific version 1.8 using pip

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@lug__aman github.com/kubeflow/pipelines/issues/7794#issuecomment-1164986300
      In kfpv2 doc is suggesting to use @component as decorator. Above function is deprecated

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@lug__aman please refer this url.
      www.kubeflow.org/docs/components/pipelines/v2/pipelines/pipeline-basics/

    • @lug__aman
      @lug__aman Год назад

      @@AshutoshTripathi_AI i followed the document in there are some code already written in docs for the testing it has already created some function and pipelines I copy paste all things from docs for test then I compiled then it's created yaml file I simply upload in kubeflow ui which install in a cluster
      But I am getting this error :
      Cannot get MLMD object from meta store

  • @geetatripathi9335
    @geetatripathi9335 4 месяца назад

    Very good beta

  • @kirancrazy393
    @kirancrazy393 10 месяцев назад +1

    Can I have your githib repo link please

  • @datasciencewitharbaaz5221
    @datasciencewitharbaaz5221 Год назад +1

    Why it is not creating visualizations for metrics confusion metrics ?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      In this video I have not covered the visualization part on kubeflow pipelines. Are you getting any error ?

    • @datasciencewitharbaaz5221
      @datasciencewitharbaaz5221 Год назад

      @@AshutoshTripathi_AI I went through the documentations, but didnt find anything I am not getting any visualizations as it says. "No Visualizations generated, create manually." But automatically it should create righ?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@datasciencewitharbaaz5221 i am not sure what you are doing to generate visualization. What I am thinking let me check the visualization part in kubeflow pipeline and will let u know how to generate and store.

  • @satyam70
    @satyam70 3 месяца назад

    do u take any class

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  3 месяца назад

      I used to take it. Stopped for some time due to other work.

  • @pankajjaiswal3907
    @pankajjaiswal3907 9 месяцев назад

    this code is outdated for the current version there are manny-many errors in this code you change the code according to the new version

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  2 месяца назад +1

      Please refer to the official document for the updates on the newer version.

  • @vishalwaghmare3130
    @vishalwaghmare3130 Год назад

    What is @ds1

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Not sure. Have I mentioned it anywhere in the video? Let me know. Thanks

    • @vishalwaghmare3130
      @vishalwaghmare3130 Год назад

      at 12:47

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      kfp.dsl contains the domain-specific language (DSL) that you can use to define and interact with pipelines and components.
      You can read about it here:
      www.kubeflow.org/docs/components/pipelines/v1/sdk/sdk-overview/

  • @keerthigavenkatesh3806
    @keerthigavenkatesh3806 Год назад

    I am facing the following error. Does anyone know how to solve it?
    ---------------------------------------------------------------------------
    AttributeError Traceback (most recent call last)
    Cell In[5], line 1
    ----> 1 create_step_prepare_data = kfp.components.create_component_from_func(
    2 func=prepare_data,
    3 base_image='python:3.7',
    4 packages_to_install=['pandas==1.2.4','numpy==1.21.0']
    5 )
    AttributeError: module 'kfp.components' has no attribute 'create_component_from_func'

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      Just check which version kubeflow pipeline you are using. In older version it was not there. Try to refer kubeflow document

    • @keerthigavenkatesh3806
      @keerthigavenkatesh3806 Год назад +1

      @@AshutoshTripathi_AI I was using the newer version, and now the error is resolved. Thanks a lot Ashutosh!

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад

      @@keerthigavenkatesh3806 good to hear.

  • @pankajjaiswal3907
    @pankajjaiswal3907 Месяц назад

    you did not clear the data/ path

  • @saadnajar2858
    @saadnajar2858 Год назад

    First of all thanks for the video , I have a problem while creating the kfp.client () it prints : Failed to load kube config. MaxRetryError: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /apis/v1beta1/healthz (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

    • @adilshaikh9123
      @adilshaikh9123 Год назад

      Hey did you got any solution I'm also facing the same issue!!!

  • @yasshhh-y1u
    @yasshhh-y1u Год назад

    Hi Ashutosh thanks for your session but for me when I started pipline t-vol is showing .This step is in pending state with this message :ContainerCreating

  • @placementandjobs4102
    @placementandjobs4102 Год назад +1

    Sir how to add Jupiter notebook in kubleflow?

    • @AshutoshTripathi_AI
      @AshutoshTripathi_AI  Год назад +1

      For that u need to install complete kubeflow with all components which requires lot of resources. Hence what I suggest you can still install jupyter notebook with anaconda and use it o build pipeline and then connect the kubeflow pipeline as shown in the tutorial.