Introducing Amazon SageMaker Pipelines - AWS re:Invent 2020

Поделиться
HTML-код
  • Опубликовано: 5 окт 2024

Комментарии • 77

  • @PaulPrae
    @PaulPrae 3 года назад +1

    So excited about all of these features!

  • @fatihbicer7353
    @fatihbicer7353 4 месяца назад

    Thank you Julien.

  • @poojankothari2440
    @poojankothari2440 3 года назад +1

    Julien Thank you for providing good content, will be very helpful if you provide some insights on model registration and linking the project with custom git repos. Kudos !!

    • @juliensimonfr
      @juliensimonfr  3 года назад

      Thanks Poojan. You can build your own custom templates with your own repos. See docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates.html

  • @caiyu538
    @caiyu538 Год назад

    Great lectures. Great teacher.

  • @sgaseretto
    @sgaseretto 3 года назад +1

    What if I have a SaaS with multi-tenants and similar models per tenant (for example a model that perform client segmentation for each tenant) and I want to do Continuous Training and Deployment for this models. How can this be achieved, since at minute 16:47 you state that this can't be done with Sagemaker Studio

  • @carbita1
    @carbita1 3 года назад +2

    Hi Julien, as a data engineer it's difficult to test workflows of pyspark without a Jupyter Notebook. Is there any way to "replace" the common Aws glue workflows by calling jupyter notebooks?. Thanks in advance.

  • @joaosalero9797
    @joaosalero9797 2 года назад

    Thanks to share it!

  • @herleyshaori
    @herleyshaori Год назад

    Thank you for the video.

  • @gaboceron100
    @gaboceron100 Год назад

    Thanl you! Very instructive

  • @ZackJW91
    @ZackJW91 3 года назад

    Interesting to see how TFX will integrate with this.

  • @Ramyavenkat-r3y
    @Ramyavenkat-r3y Год назад

    How to retrieve the inbuilt sagemaker image uri ? Kindly help me with the command

  • @abhijeetkabra8525
    @abhijeetkabra8525 Год назад

    Hello Julian. Great Video. Followed your steps to create a project and create pipelines and endpoint. can you please answer some questions
    1. We have a training model already developed which we want to use in sagemaker pipelines and then deploy it to create endpoints. how to do that.
    2. Also are there IAM roles and policies involved in working with Sagemaker pipelines.
    3. We have a notebook which has the training code built which is used to train the model, but the problem is that when a new user or a team member comes in he isnt able to see the the whole code and he has to download the whole code offline and upload it back to notebook. is there a way we can collaborate like we have in GIT or azure devops Repo

  • @AnkitSingh-rv2dq
    @AnkitSingh-rv2dq 2 года назад

    Hi Julien, I got an error in preprocessing script. Can you please confirm that the script is correct?

  • @priteshjain0310
    @priteshjain0310 3 года назад

    Hi Julien, it's not clear how I should do the inference on this. I have a custom processing container, and then I train a TF model. Is it possible to have these two clubbed for inference? I want to be giving S3 location of raw data during inference, have it go through processing and then predict on it. Can you please let me know if this is possible and how to go about it.

  • @SambitTripathy
    @SambitTripathy 3 года назад

    Great content Julien!

  • @denzilstudios7072
    @denzilstudios7072 2 года назад

    Thank you Julien also love your book at Packt a lot. Question: For our startup we want to set this sagemaker pipeline setup for dev acc & prod in seperate accounts. Where can i find guidelines on how to set this up?

    • @juliensimonfr
      @juliensimonfr  2 года назад +1

      Thanks Denzil! Here's a nice multi-account example: aws.amazon.com/blogs/machine-learning/multi-account-model-deployment-with-amazon-sagemaker-pipelines/

  • @narijami
    @narijami 3 года назад

    I would like to ask another demand. may I ask you to please make a video for how to join several tables including some aggregation functions in sql. I want to join 3 different tables which are in 2 different schema in Redshift. The output of two joined table will have some aggregation functions in its sql query. Since the schema of two tables are different I can not write the sql query directly in Data Wrangler. Will be great If you help.

  • @sivaprasanth5961
    @sivaprasanth5961 2 года назад

    Its a really awesome session Julien. I have one doubt. If I want to keep Version 1 for 70% requests and Version 2 for 30% requests, How can I do that?

    • @juliensimonfr
      @juliensimonfr  2 года назад

      You can deploy multiple variants on the same endpoint: docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html

  • @11eagleye
    @11eagleye 3 года назад

    Hi Julien, where are models getting deployed? I am not seeing any container or docker details in the demo.

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      They're deployed on SageMaker endpoints, as usual. It all takes place in the CloudFormation template stored in the 'model-deploy' repository.

  • @samnman1
    @samnman1 2 года назад

    Hi Julian, really appreciate the explication. Could you do a video or point to some demo showing how to use sagemaker pipelines for scheduled batch jobs? Say I have a 10gb data set loaded into s3 every day, how can I schedule a pipeline to transform and run inference on this?

    • @juliensimonfr
      @juliensimonfr  2 года назад +1

      You can easily run batch transform in your pipelines, see docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform. You can also schedule execution with a Lambda function firing up your pipeline, see docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

  • @fantasyapart787
    @fantasyapart787 2 месяца назад

    i cud c that this video got uploaded 3 years back, is that still valid , I mean the features navigatons

    • @juliensimonfr
      @juliensimonfr  2 месяца назад

      The UI has changed, the SDK is probably still very very similar.

  • @anubhabjoardar9321
    @anubhabjoardar9321 2 года назад

    Hello Julien, thank you for the video and the channel! Makes understanding AWS SageMaker easier for newbies like me :) I wanted to ask if there is a way to list all resources/components (Models, endpoints, TrainingJobs, ProcessingJobs etc) associated with an (or created in an) AWS SageMaker Notebook/ Studio Project? Thanks a lot for any information on this task!

    • @juliensimonfr
      @juliensimonfr  2 года назад

      Thank you! This is a really good question, and the answer is "kind of". You can track model lineage and see all artifacts that led to a particular model, see docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html, but th

  • @samuelmathias794
    @samuelmathias794 2 года назад

    Hello Julien, i would like to ask a question. i'm a bit new to sagemaker and it's functionalities. how would one go about creating thier own project template assuming i want to start a new project, or do i modify the existing abalone template to suit my taste ?

    • @juliensimonfr
      @juliensimonfr  2 года назад +1

      Here's an example aws.amazon.com/fr/blogs/machine-learning/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines/

    • @samuelmathias794
      @samuelmathias794 2 года назад

      @@juliensimonfr thank you

  • @vickyshrestha
    @vickyshrestha 3 года назад

    Hi Julien, Can we terraform the sagemaker pipelines?

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      Hi Vicky, according to registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sagemaker_model, it's not supported yet.

  • @dampeel2000
    @dampeel2000 3 года назад

    Bonjour Julien, thanks for the video ! I would be interested about the final step, the one where you actually processe an inference into the endpoint. I don't see this in the demo. In particular I'm curious to know how you can propagate the preprocessing fit "model" (for instance the one-hot) to the model hosted in the endpoint. Thank you very much for any information on this step !
    Hav a great day

    • @juliensimonfr
      @juliensimonfr  3 года назад

      Hi Damien, regarding preprocessing, you would have to apply it to the data sent to the endpoint. A clever way to do this is to use an Inference Pipeline, i.e. a sequence of models invoked as a single unit. Here's an example: github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

    • @dampeel2000
      @dampeel2000 3 года назад +1

      @@juliensimonfr Thanks Julien !

  • @juliocardenas4485
    @juliocardenas4485 2 года назад

    my layout for studio is quite different; the launches only show notebook, also, I do not get the add on with the triangle on the left.

    • @juliensimonfr
      @juliensimonfr  2 года назад

      The Studio UI frequently changes, sometimes for the better ;)

    • @juliocardenas4485
      @juliocardenas4485 2 года назад

      @@juliensimonfr thank you. I figured out what I was doing wrong 😑. I need to launch the app rather than just the notebooks

  • @bhujithmadav1481
    @bhujithmadav1481 11 месяцев назад

    Hello Julien. Thanks for your videos. It was helpful. I have a requirement. I want to create training jobs within sagemaker pipeline. How to achieve this?

    • @juliensimonfr
      @juliensimonfr  11 месяцев назад

      sagemaker.workflow.steps.TrainingStep ?

    • @bhujithmadav1481
      @bhujithmadav1481 11 месяцев назад

      @@juliensimonfr Thanks for the reply. In the video you have used scikit learn estimator to train. I will have to create a training job. My doubt is how to integrate training jobs within the pipeline. Please guide.

  • @Flopyboy
    @Flopyboy 2 года назад

    Hi Julien, I'm trying to create a pipeline and I'm experiencing significant overhead for each individual step (~10 min). Is there any way to test individual steps without running the entire pipeline and having to wait for earlier steps to finish?

    • @juliensimonfr
      @juliensimonfr  2 года назад +1

      Not that I know. I guess you could test each step in its own mini-pipeline if you have all the intermediate artifacts, and then put them together ?

    • @Flopyboy
      @Flopyboy 2 года назад

      @@juliensimonfr Thanks for responding! Sagemaker recently made it possible to execute pipelines in local mode which almost eliminates the overhead I was experiencing :)

  • @vinayakdhruv6457
    @vinayakdhruv6457 2 года назад

    how to trigger this complete pipeline using lambdas or cron jobs?? is there any such option?

    • @juliensimonfr
      @juliensimonfr  2 года назад

      docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

  • @narijami
    @narijami 3 года назад

    Hi Julien. Thanks for explanation. I am working in a company in Germany which we use AWS tools. My question is that I have to run daily millions of rows through sql queries from Redshift. But My in sagemaker I have memory limitation. Is it possible to make it easier in Sagemaker pipelines?

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      SageMaker Processing is probably what you're looking for. It's easy to automate and you can pick very large instances. Of course you just pay for the duration of the job.

    • @narijami
      @narijami 3 года назад

      @@juliensimonfr Thanks for your reply. I am currently working on sagemaker normal instance. I am running a sql query with some joins, aggregation functions reading some very large tables from Redshift. The query takes very long If I fetch data for a period of time more than 6 days. I heard that in Data Wrangler it is possible to speed up the importing tables. Will be the case for joined tables as my case?
      Thanks in advance

  • @sumeshmr9130
    @sumeshmr9130 3 года назад

    Hello Julien, Can we use the AWS Step Functions Data Science SDK along with the Pipeline? Or are these two different things?

    • @juliensimonfr
      @juliensimonfr  3 года назад

      Hi Sumesh, SageMaker Pipelines is two-sided 1) A Python SDK to build ML workflows (similar to the Data Science SDK) 2) An MLOps capability based on CodePipeline. I think the integration with SageMaker Studio is really interesting, and a more productive option than the Data Science SDK.

    • @sumeshmr9130
      @sumeshmr9130 3 года назад

      @@juliensimonfr Is there a document/link with the details to create custom project template(organization template)? In case if I wanted to call a lambda function or glue job as a workflow step in the pipeline, do you think I will be able to customize it using this?

    • @Koningbob
      @Koningbob 3 года назад

      Hi Julien, thanks for this clear demo. My team uses gitlab for ci/cd, would this be a possibility instead of Codepipeline? Thx

  • @elmirach4706
    @elmirach4706 3 года назад

    hello, how to customize from abalone pipeline to custom model pipeline?

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html :)

  • @lucieackley7432
    @lucieackley7432 3 года назад

    thanks for the video Julien gave a very good overview of it. I am wondering if there is a good way to learn more about the deploy step. Additionally, I have a model where I want to retrain it daily as we get new data daily. What is the best pattern for this?

    • @juliensimonfr
      @juliensimonfr  3 года назад

      Thanks Lucie. You can deploy the "usual" way by grabbing the model in S3 and creating an endpoint. For full automation, you can use MLOps as described in docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html, but this requires diving a bit into CloudFormation. We're covering this topic in SageMaker Fridays S03E04, so make sure to catch that episode at amazonsagemakerfridays.splashthat.com/ :)

  • @MrChristian331
    @MrChristian331 3 года назад

    Where does the model deploy to when you approve it?

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      A SageMaker endpoint, configured in the template.

    • @MrChristian331
      @MrChristian331 3 года назад

      @@juliensimonfr Sorry, it's been awhile, what is the endpoint? Just part of the container?

    • @kanishkmair2920
      @kanishkmair2920 3 года назад

      @@MrChristian331 the model endpoint creates url for inference once its trained

  • @JulianaPassos
    @JulianaPassos 3 года назад

    Salut Julien! I see the Pipeline templates are not available for region-us-east-1 in SageMakerStudio (only us-west-1). Is there a reason for that? Any chance they could be available for N.Virginia? Tks for the tutorial :-) Came in handy with a project delivery.

    • @juliensimonfr
      @juliensimonfr  3 года назад +2

      They should be available there. Please make sure that your Studio user has the appropriate permissions. There's a slider setting in the user details ("Enable SageMaker Projects").

  • @dasgupta0885
    @dasgupta0885 3 года назад

    can you please provide the GitHub link to the Python Notebook. thanks!

    • @juliensimonfr
      @juliensimonfr  3 года назад +1

      If you're only interested in the Python SDK, this one is very close: github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-pipelines. If you're interested in the full example with MLOps support, it's part of the repos I clone in the video.

    • @dasgupta0885
      @dasgupta0885 3 года назад

      @@juliensimonfr yes. I am interested in the sdk so this is perfect. Thanks a bunch!!