Julien Thank you for providing good content, will be very helpful if you provide some insights on model registration and linking the project with custom git repos. Kudos !!
Thanks Poojan. You can build your own custom templates with your own repos. See docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates.html
What if I have a SaaS with multi-tenants and similar models per tenant (for example a model that perform client segmentation for each tenant) and I want to do Continuous Training and Deployment for this models. How can this be achieved, since at minute 16:47 you state that this can't be done with Sagemaker Studio
Hi Julien, as a data engineer it's difficult to test workflows of pyspark without a Jupyter Notebook. Is there any way to "replace" the common Aws glue workflows by calling jupyter notebooks?. Thanks in advance.
Hello Julian. Great Video. Followed your steps to create a project and create pipelines and endpoint. can you please answer some questions 1. We have a training model already developed which we want to use in sagemaker pipelines and then deploy it to create endpoints. how to do that. 2. Also are there IAM roles and policies involved in working with Sagemaker pipelines. 3. We have a notebook which has the training code built which is used to train the model, but the problem is that when a new user or a team member comes in he isnt able to see the the whole code and he has to download the whole code offline and upload it back to notebook. is there a way we can collaborate like we have in GIT or azure devops Repo
Hi Julien, it's not clear how I should do the inference on this. I have a custom processing container, and then I train a TF model. Is it possible to have these two clubbed for inference? I want to be giving S3 location of raw data during inference, have it go through processing and then predict on it. Can you please let me know if this is possible and how to go about it.
Thank you Julien also love your book at Packt a lot. Question: For our startup we want to set this sagemaker pipeline setup for dev acc & prod in seperate accounts. Where can i find guidelines on how to set this up?
I would like to ask another demand. may I ask you to please make a video for how to join several tables including some aggregation functions in sql. I want to join 3 different tables which are in 2 different schema in Redshift. The output of two joined table will have some aggregation functions in its sql query. Since the schema of two tables are different I can not write the sql query directly in Data Wrangler. Will be great If you help.
Hi Julian, really appreciate the explication. Could you do a video or point to some demo showing how to use sagemaker pipelines for scheduled batch jobs? Say I have a 10gb data set loaded into s3 every day, how can I schedule a pipeline to transform and run inference on this?
You can easily run batch transform in your pipelines, see docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform. You can also schedule execution with a Lambda function firing up your pipeline, see docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
Hello Julien, thank you for the video and the channel! Makes understanding AWS SageMaker easier for newbies like me :) I wanted to ask if there is a way to list all resources/components (Models, endpoints, TrainingJobs, ProcessingJobs etc) associated with an (or created in an) AWS SageMaker Notebook/ Studio Project? Thanks a lot for any information on this task!
Thank you! This is a really good question, and the answer is "kind of". You can track model lineage and see all artifacts that led to a particular model, see docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html, but th
Hello Julien, i would like to ask a question. i'm a bit new to sagemaker and it's functionalities. how would one go about creating thier own project template assuming i want to start a new project, or do i modify the existing abalone template to suit my taste ?
Bonjour Julien, thanks for the video ! I would be interested about the final step, the one where you actually processe an inference into the endpoint. I don't see this in the demo. In particular I'm curious to know how you can propagate the preprocessing fit "model" (for instance the one-hot) to the model hosted in the endpoint. Thank you very much for any information on this step ! Hav a great day
Hi Damien, regarding preprocessing, you would have to apply it to the data sent to the endpoint. A clever way to do this is to use an Inference Pipeline, i.e. a sequence of models invoked as a single unit. Here's an example: github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb
Hello Julien. Thanks for your videos. It was helpful. I have a requirement. I want to create training jobs within sagemaker pipeline. How to achieve this?
@@juliensimonfr Thanks for the reply. In the video you have used scikit learn estimator to train. I will have to create a training job. My doubt is how to integrate training jobs within the pipeline. Please guide.
Hi Julien, I'm trying to create a pipeline and I'm experiencing significant overhead for each individual step (~10 min). Is there any way to test individual steps without running the entire pipeline and having to wait for earlier steps to finish?
@@juliensimonfr Thanks for responding! Sagemaker recently made it possible to execute pipelines in local mode which almost eliminates the overhead I was experiencing :)
Hi Julien. Thanks for explanation. I am working in a company in Germany which we use AWS tools. My question is that I have to run daily millions of rows through sql queries from Redshift. But My in sagemaker I have memory limitation. Is it possible to make it easier in Sagemaker pipelines?
SageMaker Processing is probably what you're looking for. It's easy to automate and you can pick very large instances. Of course you just pay for the duration of the job.
@@juliensimonfr Thanks for your reply. I am currently working on sagemaker normal instance. I am running a sql query with some joins, aggregation functions reading some very large tables from Redshift. The query takes very long If I fetch data for a period of time more than 6 days. I heard that in Data Wrangler it is possible to speed up the importing tables. Will be the case for joined tables as my case? Thanks in advance
Hi Sumesh, SageMaker Pipelines is two-sided 1) A Python SDK to build ML workflows (similar to the Data Science SDK) 2) An MLOps capability based on CodePipeline. I think the integration with SageMaker Studio is really interesting, and a more productive option than the Data Science SDK.
@@juliensimonfr Is there a document/link with the details to create custom project template(organization template)? In case if I wanted to call a lambda function or glue job as a workflow step in the pipeline, do you think I will be able to customize it using this?
thanks for the video Julien gave a very good overview of it. I am wondering if there is a good way to learn more about the deploy step. Additionally, I have a model where I want to retrain it daily as we get new data daily. What is the best pattern for this?
Thanks Lucie. You can deploy the "usual" way by grabbing the model in S3 and creating an endpoint. For full automation, you can use MLOps as described in docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html, but this requires diving a bit into CloudFormation. We're covering this topic in SageMaker Fridays S03E04, so make sure to catch that episode at amazonsagemakerfridays.splashthat.com/ :)
Salut Julien! I see the Pipeline templates are not available for region-us-east-1 in SageMakerStudio (only us-west-1). Is there a reason for that? Any chance they could be available for N.Virginia? Tks for the tutorial :-) Came in handy with a project delivery.
They should be available there. Please make sure that your Studio user has the appropriate permissions. There's a slider setting in the user details ("Enable SageMaker Projects").
If you're only interested in the Python SDK, this one is very close: github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-pipelines. If you're interested in the full example with MLOps support, it's part of the repos I clone in the video.
So excited about all of these features!
Me too :)
Thank you Julien.
You're welcome !
Julien Thank you for providing good content, will be very helpful if you provide some insights on model registration and linking the project with custom git repos. Kudos !!
Thanks Poojan. You can build your own custom templates with your own repos. See docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates.html
Great lectures. Great teacher.
What if I have a SaaS with multi-tenants and similar models per tenant (for example a model that perform client segmentation for each tenant) and I want to do Continuous Training and Deployment for this models. How can this be achieved, since at minute 16:47 you state that this can't be done with Sagemaker Studio
Hi Julien, as a data engineer it's difficult to test workflows of pyspark without a Jupyter Notebook. Is there any way to "replace" the common Aws glue workflows by calling jupyter notebooks?. Thanks in advance.
Thanks to share it!
My pleasure!
Thank you for the video.
You're welcome
Thanl you! Very instructive
Glad you enjoyed it!
Interesting to see how TFX will integrate with this.
How to retrieve the inbuilt sagemaker image uri ? Kindly help me with the command
Hello Julian. Great Video. Followed your steps to create a project and create pipelines and endpoint. can you please answer some questions
1. We have a training model already developed which we want to use in sagemaker pipelines and then deploy it to create endpoints. how to do that.
2. Also are there IAM roles and policies involved in working with Sagemaker pipelines.
3. We have a notebook which has the training code built which is used to train the model, but the problem is that when a new user or a team member comes in he isnt able to see the the whole code and he has to download the whole code offline and upload it back to notebook. is there a way we can collaborate like we have in GIT or azure devops Repo
Hi Julien, I got an error in preprocessing script. Can you please confirm that the script is correct?
Hi Julien, it's not clear how I should do the inference on this. I have a custom processing container, and then I train a TF model. Is it possible to have these two clubbed for inference? I want to be giving S3 location of raw data during inference, have it go through processing and then predict on it. Can you please let me know if this is possible and how to go about it.
Great content Julien!
Thank you, glad you like it!
Thank you Julien also love your book at Packt a lot. Question: For our startup we want to set this sagemaker pipeline setup for dev acc & prod in seperate accounts. Where can i find guidelines on how to set this up?
Thanks Denzil! Here's a nice multi-account example: aws.amazon.com/blogs/machine-learning/multi-account-model-deployment-with-amazon-sagemaker-pipelines/
I would like to ask another demand. may I ask you to please make a video for how to join several tables including some aggregation functions in sql. I want to join 3 different tables which are in 2 different schema in Redshift. The output of two joined table will have some aggregation functions in its sql query. Since the schema of two tables are different I can not write the sql query directly in Data Wrangler. Will be great If you help.
Its a really awesome session Julien. I have one doubt. If I want to keep Version 1 for 70% requests and Version 2 for 30% requests, How can I do that?
You can deploy multiple variants on the same endpoint: docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html
Hi Julien, where are models getting deployed? I am not seeing any container or docker details in the demo.
They're deployed on SageMaker endpoints, as usual. It all takes place in the CloudFormation template stored in the 'model-deploy' repository.
Hi Julian, really appreciate the explication. Could you do a video or point to some demo showing how to use sagemaker pipelines for scheduled batch jobs? Say I have a 10gb data set loaded into s3 every day, how can I schedule a pipeline to transform and run inference on this?
You can easily run batch transform in your pipelines, see docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform. You can also schedule execution with a Lambda function firing up your pipeline, see docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
i cud c that this video got uploaded 3 years back, is that still valid , I mean the features navigatons
The UI has changed, the SDK is probably still very very similar.
Hello Julien, thank you for the video and the channel! Makes understanding AWS SageMaker easier for newbies like me :) I wanted to ask if there is a way to list all resources/components (Models, endpoints, TrainingJobs, ProcessingJobs etc) associated with an (or created in an) AWS SageMaker Notebook/ Studio Project? Thanks a lot for any information on this task!
Thank you! This is a really good question, and the answer is "kind of". You can track model lineage and see all artifacts that led to a particular model, see docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html, but th
Hello Julien, i would like to ask a question. i'm a bit new to sagemaker and it's functionalities. how would one go about creating thier own project template assuming i want to start a new project, or do i modify the existing abalone template to suit my taste ?
Here's an example aws.amazon.com/fr/blogs/machine-learning/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines/
@@juliensimonfr thank you
Hi Julien, Can we terraform the sagemaker pipelines?
Hi Vicky, according to registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sagemaker_model, it's not supported yet.
Bonjour Julien, thanks for the video ! I would be interested about the final step, the one where you actually processe an inference into the endpoint. I don't see this in the demo. In particular I'm curious to know how you can propagate the preprocessing fit "model" (for instance the one-hot) to the model hosted in the endpoint. Thank you very much for any information on this step !
Hav a great day
Hi Damien, regarding preprocessing, you would have to apply it to the data sent to the endpoint. A clever way to do this is to use an Inference Pipeline, i.e. a sequence of models invoked as a single unit. Here's an example: github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb
@@juliensimonfr Thanks Julien !
my layout for studio is quite different; the launches only show notebook, also, I do not get the add on with the triangle on the left.
The Studio UI frequently changes, sometimes for the better ;)
@@juliensimonfr thank you. I figured out what I was doing wrong 😑. I need to launch the app rather than just the notebooks
Hello Julien. Thanks for your videos. It was helpful. I have a requirement. I want to create training jobs within sagemaker pipeline. How to achieve this?
sagemaker.workflow.steps.TrainingStep ?
@@juliensimonfr Thanks for the reply. In the video you have used scikit learn estimator to train. I will have to create a training job. My doubt is how to integrate training jobs within the pipeline. Please guide.
Hi Julien, I'm trying to create a pipeline and I'm experiencing significant overhead for each individual step (~10 min). Is there any way to test individual steps without running the entire pipeline and having to wait for earlier steps to finish?
Not that I know. I guess you could test each step in its own mini-pipeline if you have all the intermediate artifacts, and then put them together ?
@@juliensimonfr Thanks for responding! Sagemaker recently made it possible to execute pipelines in local mode which almost eliminates the overhead I was experiencing :)
how to trigger this complete pipeline using lambdas or cron jobs?? is there any such option?
docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
Hi Julien. Thanks for explanation. I am working in a company in Germany which we use AWS tools. My question is that I have to run daily millions of rows through sql queries from Redshift. But My in sagemaker I have memory limitation. Is it possible to make it easier in Sagemaker pipelines?
SageMaker Processing is probably what you're looking for. It's easy to automate and you can pick very large instances. Of course you just pay for the duration of the job.
@@juliensimonfr Thanks for your reply. I am currently working on sagemaker normal instance. I am running a sql query with some joins, aggregation functions reading some very large tables from Redshift. The query takes very long If I fetch data for a period of time more than 6 days. I heard that in Data Wrangler it is possible to speed up the importing tables. Will be the case for joined tables as my case?
Thanks in advance
Hello Julien, Can we use the AWS Step Functions Data Science SDK along with the Pipeline? Or are these two different things?
Hi Sumesh, SageMaker Pipelines is two-sided 1) A Python SDK to build ML workflows (similar to the Data Science SDK) 2) An MLOps capability based on CodePipeline. I think the integration with SageMaker Studio is really interesting, and a more productive option than the Data Science SDK.
@@juliensimonfr Is there a document/link with the details to create custom project template(organization template)? In case if I wanted to call a lambda function or glue job as a workflow step in the pipeline, do you think I will be able to customize it using this?
Hi Julien, thanks for this clear demo. My team uses gitlab for ci/cd, would this be a possibility instead of Codepipeline? Thx
hello, how to customize from abalone pipeline to custom model pipeline?
docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html :)
thanks for the video Julien gave a very good overview of it. I am wondering if there is a good way to learn more about the deploy step. Additionally, I have a model where I want to retrain it daily as we get new data daily. What is the best pattern for this?
Thanks Lucie. You can deploy the "usual" way by grabbing the model in S3 and creating an endpoint. For full automation, you can use MLOps as described in docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html, but this requires diving a bit into CloudFormation. We're covering this topic in SageMaker Fridays S03E04, so make sure to catch that episode at amazonsagemakerfridays.splashthat.com/ :)
Where does the model deploy to when you approve it?
A SageMaker endpoint, configured in the template.
@@juliensimonfr Sorry, it's been awhile, what is the endpoint? Just part of the container?
@@MrChristian331 the model endpoint creates url for inference once its trained
Salut Julien! I see the Pipeline templates are not available for region-us-east-1 in SageMakerStudio (only us-west-1). Is there a reason for that? Any chance they could be available for N.Virginia? Tks for the tutorial :-) Came in handy with a project delivery.
They should be available there. Please make sure that your Studio user has the appropriate permissions. There's a slider setting in the user details ("Enable SageMaker Projects").
can you please provide the GitHub link to the Python Notebook. thanks!
If you're only interested in the Python SDK, this one is very close: github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-pipelines. If you're interested in the full example with MLOps support, it's part of the repos I clone in the video.
@@juliensimonfr yes. I am interested in the sdk so this is perfect. Thanks a bunch!!