Kevin! I really appreciate your effort in making this valuable material. I'm very surprised that you only have 2k subscribers, for such a detailed, clear and relevant information. Please don't be discouraged, if such is the case
Thank you for this tutorial. I would be more interested in how you generate your fileDataset. Indeed, I have a problem using the dataset I export as MLtable after the labeling phase in Azure ML.
To generate the file dataset, grab the file from the video description and save it locally. Then, upload it into Azure ML as a v1 Tabular type, not a v2 MLTable. In the Data menu, make sure you are on the Data assets tab. Then, select +Create and name the data asset something like ChicagoParkingTickets. Select in the Type menu "Tabular" from the v1 section. On the second page, create your data asset From local files and that will give you a few more pages around where to store the data, the file you want to upload, and additional settings. Those steps should be pretty straightforward, as I tried to ensure that there would be no complications with this dataset. Uploading an MLTable asset is quite a bit more difficult than the v1 Tabular: learn.microsoft.com/en-us/azure/machine-learning/how-to-mltable?view=azureml-api-2&tabs=cli There's some work behind the scenes to add the MLtable metadata , so that, when I look at the ChicagoParkingTickets dataset in the Azure ML UI, I see the Dataset type = Tabular and Type = Table (mltable). That's why the output node for the Azure ML Designer says MLTable even though I never explicitly generated any MLTable metadata. Azure ML did the work for me after I uploaded the text file as a v1 Tabular dataset.
@@KevinFeasel Hi Kevin,. your work is really helpful! If I create a model following your steps will I be able to deploy it? I have experienced some issues when deploying some models and I have been suggested this may caused because I went for V1 datasets. Tabular datasets to be more specific. However, I have read the Microsoft documentation and have not found any evidence of that. Thanks
@@ivancasanagallen I've not had problems with deploying models based on data in V1 Tabular datasets in the past. There may be some combination of factors that led to an issue, as Azure ML can be pretty finicky if you leave the happy path scenario, but I can't think of a reason off-hand why using a V1 Tabular dataset would be a problem for later model deployment.
Thanks @@KevinFeasel I may need to read some more documentation. This is the message I get when trying to deploy a real-time endpoint: V1 deployment testing not supported This deployment is based on v1 API and doesn't support testing on the Studio. To get the key/token and invoke, please use CLI/SDK/REST v1 API. Consider migrating to v2 managed online endpoint. Learn more about CLI/SDK/REST Learn more about v2 managed online endpoint
Thank you for video, but I encountered an error. I did everything exactly as in the video, selected the same computing cluster. But not a single block is executed with an error: UserError: The specified DSI cluster type is not supported for job submission. There is no information on the Internet about this. I tried to run the built-in samples, the same thing, even clearing the data does not work. I am using a free trial subscription.
Hmm, that is a good question. It could very well be that you're using a trial subscription, but I'm not positive about that because I've not used the free trial subscription for any Azure ML testing. There's a GitHub post that does walk you through how you can see which VM classes you can use with the free trial: github.com/MicrosoftDocs/azure-docs/issues/56032. These commands use the Azure cloud shell and PowerShell. The idea would be that you could see which VM classes are enabled and what the quotas look like. Then, change the cpu-cluster to use one of the allowed classes and try again. The work I show in this video isn't particularly compute-heavy, so it should still work okay on a single instance of a smaller VM class.
Hi, I read through the Microsoft learn document, but I'm still confused between Azure ML designer and Azure ML studio...... I saw some comments saying Studio is the newer version of the designer, is this valid? The explanation says both are GUI for creating workflow...
The confusion here is in naming. Yes, "the Azure Machine Learning Designer" is an old web UI for Azure Machine Learning. Azure Machine Learning Studio replaced this old Designer web UI in its entirety, and we only look at Azure Machine Learning Studio (ml.azure.com) in this series. But within Azure Machine Learning Studio is a component known as the designer (the topic of this video). It even does many of the things the old UI included, but is different code. Hopefully that helps clarify things a little bit here.
Can you help me please? , Should we only upload the dataset without the Python code or what should we upload, please explain it to me, i have project !
You upload the dataset. To generate the file dataset, grab the file from the video description and save it locally. Then, upload it into Azure ML as a v1 Tabular type, not a v2 MLTable. In the Data menu, make sure you are on the Data assets tab. Then, select +Create and name the data asset something like ChicagoParkingTickets. Select in the Type menu "Tabular" from the v1 section. On the second page, create your data asset From local files and that will give you a few more pages around where to store the data, the file you want to upload, and additional settings. Those steps should be pretty straightforward, as I tried to ensure that there would be no complications with this dataset. The Python code is something you submit via API call and we do that in the next video in the series.
The scaling component in Azure ML Designer is called Normalize Data, so there is a built-in scaling capability. I believe there is not a built-in one-hot encoding capability. To do that, I'd recommend adding an R or Python component and have it do the work.
If you're talking about the best model generated from my prior AutoML video, we're training a new model from scratch to see how to do it. If you're asking in general, you can do this in a couple of ways. One is to train separate models as different experiment runs, saving each in the Azure ML model registry and comparing model results--for classification, you might check measures like accuracy and F1 score. A second option would be to train separate models as runs in an experiment but tagged under the same model type, saving different versions of a model in the registry. Then, after comparing, you could delete the versions that don't perform as well. A third option would be to perform comparative model analysis as part of your initial training: you can incorporate hyperparameter sweeping and even use of different algorithms in the training steps and then save the best model of the bunch to the registry. I don't have an example of doing this in a video but Microsoft does have a code example of using a hyperparameter sweep: github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1c_pipeline_with_hyperparameter_sweep
Wow, this video was amazing, so clear and interesting, thank you very much Kevin !
Kevin!
I really appreciate your effort in making this valuable material.
I'm very surprised that you only have 2k subscribers, for such a detailed, clear and relevant information.
Please don't be discouraged, if such is the case
this video is great
Hi Kevin. Thank you for this. You helped me.
Thank you for this tutorial. I would be more interested in how you generate your fileDataset. Indeed, I have a problem using the dataset I export as MLtable after the labeling phase in Azure ML.
To generate the file dataset, grab the file from the video description and save it locally. Then, upload it into Azure ML as a v1 Tabular type, not a v2 MLTable. In the Data menu, make sure you are on the Data assets tab. Then, select +Create and name the data asset something like ChicagoParkingTickets. Select in the Type menu "Tabular" from the v1 section. On the second page, create your data asset From local files and that will give you a few more pages around where to store the data, the file you want to upload, and additional settings. Those steps should be pretty straightforward, as I tried to ensure that there would be no complications with this dataset.
Uploading an MLTable asset is quite a bit more difficult than the v1 Tabular: learn.microsoft.com/en-us/azure/machine-learning/how-to-mltable?view=azureml-api-2&tabs=cli
There's some work behind the scenes to add the MLtable metadata , so that, when I look at the ChicagoParkingTickets dataset in the Azure ML UI, I see the Dataset type = Tabular and Type = Table (mltable). That's why the output node for the Azure ML Designer says MLTable even though I never explicitly generated any MLTable metadata. Azure ML did the work for me after I uploaded the text file as a v1 Tabular dataset.
@@KevinFeasel Hi Kevin,. your work is really helpful! If I create a model following your steps will I be able to deploy it?
I have experienced some issues when deploying some models and I have been suggested this may caused because I went for V1 datasets. Tabular datasets to be more specific. However, I have read the Microsoft documentation and have not found any evidence of that. Thanks
@@ivancasanagallen I've not had problems with deploying models based on data in V1 Tabular datasets in the past. There may be some combination of factors that led to an issue, as Azure ML can be pretty finicky if you leave the happy path scenario, but I can't think of a reason off-hand why using a V1 Tabular dataset would be a problem for later model deployment.
Thanks @@KevinFeasel I may need to read some more documentation. This is the message I get when trying to deploy a real-time endpoint: V1 deployment testing not supported
This deployment is based on v1 API and doesn't support testing on the Studio. To get the key/token and invoke, please use CLI/SDK/REST v1 API. Consider migrating to v2 managed online endpoint.
Learn more about CLI/SDK/REST
Learn more about v2 managed online endpoint
thankyou needed this video
Thank you for video, but I encountered an error. I did everything exactly as in the video, selected the same computing cluster. But not a single block is executed with an error: UserError: The specified DSI cluster type is not supported for job submission.
There is no information on the Internet about this. I tried to run the built-in samples, the same thing, even clearing the data does not work. I am using a free trial subscription.
Hmm, that is a good question. It could very well be that you're using a trial subscription, but I'm not positive about that because I've not used the free trial subscription for any Azure ML testing.
There's a GitHub post that does walk you through how you can see which VM classes you can use with the free trial: github.com/MicrosoftDocs/azure-docs/issues/56032. These commands use the Azure cloud shell and PowerShell. The idea would be that you could see which VM classes are enabled and what the quotas look like. Then, change the cpu-cluster to use one of the allowed classes and try again. The work I show in this video isn't particularly compute-heavy, so it should still work okay on a single instance of a smaller VM class.
Hi, I read through the Microsoft learn document, but I'm still confused between Azure ML designer and Azure ML studio...... I saw some comments saying Studio is the newer version of the designer, is this valid? The explanation says both are GUI for creating workflow...
The confusion here is in naming. Yes, "the Azure Machine Learning Designer" is an old web UI for Azure Machine Learning. Azure Machine Learning Studio replaced this old Designer web UI in its entirety, and we only look at Azure Machine Learning Studio (ml.azure.com) in this series.
But within Azure Machine Learning Studio is a component known as the designer (the topic of this video). It even does many of the things the old UI included, but is different code.
Hopefully that helps clarify things a little bit here.
Can you help me please? , Should we only upload the dataset without the Python code or what should we upload, please explain it to me, i have project !
You upload the dataset. To generate the file dataset, grab the file from the video description and save it locally. Then, upload it into Azure ML as a v1 Tabular type, not a v2 MLTable. In the Data menu, make sure you are on the Data assets tab. Then, select +Create and name the data asset something like ChicagoParkingTickets. Select in the Type menu "Tabular" from the v1 section. On the second page, create your data asset From local files and that will give you a few more pages around where to store the data, the file you want to upload, and additional settings. Those steps should be pretty straightforward, as I tried to ensure that there would be no complications with this dataset.
The Python code is something you submit via API call and we do that in the next video in the series.
Does azure have components to one hot encode Or scale features?
The scaling component in Azure ML Designer is called Normalize Data, so there is a built-in scaling capability. I believe there is not a built-in one-hot encoding capability. To do that, I'd recommend adding an R or Python component and have it do the work.
And is this classifcation model?? What about the concept of best model generated ?
If you're talking about the best model generated from my prior AutoML video, we're training a new model from scratch to see how to do it.
If you're asking in general, you can do this in a couple of ways. One is to train separate models as different experiment runs, saving each in the Azure ML model registry and comparing model results--for classification, you might check measures like accuracy and F1 score. A second option would be to train separate models as runs in an experiment but tagged under the same model type, saving different versions of a model in the registry. Then, after comparing, you could delete the versions that don't perform as well. A third option would be to perform comparative model analysis as part of your initial training: you can incorporate hyperparameter sweeping and even use of different algorithms in the training steps and then save the best model of the bunch to the registry. I don't have an example of doing this in a video but Microsoft does have a code example of using a hyperparameter sweep: github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1c_pipeline_with_hyperparameter_sweep