QUESTION OF THE DAY: Prior to watching this video, have you built a machine learning model before or intend to build one soon? If so, what interesting problem do you intend to tackle with data science? Comments down below! 😃
Never, but went to a course in Buenos Aires and saw some algorithms applied to an eye diseases database. Loved the conclusions. After that experience i'm trying to learn how to apply this in predictions on patients effective treatments and other fields such as marketing and business reports. I work on a small clinic and i began building a small database for each case. But i don't know how much data would be enough or which algorithm to choose for every case. Thank you for helping us start.
Thanks@@walterv2769 for your comment and support and for sharing with us your start in data science. I will make a future video on "how much data would be enough or which algorithm o choose". Wish you success in your data science endeavors.
Good morning. How do I install R version 3.6.2 (Dark and Stormy Night) 2019-12-12 on Ubuntu 14.04 using the terminal. My computer is old and is a 32-bits system.
@@DataProfessor is there a twitter Id or social media Id to contact, Sir. I just need the code that you used to create the box plot in video that's there in the plot window of R..
I prefer splitting the data and many of the pre-processing without additional packages and performing traditional exploratory data analysis (EDA) and statistical tests before building ML models. It teaches students much about the concepts, especially when it involves cross-validation, and also improves programming muscle memory.
Thanks for the video :) The author of Caret is now transitioning to full-time development on Parsnip, which is designed to be a more robust and tidy meta engine. Parsnip, along with other ML-related packages, form a group of packages called "Tidymodels" - much like "Tidyverse". Would love if you could dedicate a future to tidymodels :)
@@DataProfessor Im trying to apply this analytical concept to my data but I wasn't sure how applicable it could be. I am trying to characterize the performance of a set of maize hybrids under organic farming systems. I collected of categorical variables for different agronomic management practices forexample, weed control, planting density, cover crop type, type of manure applied, rate of manure application etc. So I am trying to regress the performance of each hybrid to different management practices. Is there a way I can use CART algorithms to predict the yield of a hybrid given a set of management practices?
Hi! I have a question regarding the training model, when you write a code "Species ~ . , why do you use the tilde and the comma after the species? Thank you for this video.
It's a formula and using all other variables for prediction. That's what . is there. You can use Species ~ sepal.length if you want to use only the variable for prediction
@@DataProfessor All your videos are great to watch! On this one, I'm however getting "Error: `data` and `reference` should be factors with the same levels." when I reach your "Model performance (Displays confusion matrix and statistics)" section :( Would you know why? Thanks!
Thanks Sir for your explanation. Very good pedagogy. I want to know which books did you read or courses did you do to clear all these concepts. I really appreciate the way you are comfortable with these terminologies and I want to have this confidence too
Thanks, you can do that by using these new data as the test set. So you can apply the model to make predictions on the test set using model.predict(X_test) where model is the Instantiated model such as model = RandomForestClassifier(max_depth=2, random_state=42)
For the homework: I used this code: plot(TrainingSet, main="TrainingSet") plot(TestingSet, main="TestingSet") the distribtion was roughly the same. I was wondering though, how do you put two plots next to each other. With this one I used the arrows to go between plots. Is there a way to put them next to each other?
Is it just coincidence that the Model.training and Model.cv bring the same accuracy? I might missunderstood but why do we predict the TrainingSet at Model.cv? In this case we dont work with the TestingSet at all right?
hi, thanks for your great explanation, when I try to split the dataset, it says that could not find function "creatDataPartition". Which package should I install?
I have problems installing "caret". At the final of the instalation process: "The downloaded source packages are in ‘/private/var/folders/y1/0vypd8ps5pz3vtz4pctv7xf80000gn/T/RtmpB0tlRX/downloaded_packages’", but it does not work at the moment to load using library: Error in library("caret") : there is no package called ‘caret’. I am using RStudio Version 1.3.1093 from MacOs 10.12.6
Hi Chanin, I have several questions about the svm polynimial kernal model 1. Did you choose the svmPoly model because it suits the Iris dataset? or you just pick this model to give us as an example? 2. Since I am not familiar with this model, I just want to know what kind of dataset that works great with this model? Good video by the way, thanks
Thanks for your support and comment. 1. The polynomial kernel of SVM is chosen randomly as an example. 2. As every dataset is unique in its own right, therefore it is difficult to suggest the best ML algorithm in advance prior to model building. Furthermore, to yield the best possible performance it is recommended that hyperparameter optimization be peformed for the selected ML algorithm. In spite of the points mentioned in the response to the second question, there was a paper published by one of our colleague whereby they suggest a generally good starting point for the parameter values when using SVM. Please check it out at pubs.acs.org/doi/full/10.1021/ci500344v
@@DataProfessor Thanks a lot for the answer and the additional resource, surely it will be helpful for me one day. Hope to see your next content soon. :)
@@kanintarntira6450 It's a pleasure, next video should be out in the next 1-2 days. Please don't forget to subscribe and hit the notification bell and also smash the like button. Thanks again for your comment.
Hello! Amazing video! Unfortunately i have an error : "Error in match.arg(norm, c("none", "overall", "average")) : 'arg' must be NULL or a character vector" when i try to run "fit.training.confusion
also please write the interpretation guidelines of the results in comments so that the beginners can understand and remember while running the code at their end. it would help newbies.
Thanks for the comment. I discussed about ML interpretation in another video ruclips.net/video/34yBgah8Uyg/видео.html and also made an infographic about it here github.com/dataprofessor/infographic/blob/master/05-Interpretability-of-Data-Science-Models.JPG
Anyone getting this error when running the codes under displays confusion matrix and statistics, " Error: 'data' and 'reference ' should be factors with the same levels
Many thanks for this very informative video! I have a question about sensitivity. How can I fix sensitivity to a 100%? I want to have a model with a sensitivity of 100%, is there a way to fix it?
This was a wonderful tutorial! Does this model assume independence between variables (i.e. petal length, petal width, etc.)? If so, how might I go about building a non-parametric model similarly?
Hi, I am from India. I have been learning machine learning from past 4 months in R. I want to know how an machine learning model gets implemented in real world. Please explain or make video
Thanks for the thorough explanation I have 2 questions please: - how would you calculate the roc curve for the classification? - to present the model (for example in publication), we mainly need the roc curve for the test set, correct? I mean auc for the training set can be misleadingly high? Thanks again for taking the time to prepare this useful material
To calculate the ROC curve, check out this video ruclips.net/video/uVJXPPrWRJ0/видео.html ROC AUC for Training is expected to give a high value but it could be used for comparison purpose with the Cross-Validation set and also a Test set.
@@DataProfessor excellent video. I am planning on learning python (hopefully soon). when you have time, can you create a video explaining the ROC curves using R? also there is a new package called MLeval that appears to do this job in a convenient way but not sure how accurate it is. if you can also comment on it in your video, will be great. Thanks again!
Great video, as always. I just want to ask. I've created a random forest model in R and I am happy with the Error rate when predicting both training and test data. What I want to know is how do I now apply my model in the classification of non-classified data. Essentially, I want my model to look at new data which is like the data it was trained on, predict and fill an empty class column for me with the correct classification.
You could try applying K-nearest neighbor or PCA to see whether the new data is similar to the original training data, if it is then the trained model would be applicable for the new data.
QUESTION OF THE DAY: Prior to watching this video, have you built a machine learning model before or intend to build one soon? If so, what interesting problem do you intend to tackle with data science? Comments down below! 😃
Never, but went to a course in Buenos Aires and saw some algorithms applied to an eye diseases database. Loved the conclusions. After that experience i'm trying to learn how to apply this in predictions on patients effective treatments and other fields such as marketing and business reports. I work on a small clinic and i began building a small database for each case. But i don't know how much data would be enough or which algorithm to choose for every case. Thank you for helping us start.
Thanks@@walterv2769 for your comment and support and for sharing with us your start in data science. I will make a future video on "how much data would be enough or which algorithm o choose". Wish you success in your data science endeavors.
Good morning. How do I install R version 3.6.2 (Dark and Stormy Night) 2019-12-12 on Ubuntu 14.04 using the terminal. My computer is old and is a 32-bits system.
Master Stroke Have you tried installing using:
sudo apt-get install r-base
@@DataProfessor Yes i did and it gives a version from 2015 even though I use update and upgrade comands.
The concept is very well explained and easily understood even for the people like me who are new to the programming world.
Glad to hear that, thanks for watching! 😊
The first ever ML code that I ran successfully.
AlhamduLlillah
Hi Data Professor! I am new to R and just trying to learn how to do analyses with R, thank you for the video!
Glad it was helpful!
Hello from Peru! I really liked this video, it was very simple to understand. I'll wait for more videos! thanks
Alfredo Walter Tinco Domínguez Thanks Alfredo for your support 😄
Thanks Sir. I'm reviewing my knowledge. The explanation is very lucid and it is easy to match my understanding.
Thanks for the kind words, glad it was helpful! :)
@@DataProfessor is there a twitter Id or social media Id to contact, Sir. I just need the code that you used to create the box plot in video that's there in the plot window of R..
@@curiouswanderer793 he literally went over it word by word. Just copy
Really enjoy your videos as helpful for newbie just getting into field
WallStreetNewscast Thanks for your support and comment, more practical tutorials coming up 😃
Thanks sir ...Your explanation at every step of code is very good.. good teaching skills you have ..
I found your channel from facebook! Thank you for sharing this knowledge!
Thanks for your comment, glad to have your support!
I prefer splitting the data and many of the pre-processing without additional packages and performing traditional exploratory data analysis (EDA) and statistical tests before building ML models. It teaches students much about the concepts, especially when it involves cross-validation, and also improves programming muscle memory.
thanks for sharing the knowledge. greetings from brazil
Glad it was helpful!
Thanks for the video :) The author of Caret is now transitioning to full-time development on Parsnip, which is designed to be a more robust and tidy meta engine. Parsnip, along with other ML-related packages, form a group of packages called "Tidymodels" - much like "Tidyverse". Would love if you could dedicate a future to tidymodels :)
Thanks for the comment and suggestion! I will definitely look into parsnip and the tidy models and make a video about it.
Wow very informative video, thank you Data Professor
Glad it was helpful!
@@DataProfessor Im trying to apply this analytical concept to my data but I wasn't sure how applicable it could be. I am trying to characterize the performance of a set of maize hybrids under organic farming systems. I collected of categorical variables for different agronomic management practices forexample, weed control, planting density, cover crop type, type of manure applied, rate of manure application etc. So I am trying to regress the performance of each hybrid to different management practices. Is there a way I can use CART algorithms to predict the yield of a hybrid given a set of management practices?
Hi! I have a question regarding the training model, when you write a code "Species ~ . , why do you use the tilde and the comma after the species? Thank you for this video.
It's a formula and using all other variables for prediction. That's what . is there. You can use Species ~ sepal.length if you want to use only the variable for prediction
These videos really are so informative; thankyou!
You're so welcome!
@@DataProfessor Also, for the homework around the ~5min mark, I think the distribution is roughly the same between the 80% and 20% subsets!
I use this for my homework:
# Create plots of TrainingSet and TestingSet
p1
What an absolutely amazing video!
Glad you enjoyed it! 😊
@@DataProfessor All your videos are great to watch! On this one, I'm however getting "Error: `data` and `reference` should be factors with the same levels." when I reach your "Model performance (Displays confusion matrix and statistics)" section :( Would you know why? Thanks!
Thanks Sir for your explanation. Very good pedagogy. I want to know which books did you read or courses did you do to clear all these concepts. I really appreciate the way you are comfortable with these terminologies and I want to have this confidence too
ขอบคุณอาจารย์มากค่ะ ช่วยโปรเจ็คได้มากเลยค่ะ
ยินดีครับ
Many thanks for video
How can work with "prob" and calculate probability in SVM model???
thakns
Hello, i got an error. It said that R could not found the function train. I already load the package caret. What could be the problem?
clear explanation professor
thank so much professor
thanks for this great video. but how do i actually use my classificator now on data that is unknown to my computer, without label to compare?
Thanks, you can do that by using these new data as the test set. So you can apply the model to make predictions on the test set using model.predict(X_test) where model is the Instantiated model such as model = RandomForestClassifier(max_depth=2, random_state=42)
@@DataProfessor thank you very much!!
Thank you so much!!
You’re welcome!
Great content very helpful 🙏
Glad to hear that!
This is great, is there any other video discussing about the parameters and tuning as a continuation for this video? in R of course.
professor would you give the code to scatterplot training and testing set
Love your video, but your shirt as well haha! :) Greetings from Germany
Dear professor! Can I use this algorithm to classify continuous dependent variable? And generates one feature importance figure?
For the homework:
I used this code:
plot(TrainingSet, main="TrainingSet")
plot(TestingSet, main="TestingSet")
the distribtion was roughly the same. I was wondering though, how do you put two plots next to each other. With this one I used the arrows to go between plots. Is there a way to put them next to each other?
Hi, you can use the cowplot library to create a multi-plot figure created from ggplot
You can also use package gridExtra having grid.arrange function by assigning plot to a variable
Is the cross-validation model never applied to the testing set?
Is it just coincidence that the Model.training and Model.cv bring the same accuracy? I might missunderstood but why do we predict the TrainingSet at Model.cv? In this case we dont work with the TestingSet at all right?
Hi there... I m stuck in developing svm for college admission dataset in R... will u help me for this
hi, thanks for your great explanation, when I try to split the dataset, it says that could not find function "creatDataPartition". Which package should I install?
By the way, I have downloaded Caret package, is there any other packages should be downloaded?
Thank you in advance
Hi, it's part of Caret and to split the dataset that's all you need.
@@DataProfessor Ok I will try again, thanks a lot
You can also use sample.split
I have problems installing "caret". At the final of the instalation process: "The downloaded source packages are in
‘/private/var/folders/y1/0vypd8ps5pz3vtz4pctv7xf80000gn/T/RtmpB0tlRX/downloaded_packages’", but it does not work at the moment to load using library: Error in library("caret") : there is no package called ‘caret’. I am using RStudio Version 1.3.1093 from MacOs 10.12.6
Hi Chanin, I have several questions about the svm polynimial kernal model
1. Did you choose the svmPoly model because it suits the Iris dataset? or you just pick this model to give us as an example?
2. Since I am not familiar with this model, I just want to know what kind of dataset that works great with this model?
Good video by the way, thanks
Thanks for your support and comment.
1. The polynomial kernel of SVM is chosen randomly as an example.
2. As every dataset is unique in its own right, therefore it is difficult to suggest the best ML algorithm in advance prior to model building. Furthermore, to yield the best possible performance it is recommended that hyperparameter optimization be peformed for the selected ML algorithm.
In spite of the points mentioned in the response to the second question, there was a paper published by one of our colleague whereby they suggest a generally good starting point for the parameter values when using SVM. Please check it out at pubs.acs.org/doi/full/10.1021/ci500344v
@@DataProfessor Thanks a lot for the answer and the additional resource, surely it will be helpful for me one day.
Hope to see your next content soon.
:)
@@kanintarntira6450 It's a pleasure, next video should be out in the next 1-2 days. Please don't forget to subscribe and hit the notification bell and also smash the like button. Thanks again for your comment.
Hello! Amazing video! Unfortunately i have an error : "Error in match.arg(norm, c("none", "overall", "average")) :
'arg' must be NULL or a character vector" when i try to run "fit.training.confusion
also please write the interpretation guidelines of the results in comments so that the beginners can understand and remember while running the code at their end. it would help newbies.
Thanks for the comment. I discussed about ML interpretation in another video ruclips.net/video/34yBgah8Uyg/видео.html and also made an infographic about it here github.com/dataprofessor/infographic/blob/master/05-Interpretability-of-Data-Science-Models.JPG
@@DataProfessor man thats brilliant! thank you so much! do you have a patreon or something?
Anyone getting this error when running the codes under displays confusion matrix and statistics, " Error: 'data' and 'reference ' should be factors with the same levels
Many thanks for this very informative video! I have a question about sensitivity. How can I fix sensitivity to a 100%? I want to have a model with a sensitivity of 100%, is there a way to fix it?
This was a wonderful tutorial! Does this model assume independence between variables (i.e. petal length, petal width, etc.)? If so, how might I go about building a non-parametric model similarly?
Hi, I am from India. I have been learning machine learning from past 4 months in R. I want to know how an machine learning model gets implemented in real world. Please explain or make video
seshendra vemuri Hi Seshendra, thanks for your comment, now that makes a great idea for a video. Let me answer you this in a future video.
Hi,
I am super new but I am getting an error that says 'TrainingIndex' not found. Any idea what I am doing wrong?
Nevermind, I just didn't load the caret package I think4. Got it! Thanks!
Thanks for the thorough explanation
I have 2 questions please:
- how would you calculate the roc curve for the classification?
- to present the model (for example in publication), we mainly need the roc curve for the test set, correct? I mean auc for the training set can be misleadingly high?
Thanks again for taking the time to prepare this useful material
To calculate the ROC curve, check out this video ruclips.net/video/uVJXPPrWRJ0/видео.html
ROC AUC for Training is expected to give a high value but it could be used for comparison purpose with the Cross-Validation set and also a Test set.
@@DataProfessor excellent video. I am planning on learning python (hopefully soon). when you have time, can you create a video explaining the ROC curves using R? also there is a new package called MLeval that appears to do this job in a convenient way but not sure how accurate it is. if you can also comment on it in your video, will be great. Thanks again!
Great video, as always. I just want to ask. I've created a random forest model in R and I am happy with the Error rate when predicting both training and test data. What I want to know is how do I now apply my model in the classification of non-classified data. Essentially, I want my model to look at new data which is like the data it was trained on, predict and fill an empty class column for me with the correct classification.
You could try applying K-nearest neighbor or PCA to see whether the new data is similar to the original training data, if it is then the trained model would be applicable for the new data.
From where do u belong? Thailand??? Is there tonal aviation in Thai names also???
didn't catch your name
Hi, it's Chanin
Are you from Thailand?
Yes I am, sawatdee krub 😃