He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold but at 14:20, he used kf.split, where kf is KFold. I think he frogot to use StatifiedKFold.
After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them. You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top. Never change your teaching style sir yours is the best one.👍🏻
@@manu-prakash-choudhary After 50 splits 😎😎 Score of Logistic Regression is 0.961111111111111 Score of SVM is 0.9888888888888888 Score of RandomForestClassifier is 0.973111111111111
we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.
@20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))
Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!
Hi, I'm from Malaysia. I came across your video and I am glad I did it. super easy to understand and I'm currently preparing to learn deep learning. already watch your Python, Pandas, and currently ML videos. thank you for making all these videos. you making our life easier Sir. Sincerely, your student from Malaysia.
I have the same question. I tried using folds object to split meaning, instead of kf.split(digits.data), I tried folds.split(digits.data) to compare both the results for all the models but it gave me an error. "split() missing 1 required positional argument: 'y' ". To rectify this, I gave digits.target and it worked!
Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!
Thank you Sir for this awesome explanation. Iris Dataset Assignment Score Logistic Regression [96.07% , 92.15% , 95.83%] SVM [100% , 96.07% , 97.91%] (Kernel='linear') Decision Tree [98.03 %, 92.15% , 100%] Random Forest [98.03% , 92.15% , 97.91%] Conclusion: SVM works the best model for me .
Sir, after running from sklearn.model_selection import train_test_split X_train, X_tetrst, y_train, y_test = train_test_split(digits.data,digits.target,test_size=0.3) for the second time gives me an error: ValueError: Found input variables with inconsistent numbers of samples: [540, 599] How do they deal with this error????
You import strafieldkfold but for spliting use kfold only ? line number 32 -33 for strafielidkfold it will written as for i , j in sf.split(X,y): pass it take 2 argument not 1 please provide another video on this topic and pin it
I don't understand this method on the kid that has to take the test on the very start. How would that apply to that particular real life example? Also not sure if shuffling KFold or getting StratifiedKFold is better?
I have never seen anyone who can explain Machine Learning and Data Science so easily.. I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos.... 👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏
Hi Sir, Your explanation is very well. I need a small clarification - You created an object for StratifiedKFold as folds and not used it in that example, that's fine, i will do it by myself. But let me know how the cross_val_score has got split size as 3? was it just because we assigned it earlier?
Nope, cross_val_score get 3 folds by default, you can check it at documentation. If you want to increase the numbers of folds, just pass the parameter: cross_val_score(model, X, y, cv=n_folds_you_want)
Thank you very much. Very nice explanation. My scores, after taking averages, are as follow: LogisticRegression (max_iter=200) = 97.33% SVC (kernel = poly) = 98.00% DecisionTreeClassifier = 96% RandomForestClassifier (n_estimators=300) = 96.67%
For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes. scores=[ ] avg_scores=[ ] n_est=range(1,5) #example for i in n_est : model=RandomForestClassifier(n_estimators=i) score=cross_val_score(model,digits.data, digits.target, cv=10) scores.append(score) avg_scores.append(np.average(score))
print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i)) avg_scores=np.asarray(avg_scores) #convert the list to array print(' Average accuracy score is {} for n_estimators={} calculated from following accuracy scores: {}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)])) plt.plot(n_est,avg_scores) plt.xlabel('number of estimators') plt.ylabel('average accuracy') 44 was the best for me
What is the score ? Cross validation is about validation of ONE model. After validating the model and getting his parameters, you shall choose method to compare with other models and select appropriate model. - Training set: A set of examples used for learning, that is to fit the parameters of the classifier. - Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network. - Test set: A set of examples used only to assess the performance of a fully-specified classifier.
You can use cross validation to compare multiple models to. Basically just run kfold one multiple models or same model with different parameters and compare the score.
Dear Sir Another great explanation as always. Thank you very much for that. By adding the following code svm started showing very good scores! X_train = preprocessing.scale(X_train) X_test = preprocessing.scale(X_test) Have I done the correct thing?
i have one question here why this cross_val_score function returning 3 score we haven't passed any number of the fold and can not find any default number for Fold
KFold() needs only the X as an argument but StratifiedKFold() needs both X and y as arguments. it will be like split(digits.data)====> in case of KFold split(digits.data, digits.target) ====> in case of StratifiedKFold() cheers!
Hi, one question though, can't we use in train_test_split method the 'random_state' to get the same score for any model? Could you please be kind to confirm why the accuracy of SVM got changed after executing the second time from 40% to 62% since we gave the same data to other models. Thank you
if u specify (any number) for the random_state option you will get the same result evertime. Set it to none of simply not include it if you want to randomize your training processes
Really good explanation. You are an expert. I have a question, Is it possible to select the test_size in cross-validation. Because when I use for example, Kfold with 3 splits. It splits the whole data into three parts, but it is possible to make these three splits but using 2 data tests and 7 data train.
this is most acc SVC model i get ,tell if it right ? scores_svm=cross_val_score(SVC(gamma='auto',kernel='linear'), iris.data, iris.target,cv=3) print(scores_svm) np.average(scores_svm) result: [1. 1. 0.98] 0.9933333333333333
why showing ~40% score in your SVC() , but in my code it always show ~99% and prediction is also best . kernel checked = linear as well as rbf and C = 1.0
by making df method: mean(cross_val_score(LogisticRegression(max_iter=200), X,y)) 0.9733 mean(cross_val_score(SVC(kernel='linear'),X,y)) 0.98 mean(cross_val_score(RandomForestClassifier(n_estimators=40), X, y)) 0.96 by using iris.data and iris.target directly: np.average(score_lr) 0.95333 np.average(score_svm) 0.98000001 np.average(score_rf) 0.95333333
LogisticRegressionClassifier =100% SVC (kernel="poly") =97% DecisionTreeClassifier = 97% RandomForestClassifier(n_estimators=30) =97% for every increase in n_estimators
ValueError: y should be a 1d array, got an array of shape (599, 64) instead. when i run this code (scores_logistic.append(get_score(LogisticRegression(solver='liblinear',multi_class='ovr'), X_train, X_test, y_train, y_test)) scores_svm.append(get_score(SVC(gamma='auto'), X_train, X_test, y_train, y_test)) scores_rf.append(get_score(RandomForestClassifier(n_estimators=40), X_train, X_test, y_train, y_test)))
My iris Exercise LogisticRegression model has high performance 0.98, 0.96, 0.98 . Instead of SVC , RandomForestClassifier and DecisionTreeClassifier. But RandomForestClassifier in 2nd number. DecisionTreeClassifier in 3rd number. SVC in 4th number.
that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!
Sir, i have perform cross_val_score on ( LogisticRegression, SVC, tree.DescionTreeClassifier , RandomForestClassifer ) and I got almost same values from every of the models
The mean of the score of Linear Regression =0.32256072489000853, Random forest Classifier =0.96, Standard Vector Model 0.9666666666666666 This is my result of iris datasets using cross k fold
Hi, thank you for the tutorial I believe it was very helpful but I believe you made a mistake in minute 13:09. When you copied the code from the original Kfold kf.split([]) you forgot to switch the "kf" variable for "folds". I believe that's why Stratified K-Fold results where very similar to Kfold.
Bro I got 73% accuracy score for , diabetes dataset, using ADAboost with base learer as Decision tree, but without ADAboost I am getting 75% accuracy.. which should I use ? If I am wrong what should be done ?...
00:02 K fold cross validation helps determine the best machine learning model for a given problem. 02:20 K-fold cross validation provides a more robust evaluation of machine learning models. 04:36 Classifying handwritten characters into ten categories using different algorithms and evaluating performance using k-fold cross validation. 07:06 K-fold cross validation helps in more robust model evaluation. 09:43 K-fold cross validation divides data into training and testing sets for iterative model evaluation. 12:35 Stratified k-fold ensures uniform distribution of categories for better model training. 15:42 Measuring the performance of models in each iteration 18:29 Parameter tuning in random forest classifier improves scores. 20:46 K Fold Cross Validation helps measure the performance of machine learning models. 23:18 Cross-validation helps in comparing algorithms and finding the best parameters for a given problem. 25:18 K Fold Cross Validation helps in assessing the model's performance. Crafted by Merlin AI.
Using the K Fold Method, the data was split multiple times into X_train s and y_train s but remained constant for each method for each split. Is it the same case in the cross_val_score method? Isn't the splitting taking place differently for each method? So basically the models are trained on different X_train s and y_train s Thank you so much for the clear explanation.
best explanation... i like the way u give examples using small data to explain how it actually works. 10:20 no one explains like this... keep doing great work
@@codebasics I have applied K fold on the linear regression's dataset I used different activation functions & then I get mean & se values How to pick the best model from the k folds?
Sir I am performing cross validation technique on iris dataset from kaggle but it when I am using cross-validation technique there is a value error - found input variable with inconsistent number of samples.Sir how to deal with these issue .Please explain.
usage of same datasets make less uninteresting, but your tutorials are awesome every tutorial across every thing have + and -,your tutorials are more structured but minus point is usage of same dataset which reduces interest to go next next
Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison. Thanks in advance
I'm doing kf.split but when printing the score it says data cardinality is ambiguous. Since i have done train test split earlier i believe there will be unequal no of samples in x and y. Would like some help to understand why us the error coming up
you imported StratifiedKFold but where did you use it in the above dataset? and in kfold i think they are just dividing the array based on index number then how come in stratifiedKFold without seeing the values of what is in the target class they can the divide assuring all get equal no. of each class?
Kindly tell me how cross validation is used to find out errors values like mse,rmse,Mae,rmae of regression algorithm like linear regression on continuous value of dataset
if we use 'cross_val_score' function, how can we set the training & testing data ratio? I checked the parameter and didn't see any parameter that can set the ratio.
Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?
that is to escape to the next line. Since Python uses white space, if you don't use that backslash, Python would read that as the next line of code. You use that backslash so it tells Python that even when you're typing in the next line, it is still part of the code in the above line. He only did that so the code doesn't extend all the way to the right and pass the right screen.
at 15:05 you used kf.split in the for loop, is that correct? because now we are using strafiedkfold so the instance for that is folds. Should it be folds.split?
I have a little problem. In the original dataset the values where from 1-9. But in the train and test splits there were 0s in them. How does the k-fold generate them when they are not in the original dataset?
Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold
but at 14:20, he used kf.split, where kf is KFold.
I think he frogot to use StatifiedKFold.
yeah, i noticed that
After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them.
You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top.
Never change your teaching style sir yours is the best one.👍🏻
14:15 - Here instead of kf.split() we should use folds.split(). Am I correct??
Yes. My notebook has a correction. Check that on GitHub link I have provided in video description
Yes and also just to add to it StratifiedKFold requires X and y both labels to its split method. Stratification is done based on the y labels.
what an amazing explanation. Finally! I understood cross validation concept so clearly. Thank You so much.
Glad it was helpful!
After Parameter Tuning Using Cross Validation = 10 and taking average
Logistic Regression = 95.34%
SVM = 97.34%
Decision Tree = 95.34 %
Random Forest Classifier = 96.67 %
Performance = SVM > Random Forest > Logistic ~ Decision
after taking cv=5 and C=6 svm is 98.67%
@@manu-prakash-choudhary After 50 splits 😎😎
Score of Logistic Regression is 0.961111111111111
Score of SVM is 0.9888888888888888
Score of RandomForestClassifier is 0.973111111111111
Sir
You used KFold(kf) instead of StratifiedKFold(folds) in the video
Will there be any difference in the scores if we use stratified KFold?
There is slight difference in the scores
for me, SVM's score is almost 99 everytime
Hey bro how are you?
good to see you.
@@computingpanda1629 bro aap bhi idhar😂🤣🤣 machine learning padhne aaye ho😂
Then maybe ur overfitting the data 😂😂
Same here😅
I love that you go through the example the hard way and introduce the cross validation after
Exercise solution: github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb
Complete machine learning tutorial playlist: ruclips.net/video/gmvvaobm7eQ/видео.html
we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.
@20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))
thanks man, i was worried when mine was showing 5 folds results. i thought something was wrong w my code.
@@gandharvsaxena8841 Me too lol, whi I am getting 5
Thankyou man!!
Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!
My results (with final average):
L. Regression --> 97.33%
Decision Tree --> 96.66%
SVM --> 98.00% [THE WINNER]
Random Forest --> 96.66%
right same here
Same but i tune svm with kernal = linear and got 99.33%
@@jaihind5092 Pretty good, man!! 👏🏻
Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.
Hi, I'm from Malaysia. I came across your video and I am glad I did it. super easy to understand and I'm currently preparing to learn deep learning. already watch your Python, Pandas, and currently ML videos. thank you for making all these videos. you making our life easier Sir.
Sincerely, your student from Malaysia.
In line [33] of your code, the "kf" should be replaced with "folds" since "folds" is the object from the StratifiedKFold() class :) Am I right?
I have the same question. I tried using folds object to split meaning, instead of kf.split(digits.data), I tried folds.split(digits.data) to compare both the results for all the models but it gave me an error. "split() missing 1 required positional argument: 'y' ". To rectify this, I gave digits.target and it worked!
Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!
Mast pelluri, I am glad you liked it and thanks for recommending it to your friends 🙏👍
is this K Folder Cross Validation is an optimization technique or not?
Thank you Sir for this awesome explanation. Iris Dataset Assignment Score
Logistic Regression [96.07% , 92.15% , 95.83%]
SVM [100% , 96.07% , 97.91%] (Kernel='linear')
Decision Tree [98.03 %, 92.15% , 100%]
Random Forest [98.03% , 92.15% , 97.91%]
Conclusion: SVM works the best model for me .
pretty ironic and yet amusing at the same time
Sir, after running from sklearn.model_selection import train_test_split
X_train, X_tetrst, y_train, y_test = train_test_split(digits.data,digits.target,test_size=0.3)
for the second time gives me an error:
ValueError: Found input variables with inconsistent numbers of samples: [540, 599]
How do they deal with this error????
In exercise the maximum score get by SVM at gamma=auto and kernel=linearr and the score is = array([1. , 1. , 0.98]) 😀
You import strafieldkfold but for spliting use kfold only ?
line number 32 -33 for strafielidkfold it will written as
for i , j in sf.split(X,y):
pass
it take 2 argument not 1
please provide another video on this topic and pin it
I don't understand this method on the kid that has to take the test on the very start. How would that apply to that particular real life example?
Also not sure if shuffling KFold or getting StratifiedKFold is better?
Sir, SVM performance is high as compared to other algo after changing parameter gamma='scale' for the given example of digits dataset
After watching the video for 25 mins, I realized that the last 5mins were the most important😄
I have never seen anyone who can explain Machine Learning and Data Science so easily..
I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos....
👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏
Happy to help
Usful for identifying many different types categories.
Great stuff indeed. I'm learning machine learning from scratch and this was very helpful. Keep up the good work, kudos!
Hi Sir, Your explanation is very well. I need a small clarification - You created an object for StratifiedKFold as folds and not used it in that example, that's fine, i will do it by myself. But
let me know how the cross_val_score has got split size as 3? was it just because we assigned it earlier?
Nope, cross_val_score get 3 folds by default, you can check it at documentation. If you want to increase the numbers of folds, just pass the parameter: cross_val_score(model, X, y, cv=n_folds_you_want)
@@thiagomunich Thanks Thiago for helping Mahesh with his question
@@codebasics How does StratifiedKFold come into action in this case?
@@avesharora yeah he forgot to use Stratified Fold in this case.
@@avesharora use cv = StratifiedKFold(n_splits = 4) as a hyperparameter in Cross_val_score
The accuracy which we are getting is of trained part or test part ? . If it is of test then how can we check train accuracy ?
My svm was giving 99.25% accuracy
I checked again adnd again and don't know how it's happening
Your svm gives just 45%
That’s the way to go varun, good job working on that exercise
Thanks man! You're really helping me out finishing my university project in machine learning.
Christian I am glad to hear you are making a progress on your University project 😊 I wish you all the best 👍
Thank you very much. Very nice explanation. My scores, after taking averages, are as follow:
LogisticRegression (max_iter=200) = 97.33%
SVC (kernel = poly) = 98.00%
DecisionTreeClassifier = 96%
RandomForestClassifier (n_estimators=300) = 96.67%
mine too...
For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes.
scores=[ ]
avg_scores=[ ]
n_est=range(1,5) #example
for i in n_est :
model=RandomForestClassifier(n_estimators=i)
score=cross_val_score(model,digits.data, digits.target, cv=10)
scores.append(score)
avg_scores.append(np.average(score))
print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i))
avg_scores=np.asarray(avg_scores) #convert the list to array
print('
Average accuracy score is {} for n_estimators={} calculated from following accuracy scores:
{}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)]))
plt.plot(n_est,avg_scores)
plt.xlabel('number of estimators')
plt.ylabel('average accuracy')
44 was the best for me
Thanks sir! Your tutorials are really helpful for me. Hope I'm gonna see all of them and make my transition from mechanical to AI successful 😊.
why your SVM score is too low. Even though I have checked many time and It comes greater then 90%. Please do check anyone.
Good job ajmal, that’s a pretty good score. Thanks for working on the exercise
What is the score ?
Cross validation is about validation of ONE model.
After validating the model and getting his parameters, you shall choose method to compare with other models and select appropriate model.
- Training set: A set of examples used for learning, that is to fit the parameters of the classifier.
- Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.
- Test set: A set of examples used only to assess the performance of a fully-specified classifier.
You can use cross validation to compare multiple models to. Basically just run kfold one multiple models or same model with different parameters and compare the score.
Dear Sir
Another great explanation as always.
Thank you very much for that.
By adding the following code svm started showing very good scores!
X_train = preprocessing.scale(X_train)
X_test = preprocessing.scale(X_test)
Have I done the correct thing?
i have one question here why this cross_val_score function returning 3 score we haven't passed any number of the fold and can not find any default number for Fold
I came here in comment section to find this question. Please tell me if you already know the reason for that.
@@Natur_Deutschland U can control the number of scores by passing 'cross_val_score(lasso, X, y, cv=3)' cv like this
In line 37 do we need to specify like : split(digits.data,digits.target) instead of only split(digits.data).Trying the latter yields error in my case
Totally agree with you, can we just split(digits) ?
Totally agree with you, can we just split(digits) ?
KFold() needs only the X as an argument but StratifiedKFold() needs both X and y as arguments. it will be like
split(digits.data)====> in case of KFold
split(digits.data, digits.target) ====> in case of StratifiedKFold()
cheers!
Hi, one question though, can't we use in train_test_split method the 'random_state' to get the same score for any model? Could you please be kind to confirm why the accuracy of SVM got changed after executing the second time from 40% to 62% since we gave the same data to other models. Thank you
Bump, I've used random state on train_test_split and I always get consistent results.
Andrés Espinal random state maintains the state of train and test split, so you will get constant results
if u specify (any number) for the random_state option you will get the same result evertime. Set it to none of simply not include it if you want to randomize your training processes
Really good explanation. You are an expert. I have a question, Is it possible to select the test_size in cross-validation. Because when I use for example, Kfold with 3 splits. It splits the whole data into three parts, but it is possible to make these three splits but using 2 data tests and 7 data train.
In my case Logistic regression won with 2 hundered percent in it!
your tutorial are saving my life
this is most acc SVC model i get ,tell if it right ?
scores_svm=cross_val_score(SVC(gamma='auto',kernel='linear'), iris.data, iris.target,cv=3)
print(scores_svm)
np.average(scores_svm)
result:
[1. 1. 0.98]
0.9933333333333333
17:29 Sir, I thought you was not use stratified kfold CV. Actually you use k fold cross validation instead
why showing ~40% score in your SVC() , but in my code it always show ~99% and prediction is also best .
kernel checked = linear as well as rbf and C = 1.0
In my case:-
#Using Logistic Regression
lr=LogisticRegression()
lr.fit(X_train,y_train)
lr.score(X_test,y_test)
Accuracy:- 0.9703703703703703
#Using SVC
sm=SVC()
sm.fit(X_train,y_train)
sm.score(X_test,y_test)
Accuracy:- 0.9907407407407407
#USing Random Forest
rf=RandomForestClassifier()
rf.fit(X_train,y_train)
rf.score(X_test,y_test)
Accuracy:- 0.9851851851851852
by making df method:
mean(cross_val_score(LogisticRegression(max_iter=200), X,y))
0.9733
mean(cross_val_score(SVC(kernel='linear'),X,y))
0.98
mean(cross_val_score(RandomForestClassifier(n_estimators=40), X, y))
0.96
by using iris.data and iris.target directly:
np.average(score_lr)
0.95333
np.average(score_svm)
0.98000001
np.average(score_rf)
0.95333333
LogisticRegressionClassifier =100%
SVC (kernel="poly") =97%
DecisionTreeClassifier = 97%
RandomForestClassifier(n_estimators=30) =97% for every increase in n_estimators
ValueError: y should be a 1d array, got an array of shape (599, 64) instead. when i run this code (scores_logistic.append(get_score(LogisticRegression(solver='liblinear',multi_class='ovr'), X_train, X_test, y_train, y_test))
scores_svm.append(get_score(SVC(gamma='auto'), X_train, X_test, y_train, y_test))
scores_rf.append(get_score(RandomForestClassifier(n_estimators=40), X_train, X_test, y_train, y_test)))
Hello I have a doubt. Suppose my KNN algo works best, and scores are .98/.99/.95. Then how do I select train and test of which i got .99???
My iris Exercise LogisticRegression model has high performance 0.98, 0.96, 0.98 .
Instead of SVC , RandomForestClassifier and DecisionTreeClassifier.
But RandomForestClassifier in 2nd number.
DecisionTreeClassifier in 3rd number.
SVC in 4th number.
Logistic Regression = Tuning Regularization C=0.1 = 94%
SVC = 96.66%
RandomForestClassifier = Tuning n_estimators=15 = 96%
SVC~RandomForest > Logistic Regresion
So kf.split(') is not spliting the data as I thought but its just spliting the numbers
that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!
Sir, i have perform cross_val_score on ( LogisticRegression, SVC, tree.DescionTreeClassifier , RandomForestClassifer ) and I got almost same values from every of the models
The mean of the score of Linear Regression =0.32256072489000853, Random forest Classifier =0.96, Standard Vector Model 0.9666666666666666
This is my result of iris datasets using cross k fold
LogisticRegression=97.3%(MAX score)
SVM=96%
RandomFeorest=94%
DecisionTree=96%
Hi, thank you for the tutorial I believe it was very helpful but I believe you made a mistake in minute 13:09. When you copied the code from the original Kfold kf.split([]) you forgot to switch the "kf" variable for "folds". I believe that's why Stratified K-Fold results where very similar to Kfold.
Bro I got 73% accuracy score for , diabetes dataset, using ADAboost with base learer as Decision tree, but without ADAboost I am getting 75% accuracy.. which should I use ? If I am wrong what should be done ?...
00:02 K fold cross validation helps determine the best machine learning model for a given problem.
02:20 K-fold cross validation provides a more robust evaluation of machine learning models.
04:36 Classifying handwritten characters into ten categories using different algorithms and evaluating performance using k-fold cross validation.
07:06 K-fold cross validation helps in more robust model evaluation.
09:43 K-fold cross validation divides data into training and testing sets for iterative model evaluation.
12:35 Stratified k-fold ensures uniform distribution of categories for better model training.
15:42 Measuring the performance of models in each iteration
18:29 Parameter tuning in random forest classifier improves scores.
20:46 K Fold Cross Validation helps measure the performance of machine learning models.
23:18 Cross-validation helps in comparing algorithms and finding the best parameters for a given problem.
25:18 K Fold Cross Validation helps in assessing the model's performance.
Crafted by Merlin AI.
Finnaly a video explaining de X_train, X_test, y_train,y_teste. Thank you!
hi, I got SVC>DECISION_TREE>RANDOM_FOREST>LR, --SVC =97%,DT=96%,,RF=95%,LR = 94%,
When i'm perfroming the Kfold it is showing this error
Found input variables with inconsistent numbers of samples: [1198, 599]
Using the K Fold Method, the data was split multiple times into X_train s and y_train s but remained constant for each method for each split.
Is it the same case in the cross_val_score method? Isn't the splitting taking place differently for each method? So basically the models are trained on different X_train s and y_train s
Thank you so much for the clear explanation.
@codebasics after applying cross_val_score why did it give only 3 results. why not more?
Useful for identifying many differnt types of categories.
Thank for the very useful and free tutorial series. Salute to you sir!
🌹 You are way way... way better than all of my Machine learning professor at school!
when i tried for svm i got score around 98.43 but sir got 40 ... why its happened please someone clarify it.
LogisticRegression was the best model in the Iris dataset
I got an accuracy of 97.3% compared to other models such as svm and randomforestclassifier
At 14:21 you forget to change kf.split() to folds.split()
And thus I think you didn't demonstrate stratifiedfold.
Just wanted to let you know .
best explanation... i like the way u give examples using small data to explain how it actually works. 10:20
no one explains like this... keep doing great work
Glad you liked it
@@codebasics
I have applied K fold on the linear regression's dataset
I used different activation functions & then I get mean & se values
How to pick the best model from the k folds?
Sir I am performing cross validation technique on iris dataset from kaggle but it when I am using cross-validation technique there is a value error - found input variable with inconsistent number of samples.Sir how to deal with these issue .Please explain.
def avg(nums):
num_avg = 0
for i in range(len(nums)):
num_avg = num_avg + nums[i]
num_avg = (num_avg / len(nums))
return num_avg
// this is the code if you want to get the average of the list. To use it just say
avg(scores_l)
usage of same datasets make less uninteresting, but your tutorials are awesome every tutorial across every thing have + and -,your tutorials are more structured but minus point is usage of same dataset which reduces interest to go next next
My n_splits value is 3 but cross_val_score function give me 5 output why??
Why we have not used train_test_split for cross-val_score? Can anyone explain ?
You make exquisite content, I'd love to see more!
Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison.
Thanks in advance
I'm doing kf.split but when printing the score it says data cardinality is ambiguous. Since i have done train test split earlier i believe there will be unequal no of samples in x and y. Would like some help to understand why us the error coming up
you imported StratifiedKFold but where did you use it in the above dataset?
and in kfold i think they are just dividing the array based on index number then how come in stratifiedKFold without seeing the values of what is in the target class they can the divide assuring all get equal no. of each class?
Support vector classifier works well for me, why does it perform weak in your classification? Any clarification?
love indian accent
Kindly tell me how cross validation is used to find out errors values like mse,rmse,Mae,rmae of regression algorithm like linear regression on continuous value of dataset
For each an every time when i using this code i got negative values I'm using SVR model. Can anyone please help me.
My SVM is 0.98 without k-fold cross-validation. why?
Thanks for all the lectures,but you wanted to use StratifiedKFold yet used normal KFold
0.966667 accuracy for 50 cv for SVM its better as compare 2 others for iris dataset
if we use 'cross_val_score' function, how can we set the training & testing data ratio? I checked the parameter and didn't see any parameter that can set the ratio.
A very informative video on ML but in my execution, I am getting a warning of iteration reached the limit. Can you help me out with this?
If we set 150 cross validation then who many set for training and testing
Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?
how does kfold solve underfitting and overfitting . can u please brief
Sir in cell 33 what is that red \ back Slash
that is to escape to the next line. Since Python uses white space, if you don't use that backslash, Python would read that as the next line of code. You use that backslash so it tells Python that even when you're typing in the next line, it is still part of the code in the above line. He only did that so the code doesn't extend all the way to the right and pass the right screen.
Good question! I tried using this back slash but it gave me an error - am on Python version 3. Any ideas, @codebasics
So load_digits is a function, right? I'm confused as to how its a dataset. And where is features and target being differentiated?
at 15:05 you used kf.split in the for loop, is that correct? because now we are using strafiedkfold so the instance for that is folds. Should it be folds.split?
I have a little problem. In the original dataset the values where from 1-9. But in the train and test splits there were 0s in them. How does the k-fold generate them when they are not in the original dataset?
Your video's on machine learning is way bettet than any online paid video's. so keep growing..