Machine Learning Tutorial Python 12 - K Fold Cross Validation

Поделиться
HTML-код
  • Опубликовано: 1 окт 2024

Комментарии • 597

  • @codebasics
    @codebasics  2 года назад +3

    Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

  • @The_TusharMishra
    @The_TusharMishra 8 месяцев назад +6

    He did folds = StratifiedKFold(), and said that he will use it because it is better than KFold
    but at 14:20, he used kf.split, where kf is KFold.
    I think he frogot to use StatifiedKFold.

  • @MrSparshtiwari
    @MrSparshtiwari 3 года назад +120

    After watching so many different ML tutorial videos and literally so many i have just one thing to say, the way you teach is literally the best among all of them.
    You name any famous one like Andrew NG or sentdex but you literally need to have prerequisites to understand their videos while yours are a treat to the viewers explained from so basics and slowly going up and up. And those exercises are like cherry on the top.
    Never change your teaching style sir yours is the best one.👍🏻

  • @anujvyas9493
    @anujvyas9493 4 года назад +43

    14:15 - Here instead of kf.split() we should use folds.split(). Am I correct??

    • @codebasics
      @codebasics  4 года назад +14

      Yes. My notebook has a correction. Check that on GitHub link I have provided in video description

    • @Thedevineforce
      @Thedevineforce 4 года назад +12

      Yes and also just to add to it StratifiedKFold requires X and y both labels to its split method. Stratification is done based on the y labels.

  • @rajnichauhan1286
    @rajnichauhan1286 4 года назад +64

    what an amazing explanation. Finally! I understood cross validation concept so clearly. Thank You so much.

  • @pablu_7
    @pablu_7 4 года назад +22

    After Parameter Tuning Using Cross Validation = 10 and taking average
    Logistic Regression = 95.34%
    SVM = 97.34%
    Decision Tree = 95.34 %
    Random Forest Classifier = 96.67 %
    Performance = SVM > Random Forest > Logistic ~ Decision

    • @manu-prakash-choudhary
      @manu-prakash-choudhary 3 года назад

      after taking cv=5 and C=6 svm is 98.67%

    • @sriram_cyber5696
      @sriram_cyber5696 Год назад

      @@manu-prakash-choudhary After 50 splits 😎😎
      Score of Logistic Regression is 0.961111111111111
      Score of SVM is 0.9888888888888888
      Score of RandomForestClassifier is 0.973111111111111

  • @AjayKumar-uy3tp
    @AjayKumar-uy3tp 3 года назад +10

    Sir
    You used KFold(kf) instead of StratifiedKFold(folds) in the video
    Will there be any difference in the scores if we use stratified KFold?

    • @Zencreate
      @Zencreate Год назад

      There is slight difference in the scores

  • @shashankkkk
    @shashankkkk 3 года назад +22

    for me, SVM's score is almost 99 everytime

  • @beansgoya
    @beansgoya 5 лет назад +30

    I love that you go through the example the hard way and introduce the cross validation after

  • @codebasics
    @codebasics  4 года назад +4

    Exercise solution: github.com/codebasics/py/blob/master/ML/12_KFold_Cross_Validation/Exercise/exercise_kfold_validation.ipynb
    Complete machine learning tutorial playlist: ruclips.net/video/gmvvaobm7eQ/видео.html

    • @hemenboro4313
      @hemenboro4313 4 года назад

      we needed to use mean() with cross validation to get average mean of accuracy score. i'm guessing you forget to add. anyways video is pretty good and depth.keep producing such videos.

  • @naveenkalhan95
    @naveenkalhan95 4 года назад +7

    @20:39 of the video, noticed something interesting, by default "cross_val_score()" method generates 3 kfolds... but the default has now changed from 3 to 5 :))

    • @gandharvsaxena8841
      @gandharvsaxena8841 3 года назад +2

      thanks man, i was worried when mine was showing 5 folds results. i thought something was wrong w my code.

    • @khalidalghamdi6303
      @khalidalghamdi6303 2 года назад

      ​@@gandharvsaxena8841 Me too lol, whi I am getting 5

    • @aadilsstatus8895
      @aadilsstatus8895 2 года назад

      Thankyou man!!

  • @carpingnyland8518
    @carpingnyland8518 2 года назад +6

    Great video, as usual. Quick question: How were able to get such low scores for svm? I ran it a couple of times and was getting in the upper 90's. So, I set up a for loop, ran 1000 different train_test_split iterations through svm and recorded the lowest score. It came back 97.2%!

  • @cindinishimoto9528
    @cindinishimoto9528 4 года назад +9

    My results (with final average):
    L. Regression --> 97.33%
    Decision Tree --> 96.66%
    SVM --> 98.00% [THE WINNER]
    Random Forest --> 96.66%

  • @AltafAnsari-tf9nl
    @AltafAnsari-tf9nl 3 года назад +12

    Couldn't ask for a better teacher to teach machine learning. Truly exceptional !!!!Thank You so much for all your efforts.

  • @nuraishahzainal1660
    @nuraishahzainal1660 2 года назад +7

    Hi, I'm from Malaysia. I came across your video and I am glad I did it. super easy to understand and I'm currently preparing to learn deep learning. already watch your Python, Pandas, and currently ML videos. thank you for making all these videos. you making our life easier Sir.
    Sincerely, your student from Malaysia.

  • @parisapouya6716
    @parisapouya6716 2 года назад +3

    In line [33] of your code, the "kf" should be replaced with "folds" since "folds" is the object from the StratifiedKFold() class :) Am I right?

    • @Zencreate
      @Zencreate Год назад +1

      I have the same question. I tried using folds object to split meaning, instead of kf.split(digits.data), I tried folds.split(digits.data) to compare both the results for all the models but it gave me an error. "split() missing 1 required positional argument: 'y' ". To rectify this, I gave digits.target and it worked!

  • @mastijjiv
    @mastijjiv 4 года назад +10

    Your videos are AMAZING man!!! I have already recommended these videos to my colleagues in my University who is taking Machine Learning course. They are also loving it...!!! Keep it up champ!

    • @codebasics
      @codebasics  4 года назад +2

      Mast pelluri, I am glad you liked it and thanks for recommending it to your friends 🙏👍

  • @shehzadahmad9419
    @shehzadahmad9419 2 года назад +1

    is this K Folder Cross Validation is an optimization technique or not?

  • @pablu_7
    @pablu_7 4 года назад +6

    Thank you Sir for this awesome explanation. Iris Dataset Assignment Score
    Logistic Regression [96.07% , 92.15% , 95.83%]
    SVM [100% , 96.07% , 97.91%] (Kernel='linear')
    Decision Tree [98.03 %, 92.15% , 100%]
    Random Forest [98.03% , 92.15% , 97.91%]
    Conclusion: SVM works the best model for me .

    • @pranjaysingh4161
      @pranjaysingh4161 9 месяцев назад

      pretty ironic and yet amusing at the same time

  • @cedrictchounkeu5219
    @cedrictchounkeu5219 Год назад +1

    Sir, after running from sklearn.model_selection import train_test_split
    X_train, X_tetrst, y_train, y_test = train_test_split(digits.data,digits.target,test_size=0.3)
    for the second time gives me an error:
    ValueError: Found input variables with inconsistent numbers of samples: [540, 599]
    How do they deal with this error????

  • @21_koustavbanerjee69
    @21_koustavbanerjee69 Год назад +1

    In exercise the maximum score get by SVM at gamma=auto and kernel=linearr and the score is = array([1. , 1. , 0.98]) 😀

  • @kishantiwari7739
    @kishantiwari7739 Год назад +1

    You import strafieldkfold but for spliting use kfold only ?
    line number 32 -33 for strafielidkfold it will written as
    for i , j in sf.split(X,y):
    pass
    it take 2 argument not 1
    please provide another video on this topic and pin it

  • @panagiotisgoulas8539
    @panagiotisgoulas8539 2 года назад +1

    I don't understand this method on the kid that has to take the test on the very start. How would that apply to that particular real life example?
    Also not sure if shuffling KFold or getting StratifiedKFold is better?

  • @ramandeepbains862
    @ramandeepbains862 2 года назад +1

    Sir, SVM performance is high as compared to other algo after changing parameter gamma='scale' for the given example of digits dataset

  • @arfazkhankhan74
    @arfazkhankhan74 Год назад +1

    After watching the video for 25 mins, I realized that the last 5mins were the most important😄

  • @beerusreal6
    @beerusreal6 3 года назад +3

    I have never seen anyone who can explain Machine Learning and Data Science so easily..
    I used to be scared in Machine Learning and Data science, then after seeing your videos, I am now confident that I can do it by myself. Thank you so much for all these videos....
    👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏

  • @Gamesational1
    @Gamesational1 2 года назад +1

    Usful for identifying many different types categories.

  • @knbharath5947
    @knbharath5947 5 лет назад +6

    Great stuff indeed. I'm learning machine learning from scratch and this was very helpful. Keep up the good work, kudos!

  • @venkatamaheshvanguru2124
    @venkatamaheshvanguru2124 5 лет назад +5

    Hi Sir, Your explanation is very well. I need a small clarification - You created an object for StratifiedKFold as folds and not used it in that example, that's fine, i will do it by myself. But
    let me know how the cross_val_score has got split size as 3? was it just because we assigned it earlier?

    • @thiagomunich
      @thiagomunich 5 лет назад +9

      Nope, cross_val_score get 3 folds by default, you can check it at documentation. If you want to increase the numbers of folds, just pass the parameter: cross_val_score(model, X, y, cv=n_folds_you_want)

    • @codebasics
      @codebasics  5 лет назад

      @@thiagomunich Thanks Thiago for helping Mahesh with his question

    • @avesharora
      @avesharora 4 года назад

      @@codebasics How does StratifiedKFold come into action in this case?

    • @late_nights
      @late_nights 4 года назад

      @@avesharora yeah he forgot to use Stratified Fold in this case.

    • @shubhamsd100
      @shubhamsd100 2 года назад

      @@avesharora use cv = StratifiedKFold(n_splits = 4) as a hyperparameter in Cross_val_score

  • @hamzanaeem4838
    @hamzanaeem4838 4 года назад +1

    The accuracy which we are getting is of trained part or test part ? . If it is of test then how can we check train accuracy ?

  • @Enem_Verse
    @Enem_Verse 3 года назад +1

    My svm was giving 99.25% accuracy
    I checked again adnd again and don't know how it's happening
    Your svm gives just 45%

    • @codebasics
      @codebasics  3 года назад

      That’s the way to go varun, good job working on that exercise

  • @christiansinger2497
    @christiansinger2497 4 года назад +4

    Thanks man! You're really helping me out finishing my university project in machine learning.

    • @codebasics
      @codebasics  4 года назад +3

      Christian I am glad to hear you are making a progress on your University project 😊 I wish you all the best 👍

  • @pappering
    @pappering 4 года назад +10

    Thank you very much. Very nice explanation. My scores, after taking averages, are as follow:
    LogisticRegression (max_iter=200) = 97.33%
    SVC (kernel = poly) = 98.00%
    DecisionTreeClassifier = 96%
    RandomForestClassifier (n_estimators=300) = 96.67%

  • @panagiotisgoulas8539
    @panagiotisgoulas8539 2 года назад +2

    For the parameter tuning this helps. Just play a bit with indexes due to lists staring from 0 and n_estimators from 1 to match up indexes.
    scores=[ ]
    avg_scores=[ ]
    n_est=range(1,5) #example
    for i in n_est :
    model=RandomForestClassifier(n_estimators=i)
    score=cross_val_score(model,digits.data, digits.target, cv=10)
    scores.append(score)
    avg_scores.append(np.average(score))

    print('avg score:{}, n_estimator:{}'.format(avg_scores[i-1],i))
    avg_scores=np.asarray(avg_scores) #convert the list to array
    print('
    Average accuracy score is {} for n_estimators={} calculated from following accuracy scores:
    {}'.format(np.amax(avg_scores),np.argmax(avg_scores)+1,scores[np.argmax(avg_scores)]))
    plt.plot(n_est,avg_scores)
    plt.xlabel('number of estimators')
    plt.ylabel('average accuracy')
    44 was the best for me

  • @zunairnoor2745
    @zunairnoor2745 Год назад +2

    Thanks sir! Your tutorials are really helpful for me. Hope I'm gonna see all of them and make my transition from mechanical to AI successful 😊.

  • @ajmalrasheed5412
    @ajmalrasheed5412 3 года назад +1

    why your SVM score is too low. Even though I have checked many time and It comes greater then 90%. Please do check anyone.

    • @codebasics
      @codebasics  3 года назад +1

      Good job ajmal, that’s a pretty good score. Thanks for working on the exercise

  • @WahranRai
    @WahranRai 5 лет назад +2

    What is the score ?
    Cross validation is about validation of ONE model.
    After validating the model and getting his parameters, you shall choose method to compare with other models and select appropriate model.
    - Training set: A set of examples used for learning, that is to fit the parameters of the classifier.
    - Validation set: A set of examples used to tune the parameters of a classifier, for example to choose the number of hidden units in a neural network.
    - Test set: A set of examples used only to assess the performance of a fully-specified classifier.

    • @codebasics
      @codebasics  5 лет назад

      You can use cross validation to compare multiple models to. Basically just run kfold one multiple models or same model with different parameters and compare the score.

  • @nilupulperera
    @nilupulperera 4 года назад +2

    Dear Sir
    Another great explanation as always.
    Thank you very much for that.
    By adding the following code svm started showing very good scores!
    X_train = preprocessing.scale(X_train)
    X_test = preprocessing.scale(X_test)
    Have I done the correct thing?

  • @vishalvanpariya1466
    @vishalvanpariya1466 3 года назад +1

    i have one question here why this cross_val_score function returning 3 score we haven't passed any number of the fold and can not find any default number for Fold

    • @Natur_Deutschland
      @Natur_Deutschland 3 года назад

      I came here in comment section to find this question. Please tell me if you already know the reason for that.

    • @phantheduy4522
      @phantheduy4522 3 года назад

      @@Natur_Deutschland U can control the number of scores by passing 'cross_val_score(lasso, X, y, cv=3)' cv like this

  • @soumyabanerjee6860
    @soumyabanerjee6860 5 лет назад +4

    In line 37 do we need to specify like : split(digits.data,digits.target) instead of only split(digits.data).Trying the latter yields error in my case

    • @weicao4101
      @weicao4101 4 года назад

      Totally agree with you, can we just split(digits) ?

    • @weicao4101
      @weicao4101 4 года назад

      Totally agree with you, can we just split(digits) ?

    • @adnanax
      @adnanax 4 года назад

      KFold() needs only the X as an argument but StratifiedKFold() needs both X and y as arguments. it will be like
      split(digits.data)====> in case of KFold
      split(digits.data, digits.target) ====> in case of StratifiedKFold()
      cheers!

  • @yusufsafdar3789
    @yusufsafdar3789 4 года назад +3

    Hi, one question though, can't we use in train_test_split method the 'random_state' to get the same score for any model? Could you please be kind to confirm why the accuracy of SVM got changed after executing the second time from 40% to 62% since we gave the same data to other models. Thank you

    • @seasonz367
      @seasonz367 4 года назад +1

      Bump, I've used random state on train_test_split and I always get consistent results.

    • @anand.prasad502
      @anand.prasad502 4 года назад +1

      Andrés Espinal random state maintains the state of train and test split, so you will get constant results

    • @harryfeng4199
      @harryfeng4199 2 года назад

      if u specify (any number) for the random_state option you will get the same result evertime. Set it to none of simply not include it if you want to randomize your training processes

  • @hansvasquezm
    @hansvasquezm Год назад +1

    Really good explanation. You are an expert. I have a question, Is it possible to select the test_size in cross-validation. Because when I use for example, Kfold with 3 splits. It splits the whole data into three parts, but it is possible to make these three splits but using 2 data tests and 7 data train.

  • @muhammedrajab2301
    @muhammedrajab2301 4 года назад +2

    In my case Logistic regression won with 2 hundered percent in it!

  • @Hiyori___
    @Hiyori___ 3 года назад +2

    your tutorial are saving my life

  • @jadhavAkshay0701
    @jadhavAkshay0701 4 года назад

    this is most acc SVC model i get ,tell if it right ?
    scores_svm=cross_val_score(SVC(gamma='auto',kernel='linear'), iris.data, iris.target,cv=3)
    print(scores_svm)
    np.average(scores_svm)
    result:
    [1. 1. 0.98]
    0.9933333333333333

  • @adipurnomo5683
    @adipurnomo5683 3 года назад +1

    17:29 Sir, I thought you was not use stratified kfold CV. Actually you use k fold cross validation instead

  • @sudhiranjangupta7517
    @sudhiranjangupta7517 3 года назад +1

    why showing ~40% score in your SVC() , but in my code it always show ~99% and prediction is also best .
    kernel checked = linear as well as rbf and C = 1.0

  • @beerusreal6
    @beerusreal6 3 года назад

    In my case:-
    #Using Logistic Regression
    lr=LogisticRegression()
    lr.fit(X_train,y_train)
    lr.score(X_test,y_test)
    Accuracy:- 0.9703703703703703
    #Using SVC
    sm=SVC()
    sm.fit(X_train,y_train)
    sm.score(X_test,y_test)
    Accuracy:- 0.9907407407407407
    #USing Random Forest
    rf=RandomForestClassifier()
    rf.fit(X_train,y_train)
    rf.score(X_test,y_test)
    Accuracy:- 0.9851851851851852

  • @adnanax
    @adnanax 4 года назад

    by making df method:
    mean(cross_val_score(LogisticRegression(max_iter=200), X,y))
    0.9733
    mean(cross_val_score(SVC(kernel='linear'),X,y))
    0.98
    mean(cross_val_score(RandomForestClassifier(n_estimators=40), X, y))
    0.96
    by using iris.data and iris.target directly:
    np.average(score_lr)
    0.95333
    np.average(score_svm)
    0.98000001
    np.average(score_rf)
    0.95333333

  • @karishmasewraj6437
    @karishmasewraj6437 2 года назад

    LogisticRegressionClassifier =100%
    SVC (kernel="poly") =97%
    DecisionTreeClassifier = 97%
    RandomForestClassifier(n_estimators=30) =97% for every increase in n_estimators

  • @EmohGame
    @EmohGame 3 года назад

    ValueError: y should be a 1d array, got an array of shape (599, 64) instead. when i run this code (scores_logistic.append(get_score(LogisticRegression(solver='liblinear',multi_class='ovr'), X_train, X_test, y_train, y_test))
    scores_svm.append(get_score(SVC(gamma='auto'), X_train, X_test, y_train, y_test))
    scores_rf.append(get_score(RandomForestClassifier(n_estimators=40), X_train, X_test, y_train, y_test)))

  • @Jacked_Gaming
    @Jacked_Gaming 2 года назад

    Hello I have a doubt. Suppose my KNN algo works best, and scores are .98/.99/.95. Then how do I select train and test of which i got .99???

  • @engineerpython4812
    @engineerpython4812 3 года назад

    My iris Exercise LogisticRegression model has high performance 0.98, 0.96, 0.98 .
    Instead of SVC , RandomForestClassifier and DecisionTreeClassifier.
    But RandomForestClassifier in 2nd number.
    DecisionTreeClassifier in 3rd number.
    SVC in 4th number.

  • @shubhamkanwal8977
    @shubhamkanwal8977 4 года назад

    Logistic Regression = Tuning Regularization C=0.1 = 94%
    SVC = 96.66%
    RandomForestClassifier = Tuning n_estimators=15 = 96%
    SVC~RandomForest > Logistic Regresion

  • @Solankinileshchalala
    @Solankinileshchalala День назад

    So kf.split(') is not spliting the data as I thought but its just spliting the numbers

  • @tatendaVIDZ90
    @tatendaVIDZ90 2 года назад +1

    that approach of doing the manual method of what cross_val_score is doing in the background and then introducing the method! God send! Brilliant. Brilliant I say!

  • @saurabh7943
    @saurabh7943 9 месяцев назад

    Sir, i have perform cross_val_score on ( LogisticRegression, SVC, tree.DescionTreeClassifier , RandomForestClassifer ) and I got almost same values from every of the models

  • @rajkalashtiwari
    @rajkalashtiwari 2 года назад

    The mean of the score of Linear Regression =0.32256072489000853, Random forest Classifier =0.96, Standard Vector Model 0.9666666666666666
    This is my result of iris datasets using cross k fold

  • @shylashreedev2685
    @shylashreedev2685 2 года назад

    LogisticRegression=97.3%(MAX score)
    SVM=96%
    RandomFeorest=94%
    DecisionTree=96%

  • @uva2805
    @uva2805 Год назад

    Hi, thank you for the tutorial I believe it was very helpful but I believe you made a mistake in minute 13:09. When you copied the code from the original Kfold kf.split([]) you forgot to switch the "kf" variable for "folds". I believe that's why Stratified K-Fold results where very similar to Kfold.

  • @hemanthvokkaliga
    @hemanthvokkaliga 2 года назад

    Bro I got 73% accuracy score for , diabetes dataset, using ADAboost with base learer as Decision tree, but without ADAboost I am getting 75% accuracy.. which should I use ? If I am wrong what should be done ?...

  • @RAKESHKUMAR-rb8dv
    @RAKESHKUMAR-rb8dv 8 месяцев назад

    00:02 K fold cross validation helps determine the best machine learning model for a given problem.
    02:20 K-fold cross validation provides a more robust evaluation of machine learning models.
    04:36 Classifying handwritten characters into ten categories using different algorithms and evaluating performance using k-fold cross validation.
    07:06 K-fold cross validation helps in more robust model evaluation.
    09:43 K-fold cross validation divides data into training and testing sets for iterative model evaluation.
    12:35 Stratified k-fold ensures uniform distribution of categories for better model training.
    15:42 Measuring the performance of models in each iteration
    18:29 Parameter tuning in random forest classifier improves scores.
    20:46 K Fold Cross Validation helps measure the performance of machine learning models.
    23:18 Cross-validation helps in comparing algorithms and finding the best parameters for a given problem.
    25:18 K Fold Cross Validation helps in assessing the model's performance.
    Crafted by Merlin AI.

  • @ricardogomes9528
    @ricardogomes9528 3 года назад +1

    Finnaly a video explaining de X_train, X_test, y_train,y_teste. Thank you!

  • @sukantithakur4225
    @sukantithakur4225 4 года назад

    hi, I got SVC>DECISION_TREE>RANDOM_FOREST>LR, --SVC =97%,DT=96%,,RF=95%,LR = 94%,

  • @beerusreal6
    @beerusreal6 3 года назад

    When i'm perfroming the Kfold it is showing this error
    Found input variables with inconsistent numbers of samples: [1198, 599]

  • @ramezhabib320
    @ramezhabib320 Год назад

    Using the K Fold Method, the data was split multiple times into X_train s and y_train s but remained constant for each method for each split.
    Is it the same case in the cross_val_score method? Isn't the splitting taking place differently for each method? So basically the models are trained on different X_train s and y_train s
    Thank you so much for the clear explanation.

  • @manojkuna3962
    @manojkuna3962 Год назад

    @codebasics after applying cross_val_score why did it give only 3 results. why not more?

  • @Gamesational1
    @Gamesational1 2 года назад +1

    Useful for identifying many differnt types of categories.

  • @kmchentw
    @kmchentw 3 года назад +2

    Thank for the very useful and free tutorial series. Salute to you sir!

  • @nicoleluo6692
    @nicoleluo6692 Год назад +1

    🌹 You are way way... way better than all of my Machine learning professor at school!

  • @naveennitt5420
    @naveennitt5420 Год назад

    when i tried for svm i got score around 98.43 but sir got 40 ... why its happened please someone clarify it.

  • @rajadurai7336
    @rajadurai7336 11 месяцев назад

    LogisticRegression was the best model in the Iris dataset
    I got an accuracy of 97.3% compared to other models such as svm and randomforestclassifier

  • @sumitprajapati821
    @sumitprajapati821 2 года назад

    At 14:21 you forget to change kf.split() to folds.split()
    And thus I think you didn't demonstrate stratifiedfold.
    Just wanted to let you know .

  • @sarangabbasi2560
    @sarangabbasi2560 2 года назад +2

    best explanation... i like the way u give examples using small data to explain how it actually works. 10:20
    no one explains like this... keep doing great work

    • @codebasics
      @codebasics  2 года назад

      Glad you liked it

    • @strongsyedaa7378
      @strongsyedaa7378 2 года назад

      @@codebasics
      I have applied K fold on the linear regression's dataset
      I used different activation functions & then I get mean & se values
      How to pick the best model from the k folds?

  • @rishabhmehta6204
    @rishabhmehta6204 4 года назад

    Sir I am performing cross validation technique on iris dataset from kaggle but it when I am using cross-validation technique there is a value error - found input variable with inconsistent number of samples.Sir how to deal with these issue .Please explain.

  • @aadilgoyal9286
    @aadilgoyal9286 3 года назад

    def avg(nums):
    num_avg = 0
    for i in range(len(nums)):
    num_avg = num_avg + nums[i]

    num_avg = (num_avg / len(nums))

    return num_avg
    // this is the code if you want to get the average of the list. To use it just say
    avg(scores_l)

  • @Kishor_D7
    @Kishor_D7 Месяц назад

    usage of same datasets make less uninteresting, but your tutorials are awesome every tutorial across every thing have + and -,your tutorials are more structured but minus point is usage of same dataset which reduces interest to go next next

  • @aubdurrobanik4036
    @aubdurrobanik4036 3 года назад

    My n_splits value is 3 but cross_val_score function give me 5 output why??

  • @kandarppatel5349
    @kandarppatel5349 Год назад

    Why we have not used train_test_split for cross-val_score? Can anyone explain ?

  • @apeculiargentleman6925
    @apeculiargentleman6925 5 лет назад +5

    You make exquisite content, I'd love to see more!

  • @ignaciozamanillo9659
    @ignaciozamanillo9659 3 года назад +1

    Thanks for the video! I have a question, when you do the cross validation inside the for loop you use the same folds for all the methods. Does the cross_val_score do the same? If not, it is posible to use the same folds in order to get a more accurate comparison.
    Thanks in advance

  • @sivanit3150
    @sivanit3150 2 года назад

    I'm doing kf.split but when printing the score it says data cardinality is ambiguous. Since i have done train test split earlier i believe there will be unequal no of samples in x and y. Would like some help to understand why us the error coming up

  • @rahulranjan8682
    @rahulranjan8682 2 года назад

    you imported StratifiedKFold but where did you use it in the above dataset?
    and in kfold i think they are just dividing the array based on index number then how come in stratifiedKFold without seeing the values of what is in the target class they can the divide assuring all get equal no. of each class?

  • @abhisarangan2264
    @abhisarangan2264 2 года назад

    Support vector classifier works well for me, why does it perform weak in your classification? Any clarification?

  • @andre__442
    @andre__442 5 лет назад +1

    love indian accent

  • @adnanakhter4066
    @adnanakhter4066 4 года назад +1

    Kindly tell me how cross validation is used to find out errors values like mse,rmse,Mae,rmae of regression algorithm like linear regression on continuous value of dataset

  • @suthaisahamed9742
    @suthaisahamed9742 Год назад

    For each an every time when i using this code i got negative values I'm using SVR model. Can anyone please help me.

  • @manu93ize
    @manu93ize 4 года назад

    My SVM is 0.98 without k-fold cross-validation. why?

  • @ganeshravula2303
    @ganeshravula2303 Год назад

    Thanks for all the lectures,but you wanted to use StratifiedKFold yet used normal KFold

  • @ramandeepbains862
    @ramandeepbains862 2 года назад

    0.966667 accuracy for 50 cv for SVM its better as compare 2 others for iris dataset

  • @AnInquiringMind404
    @AnInquiringMind404 9 месяцев назад

    if we use 'cross_val_score' function, how can we set the training & testing data ratio? I checked the parameter and didn't see any parameter that can set the ratio.

  • @vishalrana9594
    @vishalrana9594 2 года назад

    A very informative video on ML but in my execution, I am getting a warning of iteration reached the limit. Can you help me out with this?

  • @BeautyQueenANA
    @BeautyQueenANA 3 года назад

    If we set 150 cross validation then who many set for training and testing

  • @someshkb
    @someshkb 4 года назад +2

    Thank you very much for the nice explanation. I have one question in this context: Isn't it necessary to use in train_test_split method the 'random_state' to get the same score for any model?

  • @vasigaransenthilkumar3731
    @vasigaransenthilkumar3731 2 года назад

    how does kfold solve underfitting and overfitting . can u please brief

  • @pranitbhisade3174
    @pranitbhisade3174 5 лет назад +3

    Sir in cell 33 what is that red \ back Slash

    • @beansgoya
      @beansgoya 5 лет назад +4

      that is to escape to the next line. Since Python uses white space, if you don't use that backslash, Python would read that as the next line of code. You use that backslash so it tells Python that even when you're typing in the next line, it is still part of the code in the above line. He only did that so the code doesn't extend all the way to the right and pass the right screen.

    • @zakiasalod891
      @zakiasalod891 5 лет назад

      Good question! I tried using this back slash but it gave me an error - am on Python version 3. Any ideas, @codebasics

  • @ovighosh
    @ovighosh Год назад

    So load_digits is a function, right? I'm confused as to how its a dataset. And where is features and target being differentiated?

  • @ssawant0202
    @ssawant0202 2 года назад

    at 15:05 you used kf.split in the for loop, is that correct? because now we are using strafiedkfold so the instance for that is folds. Should it be folds.split?

  • @dennisaddo2609
    @dennisaddo2609 Год назад

    I have a little problem. In the original dataset the values where from 1-9. But in the train and test splits there were 0s in them. How does the k-fold generate them when they are not in the original dataset?

  • @mohammadpatel2569
    @mohammadpatel2569 5 лет назад +1

    Your video's on machine learning is way bettet than any online paid video's. so keep growing..