299 - Evaluating sklearn model using KFold cross validation​ in python

Поделиться
HTML-код
  • Опубликовано: 28 фев 2023
  • Code generated in the video can be downloaded from here:
    github.com/bnsreenu/python_fo...
    Let us start by understanding the Binary classification using keras . This is the normal way most of us approach the problem of binary classification
    using sklearn (SVM). In this example, we will split our data set the normal way into train and test groups.
    We will then learn to divide data using K Fold splits.
    We will iterate through each split to train and evaluate our model.
    We will finally use the cross_val_score() function to perform the evaluation.
    It takes the dataset and cross-validation configuration and returns a list of
    scores calculated for each fold.
    KFOLD is a model validation technique.
    Cross-validation between multiple folds allows us to evaluate the model performance.
    KFold library in sklearn provides train/test indices to split data in train/test sets. Splits dataset into k consecutive folds (without shuffling by default).
    Each fold is then used once as a validation while the k - 1 remaining folds
    form the training set.
    Split method witin KFold generates indices to split data into training and test set. The split will divide the data into n_samples/n_splits groups.
    One group is used for testing and the remaining data used for training.
    All combinations of n_splits-1 will be used for cross validation.
    Wisconsin breast cancer example
    Dataset link: www.kaggle.com/datasets/uciml...
  • НаукаНаука

Комментарии • 20

  • @Master_of_Chess_Shorts
    @Master_of_Chess_Shorts Год назад +1

    You are one of the best data science teacher out there. Thanks for your good work and approach. You explain very well on a wide range of topics.

  • @newcooldiscoveries5711
    @newcooldiscoveries5711 Год назад

    Been enjoying this KFold series. Looking forward to the next one. Thanks.

  • @caiyu538
    @caiyu538 Год назад

    I used this module a lot during my work. thank for these great free libraries, it make data scientists easier. Most of work is to glue the data to these libraries.

  • @DmitriiTarakanov
    @DmitriiTarakanov Год назад

    Dear Sreeni, thank you so much for your work! Have a good one!

  • @joebi-den4761
    @joebi-den4761 Год назад +3

    hi, thanks for doing everything and providing it for free. I’m final year EE engineer, not doing great academically. but I hope the future I could be better

    • @hannukoistinen5329
      @hannukoistinen5329 8 месяцев назад +1

      Hi!! Don't be ashamed!! You are on a very demanding curriculum probably. My acvice: learn R!! You can do everything as Python can and much more!! And you don't have to punch some code, which you don't necessarily even need!! Python is just "fashion!. You can do all the research, all math, all visualization with R. Success and God bless you with your studies!!!

    • @joebi-den4761
      @joebi-den4761 7 месяцев назад

      @@hannukoistinen5329 duly noted. very practical advice thanks. so i should be using Rstudio correct? or you have more to say/give

  • @maheshmaskey4592
    @maheshmaskey4592 Месяц назад

    Good post. By the way, how do we select the best model after cross-validation? I am more interested in regression than classification. Have you tried using a multivariate polynomial regression model so that we could establish an empirical relation?

  • @malithabasuri4491
    @malithabasuri4491 Год назад

    Hi, great video series. Can you start a video series about medical image processing and ML like 3D MRI processing, stopping leaky validations and etc. It would be really useful because there aren't many resources.

  • @guiomoff2438
    @guiomoff2438 Год назад

    Before doing a crossvalidation, shoudn't you use a dimentionnality reduction technique to determine if all features are necessary for your training? Thanks by advance if you take the time to answer me!

  • @11111653
    @11111653 Год назад

    how to print roc curve for overall cross validation?
    i have been trying to print roc curve but it shows me error apparently because i got different counts of tprs/fprs on each fold that prevents the code from showing

  • @ajay0909
    @ajay0909 Год назад

    Hi sir, i have been trying to implement video classification using CNN. All the content or tutorials out there are quite hard to implement or maybe I got used to your detailed explanation. Please do a tutorial on how to load video data. Thanks for all the high quality content.

  • @Gingeey23
    @Gingeey23 Год назад

    Great video. Just to clarify, is the purpose of cross-validation to tune the hyperparameters of models on a variety of different train_test splits to avoid overfitting? Cheers!

    • @DigitalSreeni
      @DigitalSreeni  Год назад +1

      Yes, the main purpose of cross-validation is to estimate the performance of a model on an independent dataset and to tune the hyperparameters of the model to avoid overfitting.

  • @marcinmaleszewski2023
    @marcinmaleszewski2023 Год назад

    Thanks!

  • @Athens1992
    @Athens1992 Год назад

    nice video, one silly question u are using in a pipeline minmaxScaler how does know the cross_val_score to apply minmax_score on X_array? I know it's silly question about I have the question because u don't transform your pipeline to X_array

  • @maryamshehu8842
    @maryamshehu8842 Год назад

    Hi Thanks for the video.Code Generated is not in the github file you shared

  • @DineshSereno
    @DineshSereno 10 месяцев назад

    Thanks!