Hands-on Class Imbalance Treatment in Python | Oversampling | Undersampling | SMOTE | Data Science

Поделиться
HTML-код
  • Опубликовано: 24 янв 2025

Комментарии • 12

  • @sunilrajput1307
    @sunilrajput1307 4 месяца назад

    Awesome Explanation...
    I suggest you attach the Jupyter notebook code with your video.

    • @prosmartanalytics
      @prosmartanalytics  4 месяца назад

      Thank you! 😊 We used to provide notebooks too but stopped due to IP infringement.

  • @Zeeshanoca
    @Zeeshanoca 10 месяцев назад

    Awesome presentation. Kindly make a presentation on these also Hybrid Sampling/Ensemble Systems. Thanks

    • @prosmartanalytics
      @prosmartanalytics  10 месяцев назад

      Thank you! We'll keep these suggestions in mind.

  • @amazing_performances
    @amazing_performances 11 месяцев назад

    Great videos! Thak you for sharing!!!

  • @DanielTok-bs5mn
    @DanielTok-bs5mn 8 месяцев назад

    awesome, but what about stratify when splitting?

    • @prosmartanalytics
      @prosmartanalytics  8 месяцев назад

      Thank you! Stratify maintains the same proportion of 0s and 1s in both train and val/test sets as that of the overall data, but it won't resolve the class imbalance issue. We may stratify at the time of split to maintain whatever imbalance we have, and then apply imbalance treatment only to the train set.

  • @younesgasmi8518
    @younesgasmi8518 Год назад

    Thanks for the presentation if Can I use SMOTE before Splitting the dataset into training and testing dataset ?

    • @prosmartanalytics
      @prosmartanalytics  Год назад +1

      Welcome! Good question. Any imbalance treatment needs to be applied only to the train data i.e. for training the model, but because the test data represents future data, it is not supposed to be treated for imbalance.

    • @younesgasmi8518
      @younesgasmi8518 Год назад

      @@prosmartanalytics i mean whene we use oversampling on the whole dataset (before Splitting) because whene i used this way I have got a good confusion Matrix and better metrics ( accuracy recall F1 precision) and there is not any problelm of overfitting.

    • @prosmartanalytics
      @prosmartanalytics  Год назад +2

      Yes, but there is a leakage problem. The results so obtained won't be considered reliable. Test data is suposed to be representing the future. So if we are predicting defaults for a bank where the historical default rate is only 2%, test data should represent this value and not 50%. If we use the entire data for imbalance treatment, somehow the data that we are going to use as test later has already participated in the training process because we generated our labels using that too.