How to impute missing data in categorical features (using MICE)

Поделиться
HTML-код
  • Опубликовано: 27 окт 2024

Комментарии • 14

  • @machinelearningplus
    @machinelearningplus  5 месяцев назад

    I teach complete ML Mastery Roadmap (self paced courses) to master Data Science from scratch: edu.machinelearningplus.com/s/pages/ds-career-path

  • @machinelearningplus
    @machinelearningplus  Год назад

    1. It uses the MICE algorithm to impute missing data. Consuder checking out the previous video on MICE: ruclips.net/video/BjyUbk258o4/видео.html
    2. Mostly yes

  • @ketanbutte3497
    @ketanbutte3497 Год назад

    great video..
    Some doubts-
    1. Whats the intuition behind the baysian intuition. How the genders were assigned for missing places?
    2. Can this be safely used for any categorical data which has missing values??
    Again great work...

  • @thezenithanalysis7541
    @thezenithanalysis7541 3 месяца назад

    Thank you for this video. It helped.

  • @sabalunax
    @sabalunax 9 месяцев назад

    You are a genius! Your video helped me a lot! :)

  • @ekpenyongokpo4900
    @ekpenyongokpo4900 7 месяцев назад

    Thank you, for the great tutorial

  • @MaximoTartaglia
    @MaximoTartaglia 9 месяцев назад

    Thank you, great video!! But shouldn't you use one hot enconding instead of label encoding because it is a nominal cat variable?

    • @machinelearningplus
      @machinelearningplus  8 месяцев назад

      The idea is to convert it to numeric column, which can be done using labelencoder itself. Since, there are only 2 categories in gender, it shouldn't really matter if you want to use one hot encoding

    • @ssffyy9401
      @ssffyy9401 8 месяцев назад

      ​@@machinelearningplus Thank you for the clarification. I have a question regarding the use of encoding techniques in machine learning. Typically, we opt for one-hot encoding over label encoding for nominal categorical variables to avoid implying any inherent order or hierarchy to the model during prediction tasks. However, in a scenario where the dataset includes a nominal categorical column with more than two classes and the purpose is not for ML prediction but for imputation to address missing values, would employing label encoding to prepare the dataset for the imputer potentially mislead the imputation process?

    • @machinelearningplus
      @machinelearningplus  8 месяцев назад

      Thanks for the question. Yes, I would think so. Especially since more than 2 categories are involved

  • @aminvahdati9476
    @aminvahdati9476 Год назад

    what is the difference of this video with Mode imputation? it did the same thing with long codes

    • @shankars4384
      @shankars4384 Год назад +1

      simulation studies suggests that mean imputation is possibly the worst missing data handling method available. this is from research papers. MICE method is like a one size fits all approach and much better. mode imputation messes with bias and variance and screws up the model.

  • @saurabhsonawane7110
    @saurabhsonawane7110 7 месяцев назад

    Life saver!