1. It uses the MICE algorithm to impute missing data. Consuder checking out the previous video on MICE: ruclips.net/video/BjyUbk258o4/видео.html 2. Mostly yes
great video.. Some doubts- 1. Whats the intuition behind the baysian intuition. How the genders were assigned for missing places? 2. Can this be safely used for any categorical data which has missing values?? Again great work...
The idea is to convert it to numeric column, which can be done using labelencoder itself. Since, there are only 2 categories in gender, it shouldn't really matter if you want to use one hot encoding
@@machinelearningplus Thank you for the clarification. I have a question regarding the use of encoding techniques in machine learning. Typically, we opt for one-hot encoding over label encoding for nominal categorical variables to avoid implying any inherent order or hierarchy to the model during prediction tasks. However, in a scenario where the dataset includes a nominal categorical column with more than two classes and the purpose is not for ML prediction but for imputation to address missing values, would employing label encoding to prepare the dataset for the imputer potentially mislead the imputation process?
simulation studies suggests that mean imputation is possibly the worst missing data handling method available. this is from research papers. MICE method is like a one size fits all approach and much better. mode imputation messes with bias and variance and screws up the model.
I teach complete ML Mastery Roadmap (self paced courses) to master Data Science from scratch: edu.machinelearningplus.com/s/pages/ds-career-path
1. It uses the MICE algorithm to impute missing data. Consuder checking out the previous video on MICE: ruclips.net/video/BjyUbk258o4/видео.html
2. Mostly yes
great video..
Some doubts-
1. Whats the intuition behind the baysian intuition. How the genders were assigned for missing places?
2. Can this be safely used for any categorical data which has missing values??
Again great work...
Thank you for this video. It helped.
You are a genius! Your video helped me a lot! :)
Happy to help :)
Thank you, for the great tutorial
Thank you, great video!! But shouldn't you use one hot enconding instead of label encoding because it is a nominal cat variable?
The idea is to convert it to numeric column, which can be done using labelencoder itself. Since, there are only 2 categories in gender, it shouldn't really matter if you want to use one hot encoding
@@machinelearningplus Thank you for the clarification. I have a question regarding the use of encoding techniques in machine learning. Typically, we opt for one-hot encoding over label encoding for nominal categorical variables to avoid implying any inherent order or hierarchy to the model during prediction tasks. However, in a scenario where the dataset includes a nominal categorical column with more than two classes and the purpose is not for ML prediction but for imputation to address missing values, would employing label encoding to prepare the dataset for the imputer potentially mislead the imputation process?
Thanks for the question. Yes, I would think so. Especially since more than 2 categories are involved
what is the difference of this video with Mode imputation? it did the same thing with long codes
simulation studies suggests that mean imputation is possibly the worst missing data handling method available. this is from research papers. MICE method is like a one size fits all approach and much better. mode imputation messes with bias and variance and screws up the model.
Life saver!