Handling Missing Values using R

Dr. Bharatendra Rai

Просмотров 45 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 23 ноя 2024

Комментарии • 194

@samsontan1141 4 года назад ⁺²
You are a life saver
Dr. Bharatendra Rai. Thank you.
@bkrai 4 года назад ⁺¹
Thanks for comments!
@rajlatte9131 4 года назад ⁺³
This was the best explanation that I have heard since my DS journey, Now I can confidently deal with missing values in R.. Kudos to you Bharat Sir, much appreciated :)
@bkrai 4 года назад
Thanks for comments!
@flamboyantperson5936 6 лет назад ⁺¹¹
Such a nice explanation Sir. This was one of the most awaited lecture. Thank you so much for such a nice explanation.
@bkrai 6 лет назад
Thanks for comments!
@dileep3549 6 лет назад ⁺³
Thanks a ton sir , your videos are very helpful . You teach subject very nicely .
@bkrai 6 лет назад
Thanks for comments!
@bassamal-kaaki3253 4 года назад ⁺¹
You are such a wonderful prof. I love the way you handle things with ease and without confusions. You are the best.
@bkrai 4 года назад
Thank you! 😃
@niteshranjan5033 2 года назад ⁺¹
Really sir It's very helpful to us. No one can explain these things like you
@bkrai 2 года назад
Thanks for comments!
@abhishek894 3 года назад ⁺¹
What a awesome video. You make everything so easy. Thank you once again Dr. Rai.
@bkrai 3 года назад
You are welcome!
@elenafumagalli9044 4 года назад ⁺¹
Very nicely explained, thank you. Can you suggest references that we could use in a paper to justify imputing NAs before running a mixed anova analysis rather than just using a lmer function that does listwise deletion? Our missing are 30% of the data and I think it is too much information to be lost...
@bkrai 3 года назад ⁺¹
You can go through the documentation of the package, it should provide some references.
@asterlookanalytics9853 2 года назад ⁺²
You have taught me most of the things. Actually you introduced me to machine learning. Greatly fantastic videos. Be blessed
@bkrai 2 года назад
Great to hear!
@fatimasadjadpour4845 4 года назад ⁺¹
Dear Professor Rai,
You have super useful videos for every subject!
Many Thanks
@bkrai 4 года назад
Glad to hear that!
@delt19 6 лет назад ⁺⁴
I've been wondering how to impute data and as always you make it seem very easy. Would be interested in seeing a tutorial on how to handle outliers in a data set prior to training a model.
@bkrai 6 лет назад
Thanks for feedback and suggestion. I've added it to my list.
@sebastianvarela2190 5 лет назад
@@bkrai where is your video on handling outliers? I cant find it in your list... thanks in advance!
@atulsaurabh8 4 года назад
@@bkrai Can not find tutorial on outlier treatment in R, could you please share the link Sir?
@vishalialahappan9069 5 лет назад ⁺²
Thanku so much Sir! Best tutorial channel for learning datascience with R
@bkrai 5 лет назад
Thanks for your comments!
@MinecraftPhil72 3 года назад
First of all - thank you for still answering questions two years after the release of this video!
My question is - where is the original data file taken from, as I would like to use it in a paper and have to cite the original source.
Thank you sir!
@sapirelmaliah9561 5 лет назад ⁺¹
Thank you for a very clear and helpful explanation! I used your code on my data and it worked!!
@bkrai 5 лет назад
Thanks for comments!
@ravikumar-rz8uu 6 лет назад ⁺²
Sir, You are explaining very well Data Science Concepts ,Thank You..
@bkrai 6 лет назад
Thanks for comments!
@akd9977 6 лет назад ⁺⁴
Thank you. Can you please create one video to handle outlier in data
@bkrai 6 лет назад
Thanks for feedback and suggestion. I've added it to my list.
@sanjayursal5330 4 года назад ⁺¹
Very nicely explained and pretty in depth too.
@bkrai 4 года назад
Glad it was helpful!
@larbihouichi8942 5 лет назад ⁺²
Dear Bharatendra Rai
In multiple imputation, how to decide on which the best proposed from 3 or 5 imputation?
@bkrai 5 лет назад
When you do 3 imputations, you can separately try them with your prediction model and choose the one that works best.
@gauravyewale3986 6 лет назад ⁺²
It helped me a lot. Thanks for the video sir.Would like to see more such videos from you.
@bkrai 6 лет назад
Thanks for the comments! Here are some links that you may find useful:
Machine Learning videos: goo.gl/WHHqWP
Introductory R Videos: goo.gl/NZ55SJ
Deep Learning with TensorFlow: goo.gl/5VtSuC
Image Analysis & Classification: goo.gl/Md3fMi
Text mining: goo.gl/7FJGmd
Data Visualization: goo.gl/Q7Q2A8
@maksim0933 4 года назад ⁺¹
Such a nice music) Thank you for your lesson) Well done! Very appreciated!
@bkrai 4 года назад
Many thanks!
@sharanyamahapatra7563 3 года назад ⁺¹
Beautifully explained sir, thank you for this video !!
@bkrai 3 года назад ⁺¹
Most welcome!
@NikhilKumar-hv6dv 6 лет назад ⁺¹
Explanation part is very good. I have a question, does this package perform swiftly when it comes to big data sets with multiple rows and lots of NA's? What are the other options?
@bkrai 6 лет назад
It should work fine with bigger data sets. If your computer is faster with at least 16gb RAM, I don't foresee any issue. You can also save time with number of imputations where default is 5, but you can go lower too.
@santosacosta4645 6 лет назад ⁺¹
Thank you. Could you please elaborate on how do you make your decision on which of the 3 imputation methods to use?
@bkrai 6 лет назад
Any of the 3 imputations should be fine. Many methods do not allow you to proceed with model building unless missing data is addressed. You can also run a model with each of the 3 imputations and choose one that gives the best results.
@MKmadhurima 3 года назад
Can you please give the detailed explanation of the interpretation of md.pairs
@pralhadkalkundre2651 5 лет назад ⁺¹
Nice Tutorial... Thoroughly understood.. Please make on outliers as well👍
@bkrai 5 лет назад
Thanks for comments and suggestion!
@pitrodapiyush 5 лет назад ⁺¹
Upmost respect for sharing the knowledge with simple and effective presentation.
@bkrai 5 лет назад
Thanks for your comments!
@haraldurkarlsson1147 3 года назад
A very nice presentation of how to impute missing data. However, I was a bit disappointed in the data set you chose (vehicleMiss.csv). It lacked information. What was the source? How long a time span did it cover? What was the currency - $. Although, these things seem clear it helps to state it nevertheless. A brief introduction of the data and what is means would have been nice. Finally, with less than 1% NAs few would bother spending a lot of time or effort imputing such data since the effect is essentially null on any analysis outcome. Another dataset - even the ones already baked into some of the packages (such as naniar or mice) would have been more appropriate. Don't get me wrong I appreciate the time and effort you put into this and it is a very nice introduction to the mice package. Thanks.
@skfunnext 6 лет назад ⁺¹
Simple and easy explanation. Requesting you to please upload one video with different methods of imputation with majority of categorical predicators if possible. Thanks Sunil
@bkrai 6 лет назад
Thanks for comments and suggestion. I've added it to my list.
@skfunnext 6 лет назад
Hi Sir - I have dataset from one of competition, if you allow can i send to you to make video on imputation with categorigal predictors. Please share your email id - sangasunil@gmail.com
@skfunnext 6 лет назад
Found your email ID and sent you data set - Thanks for help - Sunil
@JamesSmith-kk1yc 4 года назад ⁺¹
This is a very clear explanation and demonstration of the mice package. I will use this package from now on. thanks. What dataset did you use in your demonstration?
@bkrai 4 года назад
Thanks, and link to data is available in the description area.
@seant7907 4 года назад ⁺¹
sir, you just explained the topic very well and understandable. I automatically pressed the subscribe button. Please do continue your work.
@bkrai 4 года назад
Thanks for comments!
@felipeparra2365 4 года назад ⁺¹
Sir, thank you very much for that fantastic explanation, and thank you again for sharing your knowledge
@bkrai 4 года назад
Thanks for your comments!
@danish9135 3 года назад ⁺¹
great explanation. Appreciation from Pakistan
@bkrai 3 года назад
Thanks for comments!
@PM-st6vu 6 лет назад ⁺¹
Thanks for creating such intuitive video once again. Very helpful.
Was keen to know, what is the best way to research these techniques and ending up writing such succinct codes?
@bkrai 6 лет назад
There are several books and research papers available on each topic. Probably google itself is a good starting point to search relevant information.
@mukeshchoudhary2842 4 года назад ⁺¹
Nice explanation. Clear and to the point. I have one query regarding multi-year data. I have data on maize hybrids belonging to three maturity levels (Early, Medium and late) and tested for three years. The problem is that data is unbalanced as the number of hybrids tested every year (and for each maturity) varies with some being common across all three years. Can you help me how to proceed? I applied the lme4 package for variance components estimation but it gives an error for model convergence.
@bkrai 4 года назад
For imbalance problem, you can try this:
ruclips.net/video/Ho2Klvzjegg/видео.html
@sureshm73 6 лет назад ⁺²
Great Explanation in a easier way , thank you so much Sir, Could you please also create a video on the best way to impute the Outliers ?
@bkrai 6 лет назад
Thanks for feedback and suggestion. I've added it to my list.
@p3drito 3 года назад ⁺¹
Is there a way to impute only specific columns. Say, I don't want to impute column 2-7 with the command [,2:7] but columns 2,4,8,10 etc. Can I specify these in the mice command?
Thanks in advance!
@bkrai 3 года назад
You can use a subset of data before using mice. Once done, you can combine columns back.
@vandanaarya431 3 года назад
It is wonderful sir..You have provided it for the best of the research. I am thankful to you.
@bkrai 3 года назад
You are welcome!
@royodama7689 2 года назад ⁺¹
Thank you so much, Prof.
@bkrai 2 года назад
You are very welcome!
@DanielKanyata 3 года назад
Thank you so so much for this very helpful video. I want to find out though. Is there a way of saving the complete data after imputation into time series (xts object) other than the data.frame? I am dealing with monthly returns
@KayYesYouTuber 4 года назад ⁺¹
This is beautiful. Thank you very much.
@bkrai 4 года назад
Thanks for comments!
@ahmetklnc6347 4 года назад ⁺¹
Thanks for explanation! My question is how can I apply mice function to large data set (for example: my data set that I work on it has 105000 observations and 226 variables)? I tried what you applied in video but firstly I had error like "system is computationally singular: reciprocal condition number". After that I also change method parameter as "cart" inside mice function (but I am not sure cause I have both categorical and continuous features in my data), it did not give any error but not it takes too much time and does not end. So, I could not make any imputation. Do you have any suggestion? Thank you.
@bkrai 4 года назад
In such situations I use appropriate sample from the original data to build models.
@abubakarmehran6052 4 года назад ⁺¹
Can't explain u, how much i respect and love u sir! ❤
@bkrai 4 года назад ⁺¹
Thanks a ton!
@abubakarmehran6052 4 года назад
@@bkrai sir, when i ran the md.pattern() code, my plot had not become as yours! please help sir
@adeelahmadnadeem1265 2 года назад
Nice tutorial. I have 1 year time series MODIS vegetation indices like NDVI , EVI etc. with 16 days temporal resolution. i want to fill the time gap in datasets. How i can do this in RStudio any suggestion?
@WhiteGhost13 5 лет назад ⁺¹
Thank you so much for this video! I really appreciate it!
@bkrai 5 лет назад
Thanks for comments!
@ozozan7895 4 года назад ⁺¹
Dear Prof Rai, I found this error in the package
marginplot(data[,c('Mileage', 'lc')])
Error in marginplot(data[, c("Mileage", "lc")]) :
could not find function "marginplot"
@bkrai 4 года назад
Make sure to run the libraries at the beginning.
@ozozan7895 4 года назад ⁺¹
@@bkrai Thank you for your advice, Sir. I solved the issue. However, another issue comes, as after imputation a warning message appears "Warning message: number of logged events: XX."
XX is a number. Do you have any clue about this issue?
@bkrai 4 года назад
In R warnings are ok. It's not an error.
@ozozan7895 4 года назад
@@bkrai Ok. Would you recommend this MICE imputation method for a gene expression analysis and mass spectrometer based data? since these data can be large in variable.
@abdulwaheedshaikh3745 6 лет назад ⁺¹
Sir, in the case of my dataset missing values using MICE, it shows this error: "Warning message:
Number of logged events: 243." As my dataset is having 120 columns. How can change my code. My current code is: impute
@bkrai 6 лет назад
Note that "warning message" is not error.
@thomaspgumpel8543 4 года назад ⁺¹
Thanks for an excellent video. As per your instructions:
impute
@bkrai 4 года назад
There is no need to exclude categorical variables.
@thomaspgumpel8543 4 года назад
@@bkrai Categorical data (such as gender) must be an integer.
@kinisuni 9 месяцев назад ⁺¹
When we are running the Summary Data , state is showing ass Char, in your case it is showing as Factor, kindly help us how to address the same
@bkrai 9 месяцев назад
You can use this line to change it to factor:
data$State
@TinaHelen 4 года назад ⁺¹
Thank you for that great explanation. The thing I still don't get is, what criteria should I use to decide, which imputation to use? And s it always good to choose one (e.g. the first imputation") for ALL variables?
@bkrai 4 года назад
If there are 3 and all of them provide consistent outcome for the model, then any of the 3 can be chosen. But if due to random chance one of them behaves very different in terms of model results, then have another option is always good.
@Bupchiieee 4 года назад ⁺¹
Sir can I perform imputation after converting a whole data set which includes character values to numeric values.
@bkrai 4 года назад
It will depend on the type of variable. Some chr variables may not be meaningful for converting to numeric.
@irmafatmawati2764 4 года назад ⁺¹
great Sir. I like your explanation.
@bkrai 4 года назад
Thanks and welcome!
@karyichia2366 4 года назад ⁺¹
thanks a lot for the explanation, sir.
i have a question, p
@bkrai 4 года назад
Yes, it's length of whole data and 100 is used to convert it in to %.
@karyichia2366 4 года назад ⁺¹
Alright, i got it. Thank you sir!
@bkrai 4 года назад
Welcome!
@abdulwaheedshaikh3745 6 лет назад ⁺⁴
Sir, I am a great fan of you.
@bkrai 6 лет назад
Thanks for comments!
@Dr_Rod_Rizzo 5 лет назад
Thanks!! Very useful! Do you know why my R cannot find the" md.pattern" and "md.pairs"?
@prerittrajputt7351 4 года назад
I had the same issue and I rectified it by using library(mice) and library(VIM)
@melodicguitarist 6 лет назад ⁺²
Very nicely explained Sir. If I need to understand the process of imputation that how it is calculated then I need to read the documentation for the same that what calculations has been done in this function. Can you name some companies also working in Data Science and Analysis in R , Python etc.
@bkrai 6 лет назад ⁺¹
For each R package there is very detailed publicly available documentation that provides various functions, their details, and examples. All leading companies such as Google, Facebook, Apple, Microsoft, Twitter, etc., use data science and freely available packages such as R and Python.
@melodicguitarist 6 лет назад
Thank you Sir. Any startups you know in which a fresher can apply so that he can make his career in this stream.
@bkrai 6 лет назад ⁺¹
You will have to find that out in your area. I live bear Boston and here there are lot of such companies.
@kinisuni 9 месяцев назад ⁺¹
dear sir, impute is not working for STATE, it still shows the NA , however in your video it shows as polyreg, please help
@bkrai 9 месяцев назад
You can use this line to change it to factor:
data$State
@haroonkhan4u 5 лет назад
in this function under impute (impute$imp$Mileage) what is "imp" where did it came from? and great video on missing value is there any other way we can treat missing value?
@stefansms 5 лет назад ⁺²
That's such a wonderful explanation about missing data! I have bought several ML courses in Udemy, but none of them were so detailed as your video. Thank you!
Please let me know if you have a donation method available!
@bkrai 5 лет назад ⁺²
Thanks for your comments! After your comment I've added donate button, however it is not necessary.
@akshitbhalla874 5 лет назад ⁺¹
With the same code, I got a different marginplot. It does not show 13 (but shows 8) and no numbers on axes. Also, you are a wonderful teacher.
@bkrai 5 лет назад
You may not be seeing complete plot if the area of the 4th window is too small.
@yousfoss4367 5 лет назад ⁺¹
thks for the video. please how can i do to have yourthe dataset used for this video, such as to follow up properly.
thks
@bkrai 5 лет назад
Link to data file is in the description area below video.
@ravindarmadishetty736 6 лет назад ⁺¹
Nice video sir....Very reliable for missing data. What is the use of VIM package
@bkrai 6 лет назад
Thanks for feedback! I used VIM for some of the plots.
@poojabiradar600 5 лет назад ⁺¹
Thank u very much fir creating vedios...so much helpful n easily understandable.
@bkrai 5 лет назад
Thanks for comments!
@tanushreebubna2312 4 года назад ⁺¹
Sir, how do we decide how many imputations do we want and which of the 3 imputaions to choose from?
@bkrai 4 года назад
Usually 3 is sufficient. You can choose one that gives better results.
@aniruddhaghosh9823 2 года назад
Sir, 1 query I am having if we need to replace any variable value by its group mean then how will we do and sometimes it is also true that most of the variables have skewed data, i.e., we cannot use mean and should use median instead, then how to do the replacement of missing values. Please help sir!!
@patrickduhirwenzivugira4729 4 года назад ⁺¹
Thank you for this video. You are amazing!
@bkrai 4 года назад
Thanks for comments!
@coruscated 5 лет назад ⁺¹
Can you please mention which packages to install while running the code
@bkrai 5 лет назад
Any package for which I used library, they need to be installed first.
@parasrai145 6 лет назад ⁺¹
Nicely explained!
@bkrai 6 лет назад
Thanks!
@ahiqtidar 3 года назад
Such a great Explanation
I need help on my ML problem. I am a chemical engineer with 4 years of manufacturing background, I am new DS and learning myself from RUclips and other sources. I am predicting the efficiency of a chemical reactor that is measured on 3 different days a week by Laboratory. This efficiency is indirectly related to some other variables whose values are continuous.
In short, I have 7 predictors/input variable, each variable has one value per day, that means for each input variable I have 750 values ( almost two years), but my outcome variable has only 230 values in the two years, I want to fill the missing values for my outcome variable. Should I use imputation?
@Bupchiieee 4 года назад ⁺¹
Unbeatable
@bkrai 4 года назад
Thanks for your comment!
@annmariyageorge3346 4 года назад ⁺¹
If a column only can have yes or no and some values are missing, how can i impute ?
@bkrai 4 года назад ⁺¹
You may go with the most frequent class as one option.
@atifdai313 2 года назад
How to fill the missing values in panel data?
@dimplekashyap1 6 лет назад ⁺¹
Great video
@bkrai 6 лет назад
Thanks!
@MrSworob 5 лет назад
Thank you so much for the great video, it really helped me!!
@bkrai 5 лет назад ⁺¹
Thanks for comments!
@jayashriraghunath3210 4 года назад ⁺¹
Sir can we replace NA values in string type?
@bkrai 4 года назад
Yes
@SanjayKNayak-to3nw 5 лет назад ⁺¹
Nice video sir..I tried this in a situation where only one column is there with missing value there I am getting the error "Data should be a matrix or data frame" how to handle it?
@bkrai 5 лет назад
You can change your data format to data.frame using following:
data
@hardabrahmadave6173 5 лет назад
@@bkrai It still shows the same error in my case any hints as to what i could be doing wrong?
@lokesh542 4 года назад ⁺¹
Hello sir thanks for such beautiful explanation but while imputing missing values for both categorical and numerical values using mice, my categorical values are still NA:
1 2 3
68 NA NA NA
I am using the same data file vehicleMiss can you please help why this is happening.
Below is the code:
p
@bkrai 4 года назад ⁺¹
Note that after running the last line that you mentioned, there is still no change in the original file. If there were missing values to start with, it still has missing values.
@sk93359 4 года назад ⁺¹
Hi Sir,
this code and Technique are not working in Address Type data like Categorical Data Could you please Make a Video on only Categorical Variables not any numerical Variables
@bkrai 4 года назад ⁺¹
In my data, 'state' is a categorical variable.
@sk93359 4 года назад
@@bkrai yes I saw but in my case I have all data is categorical data with 500 obs.
@sk93359 4 года назад
@@bkrai
could you get me Mail id i will send data ?
@darshitsolanki7352 5 лет назад ⁺¹
Great sir 😍
@bkrai 5 лет назад
Thanks!
@isaibassene1331 4 года назад ⁺¹
Waouh! Thank you so much.
@bkrai 4 года назад
You're welcome!
@anandacharya9919 4 года назад ⁺¹
How to handle missing value in Category variable not mentioned
@bkrai 4 года назад
You may use one of the classification methods to predict missing category.
ruclips.net/p/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1
@swapnilpatil5882 5 лет назад ⁺¹
Hello sir
nice video!!!
plz help me
How to impute mode in NA values??
thank you!
@bkrai 5 лет назад ⁺¹
This video has all the steps.
@swapnilpatil5882 5 лет назад
@@bkrai mode have categorical variable ?
@Rehan1824 3 года назад ⁺¹
awesome...
@bkrai 3 года назад
Thanks!
@MAHENDRAKUMAR-ct8jl 2 года назад ⁺¹
thanks alot !!!
@bkrai 2 года назад
You're welcome!
@comredesigns6328 5 лет назад ⁺¹
Sir thank you for all your videos. They have helped my learning r in a scale that is beyond words can explain. I am thankful to you in every step I take in learning these. Blessings!
@bkrai 5 лет назад
Thanks for your feedback and comments!
@shivamkrathghara3340 3 года назад ⁺¹
👌👌👌 Thankyu
@bkrai 3 года назад
You are welcome!
@alaaalrawajfeh153 2 года назад ⁺¹
when mice is discoverd ?
@bkrai 2 года назад
I didn't understand your question.
@earlymorningcodes6100 4 года назад ⁺¹
"observed & Imputed Values" 14:01
@bkrai 4 года назад
Thx
@ArunKumar-sg6jf 4 года назад ⁺¹
sir why u using length in code give me explanation
@bkrai 4 года назад
I didn't understand your question. Can you be more specific?
@ArunKumar-sg6jf 4 года назад ⁺¹
@@bkraiin #missing data
Sir u used length(x) what for u used to this
@bkrai 4 года назад
That's to find percentage of missing values. So numerator tells number of missing values and length(x) is total number of values.
@me3jab1 5 лет назад ⁺¹
Thank you boss
@bkrai 5 лет назад
Welcome!
@pratibhabajpai377 4 года назад ⁺¹
Thank you sir for a wonderful video on imputation of missing values. Sir I am working on a covid dataset which have about 33 columns. While I am able to impute missing values for some columns but for other columns like 'city' or 'date on which the patient was admitted to hospital' NA are not replaced by imputed values. I don't even get any error msg. Kindly guide me
@bkrai 4 года назад
For those type of columns you may have to find some other way.
@earlymorningcodes6100 4 года назад ⁺¹
"complete Data Set" 12:53
@bkrai 4 года назад
Thx
@earlymorningcodes6100 4 года назад ⁺¹
"Impute" 8:39
@bkrai 4 года назад
Thx

Следующие

Автовоспроизведение

Dealing with MISSING Data! Data Imputation in R (Mean, Median, MICE!)