Handle Missing Values: Imputation using R ("mice") Explained

Поделиться
HTML-код
  • Опубликовано: 21 мар 2020
  • Data Cleaning and missing data handling are very important in any data analytics effort. In this, we will discuss substitution approaches and Multiple Imputation using Chained Equation (MICE) imputation in R.
    R Program installation steps:-
    Please install R framework in your system. It is available for Linux,Windows and Mac systems below.
    cran.utstat.utoronto.ca/
    Also, after you install R framework, install the IDE(Integrated Development Environment), i.e R studio Desktop from below link.
    www.rstudio.com/products/rstu...
  • НаукаНаука

Комментарии • 157

  • @asukatakazawa9967
    @asukatakazawa9967 2 года назад +4

    I am totally new to MICE imputation and searched for clues on the internet but failed. However, your video was PERFECT and now I could totally understand how it works . Love your work you've done here 👍

  • @willykitheka7618
    @willykitheka7618 3 года назад +1

    You've done a super job explaining the content so well that I subscribed! Thanks for sharing!

  • @rajujha5225
    @rajujha5225 3 года назад +4

    Channels don't need to have a thousand subscribers, good content like this is sufficient. Thanks!

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thank you my friend. Subscribers are important for making more content like this.. 😆

  • @TheSeynana
    @TheSeynana 3 года назад +2

    Very good video with in-depth explanations!

  • @krishnaswamyc9034
    @krishnaswamyc9034 3 года назад +2

    This is Great content. Thanks for patiently explain

  • @aish_waryaaa
    @aish_waryaaa 2 года назад +1

    Thank You So Much Sir for a very great explanation.I have my project and was worried about imputing the missing values and this has really helped me a lot.GOD Bless You.

  • @KiyoPenspinning
    @KiyoPenspinning 3 года назад +2

    That is so valuable. Thank you for creating this video!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 3 года назад +1

    The information on the nhanes variables is readily available in the package section of RStudio. There it states that hyp = 1 is for 'no' and hyp = 2 is for 'yes'. Might have been better to convert them to 0 (for no) and 1 (for yes).

  • @erixyz
    @erixyz 3 года назад +1

    this was so well explained and to the point. thank you for your knowledge.

  • @raihankhan4374
    @raihankhan4374 4 года назад +3

    The best explanation of this I've ever come across..please make more R videos!

    • @dataexplained7305
      @dataexplained7305  4 года назад

      Thanks a lot.. more coming soon....

    • @raihankhan4374
      @raihankhan4374 4 года назад +1

      @@dataexplained7305 would you happen to know two different ways to compute the upper quartile of the variable BMI? Like what is the specific syntax?

    • @dataexplained7305
      @dataexplained7305  4 года назад

      @@raihankhan4374 Try this...
      nhanes$bmi[which(is.na(nhanes$bmi))] = mean(nhanes$bmi, na.rm = T)
      Uppr_Quartile_A = summary(nhanes$bmi)[5]
      Uppr_Quartile_B = quantile(nhanes$bmi)[4]
      Hope this helps !
      Cheers.

    • @raihankhan4374
      @raihankhan4374 4 года назад +1

      Thanks! i'll give it a go! I tried a different way earlier, it worked but feels super cheap lol check it out:
      First Method:
      step1

    • @dataexplained7305
      @dataexplained7305  4 года назад +1

      Looks great..
      Cheers

  • @aminusuleiman7924
    @aminusuleiman7924 3 месяца назад

    Thank you, very well explained. Appreciate it

  • @mattm1152
    @mattm1152 Год назад +1

    Great tutorial! Thank you!

  • @vg7181
    @vg7181 4 года назад +2

    Incredible brother!! Very well explained 👍

  • @datascientist2958
    @datascientist2958 3 года назад +1

    You have compared the imputed dataset with mean of raw data or pooled data distribution?

  • @ilaydavelioglu3677
    @ilaydavelioglu3677 3 года назад +1

    Thank you so much for your effort, love this explanation

  • @fabianoborges
    @fabianoborges 3 года назад +1

    Thank you for this material!

  • @v.m.3748
    @v.m.3748 4 года назад +3

    Ohhh, now I got it. Thank you!!

  • @anushpetrosyan2535
    @anushpetrosyan2535 3 года назад +1

    Great explanation, thank you!

  • @GGlessGo
    @GGlessGo 4 года назад +1

    22:14 saved the experiment for my paper! I still do not understand how to fit all imputed dataset in one model. But at least i got something! much love

    • @dataexplained7305
      @dataexplained7305  4 года назад +1

      Thanks so much for the support.
      You can choose the best data per
      statistical sense, e.g. regressing or mean scale from the imputed datasets. This will be one data as a whole which will be replaced in place of the missing values.. or if you want, with a few lines of code, you can pick and choose the data from all 5 which ever values you think might be close to missing values.
      Check this out..
      ruclips.net/video/_ymR-FFG44c/видео.html
      Stay tuned... more coming soon !!

  • @camillesantos4953
    @camillesantos4953 3 года назад +2

    Thank you for the in depth explanation of the MICE. One question though, I understand the first step to MICE is a simple imputation but what is the point of doing so if in the Mice Imputation command the original data set (input_dt2=nhanes) was used, in addition to checking it against its mean?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Sorry for delay... I m just making a copy of the nhanes dataset to another variable as I don't want to alter the original dataset in which case.. I need to reinstall the library.. just being lazy.. lol

  • @richardstj999
    @richardstj999 2 года назад +5

    The variable age is also categorical, with categories 1="20-39"; 2="40-59"; and 3="60-99"', which are already coded in the data.frame nhanes2, also in the mice package. This does not detract from your very helpful explanations! I am really concerned about imputation of mixed datasets, with both continuous and categorical variables. There are many journal papers in medicine where the authors say they used multivariate normal imputation, and I always wonder how they could possibly handle missing data in categorical variables using that method. The point is that they cannot, and they did not.

  • @rauceliovaldes1495
    @rauceliovaldes1495 3 года назад +5

    Muito obrigado.
    Sem falar Inglês, eu entendi!!!! Parabéns.

  • @divyasukumar7324
    @divyasukumar7324 3 года назад +1

    Thanks for the detailed explanation...

  • @annap9782
    @annap9782 3 года назад +13

    The point of multiple imputation is to perform the analyes in each imputed dataset and then to pool the results. If you just want to work with one dataset it would be better to use VIM or a similar package to perform single imputation.

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment. In mice also you can do that like below.,
      1) mice() this will give you mids
      2) with function to apply a stat function to the mids object above. This is the Mira object
      3) pool (mira) to combine the results together.
      Having said that, I still find this way simple and very effective... agree that we can use VIM as well..

  • @heralvyas27
    @heralvyas27 3 года назад +1

    Very well explained Senthil!!
    I have a question though. When we do the mice imputation, we get 5 datasets and in this case you chose one that is closest to the mean value of BMI. In this case, the amount of NAs was very less. So you could look through the data and decide which dataset to use.
    What happens in a real life situation when the # of NAs to be replaced is high? How do we decide which dataset to use then? If the number of NAs is huge, manually going through the datasets and deciding which one to use would be a cumbersome task right?

    • @dataexplained7305
      @dataexplained7305  3 года назад +1

      Thanks Heral.. you can get the distrivution of each of these datasets as well like below..
      Summary (mice$imp$bmi$1).. Let me.know how it goes..

  • @enriquecguerra
    @enriquecguerra 2 года назад +1

    Thank you very much!

  • @brazilfootball
    @brazilfootball 3 года назад

    Is it still ok to match the 5th choice with our original dataset's mean if we have non-normal data? Why?

  • @jimpul1001
    @jimpul1001 3 года назад +1

    Thanks man, this was very useful

  • @eyadha1
    @eyadha1 Год назад +1

    great video. thank you very much.

  • @sahelmoein4377
    @sahelmoein4377 Год назад +1

    thank you for this tutorial video. It was really helpful. Is there any specific rule for the number of datasets( in your case:5) we choose?

    • @dataexplained7305
      @dataexplained7305  Год назад

      Thanks. I have Selected based on the correlation and closeness of the data to original set.. there are a couple ways to this. Some don’t choose the data for substitution but just use it for analysis…

  • @seanmain5279
    @seanmain5279 3 года назад +1

    This was great, thank you

  • @ranu9376
    @ranu9376 2 года назад +1

    Amazing!

  • @p3drito
    @p3drito 3 года назад +1

    Is there a way to only impute certain variables with the "mice"-command? I'm looking for a way to specifically include certain predictor and auxiliary variables to my imputed model, since I am only working with a subset of variables of a bigger dataset. Thanks in advance.

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment.
      The answer for your question is YES.
      Go to the 14:50 minute of this video. Which ever column you don't want to impute, you simply keep it null in the method() argument. For others, you specify whatever statistical function you like to use.
      Hope this helps

    • @christiaanscholten64
      @christiaanscholten64 3 года назад +1

      Yes, you skip variables by choosing method "" for the feature you don't want the be imputed (like in the video in case of the age feature).

  • @datascientist2958
    @datascientist2958 3 года назад +1

    Sir can we use pmm for nominal features which have 4 categories.or we have to reduce the cardinality

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Yes pmm would be a good choice..Make sure to have them converted as a factor before you apply pmm..

  • @robertjones3727
    @robertjones3727 3 года назад +2

    I really enjoyed your video! You selected the 5th column by comparing the original mean to the column of imputed values, which is practical for a small data set. In cases where there are 100s or 1000s of imputed values, what are the steps for calculating the mean for each of the columns of imputed values, and comparing those results to original mean to then select the best column of imputed values?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      You can do summary(imputedDS$1) compare them to summary(source$column) and then summary(imoutedDS$2) compare with source column.. or simply mean() function...

    • @robertjones3727
      @robertjones3727 3 года назад +1

      @@dataexplained7305 Excellent - Thank you !!!

  • @Jas-ti7hr
    @Jas-ti7hr 3 года назад

    Hey, thank you for the video! It was very informative!
    Is it possible that the table (my_imp$imp$bmi) could show different results because of the random multiple iterations?

    • @dataexplained7305
      @dataexplained7305  3 года назад +1

      This ca show you the number of specified imputed data sets. For e.g. if you selected 5 like in the video, you will see five different imputed sets. You can check the myimp$chain mean which will show the chained mean computed based on imputation iterations that you specify... let me know if this helps..

    • @Jas-ti7hr
      @Jas-ti7hr 3 года назад

      @@dataexplained7305 thank you for the prompt reply! If I’m not mistaken, the chained mean is identical!

  • @camillesantos4953
    @camillesantos4953 3 года назад +1

    also, what method do you suggest for ordinal variables??

    • @dataexplained7305
      @dataexplained7305  3 года назад

      My apologies for the delay.. Check this published paper, when you get a chance..
      www.researchgate.net/publication/326435546_Missing_Data_Imputation_for_Ordinal_Data

  • @Pancho96albo
    @Pancho96albo 2 года назад +1

    thx mate

  • @manojthanu
    @manojthanu 3 года назад +1

    nice explanation .. thank you

  • @dielikerin
    @dielikerin Год назад +1

    Do you know from which author this multiple imputation is? I have to cite the method used in my article.,

  • @bic5004
    @bic5004 3 года назад +1

    hey! thanks for the video, I am getting this error:
    error in str2lang(x) : :1:5: unexpected Symbol
    1: 5531Atrialappendectomy
    ^
    I don´t know what to do, tried to delete it column but then it says the same error for the header for the next column! anyone who knows how to solve it ?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment. Looks like a simple one around the syntax. Hope you have fixed it by now..

  • @anuradhasaikia9305
    @anuradhasaikia9305 2 года назад +1

    the missing values in my dataset are not mentioned as NA but left blank. will there be an issue?

  • @Icecube88
    @Icecube88 2 года назад +1

    is there a better way to choose which column is closest to your mean instead of eye balling it? like what if you had over 100 rows.

    • @dataexplained7305
      @dataexplained7305  2 года назад +1

      Summarize your output set... that is the best way..

    • @Icecube88
      @Icecube88 2 года назад +1

      @@dataexplained7305 thanks. that worked .

  • @ShreyasMeher
    @ShreyasMeher 3 года назад +1

    I am getting this error - Error: $ operator is invalid for atomic vectors. I am using a panel data set.
    Any idea on what I should do?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Hi,
      Not sure if you have figured it out yet but i found something real quick for you..
      stackoverflow.com/questions/23299684/r-error-in-xed-operator-is-invalid-for-atomic-vectors

  • @GuisseppeVasquezV
    @GuisseppeVasquezV 4 года назад +1

    Thanks for your video !! but i dont know if you could explain the other parameters of mice package .. greetings

    • @dataexplained7305
      @dataexplained7305  4 года назад +1

      Anything specific that you are looking for ?

    • @GuisseppeVasquezV
      @GuisseppeVasquezV 4 года назад +1

      Yes the "Seed" parameter, its important to write a specific number and if it is like that, what number i have to write .. also you said that i have pick the model that looks like in mean, but there is another method to choose the correct one ... something more statistical ... and also thank you so much. Greetings.

    • @dataexplained7305
      @dataexplained7305  4 года назад +1

      Thanks for the Q. Highly appreciate it.
      SEED: I suggest you customize your iterations(ideal = 20 to 40) and also your "m" values. Seed parameter is not an ask by the mice() as per the r documentation. Having said that, you can always choose seed as "NA" for a random number generation (or) self assume a number, if that brings in a better imputation.
      OUTPUT_PICK: The one I shown above is mean pick. You can also regress(linear/logistic for example) your variables for a more closer pick. However, considering the length of explanation involved, this should be a separate video, I guess.
      Cheers.

    • @GuisseppeVasquezV
      @GuisseppeVasquezV 4 года назад

      @@dataexplained7305 thank so much for the explanation, and also i will hope you will bring us a new video with the explanation involved. Greetings.

    • @dataexplained7305
      @dataexplained7305  4 года назад +1

      Stay tuned ! I will do one soon..

  • @jaygomez2320
    @jaygomez2320 3 года назад +1

    Hello! I have a dataset with 475 rows where the NA's are categorical values (factors with 4-13 levels) these variables have 2-3 missing values, can I use mode (most frequent) for imputation.
    Also, there are 2 variables with categorical data (factor with 12 levels) where the percentage of missing values are 18% and 25%. I think these variables are important, how can I fill them up? Thank you!

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Hi Jay, thanks for your comment. My quick comments below
      First case: Either a mode or a categorical predictor algorithm like logreg will not make a big difference. So, feel free there..
      Second: Use Polyreg for this as it is >2 levels. It might be a default method also for these kinds of variables.

    • @jaygomez2320
      @jaygomez2320 3 года назад +1

      @@dataexplained7305 Hello sir, thank you for your response. I have filled the missing categorical variables with their mode just now.
      In my second question, my dataset (475 observations) has 11 variables and 3 of those have a lot of missing data (17%, 25%, 27%), in using mice (polyreg), should I include those 11 variables in one go to fill all the missing data? Or should I include only the variables that I think that have effect on the missing values?
      CODE: try

    • @dataexplained7305
      @dataexplained7305  3 года назад

      @@jaygomez2320 not a problem. Handle the ones Which ever one has the missing values only. Others leave them as is...

    • @jaygomez2320
      @jaygomez2320 3 года назад +1

      @@dataexplained7305 so that code will do?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      @@jaygomez2320 yes. Thats right.

  • @siblings794
    @siblings794 3 года назад +2

    Really very good content but i have a doubt, here we have taken the summary of bmi and based on the mean value we have selected the 5th column values because the values are near to the mean. Here why are we considering only bmi summary and why not considering chl summary to select the nearest column values. Thanks in advance

    • @dataexplained7305
      @dataexplained7305  3 года назад +1

      Sorry for delay.. if you still have this Q.. ..the baseline is that the NA values needs to be replaced with best possible values from our judgment... the target variable distribution can be compared to that of the imputed datasets and like you said each column can be imputed using seperate imputed datasets and not just one common dataset.. whatever you choose as your output variable for your analysis can be chosen to compare distribution with final datasets. Hope this helps

    • @paigecox347
      @paigecox347 2 года назад

      @@dataexplained7305 Hi, was wondering if you could help me with a similar query.
      I have 20 continuous variables in my dataset which I need to impute missing data for.
      In this case how do I impute each column using separate datasets? Do I need to do these steps separately for 20 different datasets? And how would I then combine all those columns at the end into one complete data set?

  • @tsehayenegash8394
    @tsehayenegash8394 2 года назад +1

    If you know the matlab code of MICE please inform me

  • @gatechmjm39
    @gatechmjm39 2 года назад +1

    Do you have your syntax for this uploaded anywhere that I could just review your entire syntax for this video and compare it to my code?

    • @dataexplained7305
      @dataexplained7305  2 года назад

      Can you check the video itself.. I don't have the code locally..

    • @gatechmjm39
      @gatechmjm39 2 года назад

      @@dataexplained7305 Yes, thank you. I just didn't know if you have it exported on a notepad file or something.

  • @zoiyaehtisham818
    @zoiyaehtisham818 2 года назад +1

    can you explain hmisc on R, or can we do imputation through mice when analysing categorical variables

    • @dataexplained7305
      @dataexplained7305  2 года назад +1

      Re:Hmisc.. i will upload a video.. but yes.you can impute categorical variables..

    • @zoiyaehtisham818
      @zoiyaehtisham818 2 года назад +1

      @@dataexplained7305 what we have to do if doing categorical variables, simply follow the steps or have to add another step

    • @dataexplained7305
      @dataexplained7305  2 года назад +1

      @@zoiyaehtisham818 yea.. you will just need to apply a categorical algorithm to that feature/column.. if you see the video, i would have applied logreg to hyp column... hope this helps..

    • @zoiyaehtisham818
      @zoiyaehtisham818 2 года назад

      @@dataexplained7305 thank you so much, I have data of 528 obs of 165 variables, and all my data is binary(categorical variables), this data I ve imported from SPSS on R. I have to do the analysis after the imputation and I ve short time, I am learning R since 3 months, and still failed to impute the data, I have applied the algorithm on my data but the error has come i.e
      Error in formula.character(object, env = baseenv()) :
      invalid formula "scale#1 ~ 0+gender+residence+age+religion+mothertounge+cultralbackground+academicbackground+occupation+income+ty( and so on) .Can you advice me what should I do? when I was doing imputation on SPSS there occured the MAXMODELPARAM error so I switch to R. And still I am unable. I ll be very thankful to you if you have any idea and you would tell me.

  • @anuradhasaikia9305
    @anuradhasaikia9305 2 года назад +1

    I am gettting this error after the entering mice function.Error in str2lang(x) : :1:27: unexpected symbol
    1: Company ~ Year+Contingent liabilities

    • @dataexplained7305
      @dataexplained7305  2 года назад

      Sorry missed it somehow..Hope you would have fixed the code by now.. if not please have your code sent to dataandyou@gmail.com

  • @christiansniper2
    @christiansniper2 3 года назад +1

    anyone can explain why he chose these methods in the mice function?

    • @dataexplained7305
      @dataexplained7305  3 года назад +1

      Depending on the variable type, there are a bunch of function options to choose from.

  • @anamoreno8406
    @anamoreno8406 28 дней назад

    what if you have over 50000 variables? when i try to use these functions such its not possible to call out a specific variable

  • @datascientist2958
    @datascientist2958 3 года назад +1

    How can we extract pool imputed dateset from R Studio

    • @dataexplained7305
      @dataexplained7305  3 года назад

      yes.. you can try..
      pool(with(imp_function,lm(chl~bmi+age)))
      and to understand the results..
      www.rdocumentation.org/packages/mice/versions/2.8/topics/pool

  • @MianAShah
    @MianAShah 3 года назад +1

    Good Job.

  • @BenjaminLiraLuttges
    @BenjaminLiraLuttges 3 года назад +1

    Up to what percentage of the data can be imputed? Any references?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment. I suggest, keep the outcome variable at-least complete 75 % for better predictions.Other aux variables can be a bit less depending if it is a continuous, cat, ordinal e.t.c. At the bottom line, if there is any variable less than 60% full, think of deletion like pairwise or listwise...

    • @BenjaminLiraLuttges
      @BenjaminLiraLuttges 3 года назад +1

      @@dataexplained7305 Thanks!

  • @lifeexperiment4119
    @lifeexperiment4119 2 года назад +1

    Hi. Can you please let me know how can you impute the missing value if i have 172 variables? and u need to impute in all variables

    • @dataexplained7305
      @dataexplained7305  2 года назад

      It depends on the percent of missing features.. if its more then it might not me sense to impute due to the loss of Natural value in the data

    • @lifeexperiment4119
      @lifeexperiment4119 2 года назад +1

      @@dataexplained7305 thanks.
      but the data which i have is 4% missing and in different area with out the first variable.

    • @dataexplained7305
      @dataexplained7305  2 года назад

      Sound like possible.. 👍

  • @josephogle6449
    @josephogle6449 Год назад +1

    Good explainer Re: using mice. However, in addition to @annap9782's comment, I'd also flag that finding that your imputed data is distributed similarly to the observed data is not a meaningful test of imputation performance. If we had a (conditionally) ignorable missigness mechanism (i.e., MAR), it may even be expected that these distributions will differ, without this saying anything about the performance of a given imputation.

  • @MrAli2200
    @MrAli2200 3 года назад +1

    Why did you choose the fifth column please ?

    • @dataexplained7305
      @dataexplained7305  3 года назад

      If you see the mean of the original set is closer to the distribution of 5th output set.

  • @kinleytshering1934
    @kinleytshering1934 3 года назад +1

    substituting with mean for n/a will have accumulated error......i still cannot agree... say i have data 1, 10, 10000, 50, 20, na, na, na, 3, etc.... please explain, will it justify

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment... you are correct. Mean Mode substitution will only work in certain cases when the distribution is tight rather loose like you specified.. thats why we get estimates using mice.. watch the second half of the video as well and let me know what you think..

  • @pabitrapradhan721
    @pabitrapradhan721 4 года назад +1

    Bro what about copula based imputation

  • @Icecube88
    @Icecube88 2 года назад

    what is you have categorical variables

    • @Icecube88
      @Icecube88 2 года назад +1

      nvm this handles categorial as well as numeric

    • @dataexplained7305
      @dataexplained7305  2 года назад +1

      Yes.. you just need to choose the appropriate functions to handle the categorical columns...

    • @Icecube88
      @Icecube88 2 года назад

      @@dataexplained7305 yeah I didn't realize that the first part of the vid you weren't using mice. just a simple imputation. second part of the vid you use mice. thanks for the vid.

  • @rohitnath5545
    @rohitnath5545 3 года назад

    how can you just pick an imputation. it must be pooled.

  • @waelhussein4606
    @waelhussein4606 3 года назад +3

    Nicely explained. However, you should not just pick one set after your multiple imputation. You run the analysis on all generated datasets and produce a pooled analysis.

    • @lesliezhen4256
      @lesliezhen4256 3 года назад +2

      Agreed. Something like this:
      my_imp = mice(input_dt2,m=5,method=c("","pmm","logreg","pmm"),maxit=20) # create several sets of imputed data as seen in video
      my_analysis_model

    • @paigecox347
      @paigecox347 2 года назад

      @@lesliezhen4256 Hi, I wonder if you could expand on that second line of code there for me? The one which fits the model to each set of imputed data. What do I put in the brackets of model( ) ?

    • @lesliezhen4256
      @lesliezhen4256 2 года назад +1

      @@paigecox347 The part of the 2nd line that reads model(...) is a placeholder for whatever is the actual model that you are fitting. For example, if you are fitting a mixed effects model, you would replace model(...) with something like lmer(outcome ~ 1 + variable1 * variable2 + (1|subject), data=name_of_dataset)

    • @paigecox347
      @paigecox347 2 года назад

      @@lesliezhen4256 thanks Leslie, I think for my data I won't be able to do this as it's only my dependent variables in a separate data set that need to be imputed and then averaged.
      My independent variables are in a different format (so I cant build the model from the imputed data set in the traditional way)
      Thanks for the help though 👍

  • @idealtube281
    @idealtube281 3 года назад

    its great presentation but it always asked about " my_imp " is not found....s how i pass this step...?
    i put the error i faced below,,,thanks alot
    summary(input_dta3$faminc)
    Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
    4.0 50.0 72.0 134.4 150.0 3000.0 494
    > my_imp$imp$faminc
    Error: object 'my_imp' not found
    > my_imp=imp$faminc
    Error: object 'imp' not found
    > my_imp$imp$faminc
    Error: object 'my_imp' not found...... this is the problem with me..

    • @dataexplained7305
      @dataexplained7305  3 года назад

      Thanks for your comment.
      Two things here. 1) Make sure you have executed that line of my_imp=mice(....)
      2) If this still doesnt help, paste the code below with important details taken off.. just the code flow..

    • @idealtube281
      @idealtube281 3 года назад +1

      @@dataexplained7305 i also excuted before like,,,below
      my_imp=mice(input_dta3,m=5,method=c("","pmm","logreg","pmm"),maxit=20)
      Error: Length of method differs from number of blocks
      > summary(input_dta3$faminc)
      Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
      4.0 50.0 72.0 134.4 150.0 3000.0 494
      > my_imp$imp$faminc
      Error: object 'my_imp' not found
      > my_imp$imp$faminc
      Error: object 'my_imp' not found
      this is what i excuted

    • @dataexplained7305
      @dataexplained7305  3 года назад

      @@idealtube281 check the length of the arguments supplied and columns you have in the data. Still not working ? then, email me the code dataandyou@gmail.com. I will check..

  • @user-qh6vz6cx8n
    @user-qh6vz6cx8n Месяц назад

    the jeet imputatoor

  • @djangoworldwide7925
    @djangoworldwide7925 3 года назад +1

    22 mins that could be summed to 5.

  • @kaushikdujo
    @kaushikdujo 3 года назад +1

    unnecessarily stretched

  • @kaushikdujo
    @kaushikdujo 3 года назад

    very basic .. this video could have been made in 5 mins

  • @Hari-888
    @Hari-888 4 года назад +1

    stop saying 'right' all the time. I literally had to mute the audio and use CC captions because of that.

    • @dataexplained7305
      @dataexplained7305  4 года назад

      Lol... will try to fix it next time. Stay tuned though..

  • @domivan2581
    @domivan2581 3 года назад +1

    Well explained. Thanks for this!