Thanks for this thorough demonstration! I wonder what you think about what percentage of missing values is okay to do imputation. Also the number of available complete cases might be important. E.g. if I have 3.000 complete cases is it okay to impute 12.000 missing values in the other cases? Information on these considerations are rarely to be found.
Thank you for your informative video!// At 15:03, I was wondering if you could provide me with reason(s) as to why data need to be normalised first before applying the KNN imputation. What would be consequence(s) if actual values are used for KNN imputation directly?// Are there quantitative method(s) which could be used to assess the accuracy of the imputation rather than visualisation? My data contains more than three thousand rows, so it is hard to assess the accuracy by using the three types of plotting described in the video.
I beliveve that if you have variables with different ranges (say 0 to 1) and (0 to 100) then you need to scale or normalized them before running kNN or one variable might dominate the other.
If I have panel data from 2000 to 2019 with health indices as predictors and data for these indices is missing for some years due to the frequency of data reporting. What type of missingness is that?
Nice presentation. However, I find difficult to find a good account of the difference between the different classes of missings (MCAR, MAR, MNAR). After reading the description of these types of classes by different youtubers I am just left a loss. Perhaps no one can explain these things?
It would be nice to know where some of the functions you are using are coming from (without having to visit github). I cannot find locf, nobc or forbak in nomemica. I checked the zoo package. It does not have those but similar ones (na.locf for both LOCF and NOBC).
Fantastic video! Really really helpful and informative! I recommend! Thanks for your video!
Woow.. This is wonderful.. Thank you for creating and sharing informative videos
Thanks for this thorough demonstration! I wonder what you think about what percentage of missing values is okay to do imputation. Also the number of available complete cases might be important. E.g. if I have 3.000 complete cases is it okay to impute 12.000 missing values in the other cases? Information on these considerations are rarely to be found.
Thank you for your informative video!// At 15:03, I was wondering if you could provide me with reason(s) as to why data need to be normalised first before applying the KNN imputation. What would be consequence(s) if actual values are used for KNN imputation directly?// Are there quantitative method(s) which could be used to assess the accuracy of the imputation rather than visualisation? My data contains more than three thousand rows, so it is hard to assess the accuracy by using the three types of plotting described in the video.
I beliveve that if you have variables with different ranges (say 0 to 1) and (0 to 100) then you need to scale or normalized them before running kNN or one variable might dominate the other.
If I have panel data from 2000 to 2019 with health indices as predictors and data for these indices is missing for some years due to the frequency of data reporting. What type of missingness is that?
How to check the quality of the imputation with Mice?
Nice presentation. However, I find difficult to find a good account of the difference between the different classes of missings (MCAR, MAR, MNAR). After reading the description of these types of classes by different youtubers I am just left a loss. Perhaps no one can explain these things?
It would be nice to know where some of the functions you are using are coming from (without having to visit github). I cannot find locf, nobc or forbak in nomemica. I checked the zoo package. It does not have those but similar ones (na.locf for both LOCF and NOBC).