Simple techniques for dealing with missing data

Поделиться
HTML-код
  • Опубликовано: 23 авг 2024

Комментарии • 28

  • @duckcluck123
    @duckcluck123 Год назад +1

    I loved that you added those simulation results. That was very interesting and helped my understanding

    • @mronkko
      @mronkko  Год назад

      You are welcome!

  • @MaverickRam
    @MaverickRam Год назад

    Very helpful and simplified explanation. Thanks for the video!

    • @mronkko
      @mronkko  Год назад

      You are welcome!

  • @danielvoss2483
    @danielvoss2483 5 месяцев назад

    Good Job 👍👍👍

    • @mronkko
      @mronkko  5 месяцев назад

      Thanks!

  • @surya-td4dg
    @surya-td4dg 3 года назад +1

    Strong Finnish accent :).. Thank you for the awsome content

    • @mronkko
      @mronkko  3 года назад +3

      I take the comment about my accent as a compliment ;) Funny thing: I used to live in the US and part of the accent was lost during that time. Even if that is about 20 years ago now, I still see my accent diminish when I spend a couple of days there. But now that we cannot travel the accent is as strong as ever!

  • @mohamadmatinhavaei9859
    @mohamadmatinhavaei9859 6 месяцев назад

    Great job, but what about missingness that exist in a single column and also it's more than 50%? Is deep models like GAN would be useful for imputation?( In time-series prediction). Many thanks🙏

    • @mronkko
      @mronkko  5 месяцев назад

      I assume GAN refers to some kind of neural network. Imputation works regardless of the amount of missing data, under these three conditions:
      1) You are doing multiple imputation and not single imputation so that you can quantify the uncertainty introduced by the imputation process.
      2) The imputation model contains all features of your data that are relevant for the analysis.
      3) The missingness does not depend on the missing value itself. (i.e. data are MAR or MCAR)
      I do not really see what neural nets would add over throughfully developed imputation model but they are likely to increase sample size requirements.

    • @mohamadmatinhavaei9859
      @mohamadmatinhavaei9859 5 месяцев назад

      @@mronkko "Hi again Mikko, I'm tackling a unique challenge with my dataset and believe your insights could greatly help. Could you share any contact info for more brief discussion? Thanks!"

    • @mronkko
      @mronkko  5 месяцев назад

      @@mohamadmatinhavaei9859 I take consulting orders through instats.org/expert/mikko--rönkkö-829.

  • @zhaowu3193
    @zhaowu3193 2 года назад

    Hi, thank you for the content. I would like to know how to choose the reference variable, for example, in your case IQ is taken as a reference when imputing job performance. Actually I have a lot of variables in my data set where some of them have a lot of missing values. How can I identify which variable to refer when I want to impute another one?

    • @mronkko
      @mronkko  2 года назад

      Your imputation model needs to use all variables and model all relationships that you have in your main model. In addition, you can use auxiliary variables (I have a video about that). The rule with auxiliary variables is that you should be liberal in including them. However, if your sample size is small you can start to get bias and computational difficulties if you include too many.

  • @ashayagarwal
    @ashayagarwal Год назад

    I found your channel recently, and started liking your teaching approach. I want to ask if pairwise deletion is possible in regression y = X*beta + e, beta = inv(X'X)X'y. It is possible to calculate a pairwise version of X'X. Would love to hear your thoughts. Thx

    • @mronkko
      @mronkko  Год назад

      In pairwise X'X you would need to adjust for sample size for each cell. But in principle you can estimate pairwise covariances of all the variables and then estimate regression from that covariance matrix. The resulting estimator should be consistent under MCAR but getting the standard errors right would require adjustments to the complete data standard error formulas. I have not seen any paper discussing how to do this and therefore I would not be comfortable using this approach. That being said, that I have not read something does not mean that it does not exist. I have just come to the conclusion that because FIML and multiple imputation exist already and I know how to do both, there is little reason for me to learn about other approaches to adjusting for missing data in estimation.

  • @PavanKumar-ef1yy
    @PavanKumar-ef1yy 6 месяцев назад

    Thanks a lot sir

    • @mronkko
      @mronkko  5 месяцев назад

      Most welcome

  • @bandungmee
    @bandungmee Год назад

    Hi
    It was mentioned that "the imputed data can only be used within the pooling testing and cannot be used for the model testing".
    Does it mean the data is only imputed/simulated for the purpose of analysing its reliability?. If it cannot be used for model testing, does it mean we still need to use the actual data and perform the deletion of missing data?
    Correct me if I'm wrong
    Thank you

    • @mronkko
      @mronkko  Год назад

      I need more context. Can you give me a timestamp from the video?

  • @couragee1
    @couragee1 Год назад

    thank you

  • @Stelnice
    @Stelnice 3 года назад

    Hi! In what types of research can I use pairwise/listwise deletion?

    • @mronkko
      @mronkko  3 года назад +2

      Deleting observations is never ideal if you only consider it from statistical perspective. However, simplicity is also a virtue in applied research (for example, you would be less likely to make mistakes if you keep things simple) and simple techniques should be used over complex ones if the difference in outcomes is small. Deleting observations is OK if a) your sample size is sufficient after deletion and b) your missing data are MCAR. I would not use pairwise deletion because using a different sample size for different analyses complicates things, but this depends on how the data are missing.

    • @Stelnice
      @Stelnice 3 года назад

      got this, thank you!

  • @saimakanwal1863
    @saimakanwal1863 Год назад

    Sir... Which book you are using

    • @mronkko
      @mronkko  Год назад

      Enders 2010. It is cited in the video.