How to Use SPSS-Replacing Missing Data Using Multiple Imputation (Regression Method)

Поделиться
HTML-код
  • Опубликовано: 14 июл 2024
  • Technique for replacing missing data using the regression method. Appropriate for data that may be missing randomly or non-randomly. Also appropriate for data that will be used in inferential analysis. Determining randomness of missing data can be confirmed with Little's MCAR Test ( • How to Use SPSS: Littl... ).
    Resources:
    FAQ- sites.stat.psu.edu/~jls/mifaq....
    Schafer, Joseph L. "Multiple imputation: a primer." Statistical methods in medical research 8.1 (1999): 3-15.
    Sterne, Jonathan AC, et al. "Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls." BMJ: British Medical Journal 338 (2009).
    McKnight, Patrick E., Katherine M. McKnight, and Aurelio Jose Figueredo. Missing data: A gentle introduction. Guilford Press, 2007.
    Haukoos, Jason S., and Craig D. Newgard. "Advanced statistics: missing data in clinical research-part 1: an introduction and conceptual framework." Academic Emergency Medicine 14.7 (2007): 662-668.
    Newgard, Craig D., and Jason S. Haukoos. "Advanced statistics: missing data in clinical research-part 2: multiple imputation." Academic Emergency Medicine 14.7 (2007): 669-678.

Комментарии • 148

  • @Flaya12
    @Flaya12 10 лет назад +11

    A simple thank you might not be appropriate for this great work you did and shared with the public and by doing this with me.
    So I want to tell you, if you ever feel down and or even feel worthless, remember that somewhere in Austria you made someone really happy by doing this tutorial!!! Thanks a lot.
    At first I thought it might be a bit long but it was worth every second and you did a really good job.

  • @stephaniesmith6047
    @stephaniesmith6047 11 лет назад +2

    This was a very informative video. I am currently examining some longitudinal data and of course there is a significant amount of attrition. I initially ran a regression analysis using exclude cases listwise but I didn't feel this was the best way to analyze the data. This technique definitely helps address some of those issues. Thank you so much for posting this!

  • @singularity00001
    @singularity00001 10 лет назад

    Excellent work you did here! Thank you.

  • @TheUgly0duckling
    @TheUgly0duckling 10 лет назад +10

    Thank you! Saved me and my thesis.

  • @duallumni369
    @duallumni369 9 лет назад

    The video explains the concept in such a easy to follow steps. A great video for multiple imputation technique.

  • @yad-c3662
    @yad-c3662 10 лет назад

    Thanks for doing this! Very clear and helpful

  • @janecooper358
    @janecooper358 10 лет назад

    Thanks so much for your reply - sorry you misunderstood me, I've got 570 participants so I'll do a EM and see how I go. Thanks again, and thanks for doing the videos - I've just started my PhD and I'm sure I'll be tuning in quite a bit!

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    Thank you for the clear response. very helpful, Thanks.

  • @georginamartin9337
    @georginamartin9337 8 лет назад +1

    this is really awesome!

  • @seanicusvideo
    @seanicusvideo 9 лет назад

    this is helpful. the use and purpose of the extra imputation history file might be better elaborated. was very nice to include some references! thanks!

  • @marina7181
    @marina7181 10 лет назад

    great video!!thank you!

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    Many thanks, I will try.

  • @tacappaert
    @tacappaert  9 лет назад +1

    +Shirley anynameIwant, you should report descriptive statistics for pre and post imputation.

  • @barakatunnisakmohdyusof9938
    @barakatunnisakmohdyusof9938 8 лет назад

    thank you. Great presentation. I have one question. Does the imputation can only be focused on primary outcomes?

  • @gregl4740
    @gregl4740 9 лет назад +16

    Thank you for the tutorial. I just ran this on my dataset successfully. However, I was wondering if there is a way to obtain pooled means and 95% CI's across iterations. For inferential analyses (e.g., correlation), I am able to obtain the pooled statistics. However, when I use Analyze -> Descriptive Statistics -> Explore, it will only give me the descriptive for the original data and each iteration *individually*. Is there a way to obtain the pooled descriptive for variables? Also, is there a way for SPSS to generate a dataset that only contains the imputed data after the final iteration?
    Thanks!

  • @TokenFun105
    @TokenFun105 11 лет назад

    Great thanks! I never trust my 'subjective judgement', so I like to rely on both :)

  • @shirleyanynameiwant5883
    @shirleyanynameiwant5883 9 лет назад +1

    Thanks for this great video. I found it easy to understand. I now have a data file containing multiple imputations (5 imputations). My question is when reporting the univariate statistics and normality statistics, which results should I report given I have results from the original data set and results from the 5 separate imputations? Thank you in advance.

  • @masumarahim
    @masumarahim 10 лет назад +5

    this video was very useful; thank you. however, even when splitting the file by imputation, i cannot get pooled analyses. spss will perform the analysis for the original data and each of the five imputations but will then only give me the means and standard deviations for the pooled data, not, for example, chi-square or t-test values; nor will it give me a p-value. why might this be?

  • @chetanm12
    @chetanm12 11 лет назад +1

    First off, thank you so much for posting this video...it was very well made and I look forward to exploring other videos you have. As a follow up question to enemenoff's question...what are the differences for MI for random vs. non-random patterns? Did I miss that part in the video? Do you have a source I could visit? Thank you in advance!

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    Many thanks. Regards

  • @carilynne1
    @carilynne1 10 лет назад +1

    Thanks for this video--your RUclips channel is saving my life!
    My question is similar to ones asked previously but I could not make sense of the reply about merging data in SPSS.
    I have completed multiple imputation for missing data (went great!) but I want to move this dataset into listrel for structural equation modelling. How can I get a single data set with the pooled information, rather than having the individual datasets for the imputation displayed and then SPSS pooling them during any further analysis?
    Thanks, Carilynne

  • @eligardner5436
    @eligardner5436 9 лет назад +3

    hi, thank you for very helpful video. I followed all the steps but my output after running my first ANOVA, only showed the 5 imputations, not the pooled figures. how do I get the pooled figures?

  • @seanicusvideo
    @seanicusvideo 9 лет назад +3

    Question: if results and parameters are "pooled" (and not averaged) what is the specific calculation? e.g. for bivariate correlations, or linear regression outputs, for example?

  • @sabrinadickey2205
    @sabrinadickey2205 11 лет назад +1

    Hello, the video was very helpful. I have a question regarding the use of the iterations. I had 5 iterations and the pooled iteration was not significant p > .05, but I noticed some of the others were significant. Do you ever use one of the iterations or are you only supposed to use the pooled results?

  • @rhissarobinson970
    @rhissarobinson970 9 лет назад +1

    Hello, I really appreciate you sharing this video. It has helped me tremendously to figure out how to understand and implement this method for my data. Would it be possible for you to share the syntax? For some reason, my output for percentage missing (the first output you show us) does not show the mean and standard deviation of the variables in my output. I'm sure it's a just a line I missed in the syntax. Thank you!

  • @jessicabarton93
    @jessicabarton93 9 лет назад +7

    what happens if your data is missing not at random? I did the lIttle's test and it was significant. I can't figure out which MI to do in that case

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I understand your point. But by outcome variables I mean Dependent Variable(s)!

  • @jamie10157
    @jamie10157 11 лет назад

    Hi thanks for the video it's really useful! Can I just check what exactly the y axis represents when you are looking at the patterns in the diagrame with grey and red squares?

  • @katy8791
    @katy8791 2 года назад

    Hello ! Nice video! Any idea about how to calculate the reliability of a questionnaire in spss if we have missing data in some questions? Is it using the usual cronbach alpha?? And what should be putten in the cells of the missing data in spss?

  • @Zanthias
    @Zanthias 10 лет назад +1

    Hello,
    Thank you very much for showing this video. My question is once you get all the five imputed values, is there any rule of thumb as to which of the five you should use for your analysis? Also, I realize that in your t-test example, the pooled values did not have standard deviations. How about if you want to report Std Deviations in your study? If you can kindly let me know, I will appreciate this. How about if I want to create composite, which one of the 5 imputed values should I use? Tx

  • @2Luhna7
    @2Luhna7 11 лет назад +2

    hello! thank you so much for the video.
    I have a question however. From what I understood you dont get one single databased with missing values replaced; you should work with the pooled results. So, my question is if there is any way to crate a new single database to import to other programs (for instance mplus or lisrel) and work on. I need to do that for CFA on my data...

  • @mrflowers1234
    @mrflowers1234 8 лет назад +2

    When writing a manuscript for a trial that has used multiple imputation to address missing data, what additional reporting should I include? Data pre and post imputation? Anything else?

  • @yaldaamirkiaie5303
    @yaldaamirkiaie5303 10 лет назад

    Hi Thanks for The video, It is very helpful! A question that I have so far by just watching the video is that when applying "Constrains" min 27:19 there are 2 other options saying "maximum case draws" and "maximum parameters draws". could you please let me know what are those?

  • @sameeral-abdi6870
    @sameeral-abdi6870 11 лет назад

    Thanks for this wonderful demonstration. I am facing a problem when I run this test. Number of missing values entered in the multiple imputation analysis was less than number of missing values across all the variables with missing data. Subsequently, completed data after imputation were less than my original data (valid plus missing). So, how can I fix this problem?

  • @onlyificanloveyou
    @onlyificanloveyou 11 лет назад

    Thank you for making this great video!
    I have actually done multiple imputation in Mplus and it generated 10 imputed datasets (all were .dat files). Is there a way to read these files as imputed data sets in SPSS? I need to do matched-pair t-tests by using these values. My stats consultant suggested that I ask SPSS to read these 10 imputed datasets individually, do 10 t-tests, and then average the t-value. However, I like how SPSS pooled the datasets first. Thank you!

  • @Zisis21r
    @Zisis21r 11 лет назад +1

    thanks very helpful! I have a question - under the Analyse-ImputeMissingData- Constraints tab on the lower "define constraints" variables, SPSS won`t allow me to set Min and Max values for my variables - and I notice the table rows are coloured blue and not white as in the tutorial - could anybody help me work out how I can define my min-max variables?

  • @TokenFun105
    @TokenFun105 11 лет назад

    Thank you very much for this tutorial. However, I notice you did not mention Little's Missing at Random test. Should this not be done prior to all imputation methods? Or is it sufficient to look at the Missing Pattern Values Graph? many thanks

  • @sameeral-abdi6870
    @sameeral-abdi6870 11 лет назад

    The observed discrepancy was because some cases had missing values in all the three included variables for multiple imputations. The problem resolved by adding variable(s) in which these cases had values.

  • @TokenFun105
    @TokenFun105 11 лет назад

    Would you use the same process to determine mean and stand deviation of 'pooled data'. I would imagine you could use these estimates to standardise all variables and then re-run the regression on those to obtain the standardised regression coefficients (that SPSS also does not provide)?

  • @chrislittle9839
    @chrislittle9839 11 лет назад

    For a scale score, would you calculate the aggregated variable from the pooled imputation iterations?

  • @chavianddavid
    @chavianddavid 10 лет назад

    The raw data was a dummy variable regression so there are only 1 and 0. Also, the experimental design was such that each respondent had their own design where they saw either all or just a subset of the variables. So I am looking to fill in the coefficients for the variables they did not see.

  • @sylviaherbozo5811
    @sylviaherbozo5811 10 лет назад

    Thanks, I was able to get it to work. But I had another question. After running my analyses (t-tests and chi-squares) with the imputed data, I noticed that the sample sizes for each variable on the output are still uneven which normally means some cases weren't used due to missing values. Are these sample sizes supposed to still be uneven? And I just report the total sample size?

  • @lirpatex
    @lirpatex 11 лет назад

    I would like to do a MANOVA using my imputed dataset, however, when I run the analysis, there isn't any pooled output. Is it okay to report the output from the 5th imputation? Thank you for your great video and help.

  • @janecooper358
    @janecooper358 10 лет назад

    Hi - thanks for the video - it was really informative. Just a question though... my data has a small amount of missing data - 4 variables, 32 cases, 104 values with .5% missing data overall (104 cases). MCAR was non-significant p=.052 - just! I am running a CFA using AMOS therefore I cannot have missing data. Do you recommend conducting a EM or a Mutiple Imputation or neither? Plus how can I get AMOS to look at the pooled data when conducting the CFA? Thanks !!!!

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    Could you please let me know how to calculate R Square for the model in a General Linear Regression model for the pooled data and probably its significance? is there anyway?

  • @AldoAguirreC
    @AldoAguirreC 9 лет назад +1

    How are degrees of freedom reported after a t-test is performed using multiple imputation? I see that the number of df for the pooled data can be in the thousands, and it does not feel right to report such a high number when the N = 50 for example. Any advice or paper or paper that discusses this issue? Thanks!

  • @sputaccount6139
    @sputaccount6139 10 лет назад

    is there a way to get a pooled R-squared value in multiple regression with MI data?

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I want to compare this result then to a less advance way of dealing with missing data. Do you know a reference that say which one of the following techniques produces less biased results? leaving it for list wise deletion, mean imputation (sample mean) or mean imputation of the mean of scores on the scale or sub-scale items for the subject (within subject)? So many thanks.

  • @kausarkaus7247
    @kausarkaus7247 11 лет назад +1

    I saw earlier on your comment what to do on this issue, but I was not able to set min or max value. However, I found out that you can adjust the parameter in the syntax. It did worked out as I saw all the imputed value on the output. Unfortunately, on the data view tab I couldn't see any imputed variable, nor the upper right option to switch different data files. So, what went wrong?
    Could you help me out? Thanks in advance!

  • @colanfrost3518
    @colanfrost3518 11 лет назад

    In your video you said you could only use imputed data for the analyses that have a swirl on it. Is there any possibility to use imputed data with repeated measures analyses in SPSS and how might that work?

  • @yoox0047
    @yoox0047 10 лет назад

    When you run a hierarchical regression with MI dataset, the output does not provide R, R2, adjusted R square, and F value of pooled imputation (it only provides the calculations for the original and each imputed dataset). It also doesn’t provide beta (standardized coefficients) of pooled imputation (there were only unstandardized coefficients: B and Std. Error) either. However, given these are typical calculations reported in our results, how do we obtain these information from the pooled data?

  • @jaishrik8691
    @jaishrik8691 10 лет назад

    The missing variables in my data file have a value of '9'. How do I remove these dummy variables? Thank you.

  • @Anika-ze1bh
    @Anika-ze1bh 9 лет назад

    Hey,
    thank you for the helpful tutorial. Still I have a huge problem with my imputation. After running, it imputes, values that are way to high or even negative. So I defined the range which leads to an error that says something like (mine is in german): "after 200 drawings spss can´t find the imputed value for the variable xxx with it´s defined contraints. Please check if the defined min and max is appropriate or choose a higher maximum case draws" So it stops the imputation.
    Can you help me with that?
    anika

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    !!!!!! is there any pooled data(set) in the file that SPSS creates?? I just see the original data file identified by 0 and 1 to 10 representing each imputation.

  • @sabrinadickey2205
    @sabrinadickey2205 9 лет назад

    Great video, and very easy to understand! If I wanted to remove multiple imputation from a data set is it possible and how would it be done?
    Thank you!

    • @tacappaert
      @tacappaert  9 лет назад +2

      Do you mean reversing the process, so that the missing values become missing again? I don't know if that is possible but as long as you saved the original data set, you can always revert to that.

  • @SamanthaBalemba
    @SamanthaBalemba 11 лет назад

    Will it automatically use the pooled estimates even for more advanced later techniques, like SEM? I'm using AMOS to run my SEM, but I want to make sure the MI results will automatically get used for this (seeing as how it's an addon to SPSS). I was recently informed that you can't run a proper SEM if you have ANY missing data, so I wanted to make sure I fixed that problem...

  • @guanlin6123
    @guanlin6123 9 лет назад

    Thanks for the helpful video. if we need to remove outliers, removing outliers should run before or after imputing data? if removing outliers should run after imputing data, I wonder how to do that when we have 5 inputted data.

    • @tacappaert
      @tacappaert  9 лет назад

      Look for and remove outliers before imputation. These videos may help:
      How to Use SPSS: Identifying Outliers
      How to Use SPSS:Dealing with Outliers

  • @SamanthaBalemba
    @SamanthaBalemba 11 лет назад

    Is there a cut-off for using this method in terms of the percentage of cases missing for specific variables? All of my var's are missing

  • @MoCowbell
    @MoCowbell 11 лет назад

    I have missing item level data (from a scale with some missing items) and variable level missing data. Should I first impute the missing items so that everyone has a score on the variable with the items or should I just ignore the fact that I have items and just estimate the missing variable that is composed of the items? Thanks!

  • @jackcannon3359
    @jackcannon3359 9 лет назад

    This is brilliant! Thanks for posting.... What is the minimum total number of observations (including missing obs) that this technique will work with? I have a dataset with 18 observations from 10 cases (should have 180 points in total) and I am missing 10 data points... Would multiple imputation be appropriate for this two-way repeated measures design? Thanks.

    • @tacappaert
      @tacappaert  9 лет назад

      That should work fine. I don't know that there is a minimum per se, but as long as your missing data is not the majority of the possible observations, it should work for you.

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I did, But I want to replace them as well! MI should work with Dummy coded variables as well, shouldn't it? by the way, how can I round the results of imputation for the rest? Thank you very much for your help. gratefully

  • @chavianddavid
    @chavianddavid 10 лет назад

    What if I have a conjoint study where I have 36 variables and 300 respondents but each respondent only saw a subset of the 36. So I now have a table where each row is a respondent with a constant and then coefficients for only 25 (or more) of the 35 variables. What would be the approach for replacing the missing values (i.e. the missing coefficients for those variables for that specific respondent)?

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    Do we replace the outcome variables as well? if we do, it seems a little bit awkward because we actually want to see if we can predict the outcome variables from other variables or not (our research question/hypothesis). if we already replace them by predicted values from other existing variables, aren't we just increasing the probability of type I error ( the probability that the statistical analysis would support our alternative Hypothesis, even if it is not true).

  • @skincare2010
    @skincare2010 11 лет назад

    hello, I have a few issues with my dataset: first of all, my dependent variable has a wooping 20% missing values (the question is rather sensitive, so I am considering running two models, one that uses this variable and another that uses a similar one, asked in a different way). Is this ok? Also, many of my variables are categorical or nominal (yes/no, agree/disagree etc). Can I still use this imputation method or is it just for numerical variables? Thanks.

  • @benjaminucr6636
    @benjaminucr6636 10 лет назад

    I have run an analysis like the one shown in SPSS 19, and it didnt provide in the output neither the pooled results nor the fraction of missing information. Under "Edition-->options-->multiple imputation" the option "results for imputed and observed sata" is choosen. Any idea about how can I make to get the pooled results and the fraction of missing information in my output?

  • @kausarkaus7247
    @kausarkaus7247 11 лет назад

    I got the same problem, but I managed to run the multiple imputation by adjusting the maxmodelparam (in syntax), cause I was unable to change min and max values. However, I did not see the imputed variable in the data view table. Yet I did see the results of the imputed values in the output file.
    How do I get to see the imputed variable in the data view. Thanks in advance

  • @neema1506
    @neema1506 9 лет назад

    great

  • @alipolat5393
    @alipolat5393 10 лет назад

    I have MNAR type data with sometime 60 percent missing. What I understand is that if my data is NOT random and if I choose automatic from imputation method tab than SPSS will take care of the non randomness problem of data. Is that correct.

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I am seeing that your variables are not the items within the scale but they more look like the sub-scales or the final variable. I mean it looks like you are working with missing values in final scale/subscale scores. May I use MI to impute values for missing data within my scales? and May I have all different scales/measures together in one file and use all the "items" completed in all the scales to impute for missing values? thanks in advance.

  • @kimconsultants
    @kimconsultants 11 лет назад +1

    I have a large number of variables and SPSS does not seem to be able to do the imputation with all the variables at once. So, I did groups of variable separately. However, I get multiple imputed data files. How do you recommend combining the data files?

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I am trying to run MI, it gives me error message (warning) like after 100 draws it couldn't replace this or that (it changes as I change the number of drawing) and I need to raise this number or I need to check the min and max in "constraints" and it stops running. I checked both but still didn't work. one of the variables that keeps coming up was a dummy coded variable. I took out all my dummy coded variables and it worked. What should I do with my dummy coded variables?

  • @sivabalaji30
    @sivabalaji30 9 лет назад

    I have an query…..I am currently working on SPSS on a survey data…..It Contains Many Missing value’s……… Its is Not a Random sample(MNAR)…..what method should I use to replace Missing data ?

  • @kimconsultants
    @kimconsultants 11 лет назад

    So you can impute data only for the variables where > 5% of data are missing? Or, if you impute for one you must impute for all variables that have any missing data? I ask because I have many variables and SPSS doesn't seem to be able to handle all of them at once. This means I have to create multiple imputed data sets and I'm not sure how to combine them all.

  • @Bardiyaz
    @Bardiyaz 10 лет назад

    Hi thanks for this helpful video.
    I have two questions:
    1- do we have to include non missing variables in order to get a better prediction for the missing variables?
    2- I need to do a propensity score matching after doing multiple imputation on the dataset with generated data, so I actually need a "pooled only" dataset which is the average of all as you said. is there a way to save the pooled only dataset or do I have to calculate the average for each variable and save it separately?
    thank you,

    • @tacappaert
      @tacappaert  10 лет назад

      1. Yes, you should use as many variables as you can to improve the estimation of the missing values.
      2. To the best of my knowledge you will have to calculate the mean for each variable.

  • @efi225
    @efi225 10 лет назад

    thank you so much for this helpful video! I run multiple imputation on my data, but I would like to ask you, where are the pooled values you mentioned? I can only see the values in each imputation.
    Also, I have used many different questionnaires in my research,do you think it's better to run multiple imputation to each questionnaire separately?and some of them are multidimensional,does this affect multiple imputation?maybe I should run multiple imputation on the items of each factor separately?

    • @tacappaert
      @tacappaert  10 лет назад

      The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.

    • @tacappaert
      @tacappaert  10 лет назад

      I would run separate imputations for each questionnaire if they are measuring different constructs.

  • @Ulli0664
    @Ulli0664 9 лет назад

    Thanks for the great video, helps a lot!
    Ive 2 types of missing data in my dataset (working with a questionnaire which has several versions) and Ive coded 2 types of missing data:
    -9 for actually missing (respondent didnt know / didnt want to give an answer)
    -99 for n.a. (respondent didnt see this question and therefore couldnt answer it)
    Therefore, I need to somehow exclude the -99 datapoints from the replacement.
    Any idea how to do this?
    Many, many thanks in advance!

    • @tacappaert
      @tacappaert  9 лет назад

      You can exclude certain data points by using the Select Cases function and then run the analysis.

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    MI is very complicated to be implemented for my data, but i cannot convince myself not to use it! May I average the imputed values (e.g., 5 imputed values from 5 imputations) and enter these averages into my missing values in my original file? I actually tried it with one of my analysis and the results (e.g. B weighting coefficients) are slightly different from the pooled data but instead it gives me the model summary which i need! Do you know if this approach is appropriate? Thanks.

  • @missyp017
    @missyp017 11 лет назад

    paul, did you figure this out? i need to do the same thing...

  • @mandyruth9954
    @mandyruth9954 10 лет назад

    Great video! I have run multiple imputation for 2 variables (missing categorical data for 12% of values), however, I notice after 5 iterations, I still have some missing values. Is this normal?

    • @tacappaert
      @tacappaert  9 лет назад

      Not usually. Be sure that you designated those variables to be imputed.

  • @sylviaherbozo5811
    @sylviaherbozo5811 10 лет назад

    I keep getting a warrning message such as "The imputation model for EDEQ14.1 contains more than 100 parameters. No missing values will be imputed..." Any advice on how to resolve this problem? I tried changing the measurement level but it didn't help. I wasn't sure how to do one of the other suggestions including: Reducing the number of effects in the imputation model, removing two-way interactions, or specifying constraints on the roles of some variables

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I already ran Little's MCAR test and I got: Chi-Square: 17193.367, df = 26009 and sig = 1.000. So I believe it means my data are missing completely at random. I chose to do Multiple Imputation. I am wondering if I can already choose the method for MI instead of setting it on automatic to be able to change the 3 of iterations, Since I think I need something about 91 iterations to get convergence! I found this when I did Little's MCAR test. Does iteration in MI and EM indicate the same function?

  • @ruskamihkas9723
    @ruskamihkas9723 9 лет назад +2

    Not sure if you mentioned these, but I didn't succeed until I changed my missing value codes from 99 to blank (.) and changed ordinal variables into scales. Otherwise it wouldn't do the imputations and didn't even let me specify constrains.
    I had a Likert-scale of 1-5

    • @tacappaert
      @tacappaert  9 лет назад +1

      Glad, that worked. That is a pretty common issue.

    • @iskyrisky1969
      @iskyrisky1969 9 лет назад

      +TheRMUoHP Biostatistics Resource Channel
      I have to change my nominal to scale

  • @khushbeensohi4364
    @khushbeensohi4364 9 лет назад

    Can I use this method to replace missing data if my data is not normally distributed and hence, I use non-parametric methods?

  • @Jbalisasa
    @Jbalisasa 9 лет назад +5

    This is a great presentation. I really enjoyed it. Unfortunately for me, as I tried to follow it to impute my missing data, I keep receiving a warning which says that the imputation model for some variable contains more than 100 parameters. Below is an example of such warnings: "An iteration history output dataset is requested, but cannot be written.
    The imputation model for SYNC2 contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand.
    Execution of this command stops."
    This is repeated for quite a number of variables. Can someone help me understand how to hand this trouble? Thank you. Juvenal Balisasa

    • @tacappaert
      @tacappaert  9 лет назад

      Juvenal Balisasa This likely happening because there is data that SPSS cannot categorize or falls outside of the expected range that you specified. Be sure all categorical data has a coding value and be sure there are no numeric values that are out of the specific range.

    • @iskyrisky1969
      @iskyrisky1969 9 лет назад

      +TheRMUoHP Biostatistics Resource Channel
      I have the same problem.

    • @iskyrisky1969
      @iskyrisky1969 9 лет назад

      +Juvenal Balisasa
      Have u solved your problem. I dpo have the same problem

  • @fazlihaleem6603
    @fazlihaleem6603 8 лет назад

    how would i know that the missing data is MR, MCR or missing systematic. if we do not have MR, MCR then in case of systemic missing do we have some solution

    • @tacappaert
      @tacappaert  8 лет назад

      +Fazal Haleem If your data is missing systematically, then that typically means there is some response bias of some kind (e.g. questions asking for sensitive information or questions that are unclear). You should try and figure why that might be happening so you can address that as a possible validity issue. This technique that I demonstrate can be used with data that are missing systematically.

  • @anastemi
    @anastemi 10 лет назад

    Hello, thank you for the video, it was very helpful. However when I ran multiple imputation on my data set, I got this message ' the imputation model for 'sex contains more than 100 parameters and no missing values will be added'. So a new data set was not created, Others have cited this as a problem, what should we do? I also have a large amount of missing data about 50%

    • @tacappaert
      @tacappaert  10 лет назад +1

      Try this solution direct from IBM: www-304.ibm.com/support/docview.wss?uid=swg21482103

  • @suzanneveger7148
    @suzanneveger7148 9 лет назад

    I have a question. I have imputed the data, and I want to conduct an anova test. In order to interpret the data, how do I need to read this ANOVA table? There is the origin solution, and 5 other solutions. However, I do not find a pooled solution.
    What do I need to do here?

    • @tacappaert
      @tacappaert  8 лет назад

      +Suzanne Veger Unfortunately, not all inferential techniques don't pool the result such as we saw in the t-test example.

  • @ilmamufidah6272
    @ilmamufidah6272 10 лет назад +1

    then, which imputed value is to be used? the fifth one? or we have to avegare all 5 inputted data? it will be exhausting right?
    I also have other question.
    my data are mostly ordinal (likert scale). But when I tried to run the multiple imputation, the imputed values were beyond the allowable range, some of inputted values were negative, some others were not integer. When I changed the "measure" to "scale" instead of ordinal, then I set the max and min range as well as the rounding, I could get much more beautiful values. Was my approach right?
    The last question is the same with anastemi. But then I tried to solve it by specifying the role of each variable and it worked. The problem was, I actually didn't know the role of each variable. I just predicted what the role might be (it was actually my hypothesis for a model I tried to investigate). What do you think? I am afraid that my approach is wrong so the inputted value is not valid or something like that

    • @tacappaert
      @tacappaert  10 лет назад

      There should be a pooled data value that you can use that aggregates all the imputed attempts.
      In regards to the ordinal data that was the correct approach.

    • @ilmamufidah6272
      @ilmamufidah6272 10 лет назад

      ***** Is it ok if I just average it? Since I heard that calculating the average is the simplest way to get the pooled data. Where can I found the pooled data?i didn't find any of it

    • @tacappaert
      @tacappaert  10 лет назад

      Ilma Mufidah
      The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.

  • @georginamartin9337
    @georginamartin9337 8 лет назад

    I have a dataset that has some missing values represented only by a . and others that have it as a -1 or -9. When I do the imputation the . values are imputed but the assigned missing values remain the same. How do you rectify this?

    • @tacappaert
      @tacappaert  8 лет назад

      +Georgina Martin If the -1 or -9 values are not actual outcome possibilities, then the values should be cleared and then you can run the imputation.

    • @georginamartin9337
      @georginamartin9337 8 лет назад

      excellent thats what I did and it works!

  • @ertugrulsahn
    @ertugrulsahn 10 лет назад

    Do we estimate missing nominal and ordinal values too? If not what we can do for missing nominal and ordinal values (For example nominal: gender, ordinal: perceived income as categorized by low medium high)?

    • @tacappaert
      @tacappaert  10 лет назад +1

      Yes, the procedure can estimate those values as well.

    • @ertugrulsahn
      @ertugrulsahn 10 лет назад

      ***** thanks for your care

  • @annapease
    @annapease 9 лет назад

    Hello. I am using longitudinal survey responses with a biased drop out - so there is a great big red patch at the bottom right of my missing value patterns graph! Can you tell me the best multiple imputation method to use? If I delete cases I am also biasing the dataset. I have analysed the raw data so I know what I'm comparing it with but I am struggling with the method of imputation. It's also saved across 5 different datasheets - I need to combine it into one, don't I?! Thanks!

    • @tacappaert
      @tacappaert  9 лет назад

      I would use the regression method of imputation.

    • @carolineroth2710
      @carolineroth2710 8 лет назад

      +Anna Pease I used the aggregate command to get all the pooled datasets back into one (I needed to do further imputations on my data, which I couldn't do (or couldn't figure out how to do) once I did one imputation. The thing to note with this, though, is that you won't be able to see which cases/variables have imputed data, like you can when they're not pooled. The syntax I used was this:
      AGGREGATE
      /OUTFILE='[location on my computer]\[newfilename.sav]'
      /BREAK=[variable to break by, which for me was survey participant ID]
      /[string variable 1]=FIRST([string variable 1])
      /[string variable 2]=FIRST([string variable 2])
      ...
      /[string variable x]=FIRST([string variable x])
      /[imputed scale variable 1]=MEAN([imputed scale variable 1])
      /[imputed scale variable 2]=MEAN([imputed scale variable 2])
      ...
      /[imputed scale variable x]=MEAN([imputed scale variable x]).

  • @timw.5528
    @timw.5528 9 лет назад

    I have a large number of variables in the imputation model (most of them are nominal) and I keep getting the same error message mentioned by Juvenal below, "...The imputation model for MODEL contains more than 100 parameters. No missing values will be imputed...." I checked all of the variables and they look fine (the nominal variables have values and the numeric variables are within the expected range. If I change one or more of the variables from nominal to scale it seems to work, but then it seems as though the imputations are not going to be accurate as they will be based on linear rather than logistic regression. Any suggestions?

    • @tacappaert
      @tacappaert  8 лет назад

      +Tim Wadsworth Variables that are ordinal in scale should be categorized as Scale.

    • @nohadarwish3053
      @nohadarwish3053 8 лет назад +1

      +TheRMUoHP Biostatistics Resource Channel; Thank you so much for the helpful tutorial. I have this same Warning message every time I am trying to do multiple imputation "The imputation model for Q2_3_TO contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand"; although I have checked that all variables are either scale or nominal ones. I have something like 85 variables. What should I do?

  • @mariakrista100
    @mariakrista100 9 лет назад

    What do you do after you get results from 5 imputations?

    • @tacappaert
      @tacappaert  9 лет назад

      You use that data to replace the missing data points and then run your additional analyses (e.g. t-test).

  • @d3llikz
    @d3llikz 9 лет назад

    Is this also possible for panel data?

    • @tacappaert
      @tacappaert  8 лет назад

      +Morten Fjerritslev Can you explain what you mean by panel data?

  • @antimandril2281
    @antimandril2281 9 лет назад

    When I push Ok (pattern) the computer is blocked .- Do someone knows what happened ¿?

  • @tombailey4262
    @tombailey4262 10 лет назад

    Hi, thank you for this excellent video. My question appears to be a bit more basic than those below, but I was wondering whether there is any way to store the pooled data set in a separate file. You see I would like to use with an SPSS plug-in e.g. PROCESS, which I don't think will recognise the pooled values as SPSS did with the t-test above?
    Regards
    Tom

    • @tacappaert
      @tacappaert  9 лет назад

      I don't know if that is possible. I would suggest contacting IBM SPSS technical support.

    • @sajatorunczyk6195
      @sajatorunczyk6195 9 лет назад

      Tom Bailey Tom, I am looking exactly for the same - a way to use PROCESS with data that have been imputed. Did you figure this out?

    • @tombailey4262
      @tombailey4262 9 лет назад

      saja torunczyk not very simplistically although I think you could do it in r and port it back in. One option (not as good) might be to use expectation maximisation in SPSS?

  • @ia1167
    @ia1167 10 лет назад

    Hello: first I would like to thank you for this awesome video! It is super clear and super well explained!
    I have a question for you. Procedure: According to IBM once one runs MI, following the method of "Fully Conditional Specification" ( FCS; in the output SPSS tells you what method it used) one should verify for FCS convergence, that is, whether it was achieved or not. Problem: This is the part in which I am terribly stuck because I am getting a lot of flat lines in my chart when I test whether FCS convergence was achieved (please look at this link for more info about how to do this: pic.dhe.ibm.com/infocenter/spssstat/v22r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fspss%2Ftutorials%2Fmi_fcs-convergence_telco_howto.htm). So, when I looked at my iteration history, for every set of imputed data I am getting the same value no matter what is the number of the iteration. For instance: in my dataset #1 I get the same imputed value for the total score of a questionnaire from iteration 1 to 10000, and so on until the last imputed dataset (these values remain the same within datasets but are different accross datasets). Finally my question: Why do you think FCS convergence is not achieved in this case or why my values are not changing from iteration to iteration? I have been looking in internet what to do about this besides increasing the # of iterations but there is almost no info about it. Please, would you mind giving me your thoughts about this? I will be so grateful.

    • @tacappaert
      @tacappaert  10 лет назад

      My guess is that the predicted values do not have any variance or have little ability to vary and your iterated values don't change. Generally, this is a good thing indicating that the predicted values are quite accurate being that they don't change between each iteration.

  • @kloveinn
    @kloveinn 10 лет назад

    Thanks..good stuff but video is zooming stupidly at times...needs better vid editor

  • @haliltokay3689
    @haliltokay3689 9 лет назад

    Thanks for some great videos.
    I get a warning message that says after 100 draws, the imputation algorithm cannot find an imputed value under the constraints for variable [X]. This is strange, because the variable is just a 7-point likert-scale. All "I don't know" responses are coded as 999, and as missing values. So, I tried to change the MAXCASEDRAWS. After a few attempts, it accepted 1000000000. I know.
    So, it ran the imputations. However, I was met with yet another Warning message. "Some missing values cannot be imputed because a factor in the model has a value that does not appear in the data used to build the model."
    Does anyone have any good suggestions for how I can solve this problem?
    Just FYI:
    - My data is Missing Not at Random (MNAR)
    - I have 55 variables
    - Sample size of 317.
    - Measurement scales: 7-point likert scale + 10-point evaluation scale.
    I hope someone will be able to help ASAP.
    Thank you.
    Halil

    • @tacappaert
      @tacappaert  9 лет назад

      I think you need to take a close look at the data codes you have used in the variables with missing data. For some reason SPSS cannot recognize those codes and cannot perform the imputation. If I read your post correctly you have coded both missing values and "I don't know" responses as "999". That might be the issue.

    • @haliltokay3689
      @haliltokay3689 9 лет назад

      *****
      Well, in the software I used for collecting data - all "I don't know" responses received the value 999 so they could easily be identified during data analysis. The, there are also some system-missing data which just do not have any value at all.
      But you believe this may be the source of the problem? I have now tried to recode all variables so the only type of missing values is 999. After doing so, I still get the same kind of Warnings.
      I have to change the MAXMODELPARAM =500 and the MAXCASEDRAWS=400000 , and still, SPSS does not want to impute the data properly. It says that 'some missing values cannot be imputed' AND 'after 800000 draws, the imputation algorithm cannot find an imputated value ...
      So.. Any good ideas for how to solve this problem??
      Btw. Thank you guys for such a quick response time!!!

    • @haliltokay3689
      @haliltokay3689 9 лет назад

      *****
      I also see that others have encountered a problem related to having ONE data-set with the pooled values.
      When I run the imputations, SPSS creates a new data file with the original data, the 1-5 imputations - and then that's it. In your video, it is the same. Up on the upper-right-hand section of the screen you can choose between original and the five imputations. But there is not one called pooled data.
      Is there any operations in SPSS you can do to have one data file with the pooled values WITHOUT the original data and the five imputations. I just want the pooled data for further analysis. How can I do that?

    • @tacappaert
      @tacappaert  9 лет назад

      Halil Tokay Any missing values should have cells without a code. The cells should be empty.

    • @haliltokay3689
      @haliltokay3689 9 лет назад

      *****
      Thanks. That worked. Thank you so much!
      Now, how do I transform my data so that I only have the pooled variables without all five imputations, and the original data set?
      I just want one dataset without missing values. I do not want all the imputations. Only the pooled variables.
      I need to do a PCA followed by regression analysis, so I need a dataset without missing values.

  • @yaldaamir2571
    @yaldaamir2571 10 лет назад

    I did and I left another comments about how it went! Not Good!