A simple thank you might not be appropriate for this great work you did and shared with the public and by doing this with me. So I want to tell you, if you ever feel down and or even feel worthless, remember that somewhere in Austria you made someone really happy by doing this tutorial!!! Thanks a lot. At first I thought it might be a bit long but it was worth every second and you did a really good job.
This was a very informative video. I am currently examining some longitudinal data and of course there is a significant amount of attrition. I initially ran a regression analysis using exclude cases listwise but I didn't feel this was the best way to analyze the data. This technique definitely helps address some of those issues. Thank you so much for posting this!
Thanks so much for your reply - sorry you misunderstood me, I've got 570 participants so I'll do a EM and see how I go. Thanks again, and thanks for doing the videos - I've just started my PhD and I'm sure I'll be tuning in quite a bit!
hello! thank you so much for the video. I have a question however. From what I understood you dont get one single databased with missing values replaced; you should work with the pooled results. So, my question is if there is any way to crate a new single database to import to other programs (for instance mplus or lisrel) and work on. I need to do that for CFA on my data...
Thank you for the tutorial. I just ran this on my dataset successfully. However, I was wondering if there is a way to obtain pooled means and 95% CI's across iterations. For inferential analyses (e.g., correlation), I am able to obtain the pooled statistics. However, when I use Analyze -> Descriptive Statistics -> Explore, it will only give me the descriptive for the original data and each iteration *individually*. Is there a way to obtain the pooled descriptive for variables? Also, is there a way for SPSS to generate a dataset that only contains the imputed data after the final iteration? Thanks!
I saw earlier on your comment what to do on this issue, but I was not able to set min or max value. However, I found out that you can adjust the parameter in the syntax. It did worked out as I saw all the imputed value on the output. Unfortunately, on the data view tab I couldn't see any imputed variable, nor the upper right option to switch different data files. So, what went wrong? Could you help me out? Thanks in advance!
The observed discrepancy was because some cases had missing values in all the three included variables for multiple imputations. The problem resolved by adding variable(s) in which these cases had values.
Hello, Thank you very much for showing this video. My question is once you get all the five imputed values, is there any rule of thumb as to which of the five you should use for your analysis? Also, I realize that in your t-test example, the pooled values did not have standard deviations. How about if you want to report Std Deviations in your study? If you can kindly let me know, I will appreciate this. How about if I want to create composite, which one of the 5 imputed values should I use? Tx
this is helpful. the use and purpose of the extra imputation history file might be better elaborated. was very nice to include some references! thanks!
First off, thank you so much for posting this video...it was very well made and I look forward to exploring other videos you have. As a follow up question to enemenoff's question...what are the differences for MI for random vs. non-random patterns? Did I miss that part in the video? Do you have a source I could visit? Thank you in advance!
hi, thank you for very helpful video. I followed all the steps but my output after running my first ANOVA, only showed the 5 imputations, not the pooled figures. how do I get the pooled figures?
Hello, the video was very helpful. I have a question regarding the use of the iterations. I had 5 iterations and the pooled iteration was not significant p > .05, but I noticed some of the others were significant. Do you ever use one of the iterations or are you only supposed to use the pooled results?
Thanks for this wonderful demonstration. I am facing a problem when I run this test. Number of missing values entered in the multiple imputation analysis was less than number of missing values across all the variables with missing data. Subsequently, completed data after imputation were less than my original data (valid plus missing). So, how can I fix this problem?
this video was very useful; thank you. however, even when splitting the file by imputation, i cannot get pooled analyses. spss will perform the analysis for the original data and each of the five imputations but will then only give me the means and standard deviations for the pooled data, not, for example, chi-square or t-test values; nor will it give me a p-value. why might this be?
In your video you said you could only use imputed data for the analyses that have a swirl on it. Is there any possibility to use imputed data with repeated measures analyses in SPSS and how might that work?
Hi - thanks for the video - it was really informative. Just a question though... my data has a small amount of missing data - 4 variables, 32 cases, 104 values with .5% missing data overall (104 cases). MCAR was non-significant p=.052 - just! I am running a CFA using AMOS therefore I cannot have missing data. Do you recommend conducting a EM or a Mutiple Imputation or neither? Plus how can I get AMOS to look at the pooled data when conducting the CFA? Thanks !!!!
Will it automatically use the pooled estimates even for more advanced later techniques, like SEM? I'm using AMOS to run my SEM, but I want to make sure the MI results will automatically get used for this (seeing as how it's an addon to SPSS). I was recently informed that you can't run a proper SEM if you have ANY missing data, so I wanted to make sure I fixed that problem...
Thank you very much for this tutorial. However, I notice you did not mention Little's Missing at Random test. Should this not be done prior to all imputation methods? Or is it sufficient to look at the Missing Pattern Values Graph? many thanks
Hi thanks for the video it's really useful! Can I just check what exactly the y axis represents when you are looking at the patterns in the diagrame with grey and red squares?
Thanks for this great video. I found it easy to understand. I now have a data file containing multiple imputations (5 imputations). My question is when reporting the univariate statistics and normality statistics, which results should I report given I have results from the original data set and results from the 5 separate imputations? Thank you in advance.
Thanks for this video--your RUclips channel is saving my life! My question is similar to ones asked previously but I could not make sense of the reply about merging data in SPSS. I have completed multiple imputation for missing data (went great!) but I want to move this dataset into listrel for structural equation modelling. How can I get a single data set with the pooled information, rather than having the individual datasets for the imputation displayed and then SPSS pooling them during any further analysis? Thanks, Carilynne
The raw data was a dummy variable regression so there are only 1 and 0. Also, the experimental design was such that each respondent had their own design where they saw either all or just a subset of the variables. So I am looking to fill in the coefficients for the variables they did not see.
This is a great presentation. I really enjoyed it. Unfortunately for me, as I tried to follow it to impute my missing data, I keep receiving a warning which says that the imputation model for some variable contains more than 100 parameters. Below is an example of such warnings: "An iteration history output dataset is requested, but cannot be written. The imputation model for SYNC2 contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand. Execution of this command stops." This is repeated for quite a number of variables. Can someone help me understand how to hand this trouble? Thank you. Juvenal Balisasa
Juvenal Balisasa This likely happening because there is data that SPSS cannot categorize or falls outside of the expected range that you specified. Be sure all categorical data has a coding value and be sure there are no numeric values that are out of the specific range.
How are degrees of freedom reported after a t-test is performed using multiple imputation? I see that the number of df for the pooled data can be in the thousands, and it does not feel right to report such a high number when the N = 50 for example. Any advice or paper or paper that discusses this issue? Thanks!
So you can impute data only for the variables where > 5% of data are missing? Or, if you impute for one you must impute for all variables that have any missing data? I ask because I have many variables and SPSS doesn't seem to be able to handle all of them at once. This means I have to create multiple imputed data sets and I'm not sure how to combine them all.
When writing a manuscript for a trial that has used multiple imputation to address missing data, what additional reporting should I include? Data pre and post imputation? Anything else?
I have a large number of variables and SPSS does not seem to be able to do the imputation with all the variables at once. So, I did groups of variable separately. However, I get multiple imputed data files. How do you recommend combining the data files?
Thanks, I was able to get it to work. But I had another question. After running my analyses (t-tests and chi-squares) with the imputed data, I noticed that the sample sizes for each variable on the output are still uneven which normally means some cases weren't used due to missing values. Are these sample sizes supposed to still be uneven? And I just report the total sample size?
I would like to do a MANOVA using my imputed dataset, however, when I run the analysis, there isn't any pooled output. Is it okay to report the output from the 5th imputation? Thank you for your great video and help.
thanks very helpful! I have a question - under the Analyse-ImputeMissingData- Constraints tab on the lower "define constraints" variables, SPSS won`t allow me to set Min and Max values for my variables - and I notice the table rows are coloured blue and not white as in the tutorial - could anybody help me work out how I can define my min-max variables?
Question: if results and parameters are "pooled" (and not averaged) what is the specific calculation? e.g. for bivariate correlations, or linear regression outputs, for example?
Hello ! Nice video! Any idea about how to calculate the reliability of a questionnaire in spss if we have missing data in some questions? Is it using the usual cronbach alpha?? And what should be putten in the cells of the missing data in spss?
Thank you for making this great video! I have actually done multiple imputation in Mplus and it generated 10 imputed datasets (all were .dat files). Is there a way to read these files as imputed data sets in SPSS? I need to do matched-pair t-tests by using these values. My stats consultant suggested that I ask SPSS to read these 10 imputed datasets individually, do 10 t-tests, and then average the t-value. However, I like how SPSS pooled the datasets first. Thank you!
Would you use the same process to determine mean and stand deviation of 'pooled data'. I would imagine you could use these estimates to standardise all variables and then re-run the regression on those to obtain the standardised regression coefficients (that SPSS also does not provide)?
Hi Thanks for The video, It is very helpful! A question that I have so far by just watching the video is that when applying "Constrains" min 27:19 there are 2 other options saying "maximum case draws" and "maximum parameters draws". could you please let me know what are those?
Not sure if you mentioned these, but I didn't succeed until I changed my missing value codes from 99 to blank (.) and changed ordinal variables into scales. Otherwise it wouldn't do the imputations and didn't even let me specify constrains. I had a Likert-scale of 1-5
I got the same problem, but I managed to run the multiple imputation by adjusting the maxmodelparam (in syntax), cause I was unable to change min and max values. However, I did not see the imputed variable in the data view table. Yet I did see the results of the imputed values in the output file. How do I get to see the imputed variable in the data view. Thanks in advance
I have missing item level data (from a scale with some missing items) and variable level missing data. Should I first impute the missing items so that everyone has a score on the variable with the items or should I just ignore the fact that I have items and just estimate the missing variable that is composed of the items? Thanks!
Thanks for the helpful video. if we need to remove outliers, removing outliers should run before or after imputing data? if removing outliers should run after imputing data, I wonder how to do that when we have 5 inputted data.
What if I have a conjoint study where I have 36 variables and 300 respondents but each respondent only saw a subset of the 36. So I now have a table where each row is a respondent with a constant and then coefficients for only 25 (or more) of the 35 variables. What would be the approach for replacing the missing values (i.e. the missing coefficients for those variables for that specific respondent)?
I have run an analysis like the one shown in SPSS 19, and it didnt provide in the output neither the pooled results nor the fraction of missing information. Under "Edition-->options-->multiple imputation" the option "results for imputed and observed sata" is choosen. Any idea about how can I make to get the pooled results and the fraction of missing information in my output?
This is brilliant! Thanks for posting.... What is the minimum total number of observations (including missing obs) that this technique will work with? I have a dataset with 18 observations from 10 cases (should have 180 points in total) and I am missing 10 data points... Would multiple imputation be appropriate for this two-way repeated measures design? Thanks.
That should work fine. I don't know that there is a minimum per se, but as long as your missing data is not the majority of the possible observations, it should work for you.
Hello, I really appreciate you sharing this video. It has helped me tremendously to figure out how to understand and implement this method for my data. Would it be possible for you to share the syntax? For some reason, my output for percentage missing (the first output you show us) does not show the mean and standard deviation of the variables in my output. I'm sure it's a just a line I missed in the syntax. Thank you!
I keep getting a warrning message such as "The imputation model for EDEQ14.1 contains more than 100 parameters. No missing values will be imputed..." Any advice on how to resolve this problem? I tried changing the measurement level but it didn't help. I wasn't sure how to do one of the other suggestions including: Reducing the number of effects in the imputation model, removing two-way interactions, or specifying constraints on the roles of some variables
hello, I have a few issues with my dataset: first of all, my dependent variable has a wooping 20% missing values (the question is rather sensitive, so I am considering running two models, one that uses this variable and another that uses a similar one, asked in a different way). Is this ok? Also, many of my variables are categorical or nominal (yes/no, agree/disagree etc). Can I still use this imputation method or is it just for numerical variables? Thanks.
I have MNAR type data with sometime 60 percent missing. What I understand is that if my data is NOT random and if I choose automatic from imputation method tab than SPSS will take care of the non randomness problem of data. Is that correct.
Great video, and very easy to understand! If I wanted to remove multiple imputation from a data set is it possible and how would it be done? Thank you!
Do you mean reversing the process, so that the missing values become missing again? I don't know if that is possible but as long as you saved the original data set, you can always revert to that.
When you run a hierarchical regression with MI dataset, the output does not provide R, R2, adjusted R square, and F value of pooled imputation (it only provides the calculations for the original and each imputed dataset). It also doesn’t provide beta (standardized coefficients) of pooled imputation (there were only unstandardized coefficients: B and Std. Error) either. However, given these are typical calculations reported in our results, how do we obtain these information from the pooled data?
Hi thanks for this helpful video. I have two questions: 1- do we have to include non missing variables in order to get a better prediction for the missing variables? 2- I need to do a propensity score matching after doing multiple imputation on the dataset with generated data, so I actually need a "pooled only" dataset which is the average of all as you said. is there a way to save the pooled only dataset or do I have to calculate the average for each variable and save it separately? thank you,
1. Yes, you should use as many variables as you can to improve the estimation of the missing values. 2. To the best of my knowledge you will have to calculate the mean for each variable.
then, which imputed value is to be used? the fifth one? or we have to avegare all 5 inputted data? it will be exhausting right? I also have other question. my data are mostly ordinal (likert scale). But when I tried to run the multiple imputation, the imputed values were beyond the allowable range, some of inputted values were negative, some others were not integer. When I changed the "measure" to "scale" instead of ordinal, then I set the max and min range as well as the rounding, I could get much more beautiful values. Was my approach right? The last question is the same with anastemi. But then I tried to solve it by specifying the role of each variable and it worked. The problem was, I actually didn't know the role of each variable. I just predicted what the role might be (it was actually my hypothesis for a model I tried to investigate). What do you think? I am afraid that my approach is wrong so the inputted value is not valid or something like that
There should be a pooled data value that you can use that aggregates all the imputed attempts. In regards to the ordinal data that was the correct approach.
***** Is it ok if I just average it? Since I heard that calculating the average is the simplest way to get the pooled data. Where can I found the pooled data?i didn't find any of it
Ilma Mufidah The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.
I have an query…..I am currently working on SPSS on a survey data…..It Contains Many Missing value’s……… Its is Not a Random sample(MNAR)…..what method should I use to replace Missing data ?
I have a question. I have imputed the data, and I want to conduct an anova test. In order to interpret the data, how do I need to read this ANOVA table? There is the origin solution, and 5 other solutions. However, I do not find a pooled solution. What do I need to do here?
thank you so much for this helpful video! I run multiple imputation on my data, but I would like to ask you, where are the pooled values you mentioned? I can only see the values in each imputation. Also, I have used many different questionnaires in my research,do you think it's better to run multiple imputation to each questionnaire separately?and some of them are multidimensional,does this affect multiple imputation?maybe I should run multiple imputation on the items of each factor separately?
The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.
I don't understand what values to use. I want a full table instead of the original one that has gaps where the missing data is. Can I use the New Imputation table? But what imputation do I use? Or do I fill in all the gaps with missing data with the same pooled mean from the t-test analysis?
I also get an empty Group Statistics table when doing the t-test. The mean is put out as zero and I get a warning saying "No statistics are computed for a split file in the Independent Samples table. The split file is: Imputation number=... The Independent Samples table is not produced." What am I doing wrong?
I have a dataset that has some missing values represented only by a . and others that have it as a -1 or -9. When I do the imputation the . values are imputed but the assigned missing values remain the same. How do you rectify this?
Hello, thank you for the video, it was very helpful. However when I ran multiple imputation on my data set, I got this message ' the imputation model for 'sex contains more than 100 parameters and no missing values will be added'. So a new data set was not created, Others have cited this as a problem, what should we do? I also have a large amount of missing data about 50%
Do we estimate missing nominal and ordinal values too? If not what we can do for missing nominal and ordinal values (For example nominal: gender, ordinal: perceived income as categorized by low medium high)?
Hello. I am using longitudinal survey responses with a biased drop out - so there is a great big red patch at the bottom right of my missing value patterns graph! Can you tell me the best multiple imputation method to use? If I delete cases I am also biasing the dataset. I have analysed the raw data so I know what I'm comparing it with but I am struggling with the method of imputation. It's also saved across 5 different datasheets - I need to combine it into one, don't I?! Thanks!
+Anna Pease I used the aggregate command to get all the pooled datasets back into one (I needed to do further imputations on my data, which I couldn't do (or couldn't figure out how to do) once I did one imputation. The thing to note with this, though, is that you won't be able to see which cases/variables have imputed data, like you can when they're not pooled. The syntax I used was this: AGGREGATE /OUTFILE='[location on my computer]\[newfilename.sav]' /BREAK=[variable to break by, which for me was survey participant ID] /[string variable 1]=FIRST([string variable 1]) /[string variable 2]=FIRST([string variable 2]) ... /[string variable x]=FIRST([string variable x]) /[imputed scale variable 1]=MEAN([imputed scale variable 1]) /[imputed scale variable 2]=MEAN([imputed scale variable 2]) ... /[imputed scale variable x]=MEAN([imputed scale variable x]).
Great video! I have run multiple imputation for 2 variables (missing categorical data for 12% of values), however, I notice after 5 iterations, I still have some missing values. Is this normal?
Thanks for the great video, helps a lot! Ive 2 types of missing data in my dataset (working with a questionnaire which has several versions) and Ive coded 2 types of missing data: -9 for actually missing (respondent didnt know / didnt want to give an answer) -99 for n.a. (respondent didnt see this question and therefore couldnt answer it) Therefore, I need to somehow exclude the -99 datapoints from the replacement. Any idea how to do this? Many, many thanks in advance!
I have a large number of variables in the imputation model (most of them are nominal) and I keep getting the same error message mentioned by Juvenal below, "...The imputation model for MODEL contains more than 100 parameters. No missing values will be imputed...." I checked all of the variables and they look fine (the nominal variables have values and the numeric variables are within the expected range. If I change one or more of the variables from nominal to scale it seems to work, but then it seems as though the imputations are not going to be accurate as they will be based on linear rather than logistic regression. Any suggestions?
+TheRMUoHP Biostatistics Resource Channel; Thank you so much for the helpful tutorial. I have this same Warning message every time I am trying to do multiple imputation "The imputation model for Q2_3_TO contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand"; although I have checked that all variables are either scale or nominal ones. I have something like 85 variables. What should I do?
how would i know that the missing data is MR, MCR or missing systematic. if we do not have MR, MCR then in case of systemic missing do we have some solution
+Fazal Haleem If your data is missing systematically, then that typically means there is some response bias of some kind (e.g. questions asking for sensitive information or questions that are unclear). You should try and figure why that might be happening so you can address that as a possible validity issue. This technique that I demonstrate can be used with data that are missing systematically.
Hey, thank you for the helpful tutorial. Still I have a huge problem with my imputation. After running, it imputes, values that are way to high or even negative. So I defined the range which leads to an error that says something like (mine is in german): "after 200 drawings spss can´t find the imputed value for the variable xxx with it´s defined contraints. Please check if the defined min and max is appropriate or choose a higher maximum case draws" So it stops the imputation. Can you help me with that? anika
Hi, thank you for this excellent video. My question appears to be a bit more basic than those below, but I was wondering whether there is any way to store the pooled data set in a separate file. You see I would like to use with an SPSS plug-in e.g. PROCESS, which I don't think will recognise the pooled values as SPSS did with the t-test above? Regards Tom
saja torunczyk not very simplistically although I think you could do it in r and port it back in. One option (not as good) might be to use expectation maximisation in SPSS?
Thanks for some great videos. I get a warning message that says after 100 draws, the imputation algorithm cannot find an imputed value under the constraints for variable [X]. This is strange, because the variable is just a 7-point likert-scale. All "I don't know" responses are coded as 999, and as missing values. So, I tried to change the MAXCASEDRAWS. After a few attempts, it accepted 1000000000. I know. So, it ran the imputations. However, I was met with yet another Warning message. "Some missing values cannot be imputed because a factor in the model has a value that does not appear in the data used to build the model." Does anyone have any good suggestions for how I can solve this problem? Just FYI: - My data is Missing Not at Random (MNAR) - I have 55 variables - Sample size of 317. - Measurement scales: 7-point likert scale + 10-point evaluation scale. I hope someone will be able to help ASAP. Thank you. Halil
I think you need to take a close look at the data codes you have used in the variables with missing data. For some reason SPSS cannot recognize those codes and cannot perform the imputation. If I read your post correctly you have coded both missing values and "I don't know" responses as "999". That might be the issue.
***** Well, in the software I used for collecting data - all "I don't know" responses received the value 999 so they could easily be identified during data analysis. The, there are also some system-missing data which just do not have any value at all. But you believe this may be the source of the problem? I have now tried to recode all variables so the only type of missing values is 999. After doing so, I still get the same kind of Warnings. I have to change the MAXMODELPARAM =500 and the MAXCASEDRAWS=400000 , and still, SPSS does not want to impute the data properly. It says that 'some missing values cannot be imputed' AND 'after 800000 draws, the imputation algorithm cannot find an imputated value ... So.. Any good ideas for how to solve this problem?? Btw. Thank you guys for such a quick response time!!!
***** I also see that others have encountered a problem related to having ONE data-set with the pooled values. When I run the imputations, SPSS creates a new data file with the original data, the 1-5 imputations - and then that's it. In your video, it is the same. Up on the upper-right-hand section of the screen you can choose between original and the five imputations. But there is not one called pooled data. Is there any operations in SPSS you can do to have one data file with the pooled values WITHOUT the original data and the five imputations. I just want the pooled data for further analysis. How can I do that?
***** Thanks. That worked. Thank you so much! Now, how do I transform my data so that I only have the pooled variables without all five imputations, and the original data set? I just want one dataset without missing values. I do not want all the imputations. Only the pooled variables. I need to do a PCA followed by regression analysis, so I need a dataset without missing values.
Hello: first I would like to thank you for this awesome video! It is super clear and super well explained! I have a question for you. Procedure: According to IBM once one runs MI, following the method of "Fully Conditional Specification" ( FCS; in the output SPSS tells you what method it used) one should verify for FCS convergence, that is, whether it was achieved or not. Problem: This is the part in which I am terribly stuck because I am getting a lot of flat lines in my chart when I test whether FCS convergence was achieved (please look at this link for more info about how to do this: pic.dhe.ibm.com/infocenter/spssstat/v22r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fspss%2Ftutorials%2Fmi_fcs-convergence_telco_howto.htm). So, when I looked at my iteration history, for every set of imputed data I am getting the same value no matter what is the number of the iteration. For instance: in my dataset #1 I get the same imputed value for the total score of a questionnaire from iteration 1 to 10000, and so on until the last imputed dataset (these values remain the same within datasets but are different accross datasets). Finally my question: Why do you think FCS convergence is not achieved in this case or why my values are not changing from iteration to iteration? I have been looking in internet what to do about this besides increasing the # of iterations but there is almost no info about it. Please, would you mind giving me your thoughts about this? I will be so grateful.
My guess is that the predicted values do not have any variance or have little ability to vary and your iterated values don't change. Generally, this is a good thing indicating that the predicted values are quite accurate being that they don't change between each iteration.
This was a very informative video. I am currently examining some longitudinal data and of course there is a significant amount of attrition. I initially ran a regression analysis using exclude cases listwise but I didn't feel this was the best way to compute the data. This technique definitely helps address some of those issues. Thank you so much for posting this!
The observed discrepancy was because some cases had missing values in all the three included variables for multiple imputations. The problem resolved by adding variable(s) in which these cases had values.
A simple thank you might not be appropriate for this great work you did and shared with the public and by doing this with me.
So I want to tell you, if you ever feel down and or even feel worthless, remember that somewhere in Austria you made someone really happy by doing this tutorial!!! Thanks a lot.
At first I thought it might be a bit long but it was worth every second and you did a really good job.
This was a very informative video. I am currently examining some longitudinal data and of course there is a significant amount of attrition. I initially ran a regression analysis using exclude cases listwise but I didn't feel this was the best way to analyze the data. This technique definitely helps address some of those issues. Thank you so much for posting this!
Thank you! Saved me and my thesis.
Thanks so much for your reply - sorry you misunderstood me, I've got 570 participants so I'll do a EM and see how I go. Thanks again, and thanks for doing the videos - I've just started my PhD and I'm sure I'll be tuning in quite a bit!
The video explains the concept in such a easy to follow steps. A great video for multiple imputation technique.
hello! thank you so much for the video.
I have a question however. From what I understood you dont get one single databased with missing values replaced; you should work with the pooled results. So, my question is if there is any way to crate a new single database to import to other programs (for instance mplus or lisrel) and work on. I need to do that for CFA on my data...
Excellent work you did here! Thank you.
Thanks for doing this! Very clear and helpful
what happens if your data is missing not at random? I did the lIttle's test and it was significant. I can't figure out which MI to do in that case
Thank you for the tutorial. I just ran this on my dataset successfully. However, I was wondering if there is a way to obtain pooled means and 95% CI's across iterations. For inferential analyses (e.g., correlation), I am able to obtain the pooled statistics. However, when I use Analyze -> Descriptive Statistics -> Explore, it will only give me the descriptive for the original data and each iteration *individually*. Is there a way to obtain the pooled descriptive for variables? Also, is there a way for SPSS to generate a dataset that only contains the imputed data after the final iteration?
Thanks!
I saw earlier on your comment what to do on this issue, but I was not able to set min or max value. However, I found out that you can adjust the parameter in the syntax. It did worked out as I saw all the imputed value on the output. Unfortunately, on the data view tab I couldn't see any imputed variable, nor the upper right option to switch different data files. So, what went wrong?
Could you help me out? Thanks in advance!
The observed discrepancy was because some cases had missing values in all the three included variables for multiple imputations. The problem resolved by adding variable(s) in which these cases had values.
Hello,
Thank you very much for showing this video. My question is once you get all the five imputed values, is there any rule of thumb as to which of the five you should use for your analysis? Also, I realize that in your t-test example, the pooled values did not have standard deviations. How about if you want to report Std Deviations in your study? If you can kindly let me know, I will appreciate this. How about if I want to create composite, which one of the 5 imputed values should I use? Tx
this is helpful. the use and purpose of the extra imputation history file might be better elaborated. was very nice to include some references! thanks!
First off, thank you so much for posting this video...it was very well made and I look forward to exploring other videos you have. As a follow up question to enemenoff's question...what are the differences for MI for random vs. non-random patterns? Did I miss that part in the video? Do you have a source I could visit? Thank you in advance!
hi, thank you for very helpful video. I followed all the steps but my output after running my first ANOVA, only showed the 5 imputations, not the pooled figures. how do I get the pooled figures?
Hello, the video was very helpful. I have a question regarding the use of the iterations. I had 5 iterations and the pooled iteration was not significant p > .05, but I noticed some of the others were significant. Do you ever use one of the iterations or are you only supposed to use the pooled results?
Thanks for this wonderful demonstration. I am facing a problem when I run this test. Number of missing values entered in the multiple imputation analysis was less than number of missing values across all the variables with missing data. Subsequently, completed data after imputation were less than my original data (valid plus missing). So, how can I fix this problem?
this video was very useful; thank you. however, even when splitting the file by imputation, i cannot get pooled analyses. spss will perform the analysis for the original data and each of the five imputations but will then only give me the means and standard deviations for the pooled data, not, for example, chi-square or t-test values; nor will it give me a p-value. why might this be?
In your video you said you could only use imputed data for the analyses that have a swirl on it. Is there any possibility to use imputed data with repeated measures analyses in SPSS and how might that work?
this is really awesome!
Hi - thanks for the video - it was really informative. Just a question though... my data has a small amount of missing data - 4 variables, 32 cases, 104 values with .5% missing data overall (104 cases). MCAR was non-significant p=.052 - just! I am running a CFA using AMOS therefore I cannot have missing data. Do you recommend conducting a EM or a Mutiple Imputation or neither? Plus how can I get AMOS to look at the pooled data when conducting the CFA? Thanks !!!!
Will it automatically use the pooled estimates even for more advanced later techniques, like SEM? I'm using AMOS to run my SEM, but I want to make sure the MI results will automatically get used for this (seeing as how it's an addon to SPSS). I was recently informed that you can't run a proper SEM if you have ANY missing data, so I wanted to make sure I fixed that problem...
Thank you very much for this tutorial. However, I notice you did not mention Little's Missing at Random test. Should this not be done prior to all imputation methods? Or is it sufficient to look at the Missing Pattern Values Graph? many thanks
Hi thanks for the video it's really useful! Can I just check what exactly the y axis represents when you are looking at the patterns in the diagrame with grey and red squares?
Is there a cut-off for using this method in terms of the percentage of cases missing for specific variables? All of my var's are missing
Thanks for this great video. I found it easy to understand. I now have a data file containing multiple imputations (5 imputations). My question is when reporting the univariate statistics and normality statistics, which results should I report given I have results from the original data set and results from the 5 separate imputations? Thank you in advance.
Thanks for this video--your RUclips channel is saving my life!
My question is similar to ones asked previously but I could not make sense of the reply about merging data in SPSS.
I have completed multiple imputation for missing data (went great!) but I want to move this dataset into listrel for structural equation modelling. How can I get a single data set with the pooled information, rather than having the individual datasets for the imputation displayed and then SPSS pooling them during any further analysis?
Thanks, Carilynne
The raw data was a dummy variable regression so there are only 1 and 0. Also, the experimental design was such that each respondent had their own design where they saw either all or just a subset of the variables. So I am looking to fill in the coefficients for the variables they did not see.
great video!!thank you!
This is a great presentation. I really enjoyed it. Unfortunately for me, as I tried to follow it to impute my missing data, I keep receiving a warning which says that the imputation model for some variable contains more than 100 parameters. Below is an example of such warnings: "An iteration history output dataset is requested, but cannot be written.
The imputation model for SYNC2 contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand.
Execution of this command stops."
This is repeated for quite a number of variables. Can someone help me understand how to hand this trouble? Thank you. Juvenal Balisasa
Juvenal Balisasa This likely happening because there is data that SPSS cannot categorize or falls outside of the expected range that you specified. Be sure all categorical data has a coding value and be sure there are no numeric values that are out of the specific range.
+TheRMUoHP Biostatistics Resource Channel
I have the same problem.
+Juvenal Balisasa
Have u solved your problem. I dpo have the same problem
How are degrees of freedom reported after a t-test is performed using multiple imputation? I see that the number of df for the pooled data can be in the thousands, and it does not feel right to report such a high number when the N = 50 for example. Any advice or paper or paper that discusses this issue? Thanks!
So you can impute data only for the variables where > 5% of data are missing? Or, if you impute for one you must impute for all variables that have any missing data? I ask because I have many variables and SPSS doesn't seem to be able to handle all of them at once. This means I have to create multiple imputed data sets and I'm not sure how to combine them all.
When writing a manuscript for a trial that has used multiple imputation to address missing data, what additional reporting should I include? Data pre and post imputation? Anything else?
I have a large number of variables and SPSS does not seem to be able to do the imputation with all the variables at once. So, I did groups of variable separately. However, I get multiple imputed data files. How do you recommend combining the data files?
Thanks, I was able to get it to work. But I had another question. After running my analyses (t-tests and chi-squares) with the imputed data, I noticed that the sample sizes for each variable on the output are still uneven which normally means some cases weren't used due to missing values. Are these sample sizes supposed to still be uneven? And I just report the total sample size?
I would like to do a MANOVA using my imputed dataset, however, when I run the analysis, there isn't any pooled output. Is it okay to report the output from the 5th imputation? Thank you for your great video and help.
thanks very helpful! I have a question - under the Analyse-ImputeMissingData- Constraints tab on the lower "define constraints" variables, SPSS won`t allow me to set Min and Max values for my variables - and I notice the table rows are coloured blue and not white as in the tutorial - could anybody help me work out how I can define my min-max variables?
paul, did you figure this out? i need to do the same thing...
Question: if results and parameters are "pooled" (and not averaged) what is the specific calculation? e.g. for bivariate correlations, or linear regression outputs, for example?
Hello ! Nice video! Any idea about how to calculate the reliability of a questionnaire in spss if we have missing data in some questions? Is it using the usual cronbach alpha?? And what should be putten in the cells of the missing data in spss?
For a scale score, would you calculate the aggregated variable from the pooled imputation iterations?
Thank you for making this great video!
I have actually done multiple imputation in Mplus and it generated 10 imputed datasets (all were .dat files). Is there a way to read these files as imputed data sets in SPSS? I need to do matched-pair t-tests by using these values. My stats consultant suggested that I ask SPSS to read these 10 imputed datasets individually, do 10 t-tests, and then average the t-value. However, I like how SPSS pooled the datasets first. Thank you!
Would you use the same process to determine mean and stand deviation of 'pooled data'. I would imagine you could use these estimates to standardise all variables and then re-run the regression on those to obtain the standardised regression coefficients (that SPSS also does not provide)?
Hi Thanks for The video, It is very helpful! A question that I have so far by just watching the video is that when applying "Constrains" min 27:19 there are 2 other options saying "maximum case draws" and "maximum parameters draws". could you please let me know what are those?
Not sure if you mentioned these, but I didn't succeed until I changed my missing value codes from 99 to blank (.) and changed ordinal variables into scales. Otherwise it wouldn't do the imputations and didn't even let me specify constrains.
I had a Likert-scale of 1-5
Glad, that worked. That is a pretty common issue.
+TheRMUoHP Biostatistics Resource Channel
I have to change my nominal to scale
I got the same problem, but I managed to run the multiple imputation by adjusting the maxmodelparam (in syntax), cause I was unable to change min and max values. However, I did not see the imputed variable in the data view table. Yet I did see the results of the imputed values in the output file.
How do I get to see the imputed variable in the data view. Thanks in advance
I have missing item level data (from a scale with some missing items) and variable level missing data. Should I first impute the missing items so that everyone has a score on the variable with the items or should I just ignore the fact that I have items and just estimate the missing variable that is composed of the items? Thanks!
Thanks for the helpful video. if we need to remove outliers, removing outliers should run before or after imputing data? if removing outliers should run after imputing data, I wonder how to do that when we have 5 inputted data.
Look for and remove outliers before imputation. These videos may help:
How to Use SPSS: Identifying Outliers
How to Use SPSS:Dealing with Outliers
What if I have a conjoint study where I have 36 variables and 300 respondents but each respondent only saw a subset of the 36. So I now have a table where each row is a respondent with a constant and then coefficients for only 25 (or more) of the 35 variables. What would be the approach for replacing the missing values (i.e. the missing coefficients for those variables for that specific respondent)?
thank you. Great presentation. I have one question. Does the imputation can only be focused on primary outcomes?
I have run an analysis like the one shown in SPSS 19, and it didnt provide in the output neither the pooled results nor the fraction of missing information. Under "Edition-->options-->multiple imputation" the option "results for imputed and observed sata" is choosen. Any idea about how can I make to get the pooled results and the fraction of missing information in my output?
This is brilliant! Thanks for posting.... What is the minimum total number of observations (including missing obs) that this technique will work with? I have a dataset with 18 observations from 10 cases (should have 180 points in total) and I am missing 10 data points... Would multiple imputation be appropriate for this two-way repeated measures design? Thanks.
That should work fine. I don't know that there is a minimum per se, but as long as your missing data is not the majority of the possible observations, it should work for you.
Hello, I really appreciate you sharing this video. It has helped me tremendously to figure out how to understand and implement this method for my data. Would it be possible for you to share the syntax? For some reason, my output for percentage missing (the first output you show us) does not show the mean and standard deviation of the variables in my output. I'm sure it's a just a line I missed in the syntax. Thank you!
I keep getting a warrning message such as "The imputation model for EDEQ14.1 contains more than 100 parameters. No missing values will be imputed..." Any advice on how to resolve this problem? I tried changing the measurement level but it didn't help. I wasn't sure how to do one of the other suggestions including: Reducing the number of effects in the imputation model, removing two-way interactions, or specifying constraints on the roles of some variables
hello, I have a few issues with my dataset: first of all, my dependent variable has a wooping 20% missing values (the question is rather sensitive, so I am considering running two models, one that uses this variable and another that uses a similar one, asked in a different way). Is this ok? Also, many of my variables are categorical or nominal (yes/no, agree/disagree etc). Can I still use this imputation method or is it just for numerical variables? Thanks.
I have MNAR type data with sometime 60 percent missing. What I understand is that if my data is NOT random and if I choose automatic from imputation method tab than SPSS will take care of the non randomness problem of data. Is that correct.
Great video, and very easy to understand! If I wanted to remove multiple imputation from a data set is it possible and how would it be done?
Thank you!
Do you mean reversing the process, so that the missing values become missing again? I don't know if that is possible but as long as you saved the original data set, you can always revert to that.
When you run a hierarchical regression with MI dataset, the output does not provide R, R2, adjusted R square, and F value of pooled imputation (it only provides the calculations for the original and each imputed dataset). It also doesn’t provide beta (standardized coefficients) of pooled imputation (there were only unstandardized coefficients: B and Std. Error) either. However, given these are typical calculations reported in our results, how do we obtain these information from the pooled data?
Hi thanks for this helpful video.
I have two questions:
1- do we have to include non missing variables in order to get a better prediction for the missing variables?
2- I need to do a propensity score matching after doing multiple imputation on the dataset with generated data, so I actually need a "pooled only" dataset which is the average of all as you said. is there a way to save the pooled only dataset or do I have to calculate the average for each variable and save it separately?
thank you,
1. Yes, you should use as many variables as you can to improve the estimation of the missing values.
2. To the best of my knowledge you will have to calculate the mean for each variable.
then, which imputed value is to be used? the fifth one? or we have to avegare all 5 inputted data? it will be exhausting right?
I also have other question.
my data are mostly ordinal (likert scale). But when I tried to run the multiple imputation, the imputed values were beyond the allowable range, some of inputted values were negative, some others were not integer. When I changed the "measure" to "scale" instead of ordinal, then I set the max and min range as well as the rounding, I could get much more beautiful values. Was my approach right?
The last question is the same with anastemi. But then I tried to solve it by specifying the role of each variable and it worked. The problem was, I actually didn't know the role of each variable. I just predicted what the role might be (it was actually my hypothesis for a model I tried to investigate). What do you think? I am afraid that my approach is wrong so the inputted value is not valid or something like that
There should be a pooled data value that you can use that aggregates all the imputed attempts.
In regards to the ordinal data that was the correct approach.
***** Is it ok if I just average it? Since I heard that calculating the average is the simplest way to get the pooled data. Where can I found the pooled data?i didn't find any of it
Ilma Mufidah
The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.
Great thanks! I never trust my 'subjective judgement', so I like to rely on both :)
I have an query…..I am currently working on SPSS on a survey data…..It Contains Many Missing value’s……… Its is Not a Random sample(MNAR)…..what method should I use to replace Missing data ?
I have a question. I have imputed the data, and I want to conduct an anova test. In order to interpret the data, how do I need to read this ANOVA table? There is the origin solution, and 5 other solutions. However, I do not find a pooled solution.
What do I need to do here?
+Suzanne Veger Unfortunately, not all inferential techniques don't pool the result such as we saw in the t-test example.
thank you so much for this helpful video! I run multiple imputation on my data, but I would like to ask you, where are the pooled values you mentioned? I can only see the values in each imputation.
Also, I have used many different questionnaires in my research,do you think it's better to run multiple imputation to each questionnaire separately?and some of them are multidimensional,does this affect multiple imputation?maybe I should run multiple imputation on the items of each factor separately?
The pooled data should be found in the output as demonstrated in the video. Be sure the data is categorized properly in the Variable View. Be sure it is set up as "Scale/Numeric" data.
I would run separate imputations for each questionnaire if they are measuring different constructs.
I don't understand what values to use. I want a full table instead of the original one that has gaps where the missing data is. Can I use the New Imputation table? But what imputation do I use? Or do I fill in all the gaps with missing data with the same pooled mean from the t-test analysis?
I also get an empty Group Statistics table when doing the t-test. The mean is put out as zero and I get a warning saying "No statistics are computed for a split file in the Independent Samples table. The split file is: Imputation number=... The Independent Samples table is not produced." What am I doing wrong?
I have a dataset that has some missing values represented only by a . and others that have it as a -1 or -9. When I do the imputation the . values are imputed but the assigned missing values remain the same. How do you rectify this?
+Georgina Martin If the -1 or -9 values are not actual outcome possibilities, then the values should be cleared and then you can run the imputation.
excellent thats what I did and it works!
Hello, thank you for the video, it was very helpful. However when I ran multiple imputation on my data set, I got this message ' the imputation model for 'sex contains more than 100 parameters and no missing values will be added'. So a new data set was not created, Others have cited this as a problem, what should we do? I also have a large amount of missing data about 50%
Try this solution direct from IBM: www-304.ibm.com/support/docview.wss?uid=swg21482103
Can I use this method to replace missing data if my data is not normally distributed and hence, I use non-parametric methods?
Yes, you can.
Do we estimate missing nominal and ordinal values too? If not what we can do for missing nominal and ordinal values (For example nominal: gender, ordinal: perceived income as categorized by low medium high)?
Yes, the procedure can estimate those values as well.
***** thanks for your care
Hello. I am using longitudinal survey responses with a biased drop out - so there is a great big red patch at the bottom right of my missing value patterns graph! Can you tell me the best multiple imputation method to use? If I delete cases I am also biasing the dataset. I have analysed the raw data so I know what I'm comparing it with but I am struggling with the method of imputation. It's also saved across 5 different datasheets - I need to combine it into one, don't I?! Thanks!
I would use the regression method of imputation.
+Anna Pease I used the aggregate command to get all the pooled datasets back into one (I needed to do further imputations on my data, which I couldn't do (or couldn't figure out how to do) once I did one imputation. The thing to note with this, though, is that you won't be able to see which cases/variables have imputed data, like you can when they're not pooled. The syntax I used was this:
AGGREGATE
/OUTFILE='[location on my computer]\[newfilename.sav]'
/BREAK=[variable to break by, which for me was survey participant ID]
/[string variable 1]=FIRST([string variable 1])
/[string variable 2]=FIRST([string variable 2])
...
/[string variable x]=FIRST([string variable x])
/[imputed scale variable 1]=MEAN([imputed scale variable 1])
/[imputed scale variable 2]=MEAN([imputed scale variable 2])
...
/[imputed scale variable x]=MEAN([imputed scale variable x]).
Great video! I have run multiple imputation for 2 variables (missing categorical data for 12% of values), however, I notice after 5 iterations, I still have some missing values. Is this normal?
Not usually. Be sure that you designated those variables to be imputed.
Thanks for the great video, helps a lot!
Ive 2 types of missing data in my dataset (working with a questionnaire which has several versions) and Ive coded 2 types of missing data:
-9 for actually missing (respondent didnt know / didnt want to give an answer)
-99 for n.a. (respondent didnt see this question and therefore couldnt answer it)
Therefore, I need to somehow exclude the -99 datapoints from the replacement.
Any idea how to do this?
Many, many thanks in advance!
You can exclude certain data points by using the Select Cases function and then run the analysis.
What do you do after you get results from 5 imputations?
You use that data to replace the missing data points and then run your additional analyses (e.g. t-test).
Is this also possible for panel data?
+Morten Fjerritslev Can you explain what you mean by panel data?
+Shirley anynameIwant, you should report descriptive statistics for pre and post imputation.
I have a large number of variables in the imputation model (most of them are nominal) and I keep getting the same error message mentioned by Juvenal below, "...The imputation model for MODEL contains more than 100 parameters. No missing values will be imputed...." I checked all of the variables and they look fine (the nominal variables have values and the numeric variables are within the expected range. If I change one or more of the variables from nominal to scale it seems to work, but then it seems as though the imputations are not going to be accurate as they will be based on linear rather than logistic regression. Any suggestions?
+Tim Wadsworth Variables that are ordinal in scale should be categorized as Scale.
+TheRMUoHP Biostatistics Resource Channel; Thank you so much for the helpful tutorial. I have this same Warning message every time I am trying to do multiple imputation "The imputation model for Q2_3_TO contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand"; although I have checked that all variables are either scale or nominal ones. I have something like 85 variables. What should I do?
is there a way to get a pooled R-squared value in multiple regression with MI data?
how would i know that the missing data is MR, MCR or missing systematic. if we do not have MR, MCR then in case of systemic missing do we have some solution
+Fazal Haleem If your data is missing systematically, then that typically means there is some response bias of some kind (e.g. questions asking for sensitive information or questions that are unclear). You should try and figure why that might be happening so you can address that as a possible validity issue. This technique that I demonstrate can be used with data that are missing systematically.
The missing variables in my data file have a value of '9'. How do I remove these dummy variables? Thank you.
When I push Ok (pattern) the computer is blocked .- Do someone knows what happened ¿?
Hey,
thank you for the helpful tutorial. Still I have a huge problem with my imputation. After running, it imputes, values that are way to high or even negative. So I defined the range which leads to an error that says something like (mine is in german): "after 200 drawings spss can´t find the imputed value for the variable xxx with it´s defined contraints. Please check if the defined min and max is appropriate or choose a higher maximum case draws" So it stops the imputation.
Can you help me with that?
anika
great
Hi, thank you for this excellent video. My question appears to be a bit more basic than those below, but I was wondering whether there is any way to store the pooled data set in a separate file. You see I would like to use with an SPSS plug-in e.g. PROCESS, which I don't think will recognise the pooled values as SPSS did with the t-test above?
Regards
Tom
I don't know if that is possible. I would suggest contacting IBM SPSS technical support.
Tom Bailey Tom, I am looking exactly for the same - a way to use PROCESS with data that have been imputed. Did you figure this out?
saja torunczyk not very simplistically although I think you could do it in r and port it back in. One option (not as good) might be to use expectation maximisation in SPSS?
Thanks for some great videos.
I get a warning message that says after 100 draws, the imputation algorithm cannot find an imputed value under the constraints for variable [X]. This is strange, because the variable is just a 7-point likert-scale. All "I don't know" responses are coded as 999, and as missing values. So, I tried to change the MAXCASEDRAWS. After a few attempts, it accepted 1000000000. I know.
So, it ran the imputations. However, I was met with yet another Warning message. "Some missing values cannot be imputed because a factor in the model has a value that does not appear in the data used to build the model."
Does anyone have any good suggestions for how I can solve this problem?
Just FYI:
- My data is Missing Not at Random (MNAR)
- I have 55 variables
- Sample size of 317.
- Measurement scales: 7-point likert scale + 10-point evaluation scale.
I hope someone will be able to help ASAP.
Thank you.
Halil
I think you need to take a close look at the data codes you have used in the variables with missing data. For some reason SPSS cannot recognize those codes and cannot perform the imputation. If I read your post correctly you have coded both missing values and "I don't know" responses as "999". That might be the issue.
*****
Well, in the software I used for collecting data - all "I don't know" responses received the value 999 so they could easily be identified during data analysis. The, there are also some system-missing data which just do not have any value at all.
But you believe this may be the source of the problem? I have now tried to recode all variables so the only type of missing values is 999. After doing so, I still get the same kind of Warnings.
I have to change the MAXMODELPARAM =500 and the MAXCASEDRAWS=400000 , and still, SPSS does not want to impute the data properly. It says that 'some missing values cannot be imputed' AND 'after 800000 draws, the imputation algorithm cannot find an imputated value ...
So.. Any good ideas for how to solve this problem??
Btw. Thank you guys for such a quick response time!!!
*****
I also see that others have encountered a problem related to having ONE data-set with the pooled values.
When I run the imputations, SPSS creates a new data file with the original data, the 1-5 imputations - and then that's it. In your video, it is the same. Up on the upper-right-hand section of the screen you can choose between original and the five imputations. But there is not one called pooled data.
Is there any operations in SPSS you can do to have one data file with the pooled values WITHOUT the original data and the five imputations. I just want the pooled data for further analysis. How can I do that?
Halil Tokay Any missing values should have cells without a code. The cells should be empty.
*****
Thanks. That worked. Thank you so much!
Now, how do I transform my data so that I only have the pooled variables without all five imputations, and the original data set?
I just want one dataset without missing values. I do not want all the imputations. Only the pooled variables.
I need to do a PCA followed by regression analysis, so I need a dataset without missing values.
Hello: first I would like to thank you for this awesome video! It is super clear and super well explained!
I have a question for you. Procedure: According to IBM once one runs MI, following the method of "Fully Conditional Specification" ( FCS; in the output SPSS tells you what method it used) one should verify for FCS convergence, that is, whether it was achieved or not. Problem: This is the part in which I am terribly stuck because I am getting a lot of flat lines in my chart when I test whether FCS convergence was achieved (please look at this link for more info about how to do this: pic.dhe.ibm.com/infocenter/spssstat/v22r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.cs%2Fspss%2Ftutorials%2Fmi_fcs-convergence_telco_howto.htm). So, when I looked at my iteration history, for every set of imputed data I am getting the same value no matter what is the number of the iteration. For instance: in my dataset #1 I get the same imputed value for the total score of a questionnaire from iteration 1 to 10000, and so on until the last imputed dataset (these values remain the same within datasets but are different accross datasets). Finally my question: Why do you think FCS convergence is not achieved in this case or why my values are not changing from iteration to iteration? I have been looking in internet what to do about this besides increasing the # of iterations but there is almost no info about it. Please, would you mind giving me your thoughts about this? I will be so grateful.
My guess is that the predicted values do not have any variance or have little ability to vary and your iterated values don't change. Generally, this is a good thing indicating that the predicted values are quite accurate being that they don't change between each iteration.
Thanks..good stuff but video is zooming stupidly at times...needs better vid editor
This was a very informative video. I am currently examining some longitudinal data and of course there is a significant amount of attrition. I initially ran a regression analysis using exclude cases listwise but I didn't feel this was the best way to compute the data. This technique definitely helps address some of those issues. Thank you so much for posting this!
The observed discrepancy was because some cases had missing values in all the three included variables for multiple imputations. The problem resolved by adding variable(s) in which these cases had values.