Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the RUclips channel where we post a new video almost three times per week: ruclips.net/channel/UCiujxblFduQz8V4xHjMzyzQ Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074 And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en Check it out! Thanks!
I just wanna say, thank you for making these videos. They are short, you keep it to the point and they're super helpful. In searching for these tutorial/help videos on youtube there's a lot of lesser stuff and it's hard to see the good from the bad. So I'm grateful that you take out your time to make these vids. Many thanks!
thank you once again! I read some suggestions which say that we should run the logistic regression as linear (for the independent, both continuous and dummy, variables) and get only the collinearity tests. So this is what I finally did. So, I guess you guessed right! thank you!
Hi James, I'm not sure about this method.. According to IBM this particular diagnostic ignores the DV (I've got a web address as a reference but RUclips wont let me post it here...) Anyway, I've tested this by having all my IV's in there and putting any random DV in and the values don't change at all. The reason your values are changing is because you're changing the mix of IV's not because of the DV.
Dear James Gaskin, I have seen most of your videos on SEM, AMOS and SPSS in youtube. It was great and amazing. I have learned SEM and AMOS and I have done the chapter 4 of my PhD, just by watching your videos. Books and some materials were so confusing and your videos provided a great help, Thank you so much
That's interesting. I just tried it as well and I experience the same thing: the IVs are independent of the DV when it comes to the VIF. I think the method will still work however. By removing one at a time and then replacing it, you are sort of "rotating" the set of IVs so that you can get a good picture of the inflation factors. I don't have my Hair et al 2010 book with me, but I bet they have something good to say on multicollinearity.
@belfagor80 Thanks for your feedback. My main goal with these videos (I have several) is not to explain the purpose (which is actually the more important aspect - I agree), but to demonstrate the mechanics. It is my hope that researchers know why they want to do the particular test, and then they just need help with HOW to do it.
@richmondody If you used mean-centered cross products, then yes, you can include these. Theoretically, mean-centering them should remove multicollinearity. Also, you should DEFINITELY mean center before creating cross products.
Hi James. I have in total 70 variables in a 5pt likert scale. How do I check multicollinearity for all of them. If I do it the way you have demonstrated in the video.I am getting very large VIF values. Kindly help.
@Yooogggeee Do it for ordinal variables just as I've described in this video. For nominal variables you do not need to do it because you should not be including nominal variables in a multiple regression analysis.
Always conduct an EFA unless you are using secondary data. You can clean your data using the data screening videos and wiki pages I've posted. As for combining the data, you'll need to do a levene's homogeneity of variance test. You should also conduct an EFA separately for each dataset to see if the items factor in the same way. That is further evidence that the two groups are not significantly different at the measurement level. If they EFAs match, then do them together.
@Yooogggeee No. Please do not include categorical variables in a regression equation unless you convert them to a set of dummy variables (but then it gets tricky...). To overcome multicollinearity, remove one of the overlapping variables. It has nothing to do with PCA vs. PAF.
VIF is the best way I think, but you could also just look at the correlation matrix. If you have any correlations above 0.90 then you run the risk of multicollinearity.
No, I don't think there is an issue. Typically VIF tests are done with three or more IVs. If you only have two IVs, then the VIF should actually be 1.000
Not necessarily. It is more likely a tautology issue. This means you are predicting something with itself. For example, if your IV is computer self-efficacy and your DV is confidence in using computers. These are the same essentially, so your R-square will be very high. To determine if these are tautological, do a discriminant validity analysis to see if they are truly distinct constructs.
Hello James, I would like to know about checking collinearity between a continuous and a categorical variable, and also between 2 categorical variables. Is there a Cut off value for cohens D while running T -test (given P value is significant) : to check collinearity between one independent and one categorical variable? and is there any Cut off value for Cramers V while running chi square test (given P value is significant): to check collinearity between 2 categorical variables?
I suppose it is possible, but highly improbable that you would find collinearity between categorical variables, or between a continuous and a categorical variable. I'm not aware of any cuttoffs for Cohen's D for collinearity (I've never used it for that) or for Cramers V (never used Cramers V before....). Best of luck to you!
I've never heard of an R-Matrix... If you want to do a multicollinearity test, then create Latent Factor Scores for each of the 1st order dimensions of the 2nd order construct. Then do a multicollinearity test as shown in the video. To get Latent Variable Scores, watch my video about handling 2nd order formative constructs in SmartPLS.
So it means, the numbers in VIF column represent how the particular "IV" (in a row) is collinear/related with (chosen) "DV". Not that the numbers under VIF represents collinearity between each other? Thank you for help and your time to answer.
I would run them together (but without the others) and see which has the lowest tolerance after switching them out one at a time. Then remove that one if I am not too emotionally attached to it. If so, then maybe remove the one I care least about.
Thanks for the reply James. What are the tests for outlier detection one should go for in such a scenario. I have a survey form for two sets of respondents. In which certain common questions are being answered by both the set of respondents. One scale has 70 items/variables and the other one has 100 items/variables. The items have been taken from estb scales. So do I need to go for an EFA separately for both the sets of questionnaires? What are the tests I should conduct first to clean my data?
I'm not sure I understand the problem. If I might guess though, you are probably using fewer than 3 variables in the regressions. You need at least three.
Hello James, thanks for video. Somewhere about 1:30 in the video you say that the independend values in the model are multicollinear with the dependend one. I dont feel to be in the position to question you, but just to make sure: isnt it that in RA the imputs (IV) should be uncorrelated among each other (an ideal case), but somehow related to the dependend variable? Meaning that the collinearity test should treat the overaleaping (stealing variability) between independends? (not IV to DV)
Hi James, thank you for the useful Video. I have an unsure point about the VIF. I have 5 independent variables A,B,C,D,E. When I put the A in to the dependent variable box the VIF between A and B is X, but when I put B in to the dependent variable box the VIF between B and A is not the same X. So is that correct?
@@Gaskination Could you please tell me in this case which value of VIF should I take? I mean if I need to write values of VIF for each IV, which value I should write? because every time I put an IV into the DV box the first value of VIF for the IV for changes! I would be so happy if you could tell me what to do in this case, Thank you :)
@@lavieee6354 One approach to solve this is to just use a random variable (i.e., a new variable with random values) as the DV. Then you only need to conduct the test once.
@@Gaskination I appreciate your kind reply ~ in my case I have 4 IV (ABCD) and I have to write in a table the VIF value of each variable, if I use A as a dependent variable and I conduct the test once then I’ll get VIF values of only 3 variables (BCD) so how about A ?
@@lavieee6354 As noted, you could instead generate a new variable with a random set of values (you can do this pretty easily in Excel) and then use that new variable as the DV so that you can use all four IVs in the VIF analysis.
Depends how you interpret the VIFs, knowing where multicollinearity exists and in what quantities is definitely good for a few discussion points (my work is predomiantly in geospatial and it comes up all the time and is fairly much unavoidable). Read this, it deflates the whole issue of VIFs and argues against the arbitrary rules that people have put on it over the years. O’brien, R. M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 673-690.
Hi James! Thank you so much! It is a very clear video. I have one maybe silly question - if I test my variables of logistic regression using this method, would that also work?
Hi James; I have a second order formative construct with 16 reflective first order indicators. Do I have to do the multicollinearity test? if yes, do I have to do it for 16 times? what is the different between VIF and R-Matrix?
Hi James, when you are running the collinearity diagnositics, do you include you outcome variable that you are interested in, into the model whilst testing iteratively for collinearity?
Hi Thanks for your interesting explanation. I have a question. Do you need to add the control variables also in as independent variables when you are testing multicollinearity and homoscedasticity? Or do you do these test only with the dependent variable, direct independent variable and for example a moderator?
two stage least square i think there is a process before we have to do and the out puts we have to use them as an instrument variables. i found i think a Japanese video but i don't know the language.
Hi James and thanks for all these amazing videos. What happens in the case you have only one independent variable and 4 dependent ones in your model? Can/should you still perform this test? Or shall I just perform CFA on AMOS. Thank you again!
or to test for multicollinearity in logistic regression, I see the standard errors of the coefficients, and if they are greater than 2, then there is an indication of such problem? (if I understood correctly from the ppt)
Hi Mr. James Gaskin. Thank you very much for very informative tutorials on research. I have one question please. How "Lateral Collinearity Assessment" be determined? Thanks please
Sorry for asking again.I read the ppt, but still do not understand. My model is:dependent variable dummy (1,0),and independent variables both dummy and continuous/numerical variables. My question is: to test for multicollinearity, could I ran linear regressions of the continuous variables (as dependent variables) against the rest independent variables and get the VIFs? Would that be correct? In that case, how should I treat the independent dummy variables? I am sorry if this is a silly question
If you google "multicollinearity in logistic regression spss" (without quotes) and then select the third result -- a powerpoint called "Solving Stepwise Logistic Regression Problems.ppt" -- it will tell you how to do it.
Hi, James. I have two IVs and one DV. But during the rotations of testing multicollinarity, when one of the IV went to the position of DV, the result of VIF are 5.2. Other two scenarios are fine. Do you think there is a multicollinearity problem exists in my case? Thanks so much!
Hello James, when trying to conduct a hierarchical regression in SPSS, the output continues to provide excluded variables. However, when I run collinearity diagnostics, all of the VIFs are under 2. What else can be causing this to occur?
Dear James, thanks a lot for all your great videos. We have a question about Multicollinearity. We tried to test multicollinearity for a formative construct with two indicators. So for testing multicolinearity we treated one as the independent variable and one as the dependent and then the other way round. In both scenarios the VIF turns out to be 1.00. Does this mean that there is no multicollinearity at all (so this is good) or is testing for multicollinearity simply not possible if we only have 2 indicators of a formative construct? Many thanks in advance! Anja
Thank you. I have a multicollinearity problem. The correlation between two IV is .72. However, its VIF is about 2.4. And when I look at its condition indices, its condition index is about 38. In my understanding, the condition index above 30 indicates severe multicollinearity problems. What do you think?
+Mweene Mimba Different thresholds are provided by many different sources. For example, Field (2009) says a VIF of 5 is fine: --Field, A. 2009. Discovering statistics using SPSS: Sage publications. These others also suggest a VIF of 5: --Cenfetelli, R.T., Bassellier, G. (2009). Interpretation of formative measurement in information systems research. MIS Quarterly. 33(4) 689-707. --Diamantopoulos, A., Siguaw, J.A. (2006). Formative versus reflective indicators in organizational measure development: A comparison and empirical illustration. British Journal of Management. 17(2006) 263-282. --Petter, S., Straub, D., Rai, A. (2007). Specifying formative constructs in information systems research. MIS Quarterly. 31(4) 623-656.
Hi James, I'm not sure that this is correct. I don't think that you have to do all that clicking. The measure of tolerence for an independent variable X that SPSS gives you already takes into account all the other independent variables in your model. Tolerence is equal to 1 - R^2, where R^2 is the proportional reduction in error in a model which regresses your independent variable X on all the rest of the independent variables. I hope this makes sense. Paz y amor. Adam
I think I need to make a new video for this. This is a video I made after learning from my professor how to do it. But now that I've been around the block a bit more, I can see the weaknesses in the approach. I'll need to make a new video that focuses on tolerance. Thanks for the input!
+James Gaskin Great, but could you be bit more specific about the Tolerance concept as Tolerance = 1-R^2 , and as close to 0 is bad if I am correct. In that case is there any other way to check this other than keep on changing my IV i.e. X
+Udayan Goswami Tolerance greater than 0.100 is best, approaching 1.0 is ideal. If you cannot achieve this, then one of your IVs is too similar to another one. You might consider combining them into a single 2nd order factor.
Thanks James Gaskin could you kindly guide me is there any other way to check this other than the process which you explained in this video. is there any plan to publish new tutorial video ?
Love your videos - they are so helpful. I was just wondering if you can help me. I have 20 independent variables and have run univariable analysis to see which variables I should use to build a multivariable binary logistic model of predictors for my outcome. I am included all variables with a p0.9 - I am not sure what this means!! I found your video and thought I could use the test you demonstrate here... however your output provides a VIF. Is it the same? Please help!! Thank you :)
Hello Mr. Gaskin, I have a regressionmodel with around 40-50 variables. I want to test for multicollinearity. Of course if I would do this like you did I will spend a lot of time by clicking in and out and check it for every variable. Do you know another method where I can get it immediately for each variable?
Just stick the actual DV in the DV spot and do it once. This will give you a close enough approximation of the VIF to detect multicollinearity. However, a regression with 40-50 exogenous variables seems pretty atheoretical...
Ahmad Usman if you find this with your factors, you can try to do an EFA with just the two factors with high VIF. Check the cross loadings to see which items should probably be removed in order to increase the distance between those factors
And what if both of these factors were found as one factor in EFA? Is there anything we can do instead of just deleting one whole factor from the model? I saw your EFA (SEM series 2016) video but you did not encounter this problem there. Commulative variance is 58.79 percent which is a case of multicollinearity, any solution in mind? *This is for PhD Research and not for Paper.
Hi James, am using 5 point likert scale for my questionnaire, so if i want to check collinearity do i need to "compute" variables first out of the "questionnaire items scores" or i just use these items directly. ( when you say variable do you mean questionnaire item or not?)
+Arif Abdu Multicollinearity is assessed for predictor variables. So, if you have multiple antecedents, then you should test for multicollinearity. To do this, you would first have to collapse them each into a single factor score as shown in this video: ruclips.net/video/dsOS9tQjxW8/видео.html
Hello James. When I put my predictor variables into regression equation (one by one) I got quite decent results in terms of explaining the outcome variable. However, when I try to count in all of them together (total five predictor variables btw), one of the variable's regression coefficient turns out to be negative (which definitely makes no sense at all). I've checked it out and what I got is that there might be a multicollinearity problem. However, there were no any multicollinearity issues with the data as well. Is that Simpson's Paradox? If so, how to report those results? Thank you so much!
This is quite common in multiple regression. Regression (unlike correlation) accounts for the variance explained by other variables. Therefore, if a predictor has an effect when examined alone, it may still disappear, or even reverse, when examined while accounting for other predictors. I recommend just reporting what you found (just as you said here).
For continuous, it will work the same. For categorical, you will need to break up the categories into dummy variables first, and then do it the same as in this video.
Hi again, how can i do the 2sls regression model in spss. They ask for instrumental variables as many as explanatory variables! where do i get these from ?
I don't know what a 2sls is. As for instrumental variables, maybe they are referring to the variables in your dataset that came from some sort of data collection instrument (such as a survey).
+Kim Rodriguez That might help, but there are some downsides. Here is a good reference: learnitdaily.com/six-ways-to-address-collinearity-in-regression-models/
Dear James, I have a poor explained variance on my DVs in my model and I am suspecting a multicollinearity issues. My DVs are latent each consist of 4 and 5 indicators. How can I perform the steps for detecting multicollinearity as you show it above since I am not using composite model. Moreover, can you kindly advise me of some factors that could affect the explained variance to be very low. Thanks
I'm not aware of a way to test for multicollinearity with latent variables. The reason you have low R2 is because the IVs simply are not related to the DV very well.
Hi James! I am wondering, why is it that when I am testing multicollinearity between only two predictors, I get an VIF value of 1? I know that my two predictors are correlated significantly at 0.7, so I do not expect for their VIF value to be equal to 1? So, I have two predictors that I want to compare, I entered one as the DV and the other as the IV. The predictors were correlated at r= 0.7,
In this case, use the actual DV as the dependent variable. VIFs are compared between predictors. So, if there is only one predictor, then it will always give you a value of 1.
VIF < 3: not a problem VIF > 3; potential problem VIF > 5; very likely problem VIF > 10; definitely problem O’brien, R. M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 673-690. (link.springer.com/article/10.1007/s11135-006-9018-6)
Hi James, Is there any guidelines on how to troubleshoot an "The following covariance matrix is not positive definite" in AMOS? The problem occurs when running my measurement model. This error remains even when I delete variables with VFI> 3
i have a question. how to check multicollinearity in questionnaire? if the questionnaire consist of 6 independent and 1 dependent variable but there are many items of each variable. SPSS took each item as single variable. suppose i have a variable ATTITUDE as an independent variable and having 6 items, and INTENTION TO BUY as an dependent variable and having 2 items. now SPSS took it as 6 independent variables and 2 dependent variable. now how to check multicollinearity? in linear regression i can only enter 1 dependent variable!!!
sorry if my question is out of topic, what if I have 5 latent variable and 2 of them are 7 likert scale while the other 3 have 5 likert scale. can I just make a regression?
Hi James, is it possible if the collinearity result on 2 independent variable are the same, both VIF and tolerance? fyi, i only hve 2 independent variables and 1 dependent variable. Thank you :)
Hi James, I was wondering if you know how to do this test but with panel data? Because I really like you method for testing this with "normal" data with only one observation per variable. At this moment, I have 17 different variables and I want to check which ones are to correlated (and are pretty much the same) so that I can reduce the amount of variables. But my problem is that I work with panel data so I don't know how I can handle this via SPSS or even STATA. Can you help me? Kind regards
Hi Mr. Gaskin, So I ran the test and it seems I'm actually dealing with a high level of multicollinearity (up till 6). Now that I know this, what should I do about it? Can I still interpret my regressions correctly?
Cassy Storms Sorry for the delay. I was hosting an SEM Boot Camp last week and then yesterday was my wife’s birthday. Today is my first day back… To answer your question, the less strict threshold is actually 10 for the VIF and less than 0.100 for the tolerance value. So you should be fine.
Hi Mr James, i would like you ask if i have 1 dependent variables and 7 independent variables, two of the iv turns out to be 7.++. Is it okay? And we don't have to test the dependent variable for multicollinearity?
7 is high, but not beyond the maximum limit of 10. The DV does not need to be tested for multicollinearity because it can't be multicollinear with anything if it is not predicting anything.
If only two IVs, then no need to test against each other. Instead, create a random variable (a variable consisting of random numbers) and make that the DV.
You can do the same as in this video since the outcome variable is just binary. If it were multinomial, then there would be a problem requiring a different approach.
James Gaskin sir, in my study there is one dependent binary variable n couple of independent variables of ordinal, categorical and continuous nature...wt is the best way to test collinearity??
You can include the continuous and ordinal IVs in the regression, with the binary DV, but do not include the categorical variable unless you split it first into dummy variables (binary values to represent each category - except the reference category).
Hi Mr, i would ask you if it's possible to put all the items of the latent variable(9 items) in independents and put a nominal variable (like sexe) in dependent ..so i do the test of multicollinearity in one time for each latent variable. second question: this test is n't for exogenous variables , did i understand you correctly?
Manar Ibraheem For nominal dependent variables, use logistic regression. Unfortunately, there is no VIF calculation in logistic regression. Thus, I suppose you could try it in linear regression just for the VIF, but then switch to logistic for the hypothesis testing... The test is for determining if exogenous variables explain overlapping variance in the dependent variable. If you have multicollinearity problems, then you need to remove one of the exogenous variables with the problem. Or you can list it as a limitation.
Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the RUclips channel where we post a new video almost three times per week: ruclips.net/channel/UCiujxblFduQz8V4xHjMzyzQ
Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074
And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en
Check it out! Thanks!
I just wanna say, thank you for making these videos. They are short, you keep it to the point and they're super helpful. In searching for these tutorial/help videos on youtube there's a lot of lesser stuff and it's hard to see the good from the bad. So I'm grateful that you take out your time to make these vids. Many thanks!
Thank you for your amazing tutorial! I'm so happy now that I've confirmed I have no multicollinearity issue. Have a great day! :-)
thank you once again! I read some suggestions which say that we should run the logistic regression as linear (for the independent, both continuous and dummy, variables) and get only the collinearity tests. So this is what I finally did. So, I guess you guessed right! thank you!
Hi James,
I'm not sure about this method..
According to IBM this particular diagnostic ignores the DV (I've got a web address as a reference but RUclips wont let me post it here...)
Anyway, I've tested this by having all my IV's in there and putting any random DV in and the values don't change at all.
The reason your values are changing is because you're changing the mix of IV's not because of the DV.
Thanks for making this available! Excellent work.
Dear James Gaskin, I have seen most of your videos on SEM, AMOS and SPSS in youtube. It was great and amazing. I have learned SEM and AMOS and I have done the chapter 4 of my PhD, just by watching your videos. Books and some materials were so confusing and your videos provided a great help, Thank you so much
+behrang samadi So glad to have been helpful.
Hope you graduated now 🙂
Dear James, Thanks for the amazing videos.Indeed they are super helpful and far better than the books , which sometimes are too confusing .Many thanks
Man you've saved my essays more than once thank you
Very interesting and well organized video...Big up!
You are exactly correct. However, the "DV" in this test is simply one of the IVs that is rotated in and out. That is how you do the VIF test.
Thank you for uploading it! :)
That's interesting. I just tried it as well and I experience the same thing: the IVs are independent of the DV when it comes to the VIF. I think the method will still work however. By removing one at a time and then replacing it, you are sort of "rotating" the set of IVs so that you can get a good picture of the inflation factors. I don't have my Hair et al 2010 book with me, but I bet they have something good to say on multicollinearity.
@belfagor80 Thanks for your feedback. My main goal with these videos (I have several) is not to explain the purpose (which is actually the more important aspect - I agree), but to demonstrate the mechanics. It is my hope that researchers know why they want to do the particular test, and then they just need help with HOW to do it.
still usable for students. thank you....
Nice and easy explained, thank you
@richmondody
If you used mean-centered cross products, then yes, you can include these. Theoretically, mean-centering them should remove multicollinearity. Also, you should DEFINITELY mean center before creating cross products.
thank you for the so quick reply!
Hi James. I have in total 70 variables in a 5pt likert scale. How do I check multicollinearity for all of them. If I do it the way you have demonstrated in the video.I am getting very large VIF values. Kindly help.
@Yooogggeee
Do it for ordinal variables just as I've described in this video. For nominal variables you do not need to do it because you should not be including nominal variables in a multiple regression analysis.
This is great. Thank you!
Always conduct an EFA unless you are using secondary data. You can clean your data using the data screening videos and wiki pages I've posted. As for combining the data, you'll need to do a levene's homogeneity of variance test. You should also conduct an EFA separately for each dataset to see if the items factor in the same way. That is further evidence that the two groups are not significantly different at the measurement level. If they EFAs match, then do them together.
Thanks for your instructive video. I was wondering whether dummy variables can also be checked for multicollinearity in this way.
@Yooogggeee
No. Please do not include categorical variables in a regression equation unless you convert them to a set of dummy variables (but then it gets tricky...). To overcome multicollinearity, remove one of the overlapping variables. It has nothing to do with PCA vs. PAF.
VIF is the best way I think, but you could also just look at the correlation matrix. If you have any correlations above 0.90 then you run the risk of multicollinearity.
No, I don't think there is an issue. Typically VIF tests are done with three or more IVs. If you only have two IVs, then the VIF should actually be 1.000
Correct. You must have 2 variables at least in the IV spot.
Not necessarily. It is more likely a tautology issue. This means you are predicting something with itself. For example, if your IV is computer self-efficacy and your DV is confidence in using computers. These are the same essentially, so your R-square will be very high. To determine if these are tautological, do a discriminant validity analysis to see if they are truly distinct constructs.
Hello James, I would like to know about checking collinearity between a continuous and a categorical variable, and also between 2 categorical variables.
Is there a Cut off value for cohens D while running T -test (given P value is significant) : to check collinearity between one independent and one categorical variable?
and is there any Cut off value for Cramers V while running chi square test (given P value is significant): to check collinearity between 2 categorical variables?
I suppose it is possible, but highly improbable that you would find collinearity between categorical variables, or between a continuous and a categorical variable. I'm not aware of any cuttoffs for Cohen's D for collinearity (I've never used it for that) or for Cramers V (never used Cramers V before....). Best of luck to you!
I've never heard of an R-Matrix... If you want to do a multicollinearity test, then create Latent Factor Scores for each of the 1st order dimensions of the 2nd order construct. Then do a multicollinearity test as shown in the video. To get Latent Variable Scores, watch my video about handling 2nd order formative constructs in SmartPLS.
So it means, the numbers in VIF column represent how the particular "IV" (in a row) is collinear/related with (chosen) "DV". Not that the numbers under VIF represents collinearity between each other? Thank you for help and your time to answer.
I would run them together (but without the others) and see which has the lowest tolerance after switching them out one at a time. Then remove that one if I am not too emotionally attached to it. If so, then maybe remove the one I care least about.
Thanks for the reply James. What are the tests for outlier detection one should go for in such a scenario. I have a survey form for two sets of respondents. In which certain common questions are being answered by both the set of respondents. One scale has 70 items/variables and the other one has 100 items/variables. The items have been taken from estb scales. So do I need to go for an EFA separately for both the sets of questionnaires? What are the tests I should conduct first to clean my data?
Thank you James
I'm not sure I understand the problem. If I might guess though, you are probably using fewer than 3 variables in the regressions. You need at least three.
Hello James, thanks for video. Somewhere about 1:30 in the video you say that the independend values in the model are multicollinear with the dependend one. I dont feel to be in the position to question you, but just to make sure: isnt it that in RA the imputs (IV) should be uncorrelated among each other (an ideal case), but somehow related to the dependend variable? Meaning that the collinearity test should treat the overaleaping (stealing variability) between independends? (not IV to DV)
Hi James, thank you for the useful Video. I have an unsure point about the VIF. I have 5 independent variables A,B,C,D,E. When I put the A in to the dependent variable box the VIF between A and B is X, but when I put B in to the dependent variable box the VIF between B and A is not the same X. So is that correct?
Yes, that is correct. The VIF is calculated with respect to the other variables in the IV box.
@@Gaskination Could you please tell me in this case which value of VIF should I take? I mean if I need to write values of VIF for each IV, which value I should write? because every time I put an IV into the DV box the first value of VIF for the IV for changes! I would be so happy if you could tell me what to do in this case, Thank you :)
@@lavieee6354 One approach to solve this is to just use a random variable (i.e., a new variable with random values) as the DV. Then you only need to conduct the test once.
@@Gaskination I appreciate your kind reply ~ in my case I have 4 IV (ABCD) and I have to write in a table the VIF value of each variable, if I use A as a dependent variable and I conduct the test once then I’ll get VIF values of only 3 variables (BCD) so how about A ?
@@lavieee6354 As noted, you could instead generate a new variable with a random set of values (you can do this pretty easily in Excel) and then use that new variable as the DV so that you can use all four IVs in the VIF analysis.
Depends how you interpret the VIFs, knowing where multicollinearity exists and in what quantities is definitely good for a few discussion points (my work is predomiantly in geospatial and it comes up all the time and is fairly much unavoidable).
Read this, it deflates the whole issue of VIFs and argues against the arbitrary rules that people have put on it over the years.
O’brien, R. M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 673-690.
Hi James! Thank you so much! It is a very clear video. I have one maybe silly question - if I test my variables of logistic regression using this method, would that also work?
+Alina Pavlova I'm not sure, but I think the rule should still apply.
Thanks James.
If a variable was showing multicollinearity, then you would consider dropping it or one of the variables it was overlapping.
Hi James; I have a second order formative construct with 16 reflective first order indicators. Do I have to do the multicollinearity test? if yes, do I have to do it for 16 times? what is the different between VIF and R-Matrix?
Thanks a lot!!!
Hi James, when you are running the collinearity diagnositics, do you include you outcome variable that you are interested in, into the model whilst testing iteratively for collinearity?
Thanks!
Hi Thanks for your interesting explanation. I have a question. Do you need to add the control variables also in as independent variables when you are testing multicollinearity and homoscedasticity? Or do you do these test only with the dependent variable, direct independent variable and for example a moderator?
It is good practice to include the control variables.
two stage least square i think there is a process before we have to do and the out puts we have to use them as an instrument variables. i found i think a Japanese video but i don't know the language.
Hi James and thanks for all these amazing videos. What happens in the case you have only one independent variable and 4 dependent ones in your model? Can/should you still perform this test? Or shall I just perform CFA on AMOS. Thank you again!
+Marios Pournaris No need to do this test if you have only one IV.
or to test for multicollinearity in logistic regression, I see the standard errors of the coefficients, and if they are greater than 2, then there is an indication of such problem? (if I understood correctly from the ppt)
Hi Mr. James Gaskin. Thank you very much for very informative tutorials on research. I have one question please. How "Lateral Collinearity Assessment" be determined? Thanks please
I'm not sure, but here is an article that talks all about it: aisel.aisnet.org/cgi/viewcontent.cgi?article=1615&context=jais
Sorry for asking again.I read the ppt, but still do not understand. My model is:dependent variable dummy (1,0),and independent variables both dummy and continuous/numerical variables. My question is: to test for multicollinearity, could I ran linear regressions of the continuous variables (as dependent variables) against the rest independent variables and get the VIFs?
Would that be correct? In that case, how should I treat the independent dummy variables? I am sorry if this is a silly question
If you google "multicollinearity in logistic regression spss" (without quotes) and then select the third result -- a powerpoint called "Solving Stepwise Logistic Regression Problems.ppt" -- it will tell you how to do it.
Thank you!
Hi, James. I have two IVs and one DV. But during the rotations of testing multicollinarity, when one of the IV went to the position of DV, the result of VIF are 5.2. Other two scenarios are fine. Do you think there is a multicollinearity problem exists in my case? Thanks so much!
Thank you :)
Hello James, when trying to conduct a hierarchical regression in SPSS, the output continues to provide excluded variables. However, when I run collinearity diagnostics, all of the VIFs are under 2. What else can be causing this to occur?
Dear James,
thanks a lot for all your great videos.
We have a question about Multicollinearity. We tried to test multicollinearity for a formative construct with two indicators. So for testing multicolinearity we treated one as the independent variable and one as the dependent and then the other way round. In both scenarios the VIF turns out to be 1.00. Does this mean that there is no multicollinearity at all (so this is good) or is testing for multicollinearity simply not possible if we only have 2 indicators of a formative construct?
Many thanks in advance!
Anja
Use the DV in the dependent slot in SPSS instead. This way you have both IVs in the independent slots. This will show multicollinearity.
how to construct a better model and interpret the output based on the outcome of the Multicollinearity?
whether multicollinearity needs to be tested in cox regression. I have categorical independent variables. Thanks
I'm not sure. You can test it and see if it is a problem. My guess is that it should be fine with categorical IVs.
Hey Guys, you can also check it with the Tolerance value. If it's around 0.2 or less you have multicolinearity. C'est tout
Fernando Alvarado Blohm Thanks!
how do i test for multicollinearity in logistic regression? Is there a test similar to VIF in SPSS?
Thank you. I have a multicollinearity problem. The correlation between two IV is .72. However, its VIF is about 2.4. And when I look at its condition indices, its condition index is about 38. In my understanding, the condition index above 30 indicates severe multicollinearity problems. What do you think?
+KYUNG EUN JAHNG I've never heard of the condition index. So, I'm not sure. Usually a VIF less than 3.0 is fine.
+Mweene Mimba Different thresholds are provided by many different sources. For example, Field (2009) says a VIF of 5 is fine:
--Field, A. 2009. Discovering statistics using SPSS: Sage publications.
These others also suggest a VIF of 5:
--Cenfetelli, R.T., Bassellier, G. (2009). Interpretation of formative measurement in information systems research. MIS Quarterly. 33(4) 689-707.
--Diamantopoulos, A., Siguaw, J.A. (2006). Formative versus reflective indicators in organizational measure development: A comparison and empirical illustration. British Journal of Management. 17(2006) 263-282.
--Petter, S., Straub, D., Rai, A. (2007). Specifying formative constructs in information systems research. MIS Quarterly. 31(4) 623-656.
Hi James, I'm not sure that this is correct. I don't think that you have to do all that clicking. The measure of tolerence for an independent variable X that SPSS gives you already takes into account all the other independent variables in your model. Tolerence is equal to 1 - R^2, where R^2 is the proportional reduction in error in a model which regresses your independent variable X on all the rest of the independent variables. I hope this makes sense. Paz y amor. Adam
I think I need to make a new video for this. This is a video I made after learning from my professor how to do it. But now that I've been around the block a bit more, I can see the weaknesses in the approach. I'll need to make a new video that focuses on tolerance. Thanks for the input!
I wait the new video
+James Gaskin Great, but could you be bit more specific about the Tolerance concept as Tolerance = 1-R^2 , and as close to 0 is bad if I am correct. In that case is there any other way to check this other than keep on changing my IV i.e. X
+Udayan Goswami Tolerance greater than 0.100 is best, approaching 1.0 is ideal. If you cannot achieve this, then one of your IVs is too similar to another one. You might consider combining them into a single 2nd order factor.
Thanks James Gaskin could you kindly guide me is there any other way to check this other than the process which you explained in this video. is there any plan to publish new tutorial video ?
Love your videos - they are so helpful. I was just wondering if you can help me. I have 20 independent variables and have run univariable analysis to see which variables I should use to build a multivariable binary logistic model of predictors for my outcome. I am included all variables with a p0.9 - I am not sure what this means!! I found your video and thought I could use the test you demonstrate here... however your output provides a VIF. Is it the same? Please help!! Thank you :)
Hello Mr. Gaskin,
I have a regressionmodel with around 40-50 variables. I want to test for multicollinearity. Of course if I would do this like you did I will spend a lot of time by clicking in and out and check it for every variable. Do you know another method where I can get it immediately for each variable?
Just stick the actual DV in the DV spot and do it once. This will give you a close enough approximation of the VIF to detect multicollinearity. However, a regression with 40-50 exogenous variables seems pretty atheoretical...
Is it multicollinearity because more than one IV exceeds the threshold? If it was only one IV with a VIF above 3, would it then be collinearity?
The name means that multiple variables share the same linear path.
It is very useful as in how to detect it, can you please explain how to address this issue using spss or amos? What do we do if there is such issue?
Ahmad Usman if you find this with your factors, you can try to do an EFA with just the two factors with high VIF. Check the cross loadings to see which items should probably be removed in order to increase the distance between those factors
And what if both of these factors were found as one factor in EFA? Is there anything we can do instead of just deleting one whole factor from the model? I saw your EFA (SEM series 2016) video but you did not encounter this problem there. Commulative variance is 58.79 percent which is a case of multicollinearity, any solution in mind?
*This is for PhD Research and not for Paper.
Try to force it to two factors (in the extraction menu, choose the other option, not eigenvalues. Hopefully this is for research and a paper.
Hi James,
am using 5 point likert scale for my questionnaire, so if i want to check collinearity do i need to "compute" variables first out of the "questionnaire items scores" or i just use these items directly. ( when you say variable do you mean questionnaire item or not?)
+Arif Abdu Multicollinearity is assessed for predictor variables. So, if you have multiple antecedents, then you should test for multicollinearity. To do this, you would first have to collapse them each into a single factor score as shown in this video: ruclips.net/video/dsOS9tQjxW8/видео.html
Hello James. When I put my predictor variables into regression equation (one by one) I got quite decent results in terms of explaining the outcome variable. However, when I try to count in all of them together (total five predictor variables btw), one of the variable's regression coefficient turns out to be negative (which definitely makes no sense at all).
I've checked it out and what I got is that there might be a multicollinearity problem. However, there were no any multicollinearity issues with the data as well. Is that Simpson's Paradox? If so, how to report those results? Thank you so much!
This is quite common in multiple regression. Regression (unlike correlation) accounts for the variance explained by other variables. Therefore, if a predictor has an effect when examined alone, it may still disappear, or even reverse, when examined while accounting for other predictors. I recommend just reporting what you found (just as you said here).
how to consider in case of categorical and continuous variables?
For continuous, it will work the same. For categorical, you will need to break up the categories into dummy variables first, and then do it the same as in this video.
Hi again, how can i do the 2sls regression model in spss. They ask for instrumental variables as many as explanatory variables! where do i get these from ?
I don't know what a 2sls is. As for instrumental variables, maybe they are referring to the variables in your dataset that came from some sort of data collection instrument (such as a survey).
Would I be able to run a stepwise regression method in the linear regression analysis to reduce collinearity?
+Kim Rodriguez That might help, but there are some downsides. Here is a good reference: learnitdaily.com/six-ways-to-address-collinearity-in-regression-models/
No, just the predictors of the outcome variable.
Hi James, is true that when you have large R square in your model like 84% is likely that you have collinearity problem?
Dear James,
I have a poor explained variance on my DVs in my model and I am suspecting a multicollinearity issues. My DVs are latent each consist of 4 and 5 indicators. How can I perform the steps for detecting multicollinearity as you show it above since I am not using composite model.
Moreover, can you kindly advise me of some factors that could affect the explained variance to be very low.
Thanks
I'm not aware of a way to test for multicollinearity with latent variables. The reason you have low R2 is because the IVs simply are not related to the DV very well.
Hi James!
I am wondering, why is it that when I am testing multicollinearity between only two predictors, I get an VIF value of 1? I know that my two predictors are correlated significantly at 0.7, so I do not expect for their VIF value to be equal to 1?
So, I have two predictors that I want to compare, I entered one as the DV and the other as the IV. The predictors were correlated at r= 0.7,
In this case, use the actual DV as the dependent variable. VIFs are compared between predictors. So, if there is only one predictor, then it will always give you a value of 1.
So high VIF values indicate multicollinearity? And that is of course bad and what we wanna avoid? Correct?
correct
what do you mean by "IVs" for a particular model??
IV stands for Independent Variables (the variables doing the predicting).
what is the VIF cutoff point suggested by SPSS?
VIF < 3: not a problem
VIF > 3; potential problem
VIF > 5; very likely problem
VIF > 10; definitely problem
O’brien, R. M. 2007. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Quality & Quantity, 41, 673-690. (link.springer.com/article/10.1007/s11135-006-9018-6)
Hi James,
Is there any guidelines on how to troubleshoot an "The following covariance matrix is not positive definite" in AMOS? The problem occurs when running my measurement model. This error remains even when I delete variables with VFI> 3
Check the notes for the model. It might give you some clues. Some of these tips might also help: ruclips.net/video/B7YOv7hSohY/видео.html
Multicollinearity is only an issue with formative variables or with endogenous variables. So, don't do it for reflective latent variables.
Hi why didn't you use your Collinearity Diagnostic table when you were running MR?
Because the VIF gives me the information I need. However, you can use the other table as well.
thank you.
i have a question. how to check multicollinearity in questionnaire? if the questionnaire consist of 6 independent and 1 dependent variable but there are many items of each variable. SPSS took each item as single variable. suppose i have a variable ATTITUDE as an independent variable and having 6 items, and INTENTION TO BUY as an dependent variable and having 2 items. now SPSS took it as 6 independent variables and 2 dependent variable. now how to check multicollinearity? in linear regression i can only enter 1 dependent variable!!!
jehanzeb aslam you would have to use composite score or averages for each set of items.
sorry if my question is out of topic, what if I have 5 latent variable and 2 of them are 7 likert scale while the other 3 have 5 likert scale. can I just make a regression?
+shinx2ran The difference in scales is not a big deal.
Great
Hi James, is it possible if the collinearity result on 2 independent variable are the same, both VIF and tolerance? fyi, i only hve 2 independent variables and 1 dependent variable.
Thank you :)
Desi Rosita Yes. it is possible.
Hi James, I was wondering if you know how to do this test but with panel data? Because I really like you method for testing this with "normal" data with only one observation per variable.
At this moment, I have 17 different variables and I want to check which ones are to correlated (and are pretty much the same) so that I can reduce the amount of variables. But my problem is that I work with panel data so I don't know how I can handle this via SPSS or even STATA. Can you help me?
Kind regards
+Ophélie Anné Sorry, I've never worked with panel data before...
my name is riky, i want to ask you, what if there are mediating variables, please teach me step by step, thanks
Here is a video that may help: ruclips.net/video/ICnh3s2FG14/видео.html
Yes
Hi Mr. Gaskin,
So I ran the test and it seems I'm actually dealing with a high level of multicollinearity (up till 6). Now that I know this, what should I do about it? Can I still interpret my regressions correctly?
Cassy Storms Sorry for the delay. I was hosting an SEM Boot Camp last week and then yesterday was my wife’s birthday. Today is my first day back… To answer your question, the less strict threshold is actually 10 for the VIF and less than 0.100 for the tolerance value. So you should be fine.
Hi Mr James, i would like you ask if i have 1 dependent variables and 7 independent variables, two of the iv turns out to be 7.++. Is it okay? And we don't have to test the dependent variable for multicollinearity?
7 is high, but not beyond the maximum limit of 10. The DV does not need to be tested for multicollinearity because it can't be multicollinear with anything if it is not predicting anything.
James Gaskin Thank you very much! Have a nice day.
If I only have 2 IVs, do I have to do the test twice?
If only two IVs, then no need to test against each other. Instead, create a random variable (a variable consisting of random numbers) and make that the DV.
yes.
How to check collinearity in case of binary logistic regression????
You can do the same as in this video since the outcome variable is just binary. If it were multinomial, then there would be a problem requiring a different approach.
James Gaskin sir, in my study there is one dependent binary variable n couple of independent variables of ordinal, categorical and continuous nature...wt is the best way to test collinearity??
You can include the continuous and ordinal IVs in the regression, with the binary DV, but do not include the categorical variable unless you split it first into dummy variables (binary values to represent each category - except the reference category).
James Gaskin sir thanks a lot!
Is there a reference for this method?
Probably any statistics book will suffice. Hair et al 2010 (or any newer edition) talks about VIFs to test multicollinearity.
q: is multi-collinearity is only a time-serious problem? or cross-sectional as well? thank you
cross-sectional as well. It is any time the IVs might be explaining overlapping variance in the DV.
Hi James, great videos. Thank you. Does the VIF show the correlation between each IV, OR each IV's correlation with the DV?
Can you also do this for multiple moderator variables?
for interactions, yes, but not for multigroup.
Hi Mr, i would ask you if it's possible to put all the items of the latent variable(9 items) in independents and put a nominal variable (like sexe) in dependent ..so i do the test of multicollinearity in one time for each latent variable. second question: this test is n't for exogenous variables , did i understand you correctly?
and how to solve the multicollinearity problem .?
Manar Ibraheem For nominal dependent variables, use logistic regression. Unfortunately, there is no VIF calculation in logistic regression. Thus, I suppose you could try it in linear regression just for the VIF, but then switch to logistic for the hypothesis testing...
The test is for determining if exogenous variables explain overlapping variance in the dependent variable.
If you have multicollinearity problems, then you need to remove one of the exogenous variables with the problem. Or you can list it as a limitation.
James Gaskin thank you
James Gaskin i did the test , VIF is always about 3 but the tolernace is 0,5 or above for all the variables.!!!!!!!!!!!!!!!!!!!!!!!
I love you
I'm not sure. I am not very familiar with hierarchical regression... good luck. sorry to not be more help.