Thank you so much for this useful video. I have a question. After doing the final step and creating the factors, for one of the subjects that I have, all factors become zero. Could you explain why this happens? Thank you.
Is it important to show 95% confidence ellipse in PCA? why it is so? If my data is not drawing it then what should i do ? can i used PCA score graph without 95% confidence ellipse?
what to do when our independent variables are of mixed scale…. For example i have likert scale responses and also discrete( 2 or 3 scale) responses….. is there any way to perform PCA/EFA on such mixed data?????
Hello Mike, great video! A small question. If we perform an EFA using principal axis factoring and the number of components extracted accounts for 56 % of total variance , is it advisable to consider those 4 ones?
Hi there. When you are performing PCA or EFA and look into the tables containing component/factor loadings, you will see manifest variables have loadings on all retained components/factors. Since most of the loadings for a manifest variable will be non-zero across components/factors, then the loadings can all be considered cross-loadings. The issue you are speaking to is the question of which component/factor should a manifest variable be used for in its definition. To do that, you should not only be concerned with the issue of meeting a minimum loading criterion for definition, but also that the difference in the magnitude of cross-loadings across components/factors is sufficiently great enough that it makes sense to treat the manifest variable as an indicator of one component/factor as opposed to others. I actually cover this topic in a more recent presentation at ruclips.net/video/6Ycw3_iQHD0/видео.html. Also, the following Powerpoint associated with that video can be download at drive.google.com/file/d/16-arUCU95wrU2G-aAykmiVbMWAzFEsJF/view . I believe in the example, I use a loading differential of .30 as a partial basis in addressing the issue of dealing with cross loadings. I hope this helps! (by the way, my use of 'component/factor' in the above comment is not meant to indicate they are the same. There are mathematical differences between them. But I used that for shorthand since I didn't want to keep saying 'component or factor' over and over again). Cheers!
@@mikecrowson2462 I truly appreciate your comment and I will definitely check out those materials. To summarize, if there is a differential of less than .30, let us say .29, then, it would be considered a cross loading for PCA, other things being equal correct?
@@hectorponce2012 If you are using a .30 difference threshold as a basis for deciding whether an item exhibits non-trivial cross-loadings on multiple factors, then any number less (such as .29) would be considered an indication of no real issue. Of course, the use of cutpoints has it's own limitations. So when deciding on whether an item is defining one factor and not others, you should really are thinking about loadings in two ways: (a) is there a substantial enough loading (e.g., .40 or >) on a given factor that it is appropriate to treat it as an indicator of that factor? And (b) is there a large enough difference in the loading involving the factor the item is most closely associated with and the loadings on the remaining factors to justify a claim that the item is best treated as an indicator of one factor and not others (if though it will likely have non-zero loadings on the other factors)? Hope this helps!
@@hectorponce2012 Check out the Powerpoint associated with the video I posted links to in my earlier reply. You should see a reference in there. best wishes!
Hi Mike! Can I give my questionnaire to my experimental and control groups even BEFORE conducting EFA? Due to lack of time to learn and run EFA, the questionnaire I gave to my experimental and control groups is not yet reduced and NO identified factors are determined as of yet. My plan is to conduct EFA later on. When reduced items are already available, that's the time that I will be selecting appropriate responses from my study groups for my analysis. Please comment. Thank you
I have a large number of scale items and I need to narrow it down to factors in order to test the hypothesis, so I used factor analysis, I noticed that I can have 3 component, however 1 component when I ran the reliability test for it it was very low (0.4) it is understandable since it only has 2 items, I can't use it because of this. I decided to do what you did as in limit the components to 2 and in the transformation matrix the first component is 0.682 (correlated, I think the word?) with the second, while the second is the negative of that number (-0.682) with the first component and with each other 1 with 1 and 2 with 2 (0.731). Are the values acceptable and can I continue using 2 components? if yes, then how would I interpret the results in my thesis about the component transformation matrix (the one I just mentioned its values) and the total variance explained table (since it shows 3 component not 2)
Hi. I'm a bit unclear about exactly you are asking. By the way, you might consider viewing my most recent video on PCA in SPSS (you can also download a Powerpoint underneath the video description): ruclips.net/video/6Ycw3_iQHD0/видео.html I have another video on exploratory factor analysis here too: ruclips.net/video/M6FUT0h-bhY/видео.html I hope you find these helpful and they clarify things for you. Best wishes.
Thank you for very informative video Sir. Furthermore, I have a question. Take for example of this video, results suggest 2 pc's are significant that have eigenvalue more than 1 and it creates two variables in data sheet. How I can transform these two components into one variable?
@@barhom11eriqat98 Please follow the below mentioned articles Malik, A. H., bin Md Isa, A. H., bin Jais, M., Rehman, A. U., & Khan, M. A. (2021). Financial stability of Asian Nations: Governance quality and financial inclusion. Borsa Istanbul Review. Malik, A. H., Jais, M. B., Isa, A. H. M., & Rehman, A. U. (2022). Role of social sustainability for financial inclusion and stability among Asian countries. International Journal of Social Economics.
Thanks for your question. You have the same question that perplexes many folks learning about these techniques! In the past (and I dare say even today), these two procedures have been used for the same purpose - that is, exploratory factor analysis (EFA). A number of authors (e.g., Russell, 2002) have argued against the use of PCA for performing EFA both for mathematical, conceptual, and interpretative reasons. EFA is based on the common factor model and the analyst attempts to explain only the associations among a set of measured variables - factoring out the unique variance (i.e., specific variance) associated with those variables. This means that with EFA, the analyst is attempting to explain the covariation (not the total variation) by identifying factors that produce that association. PCA centers on breaking down the total variation in a set of measured variables and does not make any demarcation between the variation that is shared and unshared among a set of measured variables. So you can say that EFA is aimed at explaining covariation among a set of variables, whereas PCA is aimed at breaking down the total variation. The common factor model assumes that unobserved factors "produce" the observed covariation among a set of measured variables. PCA does not make this assumption, and is instead used for creating new linear composites of the original variables. Although the two approaches look and behave similarly, their assumptions and aims are quite different. Probably what makes this more confusing is the fact that PCA is sometimes used as a preliminary step (using scree plot or parallel analysis) when attempting to identify number of factors. I could keep going but this is getting long. So, think of it this way: EFA generally is used to explore underlying factors, whereas PCA (as a stand-alone procedure) would be oriented towards data reduction (i.e., reducing a larger number of variables down to a smaller set; which can be useful for addressing multicollinearity when using other procedures). I have a couple of other videos on PCA and EFA with additional learning materials you might check out at ruclips.net/video/6Ycw3_iQHD0/видео.html (PCA) and ruclips.net/video/M6FUT0h-bhY/видео.html (EFA using principal axis factoring). There are also references in the videos. [one other thing: varimax rotation is a strategy for increasing interpretability of factors in EFA or components in PCA. It has nothing to do with the choice of carrying out EFA or PCA, though). I hope this helps. Cheers!
@@mikecrowson2462 Thank you very much fast answer.Finally , I understand main maybe only one difference is using principal component or principal axis , isnt it?Secondly , according to my research social science use usually correlation matrix and chemistry uses covariance matrix , isnt it?
Hi there. I'm not clear what you are asking on the first question. On the second, I don't know about disciplinary differences in the use of correlation versus covariance matrix. I'm a social scientist and have always relied on the correlation matrix when factoring. However, I don't know about other disciplines such as chemistry. Best wishes.
You can also use Brian O'connor's syntax to carry out parallel analysis: see ruclips.net/video/xRsiMQ1CLfI/видео.html . Under the video description is a link to the page where the syntax can be found. .
HI EVERYONE, I HAVE JUST UPLOADED (OCTOBER 2019) AN NEW VIDEO ON THIS TOPIC ALONG WITH ADDITIONAL INSTRUCTIONAL MATERIALS (THAT CAN BE DOWNLOADED) AT: ruclips.net/video/6Ycw3_iQHD0/видео.html . Please be sure to check it out!
Hi Rachel, to be honest I can't recall that comment as this video was made long ago. Nevertheless, no offense was intended. That said, please allow me to briefly elaborate on the point I believe I was try to make (albeit inarticulately): I've come to realize over the years is that while software programs like SPSS do make things easier for folks by prepackaging options and canned output, there is a downside. And that is that all those pre-packaged options and buttons do not encourage users to know why certain options should be preferred or utilized. And sometimes we learn patterns that give us output without really knowing the limitations of the output that is generated or perhaps how to maximize the information that we do get. This is not a sleight on you or anyone who chooses these options. Heck, when I was learning to use SPSS, I learned to rely on patterns and program defaults myself. But the more I have read and thought about what buttons I was pushing, I came to realize that the trade-off for ease-of-use can sometimes be a kind of statistical tunnel vision because program defaults and options only convey the program developer's narrow view of what should be 'in' and what should be left 'out'. [And to be honest, SPSS has seemed to be slow over the years in expanding their options to support best practices based on the most recent literatures.] All this is to say that statistics is a way of thinking systematically about data, and all programs offer a range of options that may or may not be best for accomplishing your aims as a researcher for a given circumstance and for your particular data. My hope is that you, me, and all users of the various statistical programs out there don't simply take what is given to us at face-value, and do not assume that what is given represents the only or necessarily the best way to do things. At any rate, maybe this is a longer comment that it needed to be. However, the stat prof in me just had to respond. Thank you so much for visiting and offering your thoughts. Just an FYI, I do have another video here (ruclips.net/video/6Ycw3_iQHD0/видео.html) on PCA, and you can download data to practice and a supplemental Powerpoint. Best wishes!
@@mikecrowson2462 I am so sorry I meant that as a joke!!! I should have realized it might not come off that way. I found your video incredibly helpful as I am trying to relearn SPSS on the fly!
Hi Rachel, no worries at all! I was thinking it could be a joke, but decided to answer 'just in case'. Thank you for your clarification, and I'm really happy you are finding my videos helpful! Best wishes as you keep relearning SPSS!
Great explanations (but as a speaker, you need to drop the very frequent use of "uh's" and "kind of" because they are distracting to acquiring the content).
Don't know how I would show gratitude for this kind of great work you have done. Take a prostrated bow, Sir.
Best practical video for PCA, Millions of thanks!
This is insane! Feels like going to class with a great lecturer. Thank you so much! Helped me with my final year project!
This video helped me get through my stats hw. Thank you so much for your time in crafting it I'm sure it took a long time!
i dont kow why i am loving your interpretation
Very good and clear video explaining how to run a PCA analysis! Thank you very much!!
Wow...great lecture I ever found in you tube. Thank you so much sir.
The best video. thank you very much and love from sri lanka!!!
Thank you so much for this helpful video ✨
That was a great video!! learnt and understood a lot. Thanks!
Thank you so much for this useful video. I have a question. After doing the final step and creating the factors, for one of the subjects that I have, all factors become zero. Could you explain why this happens? Thank you.
Thank you very much for this video. It helped me a lot!
Thank you for this informative video. 😇😇
Love your video man...
Thank you for this amazing explanation. Cheers.
Hello, may I know the variables exactly? And what is the data about?
Thanks for your video. It was very helpful.
Sir i did same but unfortunately kmo box didn't come in results! What could b the reason even though i clicked on kmo ??
Where is the principal components in this video??
Very helpful. Great detail!
Hi Mike, what is the best way to decide on which rotation method to use on my data?
Is it important to show 95% confidence ellipse in PCA? why it is so? If my data is not drawing it then what should i do ? can i used PCA score graph without 95% confidence ellipse?
what to do when our independent variables are of mixed scale…. For example i have likert scale responses and also discrete( 2 or 3 scale) responses…..
is there any way to perform PCA/EFA on such mixed data?????
Hello Mike, great video! A small question. If we perform an EFA using principal axis factoring and the number of components extracted accounts for 56 % of total variance , is it advisable to consider those 4 ones?
Does the concept of cross loading exist in PCA like it does in EFA? What the criteria are to determine cross loading in case it does exist?
Hi there. When you are performing PCA or EFA and look into the tables containing component/factor loadings, you will see manifest variables have loadings on all retained components/factors. Since most of the loadings for a manifest variable will be non-zero across components/factors, then the loadings can all be considered cross-loadings. The issue you are speaking to is the question of which component/factor should a manifest variable be used for in its definition. To do that, you should not only be concerned with the issue of meeting a minimum loading criterion for definition, but also that the difference in the magnitude of cross-loadings across components/factors is sufficiently great enough that it makes sense to treat the manifest variable as an indicator of one component/factor as opposed to others. I actually cover this topic in a more recent presentation at ruclips.net/video/6Ycw3_iQHD0/видео.html. Also, the following Powerpoint associated with that video can be download at drive.google.com/file/d/16-arUCU95wrU2G-aAykmiVbMWAzFEsJF/view . I believe in the example, I use a loading differential of .30 as a partial basis in addressing the issue of dealing with cross loadings. I hope this helps! (by the way, my use of 'component/factor' in the above comment is not meant to indicate they are the same. There are mathematical differences between them. But I used that for shorthand since I didn't want to keep saying 'component or factor' over and over again). Cheers!
@@mikecrowson2462 I truly appreciate your comment and I will definitely check out those materials. To summarize, if there is a differential of less than .30, let us say .29, then, it would be considered a cross loading for PCA, other things being equal correct?
@@hectorponce2012 If you are using a .30 difference threshold as a basis for deciding whether an item exhibits non-trivial cross-loadings on multiple factors, then any number less (such as .29) would be considered an indication of no real issue. Of course, the use of cutpoints has it's own limitations.
So when deciding on whether an item is defining one factor and not others, you should really are thinking about loadings in two ways: (a) is there a substantial enough loading (e.g., .40 or >) on a given factor that it is appropriate to treat it as an indicator of that factor? And (b) is there a large enough difference in the loading involving the factor the item is most closely associated with and the loadings on the remaining factors to justify a claim that the item is best treated as an indicator of one factor and not others (if though it will likely have non-zero loadings on the other factors)? Hope this helps!
@@mikecrowson2462 Thanks again! One more thing, what book or article should I cite to explain cross loadings in PCA?
@@hectorponce2012 Check out the Powerpoint associated with the video I posted links to in my earlier reply. You should see a reference in there.
best wishes!
Hi Mike! Can I give my questionnaire to my experimental and control groups even BEFORE conducting EFA?
Due to lack of time to learn and run EFA, the questionnaire I gave to my experimental and control groups is not yet reduced and NO identified factors are determined as of yet.
My plan is to conduct EFA later on. When reduced items are already available, that's the time that I will be selecting appropriate responses from my study groups for my analysis. Please comment. Thank you
I don't know how to set data for KMO and PCA test. Is there any one to help?
Hi there, Munib. This is a bit older video on PCA. I have a newer one where I talk about KMO here: ruclips.net/video/6Ycw3_iQHD0/видео.html
Muito obrigada, aqui do Brasil, Salvador-Bahia!
like the loading plot easy explanation :-)
I have a large number of scale items and I need to narrow it down to factors in order to test the hypothesis, so I used factor analysis, I noticed that I can have 3 component, however 1 component when I ran the reliability test for it it was very low (0.4) it is understandable since it only has 2 items, I can't use it because of this. I decided to do what you did as in limit the components to 2 and in the transformation matrix the first component is 0.682 (correlated, I think the word?) with the second, while the second is the negative of that number (-0.682) with the first component and with each other 1 with 1 and 2 with 2 (0.731). Are the values acceptable and can I continue using 2 components? if yes, then how would I interpret the results in my thesis about the component transformation matrix (the one I just mentioned its values) and the total variance explained table (since it shows 3 component not 2)
Hi, thank you for your video. I have a question, concerning the extraction of variables and observations on the same plane! How can I do this please?
Hi. I'm a bit unclear about exactly you are asking. By the way, you might consider viewing my most recent video on PCA in SPSS (you can also download a Powerpoint underneath the video description): ruclips.net/video/6Ycw3_iQHD0/видео.html
I have another video on exploratory factor analysis here too: ruclips.net/video/M6FUT0h-bhY/видео.html
I hope you find these helpful and they clarify things for you. Best wishes.
Thank you for very informative video Sir. Furthermore, I have a question. Take for example of this video, results suggest 2 pc's are significant that have eigenvalue more than 1 and it creates two variables in data sheet. How I can transform these two components into one variable?
Yes bro, this is important question, I also looking for the answer!!
@@barhom11eriqat98 Please follow the below mentioned articles
Malik, A. H., bin Md Isa, A. H., bin Jais, M., Rehman, A. U., & Khan, M. A. (2021). Financial stability of Asian Nations: Governance quality and financial inclusion. Borsa Istanbul Review.
Malik, A. H., Jais, M. B., Isa, A. H. M., & Rehman, A. U. (2022). Role of social sustainability for financial inclusion and stability among Asian countries. International Journal of Social Economics.
thank you , but I dont understand the difference of factor analysis and principal component analysis. You use varimax here.
Thanks for your question. You have the same question that perplexes many folks learning about these techniques! In the past (and I dare say even today), these two procedures have been used for the same purpose - that is, exploratory factor analysis (EFA). A number of authors (e.g., Russell, 2002) have argued against the use of PCA for performing EFA both for mathematical, conceptual, and interpretative reasons. EFA is based on the common factor model and the analyst attempts to explain only the associations among a set of measured variables - factoring out the unique variance (i.e., specific variance) associated with those variables. This means that with EFA, the analyst is attempting to explain the covariation (not the total variation) by identifying factors that produce that association. PCA centers on breaking down the total variation in a set of measured variables and does not make any demarcation between the variation that is shared and unshared among a set of measured variables. So you can say that EFA is aimed at explaining covariation among a set of variables, whereas PCA is aimed at breaking down the total variation. The common factor model assumes that unobserved factors "produce" the observed covariation among a set of measured variables. PCA does not make this assumption, and is instead used for creating new linear composites of the original variables. Although the two approaches look and behave similarly, their assumptions and aims are quite different. Probably what makes this more confusing is the fact that PCA is sometimes used as a preliminary step (using scree plot or parallel analysis) when attempting to identify number of factors. I could keep going but this is getting long. So, think of it this way: EFA generally is used to explore underlying factors, whereas PCA (as a stand-alone procedure) would be oriented towards data reduction (i.e., reducing a larger number of variables down to a smaller set; which can be useful for addressing multicollinearity when using other procedures). I have a couple of other videos on PCA and EFA with additional learning materials you might check out at ruclips.net/video/6Ycw3_iQHD0/видео.html (PCA) and ruclips.net/video/M6FUT0h-bhY/видео.html (EFA using principal axis factoring). There are also references in the videos. [one other thing: varimax rotation is a strategy for increasing interpretability of factors in EFA or components in PCA. It has nothing to do with the choice of carrying out EFA or PCA, though). I hope this helps. Cheers!
@@mikecrowson2462 Thank you very much fast answer.Finally , I understand main maybe only one difference is using principal component or principal axis , isnt it?Secondly , according to my research social science use usually correlation matrix and chemistry uses covariance matrix , isnt it?
@@mikecrowson2462 Thank you so much!
Hi there. I'm not clear what you are asking on the first question. On the second, I don't know about disciplinary differences in the use of correlation versus covariance matrix. I'm a social scientist and have always relied on the correlation matrix when factoring. However, I don't know about other disciplines such as chemistry. Best wishes.
@@mightyhelen7553 Thanks for visiting! I'm glad you found this useful :)
Great stuff, thanks! :)
The parallel analysis engine website no longer works... :'(
I used it today. It works
You can also use Brian O'connor's syntax to carry out parallel analysis: see ruclips.net/video/xRsiMQ1CLfI/видео.html . Under the video description is a link to the page where the syntax can be found.
.
HI EVERYONE, I HAVE JUST UPLOADED (OCTOBER 2019) AN NEW VIDEO ON THIS TOPIC ALONG WITH ADDITIONAL INSTRUCTIONAL MATERIALS (THAT CAN BE DOWNLOADED) AT: ruclips.net/video/6Ycw3_iQHD0/видео.html . Please be sure to check it out!
can i get the data files
Hello. Links to the data can be found under the video description.
great
Poor audio
" people just click buttons and don't know what they mean" Sir, I am offended, I have systematic pattern for my button clicking.
Hi Rachel, to be honest I can't recall that comment as this video was made long ago. Nevertheless, no offense was intended. That said, please allow me to briefly elaborate on the point I believe I was try to make (albeit inarticulately):
I've come to realize over the years is that while software programs like SPSS do make things easier for folks by prepackaging options and canned output, there is a downside. And that is that all those pre-packaged options and buttons do not encourage users to know why certain options should be preferred or utilized. And sometimes we learn patterns that give us output without really knowing the limitations of the output that is generated or perhaps how to maximize the information that we do get. This is not a sleight on you or anyone who chooses these options. Heck, when I was learning to use SPSS, I learned to rely on patterns and program defaults myself. But the more I have read and thought about what buttons I was pushing, I came to realize that the trade-off for ease-of-use can sometimes be a kind of statistical tunnel vision because program defaults and options only convey the program developer's narrow view of what should be 'in' and what should be left 'out'. [And to be honest, SPSS has seemed to be slow over the years in expanding their options to support best practices based on the most recent literatures.] All this is to say that statistics is a way of thinking systematically about data, and all programs offer a range of options that may or may not be best for accomplishing your aims as a researcher for a given circumstance and for your particular data. My hope is that you, me, and all users of the various statistical programs out there don't simply take what is given to us at face-value, and do not assume that what is given represents the only or necessarily the best way to do things. At any rate, maybe this is a longer comment that it needed to be. However, the stat prof in me just had to respond. Thank you so much for visiting and offering your thoughts. Just an FYI, I do have another video here (ruclips.net/video/6Ycw3_iQHD0/видео.html) on PCA, and you can download data to practice and a supplemental Powerpoint. Best wishes!
@@mikecrowson2462 I am so sorry I meant that as a joke!!! I should have realized it might not come off that way. I found your video incredibly helpful as I am trying to relearn SPSS on the fly!
Hi Rachel, no worries at all! I was thinking it could be a joke, but decided to answer 'just in case'. Thank you for your clarification, and I'm really happy you are finding my videos helpful! Best wishes as you keep relearning SPSS!
@@mikecrowson2462 Thank you!!!
Great explanations (but as a speaker, you need to drop the very frequent use of "uh's" and "kind of" because they are distracting to acquiring the content).
Thank you so much for this amazing lecture!