R - Exploratory Factor Analysis Example
HTML-код
- Опубликовано: 13 янв 2018
- Lecturer: Dr. Erin M. Buchanan
Missouri State University
Spring 2018
This video replaces a previous live in-class video. This video covers an exploratory factor analysis examining both theoretical and practical points for walking through an EFA. An example write up and materials are provided on our OSF page.
List of videos for class on statisticsofdoom.com.
statisticsofdoom.com/page/adv...
All materials archived on OSF: osf.io/dnuyv/
I am convinced I had never graduated without this walkthrough to EFA - Thank you!
Thanks for the kind words!
Thank you so much for uploading this fantastic video tutorial. So thoughtful to include the notes and code, you are a star!
Thanks for the kind words.
Dr. Buchanan,
I love you. I love you. I love you.
Your videos are true gems !!! Can't thank you enough!!!
Thanks for the kind words!
@@StatisticsofDOOM Dear Dr. Buchanan,
Do you have any lectures/videos on ways to screen longitudinal data (e.g., remove outliers)? Thank you~~~
@@tsaihw86 check out the data screening videos here under the R sections: statisticsofdoom.com/page/graduate-statistics/
Dr. Buchanan, I can't thank you enough for this. I'm extremely grateful you're willing to share this info with all of us! Many thanks.
Thanks for the kind words!
Thank you for all the great videos! They make learning stats, really intuitive and easy.
Best Channel Ever!!!!! You should definitely start a podcast. you are funny, you speak extremely well and the only part left is finding a partner that you have chemistry with and you can potentially get some money out of it too.
Thank you for presenting a thorough step by step analysis of EFA. My textbook only confused me so stumbling onto your channel, was like finding a winning lotto ticket! Very valuable!
Thanks for the kind words!
Wonderful video - thanks a lot!
This is by far the best lecture out there for understanding EFA and implementing it in R - Thank you so much. I might get my first publication using all this.
Thanks for the kind words!
@@StatisticsofDOOM Update: I got my first publication! Thank you!
Great explanation, the example is super easy to follow and i was able to do it in my own data. Thanks!
Glad to be of help!
I have found this video totally valuable and you have clearly explained every part. I become a fan of your lectures dear Professor.
Thank you for the kind words!
Excellent - very, very helpful.
Glad to be of help!
Thanks - really great voice and well explained
Thank you!
very nice
This is a great. Can you point me to the video where you set up "parwise.complete.obs"? I am newly adopting R - was too old to use in grad school, so sad - and have been cobbling together between Excel, SPSS, and Mplus over past years. Started using python and juypter lab for data wrangling and some stats, but R has so many great open resources, like your channel. Loving what R can do as I am reviewing your lectures. You hit all the theory highlights of my past factor analysis and SEM courses from back in the day at Wash U - but integrated in one accessible ecosystem with endless stats packages. Awesome. Thank you for sharing.
pairwise.complete.obs is an argument in the correlation function ... like this cor(dataframe, use = "pairwise.complete.obs") would calculate correlations on all the pairwise correlations ignoring NA values when they exist. This should be the default in some things for psych (like alpha).
Thank you very much for this informative video. What I would like to ask you about is you mentioned the new Kaiser criteria which is above 0.7 instead of 1.0. I am looking for a citation for it could you tell me the literature about it? Many thanks in advance.
I got that information from the Andy Field book!
Hi Erin, Thanks a lot for this great lecture. I tried to implement the analysis as per your steps but my raw_alphas for factors are coming out to be very low (0.34 and negative for one factor). Can you please let me know what to do when the raw_alphas are this low. Please let me know how I can share more details about the data and my analysis result.
That usually means the items are uncorrelated - did you look at a correlation table of the items? Maybe one of the items is bad and is pulling down the rest?
LOVE your vieos!!! I have a problem with my R code...it seems when calculating mahalanobis() it can't invert my matrix, is there any workaround?
Make sure you haven't put several perfectly correlated columns in the code - that's usually the culprit (or reverse code and the non reverse item or a subtotal plus all the items, etc.).
Hello, Dr. Buchanan, thank you so much for all your videos, It has give a great instruction and bridges, since I shift from AMOS/SPSS to R. and getting besser to get used to R.
but:
there's one more question, in R in EFA, why don't we test Validity?
I might be ask a idiot question, but I would be great if you reply me . thanks in advance.
You can certainly test validity, but there a lot of types and are tied to a specific content area. So, you'd need related data and to know what type of thing you want to examine (external, content, etc.).
Hi, many thanks! it helped me a lot in my thesis. just a question please - in the end, when you are creating new factor scores, why not working on the data after excluding some of the variables (which were not loaded), I mean working on the "final model"? thanks in advance!
I'm not totally sure I understand the question?
A super-useful video about EFA! Thank you very much.
However, I didn't got the rchisq function used in the construction of the fake regression model.
The function is " rchisq(n, df, ncp = 0) "
I got that "n" is the rows number of the dataset, but I didn't got why I have to use a casual number for the argument " df ". Could you please let me know how to decide df for this function?
Thank you very much!
Sure thing! The df should be the number of columns (variables) involved in the calculation. So, I use ncol(dataset) to calculate that. I have a few videos on this data screening procedure that explain all the whys a bit better: ruclips.net/channel/UCMdihazndR0f9XBoSXWqnYgsearch?view_as=subscriber&query=R+data+screening
This is just amazing! Thank you Dr. Erin. I would love to get this cheat sheet. Would that be possible?
Follow the link in the description...has all notes/code from videos!
statisticsofdoom.com/page/advanced-statistics/
Very concise and well done. Thx! Uh, what's the charge for the with Stata app!!
Don't know - haven't used stata!
Pfr. Erin, can we use this EFA procedure for ordinal and binary manifest variables? (I'm using the Ambivalent Sexism Inventory that uses likert items). Note: I admire your content!
No - you should change the code here to account for that type of data. The psych package can handle this type of data - I’d recommend checking out the excellent documentation for the package.
Hello,
Many many thanks for this very helpfull video.
But i have a question: your data are categoriels, how you are looking for the normality?
As i now, the normality is only for continuous quantitative data.
Thanks
That's true, you would not look for normality in categorical data. The data in this example is interval scale, and that's why we examined for normality.
Hello, thanks so much for this video. One thing I didn't understant though is the part when you say "if your coloumns are all off by one number, you have to add one". Does this mean that, instead of having factor 1 =c(1, 3, 7, 8 etc.) I should have factor 1 =c( 2, 4, 8, 9 etc.)? Thank you :)
Right, mainly it's to make sure to line up the items you are excluding or including with the actual column number ... so if you want to include question 1, but it's actually column 5, be sure to use 5 for the column number and not the question number.
Hello ,Dr. Buchanan, thank so much for this amazing videos. By the way ,can I have your data set?
All materials archived on OSF: osf.io/dnuyv/
Hi, this was an excellent video!
One question I have is why you tested the assumption for regression using that fake data set? How does using the fake dataset help test the assumptions?
i.e.
random = rchisq(nrow(dataset), 7)
fake = lm(random ~., data = dataset)
standardized = rstudent(fake)
fitted = scale(fake$fitted.values)
..
..
..
Thanks!
The short version is that's a quick way to test all the columns at once. We are not running a "real" regression, so we have to find a way to look at the residuals for some of the assumptions. So we can put together this fake version to give us our plots. I have some videos on data screening that explain this idea more in depth in you are interested!
Thank you for this! I really like the easy, fun way of how you explain stuff.
To evaluate model fit, I see that Tucker Lewis index can be used. However, I am not able to find this index when I print the model? Any idea where it might be?
If you are using psych it should print automatically…do you not see it in the output for the EFA?
@@StatisticsofDOOM I think it's a package issue? I don't see tucker lewis in the print (nor as a value in the summary model: model$TLI is not available for me). I reinstalled the package multiple times still nothing.
@@amogh2101 The package docs say it's there, but maybe it's the combination of rotation + fitting estimator you are using (the math part), that means it doesn't calculate.
Thank you! That should be it.
@@StatisticsofDOOM I am using the following command:
# Efa matrix is a correlation matrix
fa(efa_matrix, nfactors = 2, rotate = "oblimin", scores = 'regression', fm = 'ml')
Hi Dr. Buchanan, thank you for your video. I followed your clear instructions to analyze my own EFA. When I use psych package to explore Cronbach alpha of my eight extracted factors, r gives me the following warning:
The determinant of the smoothed correlation was zero.
This means the objective function is not defined for the null model either.
The Chi square is thus based upon observed correlations.
Error: cannot allocate vector of size 503.3 Mb
In addition: Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
the model inverse times the r matrix is singular, replaced with Identity matrix which means fits are wrong
3: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
I'm wondering how to solve this problem. The dataset has 2551 observations with 112 items in each observation. All items were scored on 0, 1, 2 as categorical variables. Regarding the warning "Error: cannot allocate vector of size 503.3 Mb
", my laptop has enough space to do the calculation. Thank you.
That's ram memory, not the hard drive, so that might be part of the problem. The error implies that at least two of the items are too highly correlated to run ... did you check for multicollinearity?
Thanks you super video
You have a reference which contains Exploratory Factor Analysis with ULS function
The psych package help guides should have an example of UWLS.
Is there a way in R to deal with non-independent data? For example if ratings are nested within participants, and within items, is there a way to do EFA with crossed random effects?
Not sure - psych may have some of this functionality actually (www.personality-project.org/r/html/faMulti.html) but I have not tried it any.
@@StatisticsofDOOM I'll take a look, thanks!
Hello - which package are you getting the function 'fa.parallel' from? I'm struggling to find it!
It's in the psych package www.rdocumentation.org/packages/psych/versions/2.1.9/topics/fa.parallel
@@StatisticsofDOOM Thank you so much!
great video. Did you do a confirmatory test for the normality assumption check. I mean did you use shapiro wilk normality test? I had a similar histogram like yours but shapiro-wilk test confirms that my data isn't normally distributed. is this a serious concern?
You can use the shapiro wilk ... which tests if the sample distribution is normal. The assumption is if the samplING distribution is normal (which is a different thing). Generally, with large sample sizes the sampling distribution can be assumed to be normal with the central limit theorem.
@@StatisticsofDOOM okay, so we can assume normal distribution of sampling without any confirmatory test.
@@oluwagbengafakanye5265 I would think it was a safe assumption IF one had a large sample size.
@@StatisticsofDOOM my sample size is 347 and the variable (questions) is 30. Can I assume normality?
@@oluwagbengafakanye5265 More than likely, depending on the type of variables we are talking about.
Wonderful example, I love this video. I struggle to run my EFA, i've got the following error message, can you help please:
> PA
Have you checked a correlation table of the data? Sounds like something is perfectly correlated, which will give an error message.
@@StatisticsofDOOM Oh nice it works with the Cor matrix I define, thanks. However the model did not converge when I fit the cfa model :
cfa(MODELE, data = mydata, sample.cov = Wisc4.cov, sample.nobs = 28, estimator = "WLS", ordered = c("Item1", "Item2", "Item3", "Item4", ...)).
Results : Error in chol.default(S) : the leading minor of order 22 is not positive definite.
Do you have an idea ? I agree that a small sample size and orderer data are not a good combination. So unless to increase sample size can I change the optimizer ? How if possible?
thanks in advance
@@yakhoubndiaye2540 I have not done much with ordered models, but it still sounds like something is going awry with the small sample/combination in the data. does ordering make sense for the data? Honestly, sometimes I have problems with inputting cov or correlation matrices for models with a high correlation between factors. If you have the raw data I would use that.
Hi Dr. Buchanan, I don't know am I doing worng. The dataset I downloaded has 794 rowns and 25 columns. The is 14 EFA data. I thought you excluded some rows, but I columns has more. Could help me, please?
I'm not sure what you are asking? This dataset should have 32 columns?
It's great to see such a detailed explanation albeit incredibly difficult to follow for someone with misophonia (given the burps and all the slurps in the background).
great content. Look I'm currently developing a statistical tool on mobile phone, which will also include factor analysis. Will you be willing to try it out and give me your feedback? As an expert in statistics, do you think people will ever transition from desktop to smartphone when it comes to conducting a statistical analysis?
Tough to say - I would only use a smart phone app for something simple to answer a quick question - otherwise, I'd probably stick to doing my work on the computer, which is easier to type code, input data, and run analyses.
I totally get it! and you're the best. Keep that in mind ;-)
Do you totally ignore the communalities ? In that case why ? regards !
The communalities are the R2 version of the loading, so they tell me the same information as the loading (which we do focus on).
@@StatisticsofDOOM But the threshold for the communality should be > O.5, right ? The reason I ask you is, that its pretty difficult for me to find any "treshold limits" for EFA in the litterature. Regarding PCA several books state that it should be at least above 0.5 - Child(200?) arguing 0.2 but to me that´s insanely low (!?) - Im doing my master and collected n=700 on a 7 Likert scale and end up doing poly since my data was very skewed (though the dist. makes sense). My EFA went nuts if I didn't treat it as categorical in nature and had low communalities among but all the loadings was >0.650. Anyway - if you have any comments to my approach or concern about the h2 treshold I would be very happy. And thank you for your time and effort to teach the world with your vids - it really helped me billionTONS !!
@@googledude5649 As noted in the lecture, we are using a criterion of loading > .30 - rules of thumb are just that, rules of thumb. They can be different for each person/field/etc. The citation is Preacher and MacCallum (2003). I highly recommend reading this article. quantpsy.org/pubs/preacher_maccallum_2003.pdf
@@StatisticsofDOOM Perfect ! Happy everything !
which contry u belong?
I am based in the United States at Missouri State University.
Good explanation but too many "sort of's".