R - Exploratory Factor Analysis Example

Поделиться
HTML-код
  • Опубликовано: 13 янв 2018
  • Lecturer: Dr. Erin M. Buchanan
    Missouri State University
    Spring 2018
    This video replaces a previous live in-class video. This video covers an exploratory factor analysis examining both theoretical and practical points for walking through an EFA. An example write up and materials are provided on our OSF page.
    List of videos for class on statisticsofdoom.com.
    statisticsofdoom.com/page/adv...
    All materials archived on OSF: osf.io/dnuyv/

Комментарии • 97

  • @doucker1
    @doucker1 3 года назад +6

    I am convinced I had never graduated without this walkthrough to EFA - Thank you!

  • @ynbearljx
    @ynbearljx Год назад +2

    Thank you so much for uploading this fantastic video tutorial. So thoughtful to include the notes and code, you are a star!

  • @tsaihw86
    @tsaihw86 2 года назад +1

    Dr. Buchanan,
    I love you. I love you. I love you.
    Your videos are true gems !!! Can't thank you enough!!!

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      Thanks for the kind words!

    • @tsaihw86
      @tsaihw86 2 года назад

      @@StatisticsofDOOM Dear Dr. Buchanan,
      Do you have any lectures/videos on ways to screen longitudinal data (e.g., remove outliers)? Thank you~~~

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      @@tsaihw86 check out the data screening videos here under the R sections: statisticsofdoom.com/page/graduate-statistics/

  • @kevincopeland730
    @kevincopeland730 2 года назад +1

    Dr. Buchanan, I can't thank you enough for this. I'm extremely grateful you're willing to share this info with all of us! Many thanks.

  • @zaneford9851
    @zaneford9851 2 года назад

    Thank you for all the great videos! They make learning stats, really intuitive and easy.

  • @nimadw7198
    @nimadw7198 6 лет назад +2

    Best Channel Ever!!!!! You should definitely start a podcast. you are funny, you speak extremely well and the only part left is finding a partner that you have chemistry with and you can potentially get some money out of it too.

  • @elenaguzman9413
    @elenaguzman9413 2 года назад

    Thank you for presenting a thorough step by step analysis of EFA. My textbook only confused me so stumbling onto your channel, was like finding a winning lotto ticket! Very valuable!

  • @leike1359
    @leike1359 5 лет назад +1

    Wonderful video - thanks a lot!

  • @sheharyarakhtar
    @sheharyarakhtar 3 года назад +2

    This is by far the best lecture out there for understanding EFA and implementing it in R - Thank you so much. I might get my first publication using all this.

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад +1

      Thanks for the kind words!

    • @sheharyarakhtar
      @sheharyarakhtar 2 года назад +1

      @@StatisticsofDOOM Update: I got my first publication! Thank you!

  • @NathalieTamayo
    @NathalieTamayo 5 лет назад +2

    Great explanation, the example is super easy to follow and i was able to do it in my own data. Thanks!

  • @farhadwaseel9981
    @farhadwaseel9981 4 года назад +2

    I have found this video totally valuable and you have clearly explained every part. I become a fan of your lectures dear Professor.

  • @PaulO-mv6ku
    @PaulO-mv6ku 5 лет назад +2

    Excellent - very, very helpful.

  • @mf2810
    @mf2810 5 лет назад +1

    Thanks - really great voice and well explained

  • @RavinderRam
    @RavinderRam 6 лет назад +1

    very nice

  • @ajmackay8
    @ajmackay8 2 года назад +1

    This is a great. Can you point me to the video where you set up "parwise.complete.obs"? I am newly adopting R - was too old to use in grad school, so sad - and have been cobbling together between Excel, SPSS, and Mplus over past years. Started using python and juypter lab for data wrangling and some stats, but R has so many great open resources, like your channel. Loving what R can do as I am reviewing your lectures. You hit all the theory highlights of my past factor analysis and SEM courses from back in the day at Wash U - but integrated in one accessible ecosystem with endless stats packages. Awesome. Thank you for sharing.

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      pairwise.complete.obs is an argument in the correlation function ... like this cor(dataframe, use = "pairwise.complete.obs") would calculate correlations on all the pairwise correlations ignoring NA values when they exist. This should be the default in some things for psych (like alpha).

  • @Levover
    @Levover Год назад +2

    Thank you very much for this informative video. What I would like to ask you about is you mentioned the new Kaiser criteria which is above 0.7 instead of 1.0. I am looking for a citation for it could you tell me the literature about it? Many thanks in advance.

  • @arpitagarwal619
    @arpitagarwal619 5 лет назад +1

    Hi Erin, Thanks a lot for this great lecture. I tried to implement the analysis as per your steps but my raw_alphas for factors are coming out to be very low (0.34 and negative for one factor). Can you please let me know what to do when the raw_alphas are this low. Please let me know how I can share more details about the data and my analysis result.

    • @StatisticsofDOOM
      @StatisticsofDOOM  5 лет назад

      That usually means the items are uncorrelated - did you look at a correlation table of the items? Maybe one of the items is bad and is pulling down the rest?

  • @00maria7
    @00maria7 5 лет назад +1

    LOVE your vieos!!! I have a problem with my R code...it seems when calculating mahalanobis() it can't invert my matrix, is there any workaround?

    • @StatisticsofDOOM
      @StatisticsofDOOM  5 лет назад +1

      Make sure you haven't put several perfectly correlated columns in the code - that's usually the culprit (or reverse code and the non reverse item or a subtotal plus all the items, etc.).

  • @thechenmin
    @thechenmin 2 года назад

    Hello, Dr. Buchanan, thank you so much for all your videos, It has give a great instruction and bridges, since I shift from AMOS/SPSS to R. and getting besser to get used to R.
    but:
    there's one more question, in R in EFA, why don't we test Validity?
    I might be ask a idiot question, but I would be great if you reply me . thanks in advance.

    • @StatisticsofDOOM
      @StatisticsofDOOM  Год назад

      You can certainly test validity, but there a lot of types and are tied to a specific content area. So, you'd need related data and to know what type of thing you want to examine (external, content, etc.).

  • @yuvalpines3394
    @yuvalpines3394 3 года назад

    Hi, many thanks! it helped me a lot in my thesis. just a question please - in the end, when you are creating new factor scores, why not working on the data after excluding some of the variables (which were not loaded), I mean working on the "final model"? thanks in advance!

  • @francescog5078
    @francescog5078 4 года назад

    A super-useful video about EFA! Thank you very much.
    However, I didn't got the rchisq function used in the construction of the fake regression model.
    The function is " rchisq(n, df, ncp = 0) "
    I got that "n" is the rows number of the dataset, but I didn't got why I have to use a casual number for the argument " df ". Could you please let me know how to decide df for this function?
    Thank you very much!

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      Sure thing! The df should be the number of columns (variables) involved in the calculation. So, I use ncol(dataset) to calculate that. I have a few videos on this data screening procedure that explain all the whys a bit better: ruclips.net/channel/UCMdihazndR0f9XBoSXWqnYgsearch?view_as=subscriber&query=R+data+screening

  • @thaisgargantini3932
    @thaisgargantini3932 3 года назад +1

    This is just amazing! Thank you Dr. Erin. I would love to get this cheat sheet. Would that be possible?

    • @SergioBriMa
      @SergioBriMa 3 года назад

      Follow the link in the description...has all notes/code from videos!

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      statisticsofdoom.com/page/advanced-statistics/

  • @robg4632
    @robg4632 4 года назад +1

    Very concise and well done. Thx! Uh, what's the charge for the with Stata app!!

  • @pablocorreapinto2005
    @pablocorreapinto2005 3 года назад +1

    Pfr. Erin, can we use this EFA procedure for ordinal and binary manifest variables? (I'm using the Ambivalent Sexism Inventory that uses likert items). Note: I admire your content!

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      No - you should change the code here to account for that type of data. The psych package can handle this type of data - I’d recommend checking out the excellent documentation for the package.

  • @karimkarim4824
    @karimkarim4824 4 года назад

    Hello,
    Many many thanks for this very helpfull video.
    But i have a question: your data are categoriels, how you are looking for the normality?
    As i now, the normality is only for continuous quantitative data.
    Thanks

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      That's true, you would not look for normality in categorical data. The data in this example is interval scale, and that's why we examined for normality.

  • @bakaharipotera
    @bakaharipotera 5 лет назад

    Hello, thanks so much for this video. One thing I didn't understant though is the part when you say "if your coloumns are all off by one number, you have to add one". Does this mean that, instead of having factor 1 =c(1, 3, 7, 8 etc.) I should have factor 1 =c( 2, 4, 8, 9 etc.)? Thank you :)

    • @StatisticsofDOOM
      @StatisticsofDOOM  5 лет назад

      Right, mainly it's to make sure to line up the items you are excluding or including with the actual column number ... so if you want to include question 1, but it's actually column 5, be sure to use 5 for the column number and not the question number.

  • @darith1571
    @darith1571 Год назад

    Hello ,Dr. Buchanan, thank so much for this amazing videos. By the way ,can I have your data set?

  • @addisonmcghee9190
    @addisonmcghee9190 3 года назад

    Hi, this was an excellent video!
    One question I have is why you tested the assumption for regression using that fake data set? How does using the fake dataset help test the assumptions?
    i.e.
    random = rchisq(nrow(dataset), 7)
    fake = lm(random ~., data = dataset)
    standardized = rstudent(fake)
    fitted = scale(fake$fitted.values)
    ..
    ..
    ..
    Thanks!

    • @StatisticsofDOOM
      @StatisticsofDOOM  3 года назад

      The short version is that's a quick way to test all the columns at once. We are not running a "real" regression, so we have to find a way to look at the residuals for some of the assumptions. So we can put together this fake version to give us our plots. I have some videos on data screening that explain this idea more in depth in you are interested!

  • @amogh2101
    @amogh2101 2 года назад +1

    Thank you for this! I really like the easy, fun way of how you explain stuff.
    To evaluate model fit, I see that Tucker Lewis index can be used. However, I am not able to find this index when I print the model? Any idea where it might be?

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      If you are using psych it should print automatically…do you not see it in the output for the EFA?

    • @amogh2101
      @amogh2101 2 года назад

      @@StatisticsofDOOM I think it's a package issue? I don't see tucker lewis in the print (nor as a value in the summary model: model$TLI is not available for me). I reinstalled the package multiple times still nothing.

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад +1

      @@amogh2101 The package docs say it's there, but maybe it's the combination of rotation + fitting estimator you are using (the math part), that means it doesn't calculate.

    • @amogh2101
      @amogh2101 2 года назад

      Thank you! That should be it.

    • @amogh2101
      @amogh2101 2 года назад

      @@StatisticsofDOOM I am using the following command:
      # Efa matrix is a correlation matrix
      fa(efa_matrix, nfactors = 2, rotate = "oblimin", scores = 'regression', fm = 'ml')

  • @daiying5915
    @daiying5915 4 года назад

    Hi Dr. Buchanan, thank you for your video. I followed your clear instructions to analyze my own EFA. When I use psych package to explore Cronbach alpha of my eight extracted factors, r gives me the following warning:
    The determinant of the smoothed correlation was zero.
    This means the objective function is not defined for the null model either.
    The Chi square is thus based upon observed correlations.
    Error: cannot allocate vector of size 503.3 Mb
    In addition: Warning messages:
    1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
    2: In fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
    the model inverse times the r matrix is singular, replaced with Identity matrix which means fits are wrong
    3: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
    I'm wondering how to solve this problem. The dataset has 2551 observations with 112 items in each observation. All items were scored on 0, 1, 2 as categorical variables. Regarding the warning "Error: cannot allocate vector of size 503.3 Mb
    ", my laptop has enough space to do the calculation. Thank you.

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      That's ram memory, not the hard drive, so that might be part of the problem. The error implies that at least two of the items are too highly correlated to run ... did you check for multicollinearity?

  • @ebnouseyid5518
    @ebnouseyid5518 3 года назад +1

    Thanks you super video
    You have a reference which contains Exploratory Factor Analysis with ULS function

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      The psych package help guides should have an example of UWLS.

  • @SuperDayv
    @SuperDayv 4 года назад +1

    Is there a way in R to deal with non-independent data? For example if ratings are nested within participants, and within items, is there a way to do EFA with crossed random effects?

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      Not sure - psych may have some of this functionality actually (www.personality-project.org/r/html/faMulti.html) but I have not tried it any.

    • @SuperDayv
      @SuperDayv 4 года назад +1

      @@StatisticsofDOOM I'll take a look, thanks!

  • @Lauren-on2mw
    @Lauren-on2mw 2 года назад +1

    Hello - which package are you getting the function 'fa.parallel' from? I'm struggling to find it!

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад +1

      It's in the psych package www.rdocumentation.org/packages/psych/versions/2.1.9/topics/fa.parallel

    • @Lauren-on2mw
      @Lauren-on2mw 2 года назад +1

      @@StatisticsofDOOM Thank you so much!

  • @oluwagbengafakanye5265
    @oluwagbengafakanye5265 4 года назад

    great video. Did you do a confirmatory test for the normality assumption check. I mean did you use shapiro wilk normality test? I had a similar histogram like yours but shapiro-wilk test confirms that my data isn't normally distributed. is this a serious concern?

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад +1

      You can use the shapiro wilk ... which tests if the sample distribution is normal. The assumption is if the samplING distribution is normal (which is a different thing). Generally, with large sample sizes the sampling distribution can be assumed to be normal with the central limit theorem.

    • @oluwagbengafakanye5265
      @oluwagbengafakanye5265 4 года назад

      @@StatisticsofDOOM okay, so we can assume normal distribution of sampling without any confirmatory test.

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      @@oluwagbengafakanye5265 I would think it was a safe assumption IF one had a large sample size.

    • @oluwagbengafakanye5265
      @oluwagbengafakanye5265 4 года назад

      @@StatisticsofDOOM my sample size is 347 and the variable (questions) is 30. Can I assume normality?

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      @@oluwagbengafakanye5265 More than likely, depending on the type of variables we are talking about.

  • @yakhoubndiaye2540
    @yakhoubndiaye2540 4 года назад +1

    Wonderful example, I love this video. I struggle to run my EFA, i've got the following error message, can you help please:
    > PA

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад +1

      Have you checked a correlation table of the data? Sounds like something is perfectly correlated, which will give an error message.

    • @yakhoubndiaye2540
      @yakhoubndiaye2540 4 года назад +1

      @@StatisticsofDOOM Oh nice it works with the Cor matrix I define, thanks. However the model did not converge when I fit the cfa model :
      cfa(MODELE, data = mydata, sample.cov = Wisc4.cov, sample.nobs = 28, estimator = "WLS", ordered = c("Item1", "Item2", "Item3", "Item4", ...)).
      Results : Error in chol.default(S) : the leading minor of order 22 is not positive definite.
      Do you have an idea ? I agree that a small sample size and orderer data are not a good combination. So unless to increase sample size can I change the optimizer ? How if possible?
      thanks in advance

    • @StatisticsofDOOM
      @StatisticsofDOOM  4 года назад

      @@yakhoubndiaye2540 I have not done much with ordered models, but it still sounds like something is going awry with the small sample/combination in the data. does ordering make sense for the data? Honestly, sometimes I have problems with inputting cov or correlation matrices for models with a high correlation between factors. If you have the raw data I would use that.

  • @patipauli
    @patipauli 2 года назад

    Hi Dr. Buchanan, I don't know am I doing worng. The dataset I downloaded has 794 rowns and 25 columns. The is 14 EFA data. I thought you excluded some rows, but I columns has more. Could help me, please?

    • @StatisticsofDOOM
      @StatisticsofDOOM  2 года назад

      I'm not sure what you are asking? This dataset should have 32 columns?

  • @Uncha1n3d
    @Uncha1n3d 3 года назад +1

    It's great to see such a detailed explanation albeit incredibly difficult to follow for someone with misophonia (given the burps and all the slurps in the background).

  • @hassanelakhloufi8073
    @hassanelakhloufi8073 5 лет назад +1

    great content. Look I'm currently developing a statistical tool on mobile phone, which will also include factor analysis. Will you be willing to try it out and give me your feedback? As an expert in statistics, do you think people will ever transition from desktop to smartphone when it comes to conducting a statistical analysis?

    • @StatisticsofDOOM
      @StatisticsofDOOM  5 лет назад +1

      Tough to say - I would only use a smart phone app for something simple to answer a quick question - otherwise, I'd probably stick to doing my work on the computer, which is easier to type code, input data, and run analyses.

    • @hassanelakhloufi8073
      @hassanelakhloufi8073 5 лет назад +2

      I totally get it! and you're the best. Keep that in mind ;-)

  • @googledude5649
    @googledude5649 3 года назад +1

    Do you totally ignore the communalities ? In that case why ? regards !

    • @StatisticsofDOOM
      @StatisticsofDOOM  3 года назад

      The communalities are the R2 version of the loading, so they tell me the same information as the loading (which we do focus on).

    • @googledude5649
      @googledude5649 3 года назад

      @@StatisticsofDOOM But the threshold for the communality should be > O.5, right ? The reason I ask you is, that its pretty difficult for me to find any "treshold limits" for EFA in the litterature. Regarding PCA several books state that it should be at least above 0.5 - Child(200?) arguing 0.2 but to me that´s insanely low (!?) - Im doing my master and collected n=700 on a 7 Likert scale and end up doing poly since my data was very skewed (though the dist. makes sense). My EFA went nuts if I didn't treat it as categorical in nature and had low communalities among but all the loadings was >0.650. Anyway - if you have any comments to my approach or concern about the h2 treshold I would be very happy. And thank you for your time and effort to teach the world with your vids - it really helped me billionTONS !!

    • @StatisticsofDOOM
      @StatisticsofDOOM  3 года назад +1

      @@googledude5649 As noted in the lecture, we are using a criterion of loading > .30 - rules of thumb are just that, rules of thumb. They can be different for each person/field/etc. The citation is Preacher and MacCallum (2003). I highly recommend reading this article. quantpsy.org/pubs/preacher_maccallum_2003.pdf

    • @googledude5649
      @googledude5649 3 года назад

      @@StatisticsofDOOM Perfect ! Happy everything !

  • @RavinderRam
    @RavinderRam 6 лет назад +1

    which contry u belong?

    • @StatisticsofDOOM
      @StatisticsofDOOM  6 лет назад +1

      I am based in the United States at Missouri State University.

  • @milescooper3322
    @milescooper3322 6 лет назад

    Good explanation but too many "sort of's".