How to use Stata for Principal Component Analysis (PCA)

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • Using Stata to replicate the results of the PCA example in Multivariate Data Analysis by Hair et al.
    The link to download the authors' sample data is mvstats.com/do....
    See my website for a copy of the Stata log file used in the video (and much else besides!): financefundame....
    Join the Finance Fundamentals Discord server: / discord .

Комментарии • 12

  • @khadimhussainmalik3284
    @khadimhussainmalik3284 4 месяца назад

    Dear Sir, I extend my gratitude for the insightful lecture you provided. In my research, I have identified two variables with noteworthy cross-loading factors. The dilemma arises as to which variable should be prioritized for removal, considering their significant cross-loading with Factor 1 and Factor 2.
    tour4 | 0.7039 -0.5249
    ser | 0.7423 0.5641

    • @financefundamentals
      @financefundamentals  4 месяца назад

      Thank you for your comment/question! As I mentioned in the video, I'm not a statistics expert. Just a generalist interested in sharing knowledge about using Stata for various analyses. So you need to consider my response below while bearing that in mind.
      Regarding your specific question about which variable to remove due to cross-loading, a common approach is to consider both the statistical and theoretical aspects. From a statistical perspective, you would most likely remove the variable with the lower communality. (Based on the limited numbers you provided, this might be 'tour4' - but you need to check that column of your results.)
      However, you should also think about the theoretical relevance of each variable to your research question. Consider which variable is more meaningful to retain, based on your study's objectives and underlying theory. Sometimes a variable with slightly lower communality may be more crucial to keep from a conceptual standpoint.
      Another option to consider is trying the analysis with each variable removed in turn, and comparing the results to see which solution makes more sense and aligns better with your research goals.

  • @ehiidoko6934
    @ehiidoko6934 3 месяца назад

    Thanks for this! it was super helpful

    • @financefundamentals
      @financefundamentals  3 месяца назад

      Awesome! Happy I could help! Good luck with your Stata/PCA journey!

  • @mohammadtaufan9914
    @mohammadtaufan9914 11 месяцев назад

    Hello, can I ask you one little question? Is there a way to create plot using the factors here 9:29? Thanks in advanced.

    • @financefundamentals
      @financefundamentals  11 месяцев назад

      Remember that you would realistically be limited to a maximum of 3 factors if you wanted to visualise a plot. Here there are 4, which is why the source text used for this video does not try to show such a plot. 4-dimensional plots on a 2-D piece of paper are not strictly speaking impossible, but are unavoidably messy and hard to interpret.

    • @mohammadtaufan9914
      @mohammadtaufan9914 11 месяцев назад

      First, I'd like to give you my gratitude for replying. Your answer makes sense as it provides little to no information making plot from these factors. What I had in mind was I tried to make time series graph in which there were plot lines of each factors (X axis is variable time and Y axis is the value of factors loadings). Perhaps there is a tutorial for making such graph? As always, thank you in advanced. @@financefundamentals

  • @atharalishah4951
    @atharalishah4951 Год назад

    Hello sir can you please explain why x11 in the cross loading is eliminated although the value is not the same in both columns. in fact they are close to same, if this is the case then other factors are also close to each other why they are not dropped. Thanks.

    • @financefundamentals
      @financefundamentals  Год назад +1

      [Time stamp: issue starts around 9.55] Take a careful look at all the loadings. Notice that for all variables, except for X11, there is one (and only one) factor that has a high loading. X11 is different. It does not have any loading that is as high as any of the others, with a maximum loading of only 0.6420. But that is not the main problem. Even worse, it has TWO loadings around 0.59 to 0.64. This is called a cross-loading. So X11 is dropped. A cross-loading is NOT defined as two loadings that are exactly the same. Instead you are looking for two or more high(ish) loadings on a single variable, which are greater than your chosen significance level.

  • @Mimi-nr6jx
    @Mimi-nr6jx Год назад +1

    How do you use the loadings to create an index please?

    • @financefundamentals
      @financefundamentals  Год назад

      There are a number of methods. I personally have used the approach in Anderson, TW and Rubin, H. 1956. Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:111-150.

    • @Mimi-nr6jx
      @Mimi-nr6jx Год назад +1

      Thank you!