Principal Component Analysis (PCA) in Excel using meaningful data with detailed calculations

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • In order to clearly view my screen in this video, make sure you set the quality of the video to HD-720p. The enclosed slides on the theoretical part as well as the derivation is available here
    drive.google.c...
    PCA is a statistical technique that linearly transforms the variables into a new set of uncorrelated variables, while remain the same variation of the original variables.
    In this video, I apply and implement one of the oldest statistical techniques in Data Analysis that has been used intensively as an unsupervised Machine Learning, being Principal Component Analysis, known shortly as PCA.
    I use Microsoft Excel in order to illustrate the calculations, step by step. The sixteen German states with four variables are the subject of my analysis.
    ----
    Note that I used Microsoft Excel 2013 in the presentation. You can download the Excel file using the following link. It also contains an additional macro for the optimisation part that works for newer versions of Excel.
    drive.google.c...

Комментарии • 18

  • @newsupdates3622
    @newsupdates3622 3 года назад +1

    Stopped by to say, thank you very much!

  • @suganthiganesh6101
    @suganthiganesh6101 3 года назад +1

    Thank you . Can you please share the XL file?

    • @hmhlc
      @hmhlc  3 года назад

      Sure, I will upload it over the weekend and add the link to the video description.

    • @suganthiganesh6101
      @suganthiganesh6101 3 года назад

      @@hmhlc Thank you for the reply. I’m Waiting 😊

    • @hmhlc
      @hmhlc  3 года назад

      @@suganthiganesh6101 Uploaded with a link in the video description.

    • @suganthiganesh6101
      @suganthiganesh6101 3 года назад

      @@hmhlc Thank you so much. I will check it

    • @suganthiganesh6101
      @suganthiganesh6101 3 года назад +1

      @@hmhlc HI, I am facing a problem with the solver.
      "The Linearity conditions required by this LP solver are not satisfied." is my error

  • @yagusti_n
    @yagusti_n 2 года назад

    sir...please explain how to get eigenvector values ​​in your excel file

    • @hmhlc
      @hmhlc  2 года назад

      I have calculated the first two eigenvectors (minutes 9 to 11) using the Excel add-in 'solver' and skipped the third and the fourth in the video. It is an optimisation problem to find each of the eigenvectors. I therefore had to use the solver four times, each for an eigenvector.
      To follow the steps, please refer to the macro codes in the Excel file. Once you open the Excel sheet, press Alt + F11 to access the VBA editor and then access Module 2. I hope this helps, but please let me know if you would like me to explain the maths/logic behind it?

    • @yagusti_n
      @yagusti_n 2 года назад

      @@hmhlc yeah Sir....please explain me the math/ logic behind it

    • @hmhlc
      @hmhlc  2 года назад +1

      ​@@yagusti_n Define the following:
      'C' is the covariance matrix of your data. C has n by n dimention, which is 4 in the example I provided.
      'v1' is the first eigenvector, which is a column vector with dimension 4x1. The transpose of v1 (i.e. Transpose('v1') has therefore 1x4 dimension. it is a row.
      The product of the following matrices/vectors
      'v1' * C * Transpose('v1') = lamda1
      is therefore one number ( 1x4 * 4x4 *4x1 ) = 1x1.
      For different values of vector v1, you get different lambda1. The maximum value you can obtain for lambda is called the eigenvalue, and the eigenvector v1 that produces the maximum value for lamda1 is called the eigenvector. It is therefore an optimisation problem with a constraint that v1 is normalised (i.e. v1 * transpose(v1) =1)
      After finding the first eigenvector, the second eigenvector, v2, is the one that maximises the following
      'v2' * C * Transpose('v2') = lamda2
      with the constrain that v2 is also normalised (i.e. v2 * transpose(v2) =1) and additionally v2 is orthogonal to v1 (i.e. v1 * transpose(v2)=0).
      and so on, e.g. v3 must be normalised and orthogonal to v1 as well as to v2.
      I hope this helps.

    • @yagusti_n
      @yagusti_n 2 года назад

      @@hmhlcSir....I still don't understand how to get the eigenvector value. If there are 4 variables, it means that there are 4 eigenvectors. I don't understand why in your excel explanation the number of eigenvectors 4x4 (v1, v2, v3, v4).
      btw thank you very much for answering my previous question very clearly

    • @hmhlc
      @hmhlc  2 года назад +1

      ​@@yagusti_n , sorry I might not be the best person to explain the topic, but I will try.
      Each eigenvector is a vector that consists of four values/numbers. Since there are four eigenvectors, you have to find 4x4= 16 values.
      If you ignore the example that I provided for the moment and focus on a simpler, two dimensional problem. You can represent and plot any point (x,y) in 2-D, as follows:
      (x,y) = x * (1,0) + y * (0,1).
      The two vectors, (1,0) and (0,1) form a basis to R^2. each vector has two values, 1 and 0, so we have four values in total.
      Have you checked the slides that I uploaded into the description? It includes the description and derivation of the Maths.