Principal Component Analysis (PCA) [Matlab]

Поделиться
HTML-код
  • Опубликовано: 10 окт 2024
  • This video describes how the singular value decomposition (SVD) can be used for principal component analysis (PCA) in Matlab.
    Book Website: databookuw.com
    Book PDF: databookuw.com/...
    These lectures follow Chapter 1 from: "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz
    Amazon: www.amazon.com...
    Brunton Website: eigensteve.com
    This video was produced at the University of Washington

Комментарии • 67

  • @starriet
    @starriet Год назад +7

    for future ref) In the first part of the video, X's _colums_ (not rows) are each points(correction: 5:02 not every rows but every columns are X average).
    And, note that the code is using 'svd' function, not 'pca' function.
    This can be confusing because Prof. Brunton says in the previous lecture(the first vid on PCA) that PCA assumes 'rows' represent each individual(e.g. person, etc.), contrast to SVD which assumes 'columns' does it.
    *_BUT,_* in the second part(ovarian cancer), even though the code is using 'svd' function, the 'obs' matrix is 216x4000(216 patients) where each 'row' represents individual patient. Thus, here, U and V is actually like V and U in the first part of the lecture, respectively.
    Also, in the for loop in the code, the code plots each patient(each dot) in the 3 "principal" axes, in the for loop(in Matlab, A' means conjugate tranpose of A).
    *_However,_* the code calculates the dot products of the two long vectors(4000 elements, and this can be even larger in different examples).
    We _don't_ need this calculation because U already contains the exact same values(this U would have been V if each individual patient were represented by column, not row).
    So, we can just use U(i,1), U(i,2), U(i,3) for x,y,z in the for loop, instead of calculating dot products.
    (I don't use MATLAB but it should work. If it were Python, the only difference would be the indices start from 0 and using square brackets instead of parentheses).
    But, still, knowing why those dot products("projection" onto orthonormal vector, in this case) works is important in understanding SVD and PCA.
    Anyway, thanks a lot for this great series of lectures, awesome.

    • @armbusta
      @armbusta Год назад +2

      THANK YOU. Your comment has saved my sanity, it was the final puzzle piece that made it all click. This should be included in the description of the video word-for-word.

  • @ratnaa6326
    @ratnaa6326 4 года назад +10

    thank you so much! my understanding increased exponentially when you explained with the ovarian cancer example.

  • @MageshJohn
    @MageshJohn 3 года назад +1

    Excellent! Your 15-minute video really captures the majority of the 100 years of information on PCA. SVD works!

  • @nwxxzchen3105
    @nwxxzchen3105 4 года назад +6

    Code is more understandable for me, thanks for your great job. This example has shown how PCA looks like in the gemotry way. Also there's some implicit relationship between the data points' shape and the centralized matrix's transformation capability which is not mentioned in linear algebra course.

  • @drdlecture6137
    @drdlecture6137 3 года назад +3

    Thank for this excellent lecture. I have a question. Why didn't you subtract the mean of the rows before computing the SVD like in the previous example & explained in the PCA video?

  • @sapertuz
    @sapertuz 4 года назад +4

    I just don't understand why for the ovarian cancer example you don't do the preprocessing steps (mean and division by sqrt(Nmeas))

    • @jhonportella5618
      @jhonportella5618 3 года назад +2

      Well, I will try answer to your question. Actually, what you are reconstructing in the Gaussian Data example with SVD is a scale version of the standard deviation: STD/SQRT(n). And this is done because he is trying to plot different confidence intervals. As you may remember the equation for the confidence interval is the Mean + Z*STD/SQRT(n). Which is what he is plotting in the red circles. So with a Z value of 3, you can capture almost 99 % of the data, which is what we see in the plots. Thus that normalization term is only because of the application of that code only. For PCA you don´t have to always normalize or standardize data, it is only needed when you are working with correlations or when the application demands it. In fact, if you are working with SVD, Data doesn´t even have to be centered by the mean, which is one of the advantages to USE truncated SVD instead of PCA

  • @lesh2956
    @lesh2956 3 года назад

    thank you so much for this video. The exaplanation with the example is gold.

  • @rakulansivanesapillai6215
    @rakulansivanesapillai6215 4 года назад +3

    Lovely setup and great presentation. Thanks

  • @haideralishuvo4781
    @haideralishuvo4781 3 года назад +2

    Great Video , But one confusion , Arent we supposed to subtract the mean before computing the SVD? in the ovarian cancer case

    • @jhonportella5618
      @jhonportella5618 3 года назад

      He is working with SVD as a way to compute PCA and one of the advantages of SVD in front of classical PCA with the eigendecomposition formulation is that SVD or truncated SVD does not require the data to be centered by the mean

  • @DewanggaPrabowo
    @DewanggaPrabowo 4 года назад +1

    such a nice presentation
    This is what I'm looking for and what I'm coming for
    Thanks...

  • @abolfazlabbasi4854
    @abolfazlabbasi4854 4 года назад +2

    High quality presentation, Thanks for sharing.

  • @ElPrestigo
    @ElPrestigo 2 года назад

    BTW to use a legend for the ovarian-data you can make use of plot handlers as follows:
    h = zeros(2,1);
    ...
    if(grp{i}=='Cancer')
    h(1) = plot3(...);
    else
    h(2) = plot3(...);
    ...
    legend(h, 'Cancer', 'Normal')

  • @fermijman
    @fermijman 3 года назад +2

    Excellent lecture. Question: once you have determined the magnitude of the principle components is there a way of determining which features they represent in your original data? For instance determining which features from the cancer data correlated strongest to a cancer diagnosis?

  • @mataFot
    @mataFot Год назад

    Mr Steve first of all I would like to thank you about this video.. secondly I would like to ask you a question because it is my first time studying on PCA.. at the point 5:47 of your, video you explained that you divide B with the square root of the nPoints, how this come up.. I mean you did this because you wanted to minimize the value of the division????

  • @yourswimpal
    @yourswimpal 3 года назад +3

    great explanation! thanks! may i know how do i tell which genes has the highest "impact" with regards to PC 1 ? (in the Ovarian Cancer example) - Is there a way i can tell from matrix U or matrix V ? i just learnt PCA 3 days ago , sorry if this is a noob question :)

    • @Eta_Carinae__
      @Eta_Carinae__ 2 года назад

      You tell by the sigma matrix afaik. Look for the largest eigenvalue in sigma, and find it's corresponding eigenvector in V, and that's your most significant factor.

  • @Daniel88santos
    @Daniel88santos 4 года назад +1

    Hello Steve ... I would like first to Thank you by your effort in sharing and teaching this amazing technics. I also would like to ask you if it is possible you make a video on how to find the best r value using the Gavish-Donoho method using python language. This would be very useful for me. Thanks a lot and keep going.

    • @Eigensteve
      @Eigensteve  4 года назад +3

      Thanks for the comment. Yes, that video is coming up (in Matlab and Python). Just need a few days to process and upload.

    • @mybean1096
      @mybean1096 4 года назад +1

      Python? What snakes gotta do with it?

  • @Sheepyyyyyyyy
    @Sheepyyyyyyyy 4 года назад +1

    Thank you so much! I code the same with you, but 2 line are not perpendicular. In the code, the circle must be red ('r-') and the line must be blue ('c-').

  • @zhengyangkrisweng3338
    @zhengyangkrisweng3338 3 года назад

    Just to clarify, when you mention the energy of the statistical data, you're referring to the extent to which it captures the trend in the data, right?

  • @sohummisra8969
    @sohummisra8969 4 года назад +2

    Wonderful series of lectures. I have a question regarding using the top 3 PCAs. Why are you not scaling the top 3 eigen vectors with their associated eigen values from S in order to find x, y and z?

    • @gzitterspiller
      @gzitterspiller Год назад

      Because that only tells you how much variance there is in those directions, he only projects the datapoints into those directions and plots.

  • @yaraali4493
    @yaraali4493 4 года назад +1

    Thanks..
    How can i do varimax rotation to pca's in matlab???

  • @justinli19901027
    @justinli19901027 4 года назад

    love the series, thank you

  • @chichungchan6766
    @chichungchan6766 3 года назад

    really great video! However, can I relocate the PC1 2 3 to the actual variables?

  • @AdityaDiwakarVex
    @AdityaDiwakarVex 4 года назад +1

    I got a little bit confused, what's the intuition behind calculating x, y, and z by doing V times b (observations)? What is x, y, and z showing? Sorry for the silly question, thanks in advance.

    • @Eigensteve
      @Eigensteve  4 года назад +1

      Here, x y and z are just the first three principal components of the data set. So it allows us to visualize how the data scatters in these new V coordinates. There are interesting patterns in V(:,4) and V(:,5) too, but I can't plot in x y z u v coordinates and make sense of it as a puny human stuck in 3D.

    • @AdityaDiwakarVex
      @AdityaDiwakarVex 4 года назад +1

      @@Eigensteve Oh, so it's the data reconstruction but just using 3 of the components rather than all 4000 of them... why do we do V * d or V * obs?

    • @Eigensteve
      @Eigensteve  4 года назад +1

      @@AdityaDiwakarVex V * obs essentially takes the "observations" (i.e. the data) and transforms it into the V coordinate system. (it is also V-transpose * obs, which is an important subtlety when computing these things)

    • @AdityaDiwakarVex
      @AdityaDiwakarVex 4 года назад +1

      @@Eigensteve Oh right, I do see that it is V-transpose. Thank you so much, that cleared it up. You are easily one of the best professors/teachers I've come across, thank you!

  • @mybean1096
    @mybean1096 4 года назад +2

    Been trying to to write a formula to combine both Honey Mustard [ detaSet ] and Ranch BBQ Sauce [ dataSet(2×2) ] as one component while randomly scaling calories and sugar. Don't see what The Matrix movie has to do with anything though.

    • @Eigensteve
      @Eigensteve  4 года назад

      Nice. Actually, people do think about food, flavors, and chemistry in PCA coordinates. Some neat and unexpected food pairings have been discovered this way.

  • @jalilkhan321
    @jalilkhan321 3 года назад

    How is Proper Orthogonal Decomposition ,used in fluids , is different from PCA or SVD.

  • @Assault137
    @Assault137 4 года назад +1

    Off-topic, but how do you get the IDE to be dark for your presentations?

  • @mataFot
    @mataFot Год назад

    Also, is there anyway to get this code for practice? .. Thank you in advance!!

  • @yangyangliu9226
    @yangyangliu9226 4 года назад

    Thanks Steve for the amazing explanation! One thing I dotn't quite understand: why U*S*[cos(theta); sin(theta)] captures 1 std of the data?

    • @jhonportella5618
      @jhonportella5618 3 года назад

      What I think he is actually doing is capturing, in SVD procedure, the diffeomorphism (fancy terminology but in few words is a linear transformation which inverse exists in this case ) to reconstruct STD/SQRT(N) which is part of the equation for the confidence interval. Then he is plotting those confidence intervals up to a z value of 3 that corresponds to almost 99% of the data

    • @Nikh__
      @Nikh__ 2 года назад

      ( know this comment is 2 years old but..)
      but if anyone else is wondering, scroll back to how X is created initially. And note that U only rotates vectors and sigma stretches. And, U, sigma are SVDs of B.

  • @burakyesilyurt9544
    @burakyesilyurt9544 4 года назад

    Is there a convention about signs? I was convincing myself.. and what made me confused is T1, T2, T3 (scores) matrices in code below have same values with different signs. I found some article and code about flipping sign of svd and pca but I couldnt be sure... I'd be very happy if you made it clear for me, thanks!
    %% CODE
    clear; close all; clc;
    load fisheriris
    X = meas;
    % X = 5*randn(300, 10);
    [W, D] = eig(X'*X);
    W = W(:, end:-1:1);
    D = D(end:-1:1, end:-1:1);
    T1 = X*W;
    [U, S, V] = svd(X, 'econ');
    T2 = U*S;
    [coeff, score, latent] = pca(X, 'Algorithm', 'svd', 'centered', false);
    T3 = score;

  • @ayaahmed-hc6wu
    @ayaahmed-hc6wu 4 года назад +1

    great Steve ... I would like to Thank you for your effort . I ask you to help me please with Matlab code to make feature Extraction using PCA to galaxy images ..I searched a lot and did not find any result .

    • @Eigensteve
      @Eigensteve  4 года назад

      I think any old PCA code in Matlab will work if your data is structured as a matrix.

  • @DeepakKumar-tc9iy
    @DeepakKumar-tc9iy 4 года назад

    @Steve I work with spatial time series data(3d x.y.t eg: temperature). I seen codes reshaping the spatial dimensions into 1d, so I have 2d series then apply the PCA analysis.
    But I need to work on vectorial data (eg: wind) which is in components (u,v) in 3d. Which makes then 4d...will it make sense if I reshape 3d(spacial 2d+ components 1) into a 1d which makes 2d data and then apply the pca?

  • @HD141937
    @HD141937 4 года назад

    At 8:27, isn't it the columns of V (not U) that point into the directions of maximum variance?

  • @ifan9390
    @ifan9390 3 года назад

    Is singular data decomposition also use in 3-dimensional data plots?

  • @ifan9390
    @ifan9390 3 года назад

    what're the differences between the 2 Dimensional and 3 Dimensional data set plots?

  • @DrAndyShick
    @DrAndyShick 10 месяцев назад

    2:10 Actually, that would be 16 times as much variance

  • @roger_island90
    @roger_island90 3 года назад

    Hello sir, please I'm using your code to visualize the classification of ECG signals with 3 labels. The diagram generating is not correct. I think the problem is from the "for" loop. Please help me rectify this coz I tried severally but to no avail

  • @alex.ander.bmblbn
    @alex.ander.bmblbn 4 года назад

    dear Steve, I see that in my data set 2 states contribute to 90% of the data, how do I know, which ones?

  • @erikschiferle3385
    @erikschiferle3385 3 года назад

    One other question, in the U/S/V, which index corresponds to PC2? is it V(2,1) or V(2,2). Thank you!

    • @erikschiferle3385
      @erikschiferle3385 3 года назад

      NVM, I think I answered question. Variable "V" is 4000x216 so I believe it would be row "label" if there were one for ovarian cancer data?

  • @erikschiferle3385
    @erikschiferle3385 3 года назад

    Can someone explain why for the log and cumulative singular value graphs, we have 216 along the x axis? Why is it not 4000 for the number of genetic markers?

    • @zhankunxi1058
      @zhankunxi1058 3 года назад

      Hi Erik. I got the same confusion at the beginning. But diving a bit to Steve's previous video, singular value plot is to show how much variance is captured by each principal component and cumulative sum via Sum(lambda_k)/all the lambda's. From dimension-wise, the dimension of B matrix (subtracted with means) is 216*4000. Through SVD, U is 216*216, Sigma is 216*216 and V transpose is 216*4000. I think both plots are drawn against the number of sigma's (216 here).

  • @namiramomokhan6414
    @namiramomokhan6414 3 года назад

    which matlab software to use i have 2014

  • @td1738
    @td1738 3 года назад

    where i can find the code cheers

  • @vinuthaY-n3b
    @vinuthaY-n3b 5 месяцев назад

    Can i get this matlab code pls

  • @realsemig
    @realsemig 4 года назад

    Me being a simple pleb: That looks like a galaxy!

    • @leif1075
      @leif1075 4 года назад

      No youre right it does.

  • @Alejo10messi
    @Alejo10messi 3 года назад

    16 times more variance in one direction than the other 2:10

  • @Старкрафт2комедия
    @Старкрафт2комедия 2 года назад

    this example is way to complicated. you should stick to like 10-20 data points for initial demonstration. Otherwise its too hard to understand exactly. only in hand wavy terms

  • @prabhath8618
    @prabhath8618 4 года назад +1

    can i get the code?

    • @Eigensteve
      @Eigensteve  4 года назад

      All code on databookuw.com