What the hell. One can download your book for free?! You sir are a saint. I will work thru it and if I like it I will definitely purchase it!! (I'm pretty sure I will like it, because I like all your videos so far) PS: I am so proud of you guys. You are bringing humanity forward with content like this being free. I encourage everyone who can to purchase content from sources like this
Excelent teaching. I have one question tho. When you wrote the covariance matrix of the rows (6:00) because each row is a measurement vector I thought its the covariance between the measurements but then you wrote C=(BT)(B) which is the covariance of the features. Can you explain please.
From what I could find in PCA literature, it depends on what you have more of (Objects or Variables/Features). Both (BT)(B) and (B)(BT) is possible when doing PCA, and the covariance matrix you calculate depends on this (you always take the larger one).
Here I am six month later but now I understand my problem. So, we have measurment vector M and every mesurment has some features such as age, height, disease and etc. Now what we are intersted in, is to understand the distributation and the covariance of these features to workout for example joint or posterior distributions or etc. For example, the positive covariance between age and testing positive for some disease means there is a relation between these two ,the more the age the more the risk of this disease. So, we need the (BT)*B that is Cov-Var between features, then we can find the joint or posterior probablity distribution.
I logged in just for this, which I almost never do xD I wanted to say: Thank you! Your video series is great, enjoyful, and helps getting familiar with the topic rapidly. The same applies to the book, which you link at for free. Thank you.
Correct me if I'm wrong, but B transposed multiplied by B sums up the products of mean centered values, but to get the covariation we still need to divide by number of rows in X as covariation is defined as E{(X-E(X))*(Y-E(Y))} not just sum of (X-E(X))*(Y-E(Y)) over measurements
It can be helpful to use the names "features" (to refer to the 'n' different pixels in a photo, or the 'n' different characteristics of rats which may predict cancer) and "snapshots" (to refer to the 'm' different measurements (e.g. people's photos, or rats)). Then, it doesn't matter whether you have the "features" as columns or rows - Corr(feat) = feature-wise correlation matrix, where entries represent the correlation between two features, and the eigenvectors of this matrix are the "eigenfeatures". If you happen to have "features" as columns, then Corr(feat) = [X][X^T]. If you happen to have the "features" as rows, then Corr(feat) = [X^T][X]. Similarly, for the "snapshots" we have the Corr(snap) = snapshot-wise correlation matrix, where entries represent the correlation between two snapshots, and the eigenvectors of this matrix are the "eigensnapshots". Again, depending on whether the "snapshots" are in the rows or columns of X, you can find Corr(snap). This also helps when doing PCA, as you generally wish to reduce the number of "features", and are therefore interested in determining the eigenvectors of Corr(feat). No need to sweat over how your data is organized in the matrix X, or any annoying conventions for PCA. In short, it is easier to think of "features & snapshots" than "rows & columns".
I can't agree more. It is inconsistent in the video and the code. In the video, he emphasized that each row has to be the features collected from a single individual. If you have a 2*10000 matrix, you have 2 individuals and 2000 features. However, a matrix of 2*10000 is generated in the code, which actually means 2 features and 10000 individuals. That takes me a really long time to figure out what happened.
4:45 here you’re summing over the elements of each row, but in the book on page 21 it say x_j = sum_i X_ij so you’re building the sum of each column. Is it a typo ?
Do you have a patreon? How can I help support this content? Just these materials on Ch1 and 2 have been amazing. Will it extend to addiitonal chapters?
Hi professor, Just one question. If your X matrix has samples in the rows and sample features in the columns, then the correct shouldn't be to calculate the column-means(X), instead of row-means(X), and subtract each column-value by its respective column-mean? So, each X column (feature) has mean = 0.
I think, as he explained at the beginning, this mix-up happened due to the difference in representing the data in SVD literature and PCA literature. I am rewatching this lecture after watching the next one where MATLAB demonstration is given. The code does exactly that, take column-mean of each person and then subtract. I came down in the comment section to check if somebody else had this confusion also.
No. (and you yourself pointed out why:-). Each column represent a single feature, e.g. "age" for the entire population. Each row contains the features of a single individual, e.g. "age, weight, sex,...". In order to get an estimate of , say, cov("age","weight") one has to multiply the columns, "age"^T * "weight" and divide by, say N-1 (roughly the number of individual samples in each column as suggested by Kieng Toan). That is why B^T * B is due here. May I suggest a RUclips playlist of Victor Lavrenko that actually explains the general motivation driving PCA and the specific motivation for searching for the eigenvectors of the covariance matrix. ruclips.net/p/PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM esp. video #7
@Elad M @Kottel Kannim You're both right. B^T * B is the covariance of the columns, and that's what we want... i.e., covariance of the features (or variables). Brunton mistakenly writes "covariance of the rows of B".
2:09 I just don't get it: Let's say we measured 1600 samples. Each sample measurement resulted in a concentration value for each of 26 Elements. How would that look like in the matrix? So my matrix would have 1600 rows and 26 columns, right?
Thank you Dr. Brunton! I just bought your book and am reviewing the PCA chapter. There is a difference in your definition of principal components between this video and your textbook. Can you please clarify? In the textbook (2nd edition) in Section 1.5, after Eq 1.40, you state that "the columns of the eigenvector matrix V are the principal components". However, in this video, you define principal components as the mean-centered data matrix multiplied by your eigenvector matrix V, which in this video are defined as "loadings" that describe how much of each of the principal components each row in X has. Which definition is more accurate? Or are they both accurate? Please clarify if possible. Thank you so much!!
Amazing video. I did the MIT lectures about Linear Algebra (that talked about SVD) and the Andrew Ng's ML course (that talked about PCA). This video was the perfect bridge to connect the two things in a coherent manner. Thank you very much, Dr. Brunton!
I think so yes... V should be on the same side, the right side
2 месяца назад
First of all, thank you for all the amazing lectures you've made available, they've helped me so much in my data science journey. I was reviewing the information you shared at 6:00, and then on the book, where you mention that the row-wise covariance matrix is given by B*B, whereas in your video Singular Value Decomposition (SVD): Dominant Correlations you mention this is the column-wise correlation matrix. Could you check if I'm missing something? I feel like the latter should be the correct one (which would give us a matrix nxn). Thank you so much!
hi doctor,really usefull to watch your lecture,but in the video,you have pointed out that T matrix is the principle components, however ,this is what confused me, my knowlage is that the col vector of loading are principle components, T is just transformed version of the data B. pls correct me if im wrong, thanks.
In some implementations, I find that along with mean centering, standard deviation division is followed (Z-scores), does this make a difference? I believe standard deviation division is important to keep the features on the same scale (Unit Variance).
This is so technically correct, and simultaneously so obtuse, that my intuition fuse has melted. Please consider redoing this as 3D pseudo visualizations of data subsets.
In the mean center part you are calculating row averages? As you described each row can be have "sex, age, demographics, and so on", these are not of the same category. Shouldn't it be column means?
Great question. You can still compute the average age across all people. For other categorical data, you would usually break these columns into multiple columns and assign a "1" to the column corresponding to the correct category and "0"s for the other categories. This will make it possible to average the numerical values.
@Cathy Tang @Steve Brunton.That puzzled me as well. But I think that the name "mean row" refers to a row that consists of the averages of each column. That way, if you think of the average x, it will just represent a vector of column averages. Hence, by having copies of that same vector in each row and applying matrix substraction, you will end up with (value - its respective column average) for every value in the matrix we started with.
Hi Steve, There may be a tiny typo in Page#22 in your Data Driven Science book. The equation(1.26) is supposed to be $B = X - \bar X$ to represent demeaned data $X$ while it shows $B = X - \bar B$ on the book. Please correct me if I am wrong.
You only said about the data should have 0 mean, but what about the standard deviation? Don't we need to scale the data first by dividing each measure by its standard deviation to make sure the PCA doesn't easily overfit to direction with the largest magnitude?
The following are measurements on the test scores (X, Y) of 6 candidates for two subject examinations: (50, 55), (62, 92), (80, 97), (65, 83), (64, 95), (73, 93) Determine the first principal components for the test scores, by using Hotelling's iterative procedure. Sir how to .....???
Principal Component Analysis (PCA) is a technique in statistics that simplifies complex data by identifying and emphasizing the most important patterns or features. It does this by transforming the original variables into a new set of uncorrelated variables called principal components, allowing for a more efficient representation of the data.
Is it important to show 95% confidence ellipse in PCA? If my data is not drawing then what should i do ? can i used PCA score graph without 95% confidence ellipse?
Thank you, Prof. Brunton. I have a question: supposing I have done this series of experiments with a target measure that cannot be categorized but is a continuous value, then can I use PCA?
Can you also show how to get covariance matrix from a Gaussian function results from its fit on a Gaussian looking data. Any suggestion for a book to explain this kind of stuff? Cheers.
Can someone tell … Are the loadings, the rows or columns of V or Vtranspose (that is, there are 4 possibilities). My hunch is that the loadings are the columns of Vtranspose … but thats a hunch from a non-mathematician. (The video was not clear/explicit on this matter, probably because it’s obvious to a mathematcs student)
If I have outliers in my dataset, can this affect the PCA?, because I have tried with cases of this type and it usually identifies a single principal component
Hey Dr. Brunton. Awesome video yet again! I've been snooping around kaggle, and found a dataset on body performance given a host of variables. I thought i'd try using PCA to determine the most influential characteristics within the data and began working with it in matlab. I was able to get tons of outputs (a thrill unto itself) and a nice little scatter plot! However, when all was said and done I had difficulty understanding which variables were most influential by looking at the scatter plot and PCA breakdown. What should I be doing/thinking to gain that intuition? Thanks!
this is more than Awesome!! i want to ask you one question and it is here a1=[1,23,4,51,62,7,8,43,1,29] a2=[5,45,32,51,60,7,8,35,10,31] a3=[13,3,64,35,36,37,48,3,31,1] a4=[3,3,1,5,6,3,8,3,1,3] a5=[0,3,0,5,0,0,8,0,0,1] how can i figure out important columns (features) with eigenvalues and eigenvectors? As we can see here , importance of a4 and a5 is negligible! but how can i find out with this concept? I have eigenvalues and eigenvectors of this but do not know how to use them in this context ? after finding eigenvalues and eigenvectors , i know how to find PC.Because i have seen your videos . As i have seen in the comment section someone already asked this question . But i was not able to understand the Ans! kindly help me out.
Great question! Any categorical variable (sex, race, what state you were born in, etc.) can be put into a numerical matrix by creating extra columns, one for each category, and then assigning a value of 0 or 1 for each of these categories, depending on which category best describes the person. Maybe a non-human example: If we are building a matrix to describe movies, and we want to include the category of genre, we would need to add extra columns for each movie genre (sci-fi, action, horror, romance, comedy, etc.). And then depending on the movie, we would assign a 1 to the category the movie belongs to and 0's to all of the other categories. Now here is where it can get interesting. Lots of movies are somewhere between these sharp categories. So "The Terminator" is somewhere between sci-fi, action, and a bit of horror, so maybe it would be 0.3, 0.6, and 0.1 in these categories (adding up to 1). Other categories can be more strict, like the make of a car (there is no halfway between a Honda and a Ford). So although there is no perfect categorization, you can define a rough set of groups and then individuals can be partially in multiple categories. This is one way to add categorical data like sex and race to a numerical matrix while allowing for non-exclusive categories. I'm not sure, but I think the US census treats each multiracial combination as its own category, so that entries are only 1's and 0's. So my kids, who are half white and half asian would have a "1" in the multiracial white/asian data column. I'm not 100% sure about how the census works, but this is what I remember.
Finally, someone who explains statistics in a straight-forward way, whilst communicating in an adult like manner.
The best video on PCA I could find on youtube, no messy blackboards, jokes or oversimplification, just solid explanation, great job.
What the hell. One can download your book for free?! You sir are a saint. I will work thru it and if I like it I will definitely purchase it!! (I'm pretty sure I will like it, because I like all your videos so far)
PS: I am so proud of you guys. You are bringing humanity forward with content like this being free. I encourage everyone who can to purchase content from sources like this
So far this is the best video of PCA explanation.
Steve's explanations are excellent.
Prof. Brunton always delivers the best explanations on the subjects! His videos really help me a lot! Kudos!
Indeed he does ...
I've watched a lot of PCA videos and this is really the best one. You're amazing!
yes he is but do visit statquest
@@TheMangz1611 Bam. Best wishes to anyone who makes teaching intuitive.
These videos are the PCA for data driven engineering!!Thank you for bringing up these series publicly!!
@6:08, Can anybody confirm that C=B*BT instead of C=BT*B. That is because each row of B represents the measurement of a variable (0 mean).
Excelent teaching. I have one question tho. When you wrote the covariance matrix of the rows (6:00) because each row is a measurement vector I thought its the covariance between the measurements but then you wrote C=(BT)(B) which is the covariance of the features. Can you explain please.
From what I could find in PCA literature, it depends on what you have more of (Objects or Variables/Features). Both (BT)(B) and (B)(BT) is possible when doing PCA, and the covariance matrix you calculate depends on this (you always take the larger one).
Here I am six month later but now I understand my problem. So, we have measurment vector M and every mesurment has some features such as age, height, disease and etc. Now what we are intersted in, is to understand the distributation and the covariance of these features to workout for example joint or posterior distributions or etc. For example, the positive covariance between age and testing positive for some disease means there is a relation between these two ,the more the age the more the risk of this disease. So, we need the (BT)*B that is Cov-Var between features, then we can find the joint or posterior probablity distribution.
@@MilianoAlvez right, which means B*B is the covariance of the columns. Brunton I think accidentally wrote "rows"
I logged in just for this, which I almost never do xD
I wanted to say: Thank you!
Your video series is great, enjoyful, and helps getting familiar with the topic rapidly. The same applies to the book, which you link at for free. Thank you.
If we do row-wise correlation with respect to B, should it be C=B * B_T instead of B_T * B?
i agree with you
Yeah, he wrote "BTB is the covariance of the the rows of B", but I think he meant the columns (the features)
Correct me if I'm wrong, but B transposed multiplied by B sums up the products of mean centered values, but to get the covariation we still need to divide by number of rows in X as covariation is defined as
E{(X-E(X))*(Y-E(Y))} not just sum of (X-E(X))*(Y-E(Y)) over measurements
Note @ 7:50 regarding CV = VD. The D here is a matrix where all the eigenvalues are on the diagonal.
If there was a Nobel Prize in Education (which there absolutely should be), then you should absolutely win.
It can be helpful to use the names "features" (to refer to the 'n' different pixels in a photo, or the 'n' different characteristics of rats which may predict cancer) and "snapshots" (to refer to the 'm' different measurements (e.g. people's photos, or rats)).
Then, it doesn't matter whether you have the "features" as columns or rows - Corr(feat) = feature-wise correlation matrix, where entries represent the correlation between two features, and the eigenvectors of this matrix are the "eigenfeatures". If you happen to have "features" as columns, then Corr(feat) = [X][X^T]. If you happen to have the "features" as rows, then Corr(feat) = [X^T][X].
Similarly, for the "snapshots" we have the Corr(snap) = snapshot-wise correlation matrix, where entries represent the correlation between two snapshots, and the eigenvectors of this matrix are the "eigensnapshots". Again, depending on whether the "snapshots" are in the rows or columns of X, you can find Corr(snap).
This also helps when doing PCA, as you generally wish to reduce the number of "features", and are therefore interested in determining the eigenvectors of Corr(feat). No need to sweat over how your data is organized in the matrix X, or any annoying conventions for PCA.
In short, it is easier to think of "features & snapshots" than "rows & columns".
I can't agree more. It is inconsistent in the video and the code. In the video, he emphasized that each row has to be the features collected from a single individual. If you have a 2*10000 matrix, you have 2 individuals and 2000 features. However, a matrix of 2*10000 is generated in the code, which actually means 2 features and 10000 individuals. That takes me a really long time to figure out what happened.
4:45 here you’re summing over the elements of each row, but in the book on page 21 it say x_j = sum_i X_ij so you’re building the sum of each column. Is it a typo ?
You explain complicated math in a brilliant way. Thank you so much
Do you have a patreon? How can I help support this content? Just these materials on Ch1 and 2 have been amazing. Will it extend to addiitonal chapters?
I don't, but I really appreciate the kind words! This will extend to all of the chapters eventually.
Hi professor, Just one question. If your X matrix has samples in the rows and sample features in the columns, then the correct shouldn't be to calculate the column-means(X), instead of row-means(X), and subtract each column-value by its respective column-mean? So, each X column (feature) has mean = 0.
I think, as he explained at the beginning, this mix-up happened due to the difference in representing the data in SVD literature and PCA literature. I am rewatching this lecture after watching the next one where MATLAB demonstration is given. The code does exactly that, take column-mean of each person and then subtract. I came down in the comment section to check if somebody else had this confusion also.
Nobody gonna say anything about how this man just wrote all of that backwards flawlessly?
I suspect he is fixing it in post production by flipping the colours as a layer.
I was wondering the same.. I guess some kind of mirroring is used there. Otherwise it’s a lit backwards writing
Thank you for the lecture, its been very helpful. On an unrelated note, how do you write backwards with such ease?
They probably just mirror the video
I was thinking the same!
@@LTForcedown no, he writes backwards.
it is a mirroring technique - he cannot write backwards with such ease
I think he is using a special technology which shows mirror image of his board in front of him
Is this done with a glass whiteboard and the recording is mirrored?
doesn't C has to be B*B^T ? B^T * B is the covariance of the columns if I get this correctly
indeed, C should have to divide to 1/N ... or may be my memory is wrong
No. (and you yourself pointed out why:-).
Each column represent a single feature, e.g. "age" for the entire population.
Each row contains the features of a single individual, e.g. "age, weight, sex,...".
In order to get an estimate of , say, cov("age","weight") one has to multiply the columns,
"age"^T * "weight"
and divide by, say N-1 (roughly the number of individual samples in each column as suggested by Kieng Toan).
That is why B^T * B is due here.
May I suggest a RUclips playlist of Victor Lavrenko that actually explains the general motivation driving PCA and the specific motivation for searching for the eigenvectors of the covariance matrix.
ruclips.net/p/PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM
esp. video #7
@Elad M @Kottel Kannim You're both right. B^T * B is the covariance of the columns, and that's what we want... i.e., covariance of the features (or variables). Brunton mistakenly writes "covariance of the rows of B".
The alst part of the video on how SVD and PCA are related really class of its own. IT show the expert should run video lectures
2:09 I just don't get it: Let's say we measured 1600 samples. Each sample measurement resulted in a concentration value for each of 26 Elements. How would that look like in the matrix?
So my matrix would have 1600 rows and 26 columns, right?
Amazing explanation, went through a lot of videos but this one is the best
Still confused how do we get BV=USigma🤔🤔 since Vt doesn’t cancel with V right?
Steve, able to explain PCA from classical statistiscal point of view. Very clear
I believe that there's a typo. The principal components are the columns of V.
Thank you Dr. Brunton! I just bought your book and am reviewing the PCA chapter. There is a difference in your definition of principal components between this video and your textbook. Can you please clarify?
In the textbook (2nd edition) in Section 1.5, after Eq 1.40, you state that "the columns of the eigenvector matrix V are the principal components". However, in this video, you define principal components as the mean-centered data matrix multiplied by your eigenvector matrix V, which in this video are defined as "loadings" that describe how much of each of the principal components each row in X has.
Which definition is more accurate? Or are they both accurate? Please clarify if possible. Thank you so much!!
Should #3 be the covariance matrix of the columns rather than the row ?.
It seems to me that leads to V rows = B columns
Amazing video. I did the MIT lectures about Linear Algebra (that talked about SVD) and the Andrew Ng's ML course (that talked about PCA). This video was the perfect bridge to connect the two things in a coherent manner. Thank you very much, Dr. Brunton!
At the 13’45 ‘’ mark why is the equation CV=VD? Should it be CV=DV?
I think so yes... V should be on the same side, the right side
First of all, thank you for all the amazing lectures you've made available, they've helped me so much in my data science journey. I was reviewing the information you shared at 6:00, and then on the book, where you mention that the row-wise covariance matrix is given by B*B, whereas in your video Singular Value Decomposition (SVD): Dominant Correlations you mention this is the column-wise correlation matrix.
Could you check if I'm missing something? I feel like the latter should be the correct one (which would give us a matrix nxn).
Thank you so much!
hi doctor,really usefull to watch your lecture,but in the video,you have pointed out that T matrix is the principle components, however ,this is what confused me, my knowlage is that the col vector of loading are principle components, T is just transformed version of the data B. pls correct me if im wrong, thanks.
In some implementations, I find that along with mean centering, standard deviation division is followed (Z-scores), does this make a difference? I believe standard deviation division is important to keep the features on the same scale (Unit Variance).
how do you write inverted letters so quick? or is it some kind of CGI?
BtB seems to calculate the cariance matrix of cols of B.
Yep, this essentially is a matrix of inner products of each column with each other.
Thanks for the great explanation! In your next video, can you please explain how you are writing backward!?
He writes forwards and then flips the video horizontally
Ha ha ha
This is so technically correct, and simultaneously so obtuse, that my intuition fuse has melted. Please consider redoing this as 3D pseudo visualizations of data subsets.
In the mean center part you are calculating row averages? As you described each row can be have "sex, age, demographics, and so on", these are not of the same category. Shouldn't it be column means?
Great question. You can still compute the average age across all people. For other categorical data, you would usually break these columns into multiple columns and assign a "1" to the column corresponding to the correct category and "0"s for the other categories. This will make it possible to average the numerical values.
@Cathy Tang @Steve Brunton.That puzzled me as well. But I think that the name "mean row" refers to a row that consists of the averages of each column. That way, if you think of the average x, it will just represent a vector of column averages. Hence, by having copies of that same vector in each row and applying matrix substraction, you will end up with (value - its respective column average) for every value in the matrix we started with.
Hi Steve,
There may be a tiny typo in Page#22 in your Data Driven Science book. The equation(1.26) is supposed to be $B = X - \bar X$ to represent demeaned data $X$ while it shows $B = X - \bar B$ on the book. Please correct me if I am wrong.
I am confused with SVD of B in step 4 , Isn't we do SVD or Eigen decomposition of C the covariance matrix? i.e. T=CV=UE, C=UEV' ? thank you
Great video, but conventionally the principal components are the eigenvectors V instead of T, 8:15
Yeah, what would T=BV actually represent? When I try implementing this, it works if I produce a projection matrix from colomns of V.
So as another way to look at this, are U the scores, sigma the eigenvalues, and V the loadings?
You only said about the data should have 0 mean, but what about the standard deviation? Don't we need to scale the data first by dividing each measure by its standard deviation to make sure the PCA doesn't easily overfit to direction with the largest magnitude?
The following are measurements on the test scores (X, Y) of 6 candidates for two subject examinations:
(50, 55), (62, 92), (80, 97), (65, 83), (64, 95), (73, 93)
Determine the first principal components for the test scores, by using Hotelling's iterative procedure. Sir how to .....???
The data matrix is a wide matrix, so if it is already zero mean, then in this case the PC XV is equal to XU (Considering U from the SVD lecture)?
Principal Component Analysis (PCA) is a technique in statistics that simplifies complex data by identifying and emphasizing the most important patterns or features. It does this by transforming the original variables into a new set of uncorrelated variables called principal components, allowing for a more efficient representation of the data.
I am a phd student learning inverse scattering, your lectures help me with understanding those concept :) greetings from naples
Wouldn't we need to divide by N or N-1 for the covariance matrix? I know covariance as sij = 1/(N-1)* sum (n=>N) (v_in*v_jn)
Yes
Can someone explain to me, why covariance matrix is just the inner product of B transposed B?
How does he write in reverse?
Best explanation. Looking forward to video about Kernel PCA!
So V comes from C?
Why did we calculate covariance matrix?
Can somebody explain to me why it's B'B not BB' since the data were stored row-wise.
We do the last part (T=BV) in order to calculate the inner product with the principal components.
Is it important to show 95% confidence ellipse in PCA? If my data is not drawing then what should i do ? can i used PCA score graph without 95% confidence ellipse?
are you writing in the reverse order (right to left) on the board?
If the images of X are not all independent, then X is not full rank matrix.then will we have only rank number of eigen faces?
Thank you, Prof. Brunton. I have a question: supposing I have done this series of experiments with a target measure that cannot be categorized but is a continuous value, then can I use PCA?
Amazing lecture! But in previous videos you also said that the rows represent experiments so that was a little strange
his SVD video shows columns as experiments. PCA video shows row as experiment
Dear Steve bu video da neden altyazılarda türkçe yok. Anlayamadim
Very good explanation for each symptom and its treatment
I came to learn about PCA, but now I’m just focusing on how he can write backwards so clearly.
It's a trickle on the optocordical neural network involving image inversion
Introduction is one thing, presentation is another.
One who combines both gets all the attention!!
Hello, can you show with examples how to curvilinear component analysis?
This channel is amazing!
Can you also show how to get covariance matrix from a Gaussian function results from its fit on a Gaussian looking data. Any suggestion for a book to explain this kind of stuff? Cheers.
Does the concept of cross-loadings exist in PCA like it does in EFA? If it does exist, what are the criteria to determine so?
Can someone tell … Are the loadings, the rows or columns of V or Vtranspose (that is, there are 4 possibilities). My hunch is that the loadings are the columns of Vtranspose … but thats a hunch from a non-mathematician.
(The video was not clear/explicit on this matter, probably because it’s obvious to a mathematcs student)
This guy is super good at writing backwards
If I have outliers in my dataset, can this affect the PCA?, because I have tried with cases of this type and it usually identifies a single principal component
How to film such kind of tutorial videos?
IS HE WRITING IN MIRROR IMAGE? HE'S BEHIND THE GLASS RIGHT? SO WHAT LOOKS LIKE PCA TO US, IS HIM ACTUALLY WRITING PCA FROM THE BACK??
PCA clearly explained!!!
Do you think he writes backwards?
Thank you so much. You made it really easy to understand.
Glad to hear that!
Hey Dr. Brunton. Awesome video yet again! I've been snooping around kaggle, and found a dataset on body performance given a host of variables. I thought i'd try using PCA to determine the most influential characteristics within the data and began working with it in matlab. I was able to get tons of outputs (a thrill unto itself) and a nice little scatter plot! However, when all was said and done I had difficulty understanding which variables were most influential by looking at the scatter plot and PCA breakdown. What should I be doing/thinking to gain that intuition? Thanks!
Please could you make a video about singular spectrum analysis?
Can you please make a video on OLPP?
Love this series! Just bought your book
Can you please share what software and equipment you're using for this presentation?
this is more than Awesome!! i want to ask you one question and it is here
a1=[1,23,4,51,62,7,8,43,1,29]
a2=[5,45,32,51,60,7,8,35,10,31]
a3=[13,3,64,35,36,37,48,3,31,1]
a4=[3,3,1,5,6,3,8,3,1,3]
a5=[0,3,0,5,0,0,8,0,0,1]
how can i figure out important columns (features) with eigenvalues and eigenvectors?
As we can see here , importance of a4 and a5 is negligible! but how can i find out with this concept?
I have eigenvalues and eigenvectors of this but do not know how to use them in this context ?
after finding eigenvalues and eigenvectors , i know how to find PC.Because i have seen your videos .
As i have seen in the comment section someone already asked this question . But i was not able to understand the Ans!
kindly help me out.
Are you writing this backwards? How did you get this video like this?
Best math content is always the serious and straightforward ones.. Fuck the jokers, you are the king dude
PCA is all cool and stuff, but how did you film this????
He just knows it all.
Lol, not even the first principal component! :)
I like your explanation. Please check equation 1.26 on your databook.
I'm so confused by this board. Are you writing backwards? Am I looking at a mirror image of you??
I was following ok until the part that started with writing the lambdas. I got lost there.
Also, how is this and the nipals algorithm related?
is it a undergrad topic ?
took me a minute to realise you record this and then mirror the video, rather than learning to write backwards hahaha
Bro how do you put someone's race in matrix
Great question! Any categorical variable (sex, race, what state you were born in, etc.) can be put into a numerical matrix by creating extra columns, one for each category, and then assigning a value of 0 or 1 for each of these categories, depending on which category best describes the person.
Maybe a non-human example: If we are building a matrix to describe movies, and we want to include the category of genre, we would need to add extra columns for each movie genre (sci-fi, action, horror, romance, comedy, etc.). And then depending on the movie, we would assign a 1 to the category the movie belongs to and 0's to all of the other categories. Now here is where it can get interesting. Lots of movies are somewhere between these sharp categories. So "The Terminator" is somewhere between sci-fi, action, and a bit of horror, so maybe it would be 0.3, 0.6, and 0.1 in these categories (adding up to 1). Other categories can be more strict, like the make of a car (there is no halfway between a Honda and a Ford).
So although there is no perfect categorization, you can define a rough set of groups and then individuals can be partially in multiple categories. This is one way to add categorical data like sex and race to a numerical matrix while allowing for non-exclusive categories. I'm not sure, but I think the US census treats each multiracial combination as its own category, so that entries are only 1's and 0's. So my kids, who are half white and half asian would have a "1" in the multiracial white/asian data column. I'm not 100% sure about how the census works, but this is what I remember.
easy. run:
magic(10)>5; imshow(ans);
for best results first run
spy;
Wow, excellent explanation. Thank you so much.
are you actually writing right to left or there is some hokos pokos thing involved?
image mirrored?
Amazing video. To the point and efficient.
Glad it was helpful!
how is he able to write backwards so smoothly?
Why eigen?