No comments, but fuck, thank you. This was really good, despite some hiccups throughout, but man thank you. My 'Top 20" uni can't even teach this properly.
Hi, and thanks for your video - it's great! I just have one question - I understand that the covariance matrix transforms datapoints in the direction of its eigenvectors. Additionally, I know the eigenvectors will be orthogonal, since the covariance matrix is symmetric. What I don't understand is how we know the eigenvectors of the covariance matrix are the directions of maximum variance...
Eigenectors of a matrix, are those vectors which aren't knocked off their span. In this case we know that these are the vectors in which the co-variance matrix shears the data along.
@@kartiks9489 I understand that. But it all seems to come from the calculation. I couldn't intuitively have seen it...so I'm wondering if there was an intuitive reason why the eigenvectors of the covariance matrix turn out to be the directions to project onto that have the maximum variance...
Thanks for the question challenged me. To understand this translate those points into the coordinate system of which basis are orthonormal eigenvectors. So your transformed points by transformation matrix A in your standard coordinate system are Ax. If you want to see the coordinates of these transformed points in any other coordinate system, you simply multiply these points (vectors) by the inverse of matrix of which columns are the basis of new coordinate system (here it is V each column are eigenbasis). Before doing this, show your matrix A with eigenvalue expansion=VЛinv(V). Then do your change of basis = inv(V)VЛ inv(V). From the equation you can see that any other matrices of coordinate system will not give higher value than this. Because other basis vectors will have angle between these eigenbasis and this multiplication will reduce by cos(alpha).
It is coming from the finding an axis along which the variance of the projected data is maximized. Solving this optimization naturally leads to eignvector of covariance matrix being that axis
I'm not sure why we stretch the data points, but here's an example of why you want an axis that contains points of as large variance as possible. Say you wanted to find the relationship between calorie consumption and IQ. If everyone who you sample have an IQ of 100, no matter the consumption, there is nothing for calorie consumption to explain. But if there is a big difference in people's IQ depending on calorie consumption then you can use that to find a relationship between the two. Conversely, if you only look at groups of people who consume x calories vs who consume x+1 calories (x has little variance), there is not much variation in x to explain wild fluctuations in IQ. You want both to have high variance, ie high covariance. When you input y = cx + b, you want the x variable to have as much variance as possible so you can use it to explain as much variance in y as possible. In PCA you will find as many axis as there are variables but in practice you remove the ones that have the least explaining ability (variance) so that you can reduce the number of variables that yield y. It makes calculations easier, and helps machine learning algorithms learn, among other reasons (google dimensionality reduction).
Very good visual representation of PCA. Thank you
Fanstatic!
Great Video, Thank you
thanks for simple explanation sir... its really helpful...
Great explanation! Thank you
Sweet summarization of covariance relation to eigen-vectors. I'm still looking for summarization of meaning of life.
Eastern Orthodoxy is the meaning of life
Hey your content is good
Please make more videos
Thank you for this breakdown
Thanx for simple explanation. It clearly helps to build intuitive understanding
Nice video man
No comments, but fuck, thank you. This was really good, despite some hiccups throughout, but man thank you. My 'Top 20" uni can't even teach this properly.
Hi, and thanks for your video - it's great! I just have one question - I understand that the covariance matrix transforms datapoints in the direction of its eigenvectors. Additionally, I know the eigenvectors will be orthogonal, since the covariance matrix is symmetric. What I don't understand is how we know the eigenvectors of the covariance matrix are the directions of maximum variance...
Eigenectors of a matrix, are those vectors which aren't knocked off their span. In this case we know that these are the vectors in which the co-variance matrix shears the data along.
@@kartiks9489 I understand that. But it all seems to come from the calculation. I couldn't intuitively have seen it...so I'm wondering if there was an intuitive reason why the eigenvectors of the covariance matrix turn out to be the directions to project onto that have the maximum variance...
I found many complicated answers to this. Did you find a simple answer from intutive standpoint?
Thanks for the question challenged me. To understand this translate those points into the coordinate system of which basis are orthonormal eigenvectors. So your transformed points by transformation matrix A in your standard coordinate system are Ax. If you want to see the coordinates of these transformed points in any other coordinate system, you simply multiply these points (vectors) by the inverse of matrix of which columns are the basis of new coordinate system (here it is V each column are eigenbasis). Before doing this, show your matrix A with eigenvalue expansion=VЛinv(V). Then do your change of basis = inv(V)VЛ inv(V). From the equation you can see that any other matrices of coordinate system will not give higher value than this. Because other basis vectors will have angle between these eigenbasis and this multiplication will reduce by cos(alpha).
It is coming from the finding an axis along which the variance of the projected data is maximized. Solving this optimization naturally leads to eignvector of covariance matrix being that axis
Noice!!
@~7:45 In covariance you have to subtract the mean values.
Why did we choose eigen value and not ant other directors
thanks a lot for clearing my concepts
THE BEST
19:10 what is the point of getting the biggest variance? why not the smallest?
I'm not sure why we stretch the data points, but here's an example of why you want an axis that contains points of as large variance as possible. Say you wanted to find the relationship between calorie consumption and IQ. If everyone who you sample have an IQ of 100, no matter the consumption, there is nothing for calorie consumption to explain. But if there is a big difference in people's IQ depending on calorie consumption then you can use that to find a relationship between the two.
Conversely, if you only look at groups of people who consume x calories vs who consume x+1 calories (x has little variance), there is not much variation in x to explain wild fluctuations in IQ. You want both to have high variance, ie high covariance. When you input y = cx + b, you want the x variable to have as much variance as possible so you can use it to explain as much variance in y as possible.
In PCA you will find as many axis as there are variables but in practice you remove the ones that have the least explaining ability (variance) so that you can reduce the number of variables that yield y. It makes calculations easier, and helps machine learning algorithms learn, among other reasons (google dimensionality reduction).
do you have some Tutorial about Neural network
hi can I get the slides please?
4:03 instead of variance , deviation may be used to express the spread of data from the mean. thanks for the video
Very difficult to understand
성이 여씨이신 한국인