When I remember that var(x) is the same as cov(x, x), the formulas in the covariance matrix seem more consistent and make more sense to me. In other words, the whole matrix can also be defined in terms of covariances alone.
thank you for this, there were so many hidden tidbits of knowledge in this. thank you for making these and appreciate the attention in explaining small details.
Extremely helpful and easy to understand as someone new to this topic. Thank you for your work and actually showing examples with numbers for how each part in the covariance matrix was calculated.
Thank you Luis for always putting sense before the equations. May I suggest a related topic to cover: Gaussian Process? So far the only RUclips video that I found intuitive is "Vincent Warmerdam: Gaussian Progress | PyData Berlin 2019". Even after watching several times, I feel that I'm still missing something fundamental, like how the conditioning on data works, or how to predict using multivariate features. A walkthrough of a real problem solved with Gaussian Process will be really helpful!
This is a great visualization and a perspective that should everyone need to know. To see what is the magic behind the scene, visualization is best way as always...
Hola Luis!! your videos are very nice to get an intuitive approach of what I am doing. I have seen your videos about Covariance and PCA few times now. Any plans for iterative closest point relating two sets of points? Muchas gracias pelo trabajo!
Great video, very intuitive Thanks a lot. At 12:25 in the variance formula, we divide by the sum of all weights and not the sum of weights squared and it is the same for the covariance, right?
Thank you for this great video. A bit inconsistency between the correction 1/3 (under ur comments) and the formula alpha^2 (at 12:24). I think the formula is correct and in the concrete example 1/3 should changed to be 1/9.
Luis can you tell us whats your approach when learning a new thing or concept like where do you head to first? is it google, youtube , or some text books
Good video! Does the highly correlated future in our dataset produces less ERROR when regresson is run? How to determine if a particular feature in our dataset should be considered or not??
Thank you for the video. I have heard of the covariance matrix from the context of processing velocity-map imaging. I am finding it a bit hard to tie the information you have shared back to this context (how does constructing a covariance matrix help construct a covariance-map images?) Wonder if anyone can help me make the connection :)
Hi Luis, fantastic and clear video but quick question, at 11:01 are we sure we need to square the 1/3 weight? I don't follow the intuition of this as I'd assume having a single point weighted 1/3 would be equivalent to a situation where that point has a weight of 1 and the remaining three points have weights of 3 (said another way, imagine three of the points [(-0.4, 0.8), (1.6, 0.8), (-0.4, -1.2)] being duplicates three times and stacked upon one another). If this were the case we'd calculate the variance of 10 points of equal weights [-0.4, -0.4, -0.4, 1.6, 1.6, 1.6, -2.4, -0.4, -0.4, -0.4] with a value of 1.44 Thank you in advance for any thoughts on this question and for making such a great video! I just pre-order your upcoming ML book this morning and subscribed to your channel.
Hi Luis, this is an excellent video, and I thank you for making it. At 3:42 you say "Why square", then your answer is so the negative numbers do not cancel out the positive numbers. By that logic, then why not use absolute values and achieve the same. Turns out that the technical explanation for using squares seems hard to come by, and is not at all obvious. Perhaps you can do some digging and do another video on this? I found the discussions at Cross Validated forum to be helpful.
thanks for the explanation! I'm confused about the portion or weighted points, what does it mean when a data point has a fraction? Besides, at 8:58 isn't the covariance divided by (n-1)?
Thanks! Luis for the awesome video about covariance. I would like to ask at 10:56 , shouldn't we divide by 1+1+1+(1/3)^2? same happen to finding covariance at 11:37. we should divide by 1+1+1+(1/3)^2, shouldn't we? please correct me if i was wrong.
If the data in the examples in the beginning weren't centered at 0, what would multiplying just the coordinates (without subtracting the mean) represent?
I hope if you can make new lessons about generative modeling algorithms. It promosing for many fields. I have check your GAN lesson, but looking for more from amazing teacher.
thanks. I am viewing your unlisted video! and I got the discount for the book, reading it in my kindle before sleep is a blessing. I have a little query: 1.what is meant the by half/3-4th of a point? the size of the point is virtual, which I confused at first time. now I get it: it is a portion of the vector associated with each point. now the classic question arrrises: how many clusters? how do we know? 2. It is similar to kmeans, where we pick k-cluster centres and here pick "k" gaussian distribution. Can you please compare different clusterings in one video and what is the limitations, change, or similarity between them and how to overcome the limitations, or in what situation which method is to be used in real world data?
Thanks Sourav, great questions! 1. In some algorithms like Gaussian mixture models, you need only a fraction of the point there. I imagine it as, if every point weighted 1 kg, then points are allowed to weight any fraction of that weight. As for how many clusters, there are many heuristics such as the elbow method that work, you can find it here: ruclips.net/video/QXOkPvFM6NU/видео.html 2. Yes, very similar to K-means. In k-means, we only update the mean, but in GMM we update mean, variances, and covariances. Also in k-means we have hard assignments (a point can belong to only one cluster), but in GMM (ruclips.net/video/q71Niz856KE/видео.html) we have soft assignments (this is why points can be split into several clusters, which goes back to question 1). In real life both are used, but there are times, for example in sound classification (telling voice apart from music and noise, etc), that the clusters really intersect, and you need a soft clustering algorithm to do a better job. Hope that helps! Let me know if there is anything else that needs clarification. :)
XClent. I suppose in a two dimensional space is easier to explain variance than 4 dimensions. Is the variance of every point in space, time included? What if the space is expanding anisotropically (lets say due to gravity wave), could the variance have an extra term?.
its nice when somebody explain this, who knows what he is talking about... very nice and thank. and for all the profesors and teachers out there who cant explain this like this dude.... quit your job 😑
Good visuals, great teaching, best quality. Thank you! Your channel and StatQuest have been a huge help through understanding all this math. In this video, an extra example with a 3x3 or 4x4 covariance matrix would have been awesome, but I understand you might not have gone into it to simplify things (since 3D/4D)
I think in the second part of the video the calculation of the var(x) var(y) covar(x,y) is not done correctly..It should not be divided by 4 instead it should be divided by summation of square of weights..
When I remember that var(x) is the same as cov(x, x), the formulas in the covariance matrix seem more consistent and make more sense to me. In other words, the whole matrix can also be defined in terms of covariances alone.
thank you for this, there were so many hidden tidbits of knowledge in this. thank you for making these and appreciate the attention in explaining small details.
After watching many videos on the subject, this one finally helped me understand. Thank you
this video deserves more views. Incredible work, thank you.
Just want to leave a comment so that more people could learn from your amazing videos! Many thanks for the wonderful and fun creation!!!
You know the way how people understand. Keep posting videos. These are much elaborated.
All your videos are fun to watch. Please continue making such high-quality content videos...👏
While I know covariance matrix, It is always interesting to learn concepts from your perspective.
Extremely helpful and easy to understand as someone new to this topic. Thank you for your work and actually showing examples with numbers for how each part in the covariance matrix was calculated.
seriously one of the best and most intuitive channels on this subject. I can show your videos to my child and he will understand
Very clearly explained, well done and many thanks!
Fantastic video. Made the covariance very intuitive. Thank you!
Thank you Luis for always putting sense before the equations. May I suggest a related topic to cover: Gaussian Process? So far the only RUclips video that I found intuitive is "Vincent Warmerdam: Gaussian Progress | PyData Berlin 2019". Even after watching several times, I feel that I'm still missing something fundamental, like how the conditioning on data works, or how to predict using multivariate features. A walkthrough of a real problem solved with Gaussian Process will be really helpful!
This is a great visualization and a perspective that should everyone need to know. To see what is the magic behind the scene, visualization is best way as always...
¡Gracias Luis!
Hola Luis!! your videos are very nice to get an intuitive approach of what I am doing. I have seen your videos about Covariance and PCA few times now. Any plans for iterative closest point relating two sets of points? Muchas gracias pelo trabajo!
In my opinion it would be useful to see connection between the covariance matrix and matrix transformations. Could you make a video on that please?
Amazing video!! Never felt like I understood as well I do now - favorited
Thank you very much! ❤ Easily explained and everything included! :) 🎉
thanks dude, couldn't understand any explanation of all that before i found your video
Thx a lot sir. I am new in the BI Worl and have been strugle to understand those notions. Problem resolved today ! Thx a lot again.
Awesome ! Thank you so much. This is a very lucid explanation.
absolutely brilliantly explained. thank you
Great video, very intuitive Thanks a lot.
At 12:25 in the variance formula, we divide by the sum of all weights and not the sum of weights squared and it is the same for the covariance, right?
Thanks a lot ! ...was stuck at a concept in a research paper..resolved many doubts
Very clearly explained!! Thanks
This guy has a gift to make the tough look easy!
Thank you for this great video. A bit inconsistency between the correction 1/3 (under ur comments) and the formula alpha^2 (at 12:24). I think the formula is correct and in the concrete example 1/3 should changed to be 1/9.
thanks for your explanations, it's very helpful for me!
best explanation of covariance on youtube
Awesome video ! Thank you for making it.
Very good explanation
Thanks Luis! Could you explain the reason behind shifting center of mass to the origin?
Thanks a lot, that's the best explanation i found, bonne continuation 👍
Thanks for this video, it's just perfect!
thank you! got a new view on the variance and covariance
This is fantastic, Thank you!!
Thank you, what a good explanation
Luis can you tell us whats your approach when learning a new thing or concept like where do you head to first? is it google, youtube , or some text books
Thanks so much for doing this dude
Good video! Does the highly correlated future in our dataset produces less ERROR when regresson is run? How to determine if a particular feature in our dataset should be considered or not??
Thank you for the video. I have heard of the covariance matrix from the context of processing velocity-map imaging. I am finding it a bit hard to tie the information you have shared back to this context (how does constructing a covariance matrix help construct a covariance-map images?) Wonder if anyone can help me make the connection :)
Thanks, very neat and clean
Thank you very much , this help me so much 👏👍
Hi Luis, fantastic and clear video but quick question, at 11:01 are we sure we need to square the 1/3 weight?
I don't follow the intuition of this as I'd assume having a single point weighted 1/3 would be equivalent to a situation where that point has a weight of 1 and the remaining three points have weights of 3 (said another way, imagine three of the points [(-0.4, 0.8), (1.6, 0.8), (-0.4, -1.2)] being duplicates three times and stacked upon one another). If this were the case we'd calculate the variance of 10 points of equal weights [-0.4, -0.4, -0.4, 1.6, 1.6, 1.6, -2.4, -0.4, -0.4, -0.4] with a value of 1.44
Thank you in advance for any thoughts on this question and for making such a great video! I just pre-order your upcoming ML book this morning and subscribed to your channel.
Thank you very much for this video
This video was very useful for me
Hi Luis, this is an excellent video, and I thank you for making it. At 3:42 you say "Why square", then your answer is so the negative numbers do not cancel out the positive numbers. By that logic, then why not use absolute values and achieve the same. Turns out that the technical explanation for using squares seems hard to come by, and is not at all obvious. Perhaps you can do some digging and do another video on this? I found the discussions at Cross Validated forum to be helpful.
Hello sir, I don't understand why in the weighted covariance formula the weights are squared? also in the numerator.
thanks for the explanation! I'm confused about the portion or weighted points, what does it mean when a data point has a fraction? Besides, at 8:58 isn't the covariance divided by (n-1)?
Yeah i also think variance an covariance must be summation divided by (n-1)
Thanks! Luis for the awesome video about covariance. I would like to ask at 10:56 , shouldn't we divide by 1+1+1+(1/3)^2? same happen to finding covariance at 11:37. we should divide by 1+1+1+(1/3)^2, shouldn't we? please correct me if i was wrong.
Yes, You are right. he has added a comment for 10:56 and forgot about 11:37.
you are simply awesome
Thank you!
At 1:26 you say variance in the y direction that goes in
the diagonal
is it correct ?
If the data in the examples in the beginning weren't centered at 0, what would multiplying just the coordinates (without subtracting the mean) represent?
I hope if you can make new lessons about generative modeling algorithms. It promosing for many fields. I have check your GAN lesson, but looking for more from amazing teacher.
thanks. I am viewing your unlisted video! and I got the discount for the book, reading it in my kindle before sleep is a blessing. I have a little query:
1.what is meant the by half/3-4th of a point? the size of the point is virtual, which I confused at first time. now I get it: it is a portion of the vector associated with each point. now the classic question arrrises: how many clusters? how do we know?
2. It is similar to kmeans, where we pick k-cluster centres and here pick "k" gaussian distribution. Can you please compare different clusterings in one video and what is the limitations, change, or similarity between them and how to overcome the limitations, or in what situation which method is to be used in real world data?
Thanks Sourav, great questions!
1. In some algorithms like Gaussian mixture models, you need only a fraction of the point there. I imagine it as, if every point weighted 1 kg, then points are allowed to weight any fraction of that weight. As for how many clusters, there are many heuristics such as the elbow method that work, you can find it here: ruclips.net/video/QXOkPvFM6NU/видео.html
2. Yes, very similar to K-means. In k-means, we only update the mean, but in GMM we update mean, variances, and covariances. Also in k-means we have hard assignments (a point can belong to only one cluster), but in GMM (ruclips.net/video/q71Niz856KE/видео.html) we have soft assignments (this is why points can be split into several clusters, which goes back to question 1). In real life both are used, but there are times, for example in sound classification (telling voice apart from music and noise, etc), that the clusters really intersect, and you need a soft clustering algorithm to do a better job.
Hope that helps! Let me know if there is anything else that needs clarification. :)
Thank you
At 12:30, If the weighted covariance is divided by sum of ai squares, shouldn't it be (1/3)^2+1+1+1 = 28/9? Is it a typo?
Great ♥ but can anyone explain why we divided by (sum of a^2) instead of (sum of a)... when we calculate the variance of weighted points ?!
XClent. I suppose in a two dimensional space is easier to explain variance than 4 dimensions. Is the variance of every point in space, time included? What if the space is expanding anisotropically (lets say due to gravity wave), could the variance have an extra term?.
Pretty good Man
thank you
at 10:56, shouldn't it be divided by 10/3 instead of 4 as we've 3 and one third data points?
Yikes, you’re right!!!! Thank you!
I’ll add a comment
@@SerranoAcademy Thank you. Excellent video BTW
its nice when somebody explain this, who knows what he is talking about... very nice and thank.
and for all the profesors and teachers out there who cant explain this like this dude.... quit your job 😑
Does variance x = variance y implies covariance of 0?
logic math is interesting when focusing on abstract viewpoints
The x value of the upper-right point in 11:04 turned magically from 1.6 to 2.6 in the next slide... isn't it a typo?
Mr Serrano have you considered opening a patreon or something like that?
Awesome.
Shouldn't the weight in covariance calculation be 1/3 instead of (1/3)^2? Also, the coefficient shouldn't be 1/4 anymore in the weighted case
You are Dope man, simply awesome
please inform us why we make it and why we make it the way it is.
Good visuals, great teaching, best quality. Thank you! Your channel and StatQuest have been a huge help through understanding all this math. In this video, an extra example with a 3x3 or 4x4 covariance matrix would have been awesome, but I understand you might not have gone into it to simplify things (since 3D/4D)
Why is the denominator in the example when alpha is not 1 "4" when sigma(alpha^2) should be (0.33^2 + 1^2 + 1^2 + 1^2)
What type of software do you use to make this animation?
I use Keynote for the animations and iMovie for editing
"we should divide by 1+1+1+1/3, which is 10/3".
Is it not squared as well? 1/(10/3)^2
6:35 9:06
at 4:10, shouldn't it be 8/3 instead of 8/4?
oh wait they have different formulas for population (all) and sample (part of all)
so cool
I think in the second part of the video the calculation of the var(x) var(y) covar(x,y) is not done correctly..It should not be divided by 4 instead it should be divided by summation of square of weights..
Curious why covariance could be related to information
Background music is annoying
Too many adverts. :-(