22:50 It may be more technically accurate to say the SVD matrix goes the other way: rotating from movies (the original matrix columns) to genres (the principle components), scaling by the genre weightings (the diagonal values), then rotating from genres to users (the original matrix rows). This is because the SVD formula UΣ(V^T) would be computed from right to left via the typical matrix multiplication. Even though it's natural to look at matrices from left to right as English speakers, I try to remember that the calculations go from right to left by imagining the vector being like a sausage being put through a grinder--taking the tall column vector, placing it on its side and "grinding" it through the matrix from its top (with the columns being the input holes) and coming out its left (with the rows being the output holes) as a new sausage column vector that's the size of the column space (number of rows) of the matrix. So multiplying matrices are like a stack of meat grinders on top of one another where you can either peak at the vector sausage at each stage of its process or just pretend all matrices are just one big equivalent matrix grinder.
This is a very nice presentation. I like the tie-in to data science and the covariance matrix. One small thing: at 27:22 you say you can compute V similarly to U. But there is a hazard: the eigenvectors of V are dependent on the choices made for U (even Gil Strang ran into this issue). It's best to substitute U back into the original decomposition definition and solve for V (the remaining unknown). I'm enjoying this series.
Thank you! Excellent series, the highest quality explanations. I signed up for the Patreon. Keep up the incredible work. What a tremendous gift to the world!
Some remarks: 1) At 9:50 you say that the height has a bigger variance than the shoe sizes. However, this depends on your unit of measurement! If you would have written your height in meters, this would not be the case. (A way tot standardize is to use the "coefficient of variation", which is the standarddeviation divided by the mean value.) 2) At around 12:30 you state that in the PCA analysis, the height would be the Principal Component. That's not the case either. The principal component is a new, diagonal axis that describes "the average relative bone length factor". This factor predicts the height from shoe size and vice versa. The part of the data which is not explained by this factor is the second principal component.
The SVD is also very useful in solving systems of equations where the number of variables is not the same as the number of (linearly independent) equations constraining those variables since rectangular matrices are not invertible in the standard way. For example, the rectangular matrix used in this video had 4 rows and 8 columns so if it were a system of 4 equations with 8 unknowns (wide and fat matrix), then it would be *under* constrained because there would be more flexibility / degrees of freedom than constraints. But the SVD would help find the "simplest" (smallest Frobenius norm) pseudo inverse matrix. And if it were instead 8 equations of 4 unknowns (tall and skinny), then it would be *over* constrained because there would not be enough flexibility / degrees of freedom under those constraints. But the SVD would help find the pseudo inverse which minimizes the necessary error difference (the L2 norm of the "rejection") between the best possible solution vector that exists in the column space (the projection) compared to the ideal vector in the higher dimensional embedding space.
28:05 Does the fact that symmetric matrices always have real eigenvalues imply that whenever you multiple an arbitrary matrix by its own transpose, you essentially get a matrix with all of its rotation transformations cancelled out? Real eigenvalues would mean that a matrix only stretches (positive >1), squeezes (positive
Thank you for the well explained video. I wonder how this could be applied to financial modelling and risk analysis - my first thought is to run a Monte Carlo analysis with as many variables as possible and record all variable values with the output (for example profit or IRR). Then “just” do the “elipse thing” to figure out what variables are the most impactful?
@@AllAnglesMath 1:27 A 90° rotation on the Z-axis followed by a 45° rotation on the x-axis will transform, as shown, a circle on the ZY plan into a line, making a 45° with the Y-axis. However, should we want to do the inverse path, how can we tell if the original form is a circle or an elipse (obviously, I consider them as 2 different shapes)?
22:50 It may be more technically accurate to say the SVD matrix goes the other way: rotating from movies (the original matrix columns) to genres (the principle components), scaling by the genre weightings (the diagonal values), then rotating from genres to users (the original matrix rows). This is because the SVD formula UΣ(V^T) would be computed from right to left via the typical matrix multiplication.
Even though it's natural to look at matrices from left to right as English speakers, I try to remember that the calculations go from right to left by imagining the vector being like a sausage being put through a grinder--taking the tall column vector, placing it on its side and "grinding" it through the matrix from its top (with the columns being the input holes) and coming out its left (with the rows being the output holes) as a new sausage column vector that's the size of the column space (number of rows) of the matrix. So multiplying matrices are like a stack of meat grinders on top of one another where you can either peak at the vector sausage at each stage of its process or just pretend all matrices are just one big equivalent matrix grinder.
This is a very nice presentation. I like the tie-in to data science and the covariance matrix.
One small thing: at 27:22 you say you can compute V similarly to U. But there is a hazard: the eigenvectors of V are dependent on the choices made for U (even Gil Strang ran into this issue). It's best to substitute U back into the original decomposition definition and solve for V (the remaining unknown).
I'm enjoying this series.
Thanks for pointing that out!
Thank you! Excellent series, the highest quality explanations. I signed up for the Patreon. Keep up the incredible work. What a tremendous gift to the world!
Fantastic exposition. Thanks very much for your great work and insights.
Some remarks:
1) At 9:50 you say that the height has a bigger variance than the shoe sizes. However, this depends on your unit of measurement! If you would have written your height in meters, this would not be the case. (A way tot standardize is to use the "coefficient of variation", which is the standarddeviation divided by the mean value.)
2) At around 12:30 you state that in the PCA analysis, the height would be the Principal Component. That's not the case either. The principal component is a new, diagonal axis that describes "the average relative bone length factor". This factor predicts the height from shoe size and vice versa. The part of the data which is not explained by this factor is the second principal component.
Thank you so much for these corrections. I love it when I learn something in the comments.
Thank you for the video! Very clear & illuminating!
Aw man if only this was uploaded before my linear algebra exam then I would've had a better understanding
The video was already available on Patreon for several months.
I hope your exam went well!
The SVD is also very useful in solving systems of equations where the number of variables is not the same as the number of (linearly independent) equations constraining those variables since rectangular matrices are not invertible in the standard way.
For example, the rectangular matrix used in this video had 4 rows and 8 columns so if it were a system of 4 equations with 8 unknowns (wide and fat matrix), then it would be *under* constrained because there would be more flexibility / degrees of freedom than constraints. But the SVD would help find the "simplest" (smallest Frobenius norm) pseudo inverse matrix. And if it were instead 8 equations of 4 unknowns (tall and skinny), then it would be *over* constrained because there would not be enough flexibility / degrees of freedom under those constraints. But the SVD would help find the pseudo inverse which minimizes the necessary error difference (the L2 norm of the "rejection") between the best possible solution vector that exists in the column space (the projection) compared to the ideal vector in the higher dimensional embedding space.
Fantastic lecture
28:05 Does the fact that symmetric matrices always have real eigenvalues imply that whenever you multiple an arbitrary matrix by its own transpose, you essentially get a matrix with all of its rotation transformations cancelled out? Real eigenvalues would mean that a matrix only stretches (positive >1), squeezes (positive
I never noticed this. At first blush, I guess you are right (assuming of course that we're talking about symmetric matrices with real entries).
I enjoy going through your videos. Thank you for your effort and time. How are you annimatingt your slides? Or what software are you using?
Custom made Python library, with OpenCV to generate the final video.
Love your vids as always 🙌🏻
Very nice video. At 10:00 should the green and purple lines correspond to the respective lengths of the major and minor axes instead?
Yes, they probably should. That's a subtlety that escaped me. Thanks for sharing!
@@AllAnglesMath 😊 Thanks. Keep up the good work!
Thank you for the well explained video. I wonder how this could be applied to financial modelling and risk analysis - my first thought is to run a Monte Carlo analysis with as many variables as possible and record all variable values with the output (for example profit or IRR). Then “just” do the “elipse thing” to figure out what variables are the most impactful?
Sounds like an amazing application. Ambitious, but it can be done.
@@AllAnglesMath unfortunately I’m not advanced enough at math.. really enjoyed the video though
Keep doing such lectures. Kudos
18:24 subtly bashing the Last Jedi 👍
Not even very subtle to be honest. It was just the perfect example for showing what happens to a column with all zeroes 😆
@@AllAnglesMath 1:27 A 90° rotation on the Z-axis followed by a 45° rotation on the x-axis will transform, as shown, a circle on the ZY plan into a line, making a 45° with the Y-axis. However, should we want to do the inverse path, how can we tell if the original form is a circle or an elipse (obviously, I consider them as 2 different shapes)?
👏👍
If you are brave enough read Linear Algebra Done Right, this shit is not PG
tfw The Last Jedi was the only good movie in the final trilogy 😔
Well that's saying something ...