For everyone asking about the bowl and eigenvalues analogy: Let X= (x,y) be the input vector (so that I can write X as a vector) and consider the energy functional f(X)=X^t S X. What would happen if we evaluate on the eigenvalues? First, why would I think to do this? The eigenvectors of the matrix give the "natural coordinates" to express the action of the matrix as a linear transformation, which then gives rise to all the "completing the square" type problems with quadratic forms in usual LA classes. The natural coordinates rotate the quadratic so it doesn't have off-diagonal terms. This means the function changes from something like f(x,y)=3x^2+6y^2+4xy to something like f(x,y)=(x^2+y^2)=(||X||^2), where ||X||^2 denotes the squared norm. So the functional looks like a very nice quadratic in this case, like the ones you may learn how to draw in a multivariate calc course. Going back to the current calculation which f(X)=X^tSX: if we evaluate in the eigen-directions, then our function becomes f(X_1)=X_1^t S X_1=X_1 lambda_1 X_1= lambda_1 ||X_1||^2 (a nice quadratic) and f(X_2)=X_2^t S X_2=X_2 lambda_2 X_2= lambda_2 ||X_2||^2 (another nice quadratic). The eigenvalues lambda_1, lambda_2 become scaling coefficients in the eigen-directions. A large scaling coefficient means we have a steep quadratic and a small coefficient means we have a quadratic that is stretched out horizontally. If the eigenvalue is close to zero, the quadratic functional will almost look like a horizontal plane (really, the tangent plane will be horizontal) and hence not be invertible, so any solver will have difficulty finding a solution due to infinitely many approximate solutions. Since the solver will see a bunch of feasible directions, it will bounce around the argmin vector without being able to confidently declare success. Poor solver. Of course, these are purely mathematical problems; rounding error will probably mitigate the search even further. Edit: changed "engenvalue" to "eigenvector" in 2nd paragraph.
at 41:20, why the rank 1 matrix has 2 zero eigenvalues? because 3 - 1 = 2? does the professor mean that number of zero eigenvalues always equals to nullity of that matrix?
Staring at 22:00, should not we follow in the opposite of the gradient direction to reach minima? Gradient gives the steepest ascent directions as far as I know.
at 14:18, the energy can also so be EQUAL to 0 (not JUST bigger than 0)! Then does this not mean that the matrix is positive SEMI definite as opposed to positive definite?
At 28:00 what is the intuition behind shape of the bowl and large/small eigenvalues? He made it sound like a quite obvious statement. Also at 36:50, given that S and Q-1SQ are similar implies they have same eigen values. However, how do you show S and Q-1SQ are similar? OK I figured out the 36:50 part. It is the spectral theorem which sir had covered in previous class. S = Q (lambda) Q-1. Lambda = Q-1 S Q. As, lambda is defined as the matrix of eigen values of S, this implies that S and Q-1 S Q are similar. Please explain the part at 28:00 . Thanks!
Regarding similarity you don't need the spectral theorem, just to remember that we say that A and B are similar if there exists an invertible matrix M such that A = M^(-1) * B * M You can immediately verify that if A = Q^(-1) * S* Q, B = S, and M=Q, then the equation is satisfied so A=Q^(-1) *S* Q and B=S are similar. Regarding the bowl statement, it should be pretty clear when the eigenvectors are [1,0] and [0,1]. In that case the energy function is given by: [x,y] * S * [x,y]^T = x^2 * lambda1 + y^2 * lambda2. So in the xz-plane it is just the quadratic function scaled by lambda1. In the yz-plane it is just the quadratic function scaled by lambda2 (and in general it is a linear combination of the two). If either eigenvalue is much larger than the other the scalings will be disproportionate and therefore we will get a bowl with a steep slope in the direction of the large eigenvalue, and pretty flat slope in the direction of the small eigenvalue. However the whole point of diagonalization is that basically we can treat any diagonalizable matrix like the diagonal matrix of its eigenvalues as long as we do the appropriate orthogonal base change (or equivalently work in the correct coordinate system), so really we already know that the general bowl will be an orthogonal transformation of the bowl described above and therefore itself be a narrow valley bowl. Concretely, if v1,v2 is an orthonormal basis of eigenvectors of S, with associated eigenvalues lambda1,lambda2, then the energy function is v^T QDQ^T v where Q is the orthonormal matrix whose columns are v1,v2. D is the diagonal matrix with elements lambda1,lambda2. We can write v as a unique linear combination of the eigenvectors (it is a basis after all): v= x * v1 + y * v2 Then the energy function evaluates to: v^T QDQ^T v = v^T QD [x,y]^T = v^T Q[lambda1 * x, lambda2 * y] = v^T (lambda1 * x * v1 + lambda2 * y * v2) = lambda1 * x^2 + lambda2 * y^2, so again it is a bowl which in the direction of v1 is a 1-dimensional quadratic scaled by lambda1, and in the direction of v2 is a 1-dimensional quadratic scaled by lambda2. So if lambda1 is huge the slope in the direction v1 will be steep. Same as before, just from the point of view of the coordinate system given by the eigenvectors (v1,v2).
He means difference between eigenvalues, |lambda1 - lambda2|, is big, then we have the case where "the bowl is long and thin" he mentions right before that.
The eigenvectors with non zero eigenvalues must be mapped to somewhere within the column space, in all other directions outside the column space it collapses to 0, bear in mind that the null space vectors are also solutions to Ax=\lambda x where \lambda is 0.
The answer is at 41:17 ... you notice how we can decompose the matrix into a weighted sum of its eigenvectors.. the weights being the eigenvalues obviously, and since Rank(A) is by definition the number of linearly independent vectors in the column space of A, i.e., it is the same as the number of non-zero terms in the decomposition, which is in turn the number of non-zero eigenvalues
@@mitocw are the Julia language online asigmants mentioned also available somewhere? I see only problems from the textbook in the Assignments section of the OCW
@@mitocw I have a question about the ruclips.net/video/xsP-S7yKaRA/видео.html Where can I find this lab work about convolution? On MIT OpenCourceWare at ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/assignments/ I can find only book assignments ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/assignments/MIT18_065S18PSets.pdf#page=7 Could you help me? Thanks!
always a minus fault .... sqrt(68) not sqrt(60) , so one eigenvalue neg , yes .... 🤣😊 But now does Matlab opposite , to mine abc formula : (8 +/- sqrt(68))/2 for eigenvalues 🙄
DR. Strang thank you for another classic lecture and selection of examples on Positive Definite and Semidefinite Matrices.
For everyone asking about the bowl and eigenvalues analogy:
Let X= (x,y) be the input vector (so that I can write X as a vector) and consider the energy functional f(X)=X^t S X. What would happen if we evaluate on the eigenvalues?
First, why would I think to do this? The eigenvectors of the matrix give the "natural coordinates" to express the action of the matrix as a linear transformation, which then gives rise to all the "completing the square" type problems with quadratic forms in usual LA classes. The natural coordinates rotate the quadratic so it doesn't have off-diagonal terms. This means the function changes from something like f(x,y)=3x^2+6y^2+4xy to something like f(x,y)=(x^2+y^2)=(||X||^2), where ||X||^2 denotes the squared norm. So the functional looks like a very nice quadratic in this case, like the ones you may learn how to draw in a multivariate calc course.
Going back to the current calculation which f(X)=X^tSX: if we evaluate in the eigen-directions, then our function becomes f(X_1)=X_1^t S X_1=X_1 lambda_1 X_1= lambda_1 ||X_1||^2 (a nice quadratic) and
f(X_2)=X_2^t S X_2=X_2 lambda_2 X_2= lambda_2 ||X_2||^2 (another nice quadratic). The eigenvalues lambda_1, lambda_2 become scaling coefficients in the eigen-directions. A large scaling coefficient means we have a steep quadratic and a small coefficient means we have a quadratic that is stretched out horizontally.
If the eigenvalue is close to zero, the quadratic functional will almost look like a horizontal plane (really, the tangent plane will be horizontal) and hence not be invertible, so any solver will have difficulty finding a solution due to infinitely many approximate solutions. Since the solver will see a bunch of feasible directions, it will bounce around the argmin vector without being able to confidently declare success. Poor solver. Of course, these are purely mathematical problems; rounding error will probably mitigate the search even further.
Edit: changed "engenvalue" to "eigenvector" in 2nd paragraph.
Lecture starts at 2:50
Positive Semi-Definite matricies: 38.01
thanks man
What a king
u the man.
38:01
listening to Strang is like getting a brain massage
Im only half through one lecture and I already love him. :'D
I was going through a headache, after 15 minutes of his lecture it got evaporated.
Sure
at 41:20, why the rank 1 matrix has 2 zero eigenvalues? because 3 - 1 = 2? does the professor mean that number of zero eigenvalues always equals to nullity of that matrix?
Staring at 22:00, should not we follow in the opposite of the gradient direction to reach minima? Gradient gives the steepest ascent directions as far as I know.
I think you are right
Yes
I wish Strang was my grandfather
maybe he s not because he will be sad if his grandson s stupid and cannot inverse a matrix.... just kidding XD
@@NguyenAn-kf9ho lol
Wishing he was and isn't ?
Better wishing he is.
This professor is the platonic version of a professor
at 14:18, the energy can also so be EQUAL to 0 (not JUST bigger than 0)! Then does this not mean that the matrix is positive SEMI definite as opposed to positive definite?
came here from 18.06 fall 2011 Singular value decomposition taught by Professor Strang
I am doing a project on this topic it really helped me a lot..thank you
@@vishalyadav2958 yes
U r doing phd or post grad?
U can follow horn n johnson and strang book... it's relatively easier to understand
At 28:00 what is the intuition behind shape of the bowl and large/small eigenvalues? He made it sound like a quite obvious statement.
Also at 36:50, given that S and Q-1SQ are similar implies they have same eigen values. However, how do you show S and Q-1SQ are similar?
OK I figured out the 36:50 part. It is the spectral theorem which sir had covered in previous class. S = Q (lambda) Q-1.
Lambda = Q-1 S Q. As, lambda is defined as the matrix of eigen values of S, this implies that S and Q-1 S Q are similar.
Please explain the part at 28:00 . Thanks!
Regarding similarity you don't need the spectral theorem, just to remember that we say that A and B are similar if there exists an invertible matrix M such that
A = M^(-1) * B * M
You can immediately verify that if A = Q^(-1) * S* Q, B = S, and M=Q, then the equation is satisfied so A=Q^(-1) *S* Q and B=S are similar.
Regarding the bowl statement, it should be pretty clear when the eigenvectors are [1,0] and [0,1]. In that case the energy function is given by:
[x,y] * S * [x,y]^T = x^2 * lambda1 + y^2 * lambda2.
So in the xz-plane it is just the quadratic function scaled by lambda1. In the yz-plane it is just the quadratic function scaled by lambda2 (and in general it is a linear combination of the two). If either eigenvalue is much larger than the other the scalings will be disproportionate and therefore we will get a bowl with a steep slope in the direction of the large eigenvalue, and pretty flat slope in the direction of the small eigenvalue.
However the whole point of diagonalization is that basically we can treat any diagonalizable matrix like the diagonal matrix of its eigenvalues as long as we do the appropriate orthogonal base change (or equivalently work in the correct coordinate system), so really we already know that the general bowl will be an orthogonal transformation of the bowl described above and therefore itself be a narrow valley bowl.
Concretely, if v1,v2 is an orthonormal basis of eigenvectors of S, with associated eigenvalues lambda1,lambda2, then the energy function is
v^T QDQ^T v
where
Q is the orthonormal matrix whose columns are v1,v2.
D is the diagonal matrix with elements lambda1,lambda2.
We can write v as a unique linear combination of the eigenvectors (it is a basis after all):
v= x * v1 + y * v2
Then the energy function evaluates to:
v^T QDQ^T v = v^T QD [x,y]^T
= v^T Q[lambda1 * x, lambda2 * y]
= v^T (lambda1 * x * v1 + lambda2 * y * v2)
= lambda1 * x^2 + lambda2 * y^2,
so again it is a bowl which in the direction of v1 is a 1-dimensional quadratic scaled by lambda1, and in the direction of v2 is a 1-dimensional quadratic scaled by lambda2. So if lambda1 is huge the slope in the direction v1 will be steep. Same as before, just from the point of view of the coordinate system given by the eigenvectors (v1,v2).
@@ramman405 thanks
@32:00, Prof mentions "if the eigenvalues are far apart, that's when we have problems". What does he mean by that?
He means difference between eigenvalues, |lambda1 - lambda2|, is big, then we have the case where "the bowl is long and thin" he mentions right before that.
@@nguyennguyenphuc5217, yes, it looks like it would make it easier to miss the point and bounce back and forth around the minimum
@@gabrielmachado5708 right. it the bowl is narrow and your descent is slightly off you'll start climbing again.... so we take baby steps.
20:49 gradient descent
who's that eager student answering every question for everyone else on every class?
Sooo love Prof. Strang!!
10:00 energy
19:00 convex
14:00 deep learning
24:00 gradient descent
27:00 eigenvalue tells the shape of the bowl
38:00 semi def pos
I think the shape of the bowl will change when we add (x^T)b at 17:00 . Am I right???
It will shift or tilt the bowl in X axis direction. You can try the vizualizer al-roomi.org/3DPlot/index.html
@@jeevanel44 Hey, sorry to bother you a year later - what expression would I input to receive the bowl shown here?
Who could possibly dislike this?
who can't understand that.
Awesome video sir! Thank you!
Love you sir .love from India .
14:17 14:17 14:17
Where was the energy equation mentioned in previous lectures?
What is meant by energy whe X^t S X multiplication is carried?
Are you asking why this quadratic form is called energy?
@@spoopedoop3142 yes exactly
@@CM-Gram Kinetic energy is 1/2mv^2, where v is the velocity vector, and potential energy is 1/2kx^2, where x is the position vector.
Hopefully I can still love science at this age
I am here to leave a like to the legend.
At 41min, Why is the number of nonzero eigenvalues the same as rank(A)?
The eigenvectors with non zero eigenvalues must be mapped to somewhere within the column space, in all other directions outside the column space it collapses to 0, bear in mind that the null space vectors are also solutions to Ax=\lambda x where \lambda is 0.
The answer is at 41:17 ... you notice how we can decompose the matrix into a weighted sum of its eigenvectors.. the weights being the eigenvalues obviously, and since Rank(A) is by definition the number of linearly independent vectors in the column space of A, i.e., it is the same as the number of non-zero terms in the decomposition, which is in turn the number of non-zero eigenvalues
@@fustilarian1 Thanks for your explanation. That's very helpful.
These are great lectures! Is the autograder and programming assignment available somewhere?
yes when u get admitted to MIT u can take up the class and partake in assignments
Very comprehensive. Thanks
Does he mean "a * a^T" near the end of the video?
Where can I find the online homework? I can't find it in OCW.
The homework can be found in the Assignments section of the course on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!
@@mitocw
are the Julia language online asigmants mentioned also available somewhere? I see only problems from the textbook in the Assignments section of the OCW
julialang.org/
@@mitocw Where can we locate the programming assignments?
@@mitocw I have a question about the ruclips.net/video/xsP-S7yKaRA/видео.html
Where can I find this lab work about convolution?
On MIT OpenCourceWare at
ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/assignments/
I can find only book assignments
ocw.mit.edu/courses/mathematics/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/assignments/MIT18_065S18PSets.pdf#page=7
Could you help me? Thanks!
you ate the best
when does he prove 3?
Great work
Can we find homeworks/labs online?
The course materials are available on MIT OpenCourseWare at: ocw.mit.edu/18-065S18. Best wishes on your studies!
Well thanks prof.
hello, could anyone explains me the difference between energy function ans snorm taught by professor in lecture 8
always a minus fault .... sqrt(68) not sqrt(60) , so one eigenvalue neg , yes .... 🤣😊 But now does Matlab opposite , to mine abc formula : (8 +/- sqrt(68))/2 for eigenvalues 🙄
Octave : -0.12311 , 8.12311 agrees with abc formula
Matlab too 😀
Thanks a lot !
Thanks professor.
Amazing
Voice ❤️
Hi , I need cours about matrices polynomial please .
No we don't have to use gradient decent in this case
Math ❤️
Duster ❤️
Mic ❤️
Chalk ❤️
Accent ❤️
i hope this professor doesnt get any sexual assault charges with that much winking because his lectures are awesome.
🤣
I guess not, unless the air gets personified and files a case.
看着他越来越老 唉 时光
I see that this professor does not take question in class. . Maybe if you email him.
Maybe no one raise their hand
wow hes old now....
20220517簽
what is Convext? like that ....hahah