Quick question: at 1:29 the matrix S you give is not symmetric, but when you do the Lagrange multipliers at 7:14 you use the result that the matrix derivative of u^TSu is 2Su which (in your other video "Derivative of a Matrix") you derived for the 2x2 case assuming the matrix was symmetric. Does this result hold for the non-symmetric case? If so, how would I go about showing that? If not, what can be done for the non-symmetric case?
When the matrix S is not symmetric, the gradient of the function f(u) = u^T S u is f'(u) = (S + S^T)u. From the Lagrangian stationarity condition, this implies that 2\lambda should be an eigenvalue of the matrix S+S^T. Therefore, the solution u* of the optimization is the eigenvector of S+S^T corresponding to the largest eigenvalue of S+S^T.
6:45 BIG mistake right here. This derivative ONLY holds when S is symetric, which is NOT true for this case. The general solution is (S+S^T)u, which is easy to show that works. Other than that, good video ;)
thanks for this nice crisp video. If it is minimisation problem - means looking for minimum eigen value and its vectors? Im looking forwad to see the video to detail the basis of lagrange multipliers. Thanks once again.
Now that you have shown us the theoretical parts I request that it would be better to now start building the practical code in Python for each individual video. That would help cause theory to meet with code.
I'd like to have a deeper view about this interpretation. The interpretation I understand intuitively is that of wikipedia, where the gradients of the goal function and of the constraint function have to be colinear. I like the eigenvalue-based interpretation, but something is unclear for me: in the present case, you chose the constraint function in such a manner that we get the simplification that enables to write things as an eigenvalue problem. It's harder for me to interpret it in a more general manner, with any form of goal/constraint functions.... Moreover, what about the fact that we also need to cancel d(Lagrangian function)/d(lambda)? I'm confused about how this could be related to eigenvalue....
From my point of view the eigenvalue/eigenvektor interpretation he uses her is just a SPECIAL CASE for the Problem u.TSu s.t. u.Tu = 1 he is discussing. While the Problem he discusses is certainly interesting the interpretation for a general case is misleading! There is no general connection between eigenvalues and lagrange multipliers (as far as i know). For multiple possible interpretations of lagrange multipliers see Convex Optimization / Stephen Boyd & Lieven Vandenberghe.
How do you choose the "order" of the constraint in your new function? I mean how do you choose to write: uTSu + \lambda(1-uTu) and not uTSu + \lambda(uTu-1) ?
thank you. To me, there is still something fishy going on - you say that you'd want to take the biggest eigenvalue of matrix S and find its corresponding eigenvector - in order to "maximize the success". Those eigenvalues and -vectors are complex-valued (a pair of a complex number and its conjugate), and we can't just compare complex numbers. So the best direction would be spiraling outwards?
u.T S u is a formula I've encountered several times, can anyone help me what it is actually? It seems we're projecting the linear transformation of u onto itself, but what does that tell us?
What if the function that we're maximizing cannot be represented as a simple uT*S*u, only particular functions can. Or are we only interested in uT*S*u in data science anyways
Hey man these videos are great, you deserve more attention
ritvik doesn't just throw a bunch of equations on the board: he puts everything into perspective which is unique in mathy videos.
Great explanation man. Your explanation is way more intuitive than most of the videos out there.
thanks!!
Quick question: at 1:29 the matrix S you give is not symmetric, but when you do the Lagrange multipliers at 7:14 you use the result that the matrix derivative of u^TSu is 2Su which (in your other video "Derivative of a Matrix") you derived for the 2x2 case assuming the matrix was symmetric. Does this result hold for the non-symmetric case? If so, how would I go about showing that? If not, what can be done for the non-symmetric case?
When the matrix S is not symmetric, the gradient of the function f(u) = u^T S u is f'(u) = (S + S^T)u. From the Lagrangian stationarity condition, this implies that 2\lambda should be an eigenvalue of the matrix S+S^T. Therefore, the solution u* of the optimization is the eigenvector of S+S^T corresponding to the largest eigenvalue of S+S^T.
Yeah I was thinking that too
This was fantastic. You have a gift for teaching.
thanks!
One word.....AMAZING......great job
This is great explaination. Thanks a ton.
It would be great if you could make a video on maths behind Logistic Regression as well.
Oh, I have a video on that :) Please check my channel
Great videos.. very well explained.. thank you
Glad you like them!
Just subscribed I like how you explain things and videos are very “snackable.”
thanks!
6:45 BIG mistake right here. This derivative ONLY holds when S is symetric, which is NOT true for this case. The general solution is (S+S^T)u, which is easy to show that works. Other than that, good video ;)
Thank you very much sir. Really appreciate you making these super helpful videos.
thanks!
very clear explanation. good job my friend
Glad it was helpful!
Awesome video mate! It makes so much sense
Well explained!
Thanks a lot for the video. I hadn't realized the relationship between lagrangians and eigenvectors.
holy shit that was magical
6:00 didn't understand why lambda*(1-uTu) isn't 0, since we defined uTu = 1, didn't we?
thanks for this nice crisp video. If it is minimisation problem - means looking for minimum eigen value and its vectors? Im looking forwad to see the video to detail the basis of lagrange multipliers. Thanks once again.
Now that you have shown us the theoretical parts I request that it would be better to now start building the practical code in Python for each individual video. That would help cause theory to meet with code.
I'd like to have a deeper view about this interpretation.
The interpretation I understand intuitively is that of wikipedia, where the gradients of the goal function and of the constraint function have to be colinear.
I like the eigenvalue-based interpretation, but something is unclear for me: in the present case, you chose the constraint function in such a manner that we get the simplification that enables to write things as an eigenvalue problem.
It's harder for me to interpret it in a more general manner, with any form of goal/constraint functions....
Moreover, what about the fact that we also need to cancel d(Lagrangian function)/d(lambda)? I'm confused about how this could be related to eigenvalue....
From my point of view the eigenvalue/eigenvektor interpretation he uses her is just a SPECIAL CASE for the Problem u.TSu s.t. u.Tu = 1 he is discussing.
While the Problem he discusses is certainly interesting the interpretation for a general case is misleading!
There is no general connection between eigenvalues and lagrange multipliers (as far as i know).
For multiple possible interpretations of lagrange multipliers see Convex Optimization / Stephen Boyd & Lieven Vandenberghe.
How do you choose the "order" of the constraint in your new function?
I mean how do you choose to write:
uTSu + \lambda(1-uTu) and not
uTSu + \lambda(uTu-1) ?
It doesn't matter. Both should lead to the same result.
your intuition level is just A+. I think you should do ds and ml full time
thank you. To me, there is still something fishy going on - you say that you'd want to take the biggest eigenvalue of matrix S and find its corresponding eigenvector - in order to "maximize the success". Those eigenvalues and -vectors are complex-valued (a pair of a complex number and its conjugate), and we can't just compare complex numbers. So the best direction would be spiraling outwards?
Thank you man
Amazing
Great :)
thanks!
u.T S u is a formula I've encountered several times, can anyone help me what it is actually? It seems we're projecting the linear transformation of u onto itself, but what does that tell us?
long live and prosper
What if the function that we're maximizing cannot be represented as a simple uT*S*u, only particular functions can. Or are we only interested in uT*S*u in data science anyways