This series has been enormously helpful. What a gift! Really reminds me of some of my best professors. The best part about video lectures in general is the ability to pause the lecture in order to try out an example on my own before receiving the answer from the lecturer.
After a break came back and continue to be delighted with how the subject is opening up. Cranking through exercises of looking at expansion of sums really helps understand the how's and why's of dummy variables. Thanks
Your videos are really good! I just want to point out that there are easy ways to differentiate functions in matrix notation. Here's one way to do it: f(x) = 0.5 x'Ax - x'b ==> df = 0.5 (dx'Ax + x'Adx) - dx'b = (chain rule: d(fg) = df g + f dg) 0.5 (x'A'dx + x'Adx) - b'dx. If we factorize out the dx we get df = [0.5 x'(A' + A) - b'] dx Since df = gradf dot dx = gradf' dx the gradient of f is 0.5 (A + A')x - b This is sometimes called "Matrix Calculus" (See Magnus & Neudecker's book).
You are correct in that the matrix notation as adequate for first-order analysis. It becomes inadequate for second and higher orders. Furthermore, even the simple calculation above exhibits the "endless profusion of symbols and operators" (against which Hermann Weyl entered an "emphatic protest") which results when one avoids referring to the components of objects. Notice that you have matrices and vectors and the following operations: grad, dot, product, transpose, commutation, and conversion of a 1x1 matrix to a number. My approach only had tensors and contraction. Which approach is better depends on one's personal preference and the problem at hand.
The bit at the end about how it works if the matrix is not symmetric seems to be wrong. You can't add (A^T + A) if A is not symmetric, right? The way this is usually handled is to define the quadratic function f(x) = x^T A^T A x + A^T b^T x, so we have the square matrix A^T A. Love this lecture series! Thank you for posting it.
I used to look this up on wikipedia's page on matrix calculus. Now I can rest assured, if I am ever trapped on a deserted island and solving for the quadratic form minimization of a matrix equation is the only chance of survival, I will survive! That is, until I run out of water...
If A is anti-symmetric, then shouldn't the final answer be Grad f(x) = - b_k, because (A_ik + A_ki)= 0 (since an anti-symmetric A means A_ik = - A_ki) ?
Sorry, I still don't understand why the index of matrix A is ij on the bottom instead of i on top and j on bottom, what is the rule? And do you always put index on top for vectors? if so, what about b??? Thank you!!!
You might be expecting an upper index on the matrix because matrices often represent linear transformations. THe matrix A is a slightly different animal that naturally comes with two lower indices. Components of vectors naturally get upper indices if they are decomposed with respect to a covariant basis. For this problem, however, the placement of indices can be thought of a convenient convention.
Multivariable functions have minima (or maxima) at points where the gradient is zero. Really at points where all individual first partial derivatives are individually zero; which means the same thing.
Here, we're just dealing with the components of the vectors so x^i is the ith component of the vector, which is the same, regardless of whether or not its been transposed. We're not worried about whether the basis is contravariant or covariant, as we're just dealing with the linear algebra problem. Hope I've understood your question/ the lecture correctly.
Normal x is usually written with x^i because it usually denotes a contravariant vector, so x^T would be x_i. But keep in mind matrix notation is just a notation to carry out multiplication, and does not says whether or not the thing you use is contra- or covariant. For example, x^T*y=y^T*x if x and y are n by 1 matrices. But that doesn't mean that if you take a 1-form x and a vector y, suddenly x^i y_i = g_ji x^i g^ji y_i for any metric. Actually, if you don't have a metric, it doesn't even make sense to cannonically go from y_i to y^I, but you will be able to say y^T*x=x^T*y nevertheless. I don't know if you see what I mean
This series has been enormously helpful. What a gift! Really reminds me of some of my best professors.
The best part about video lectures in general is the ability to pause the lecture in order to try out an example on my own before receiving the answer from the lecturer.
After a break came back and continue to be delighted with how the subject is opening up. Cranking through exercises of looking at expansion of sums really helps understand the how's and why's of dummy variables. Thanks
Your videos are really good!
I just want to point out that there are easy ways to differentiate functions in matrix notation. Here's one way to do it:
f(x) = 0.5 x'Ax - x'b ==>
df = 0.5 (dx'Ax + x'Adx) - dx'b = (chain rule: d(fg) = df g + f dg)
0.5 (x'A'dx + x'Adx) - b'dx.
If we factorize out the dx we get
df = [0.5 x'(A' + A) - b'] dx
Since
df = gradf dot dx = gradf' dx
the gradient of f is
0.5 (A + A')x - b
This is sometimes called "Matrix Calculus" (See Magnus & Neudecker's book).
You are correct in that the matrix notation as adequate for first-order analysis. It becomes inadequate for second and higher orders. Furthermore, even the simple calculation above exhibits the "endless profusion of symbols and operators" (against which Hermann Weyl entered an "emphatic protest") which results when one avoids referring to the components of objects. Notice that you have matrices and vectors and the following operations: grad, dot, product, transpose, commutation, and conversion of a 1x1 matrix to a number. My approach only had tensors and contraction. Which approach is better depends on one's personal preference and the problem at hand.
The bit at the end about how it works if the matrix is not symmetric seems to be wrong. You can't add (A^T + A) if A is not symmetric, right? The way this is usually handled is to define the quadratic function f(x) = x^T A^T A x + A^T b^T x, so we have the square matrix A^T A.
Love this lecture series! Thank you for posting it.
Thanks Nina!
This discussion indeed centers around a square matrix A.
For sure! Thank you again for the series.
You can add A^T and A if A is not symmetric. You can't add if they are not the same size.
2019 still rockin´ :D
2021!
Was the definition of a tensor covered in previous lectures?
No, it's coming up. This lecture uses the tensor notation without the tensor property.
Ok, thank you.
I used to look this up on wikipedia's page on matrix calculus. Now I can rest assured, if I am ever trapped on a deserted island and solving for the quadratic form minimization of a matrix equation is the only chance of survival, I will survive!
That is, until I run out of water...
If A is anti-symmetric, then shouldn't the final answer be Grad f(x) = - b_k, because (A_ik + A_ki)= 0 (since an anti-symmetric A means A_ik = - A_ki) ?
I think, if A is antisymetic xT A x = 0 , but ı am not sure.
Sorry, I still don't understand why the index of matrix A is ij on the bottom instead of i on top and j on bottom, what is the rule? And do you always put index on top for vectors? if so, what about b??? Thank you!!!
You might be expecting an upper index on the matrix because matrices often represent linear transformations. THe matrix A is a slightly different animal that naturally comes with two lower indices.
Components of vectors naturally get upper indices if they are decomposed with respect to a covariant basis.
For this problem, however, the placement of indices can be thought of a convenient convention.
How is finding the gradient connected to the problem of minimization?
Multivariable functions have minima (or maxima) at points where the gradient is zero. Really at points where all individual first partial derivatives are individually zero; which means the same thing.
Why x^T is denoted as x^i not as x_i ? In other words - does transpose should make covariant vector contravariant and vice versa?
Here, we're just dealing with the components of the vectors so x^i is the ith component of the vector, which is the same, regardless of whether or not its been transposed. We're not worried about whether the basis is contravariant or covariant, as we're just dealing with the linear algebra problem. Hope I've understood your question/ the lecture correctly.
Normal x is usually written with x^i because it usually denotes a contravariant vector, so x^T would be x_i. But keep in mind matrix notation is just a notation to carry out multiplication, and does not says whether or not the thing you use is contra- or covariant.
For example, x^T*y=y^T*x if x and y are n by 1 matrices. But that doesn't mean that if you take a 1-form x and a vector y, suddenly x^i y_i = g_ji x^i g^ji y_i for any metric.
Actually, if you don't have a metric, it doesn't even make sense to cannonically go from y_i to y^I, but you will be able to say y^T*x=x^T*y nevertheless. I don't know if you see what I mean
lol what happened when the test image came up?
isreasontaboo de
@@micheleashehoug1592 what
Did he flip the room off...? ;)