This channel is a blessing. I've had some really bad professors and I've had some really good professors. But even the really good professor never made the concepts click with me as well as these videos do. Like, not only do I understand the math better, but just the little diagram you drew showing A's column space, and visibly showing how b is outside of A's column space yet could still be approximated using a vector v in A's column space, like idk how else to describe it but that just made it click for me. Edit: I guess _one_ way to describe it and how it clicked for me: So we use a line to approximate a bunch of data points on a graph, or plane. If these data points were in a straight line, the "approximation" would have no error. However, this is often not the case. Now, think about the equation y=mx+b. Let's use c instead of b to avoid confusion in the next step. So we have y=mx+c. This is the equation used to represent our line. Suppose b=y-c. Then we have mx=b, which looks a lot like Ax=b. And it is! A is just a 1x1 matrix. So the line is bounded by the column space of A, or m, and our variable(s) (in this case, just x) can be changed to get b. Just basic algebra: if m=3 and b=6, then x=2. But say b is a 2D vector, e.g. b=(1, 2)^T. Well now, no matter what x you use, you can't get b (unless b just happens to lie on the line). You can only get as close to b as the column space of A will allow you. In the diagram drawn in the video, the column space of A is a plane, so the span of A is 2. For simplicity, let's suppose A is a 3x2 matrix (a geometrical interpretation of this is that A is a 2D plane "floating" in a 3D space). b appears to be a 3D vector (so while A is only a 2D slice of the 3D space, b is a point that could be anywhere in the 3D space). So, just like before, we try to use a line bounded by the column space of A to get as close to b as possible by changing our variables (in this case, x1 and x2). Correct me if my understanding is wrong :)
This lesson is fantastic! I understood the problem in only 15 minutes! You're absolutely better than my numerical analysis teacher at university, that can't properly teach an argument in two hours! Thank you!
I was doing an online machine learning course and got lost when the lecturer introduced the normal equation (which this is, with a different name). Needless to say, I'm finna binge-watch your linear algebra lectures now because I get insecure about using equations I don't understand. Thanks for the playlist, I really wanna put ML in my toolset so we're doing this!
@@khaledsherif7056 not sure which course he used for ML, but I'm studying Machine Learning by Andrew NG on Coursera. When he was teaching us normal equation as an alternative to gradient descent In Week 2 of the course, I realized I had seen this in Linear algebra but with a different name which is the title of this video.
You are like a billion times better than my professor... and my professor isn't even bad. On the contrary he's my favorite! You're just even better at explaining things. Plus it's impossible for me to lose focus with the pretty colors and your beautiful handwriting. lol I have my Linear Algebra final tomorrow (technically today) and I owe the A that I'm sure to get to you and all your helpful videos!
The big picture by stating one application where this can be used Image you have a set of data points and you are asked to predict a particular value based on x or y where one of them is given. Those points when you plot them via a scatter plot and draw an imaginary line connecting all the points you will notice that the plot is not linear but is quadratic. During that time you will think of quadratic equation to find a solution to your estimation problem i.e. y = ax^2 + bx + c Now to find the co-efficients i.e. a, b and c of the equation one of the ways you can use is least squares approximation method that can help you find the values. I do recall Sal got into vector spaces and few more advance linear algebra things which might not sound easy at first. But don't get boggled down into the calculation part computers can do this easily nowadays. I used to have this bad habit of memorizing formulas and ways of solving problem without actually intuiting where and how this is actually used. Focusing on the applications gives a different level of motivation.
This is a good preface before machine learning. The star notation is always the most optimal/best, and you can gradient descent to minimize the square error
It would be great having links when says "I explained (whatever) in a different video" to access that explanation. In this case I wanted to know why C(A)transpose=N(Atranspose). Thanks¡
www.khanacademy.org/math/linear-algebra/alternate-bases/othogonal-complements/v/linear-algebra-orthogonal-complements go through this to understand how C(A)transpose=N(Atranspose).
consider any vector x perpendicular to Column space of A i.e. belongs to A _|_. Then dot product of A and x is 0, i.e. (A^T)(x) = 0 Now consider b = A^T, so clearly above equation is bx = 0, i.e. x lies in null space of b Thus x lies in null space of A^T also as in the first line I said x belongs to A perpendicular , thus C(A _|_) = null(A^T)
Very useful! In my lecture slides I had this term Hx=z for the same problem and I couldn't make sense of how we could get to this as the best solution: x = (Ht*H)^-1 * Ht * z. Now I understand:-)
For that you need to study orthogonal components, and the concept of what spanning sets are which further derive the concept of column space, null space, etc.
I wish to know how to solve this: x has values of : -2 0 1 2 3 and y : 17 5 2 1 2 and i'm asked to use the least squares method, but i've been absent and i don't know exactly what my teacher ment by that or what that method consists of. Can anyone help me solve this ?
There is always a solution to the least squares problem. Why? x* is in colspace(A) by definition of being a projection from b into C(A) so there must be a set of weights that yield a linear combination of a that equal b.
I tried using this trick for the problem I'm facing, but it turns out that when I multiply AT by A, I get a matrix which isn't invertible, so I still can't solve it. LOL This _still_ seems odd to me, because even if some element in the input matrix A was contributing 0 to the result b, it should _still_ be possible to get a point as close as possible to the result.
nice vid, but why did you take the length squared? i understand that the length of the vector would be sqrt(b1^2 + b2^2...bn^2) but why did you square even that?
"Some of you might already know where this is going.."
Me: Nope
Hahaha
What do u mean
This channel is a blessing. I've had some really bad professors and I've had some really good professors. But even the really good professor never made the concepts click with me as well as these videos do. Like, not only do I understand the math better, but just the little diagram you drew showing A's column space, and visibly showing how b is outside of A's column space yet could still be approximated using a vector v in A's column space, like idk how else to describe it but that just made it click for me.
Edit: I guess _one_ way to describe it and how it clicked for me:
So we use a line to approximate a bunch of data points on a graph, or plane. If these data points were in a straight line, the "approximation" would have no error. However, this is often not the case.
Now, think about the equation y=mx+b. Let's use c instead of b to avoid confusion in the next step. So we have y=mx+c. This is the equation used to represent our line. Suppose b=y-c. Then we have mx=b, which looks a lot like Ax=b. And it is! A is just a 1x1 matrix.
So the line is bounded by the column space of A, or m, and our variable(s) (in this case, just x) can be changed to get b. Just basic algebra: if m=3 and b=6, then x=2. But say b is a 2D vector, e.g. b=(1, 2)^T. Well now, no matter what x you use, you can't get b (unless b just happens to lie on the line). You can only get as close to b as the column space of A will allow you.
In the diagram drawn in the video, the column space of A is a plane, so the span of A is 2. For simplicity, let's suppose A is a 3x2 matrix (a geometrical interpretation of this is that A is a 2D plane "floating" in a 3D space). b appears to be a 3D vector (so while A is only a 2D slice of the 3D space, b is a point that could be anywhere in the 3D space). So, just like before, we try to use a line bounded by the column space of A to get as close to b as possible by changing our variables (in this case, x1 and x2).
Correct me if my understanding is wrong :)
When I get a real job, I will donate my bonus to Khan Academy. This has saved me so much time and you are so awesome.
Did you get a job yet?
Bol na jana❤❤
They are still waiting fir your bonus mate
This lesson is fantastic! I understood the problem in only 15 minutes! You're absolutely better than my numerical analysis teacher at university, that can't properly teach an argument in two hours! Thank you!
I was doing an online machine learning course and got lost when the lecturer introduced the normal equation (which this is, with a different name). Needless to say, I'm finna binge-watch your linear algebra lectures now because I get insecure about using equations I don't understand. Thanks for the playlist, I really wanna put ML in my toolset so we're doing this!
Can you please mention the name/link of the course ?
@@khaledsherif7056 not sure which course he used for ML, but I'm studying Machine Learning by Andrew NG on Coursera. When he was teaching us normal equation as an alternative to gradient descent In Week 2 of the course, I realized I had seen this in Linear algebra but with a different name which is the title of this video.
Comes in handy while studying machine learning.
yes, same
Very true. When I was studying ML, "normal equation", I really thought that I had seen it somewhere. Then I realized I studied it in Lin. algb.
You are like a billion times better than my professor... and my professor isn't even bad. On the contrary he's my favorite! You're just even better at explaining things.
Plus it's impossible for me to lose focus with the pretty colors and your beautiful handwriting. lol
I have my Linear Algebra final tomorrow (technically today) and I owe the A that I'm sure to get to you and all your helpful videos!
7 years later... did you get an A? :)
11 years later did you get that A?
12 years later did you get that A?
Very useful man you are doing an amazing job this literally saved me hours of searching and reading can't thank you enough :)
This was incredible, I started this video off being so confused about the least squares, and I just get it entirely now! Thank you so much :)
Indebted to Khan academy forever!
The big picture by stating one application where this can be used
Image you have a set of data points and you are asked to predict a particular value based on x or y where one of them is given.
Those points when you plot them via a scatter plot and draw an imaginary line connecting all the points you will notice that the plot is not linear but is quadratic.
During that time you will think of quadratic equation to find a solution to your estimation problem i.e. y = ax^2 + bx + c
Now to find the co-efficients i.e. a, b and c of the equation one of the ways you can use is least squares approximation method that can help you find the values.
I do recall Sal got into vector spaces and few more advance linear algebra things which might not sound easy at first. But don't get boggled down into the calculation part computers can do this easily nowadays.
I used to have this bad habit of memorizing formulas and ways of solving problem without actually intuiting where and how this is actually used. Focusing on the applications gives a different level of motivation.
This is a good preface before machine learning. The star notation is always the most optimal/best, and you can gradient descent to minimize the square error
This is super useful in solving assignments.THanks khan academy.
Best linear algebra playlist.
It would be great having links when says "I explained (whatever) in a different video" to access that explanation. In this case I wanted to know why C(A)transpose=N(Atranspose).
Thanks¡
+Sergio Prada same thing here
www.khanacademy.org/math/linear-algebra/alternate-bases/othogonal-complements/v/linear-algebra-orthogonal-complements go through this to understand how C(A)transpose=N(Atranspose).
+1
consider any vector x perpendicular to Column space of A i.e. belongs to A _|_.
Then dot product of A and x is 0, i.e. (A^T)(x) = 0
Now consider b = A^T, so clearly above equation is bx = 0, i.e. x lies in null space of b
Thus x lies in null space of A^T
also as in the first line I said x belongs to A perpendicular ,
thus C(A _|_) = null(A^T)
Awesome explanation! Keep up the good work!
god dang it I knew I should have chosen other bachelor thesis..
haha!!!!
just realizing this now as well
first semester stuff at my uni
@@rob6129 what uni u attending?
Excellent explanation of a valuable technique.
Best approach to the problem. No gradient, no multivariable calculus. you're master!
Helpful exploration of least square properties
Thank you Salman Khan. I appreciate the opportunity to relearn the method here. You can never hear this stuff enough times.
Your videos are just great !!! The concepts with geometrical examples make very good sense !!! Thanks a lot
very helpful! Thanks a lot! you are doing great things! I also listened to your other videos, all very wonderful!
Thank you so much. You just simplified long boring hours of confusing lecture
Thanks so much Khan...wonderful explanation in two videos that explains everything...great. You are wonderful
It seems I have seen the best video!
Thanks a lot, very comprehensive ! great job!
thank you very much sir
This is surprisingly easy
can we please get a video for the maximum likelihood estimation
Very useful! In my lecture slides I had this term Hx=z for the same problem and I couldn't make sense of how we could get to this as the best solution: x = (Ht*H)^-1 * Ht * z.
Now I understand:-)
Nice derivation of the normal equation
Super clarity......
thank you sir
Good video!!!! And nice work! Good luck with the KhanAcademy :)
This guy is good...........
great geometric intuition of linear regression
can you teach me cubic expressions and cubic equations :)
eg. solve the equation x(3X3X3) - 2x(2X2) - x + 2 = 0
by using the factor theorem formula :)
thanks
really helpful
당신은 나의 구원자입니다. 정말 명쾌한 강의입니다. 감사합니다!! 👍👍👍
Excelent video.
Thanks much :))))))))
Vahag
2018? Im alone :(
I'm here.
Onto 2019!
2024 here
This is the first Khan Academy video I watch and don't understand...
For that you need to study orthogonal components, and the concept of what spanning sets are which further derive the concept of column space, null space, etc.
Should have used n instead of k its usually mxn in R^n
I wish to know how to solve this: x has values of : -2 0 1 2 3 and y : 17 5 2 1 2 and i'm asked to use the least squares method, but i've been absent and i don't know exactly what my teacher ment by that or what that method consists of. Can anyone help me solve this ?
I have a question..
does least sequare approximation has always solution..
+Zulfiqar Ali not if you don't solve it.
+Conor Raypholtz it still has a universally reasonable solution
I'm pretty sure that is the idea of least squares: to provide a close answer when you can't give an exact one
it does always have one - if Ax = b has a solution than it's a vector on A and if not it's the projection on A.
There is always a solution to the least squares problem. Why? x* is in colspace(A) by definition of being a projection from b into C(A) so there must be a set of weights that yield a linear combination of a that equal b.
I tried using this trick for the problem I'm facing, but it turns out that when I multiply AT by A, I get a matrix which isn't invertible, so I still can't solve it. LOL
This _still_ seems odd to me, because even if some element in the input matrix A was contributing 0 to the result b, it should _still_ be possible to get a point as close as possible to the result.
I have one question, whether the LSS always consistent? if yes, how can I prove it? please answer
Hi, not sure if you're still looking for the answer, but could you please describe what do you mean by consistent?
It means that wheather we can always find least square solution of a system.
thaks
nice vid, but why did you take the length squared? i understand that the length of the vector would be sqrt(b1^2 + b2^2...bn^2) but why did you square even that?
utte12
Because it’s easier to work with minimizing the sum of squares than minimizing the square root of a sum of squares. That’s my guess
love this guy
what happens when AT*A is singular. How do we solve for the least square solution?
I am the 60th guy liking it !! :P :D
Great vid, thank you. :)
how did you know that it was a projection to the Col(A) and not anything else like the Range(A)?
Winnie Shi
Col(A) already is the range of A.
Big brajn
❤
🤩
accha hai
bro just do an example lol
Sometimes I can't see what he's writing.
ICAM ! ICAM ! .... .. ...... !
gorgeous
n1
Respond to this video...