This guy is super smart and he takes sophisticated concepts and explains it in a way where it's digestible without mocking the theory! What a great teacher!
I can't explain how grateful I am for your channel! I am doing an introductory machine learning course at Uni and it's extremely challenging as it's full of complex concepts and the basics aren't explored throughly. Many videos I came across on youtube were too overly simplified and only helped me very briefly to make sense of my course. However, your videos offer the perfect balance, you explore the complex maths and don't oversimplify it, but do so in a way that's easy to understand. I read through this concept several times before watching your video, but only now do I feel as if I TRULY understand it. I HIGHLY appreciate the work you do and look forward to supporting your channel.
This has been simultaneously the simplest, most detailed and yet most concise explanation of this topic I've come across so far. Much appreciated! I hope you keep making awesome content!
@@ritvikmath Is it possible to find w and b if you are not explicitly given constraints? Is it possible to find the values of w and b without explicitly solving the optimization problem? Can both be done through geometric intuition?
In case you're also having trouble figuring out how we arrive at k=1/||w|| from k * (w*w/||w||) = 1: remember that the dot product of any vector with itself is equal to its squared magnitude. Then, w*w can also be expressed as ||w||^2. ||w||^2/||w|| simplifies to just ||w||. Finally bring ||w|| to the other side by dividing the whole equation by ||w||, and you're done :) if you also have trouble understanding why exactly the dot product of any vector with itself is equal to its squared magnitude it also helps to know that the magnitude of a vector is the square root of the sum of squares of its components and that sqrt(x) * sqrt(x) = x I hope that somehow makes sense if you're struggling, surely took me a while to get that lol
I'm a PhD student studying data mining and I just wanted commend you for this SUPERB explanation. I can't thank you enough for the explaining this so clearly. Keep up the excellent work!!
Just to add onto all the love, I'm a data scientist in marketing and you are my number one channel for reviewing concepts. You are a very talented individual!
This is the best and the most intuitive explanation for SVM. It is really hard for me to actually read research papers and understand what story each line of the equation is telling. But you made it soo intuitive. Thanks a ton! Please Please make more videos like this
thanks! I did indeed kind of skip a step. The missing step is that the dot product of a vector with itself is the square of the magnitude of the vector. ie. w · w = ||w||^2
I love your channel. You explain difficult concepts that could be explained to my dear grandmother who never went to college. Excellent job sir! You should become a professor one day. You would be good.
Great video as usual! A possible side note - I find 3d picture even more intuitive. Adding z-direction which is basically can be shrunk to [-1;1] is our class prediction dimension and x1 x2 are feature dimensions. Hence, the margin hyperplane "sits" exactly on (x1; x1; 0) This is also helpful for further explanation of what SVM kernels are and why kernel alters the norms (e.g. distances) between data points, but not the data points themselves.
🌟Magnificient🌟I actually understood this loss function in by watching once. Very nice explanation of math. I saw lot of other lectures but you cant understand math without graphical visualization.
Hi, how exactly did you choose 1 and -1, the values for wx -b where x is a support vector? wx-b = 0 for x on the separating line makes sense however. Could it have other values?
thank you for your genius explanation. At 5:11, before getting the value k, the equation k * ( w * w) / (magnitude of w) = 1 contains w * w, why the output k doesn't have w in the end.
Awesome explanation I've a doubt, (might be silly) How did people come up with W.X-b=1 and W.X-b=-1?does 1, -1 in these equations tell us something? For some reason, I'm unable to get the intuition of 1,-1 in the above equations.(although i understood that they are parallel lines) Someone pls help me
Equation for points on margins are: w.x - b = 1 w.x - b = -1 That means we have fixed our margin to "2" (from -1 to +1). But our problem is to maximize the margin, so shouldn't we keep it a variable? like: w.x - b = +r w.x - b = -r where maximizing r is our goal?
Great video, with easy to follow explanation. However, you formulated the optimization problem that needs to be solved by the end of thevideo. The most ineteresting question now is how to actually solve this optimization problem. Can you give some directions on how this problem is actually solved?
Question on the notation. The image shows that the vector between the central line and decision line is w. So, I think, that w is the length of the decision boundary. But then we go on to show that the length of the decision boundary is k=1/||w||. So I'm not clear on what w (or k, for that matter) are actually representing.
I'm not sure but I think you forgot to say that in order to have margin = +-1 you should scale multiplying constants to w and b. Otherwise I don't explain how we could have distance of 1 from the middle The rest of the video is awesome, thank you very much :)
Terrific tutorial, save me 5:12 to simplify k*(W*W)/||w|| =1, W means vector w W*W = ||w||*||w||*cos 0; cos 0 == 1; Thus k*(||w||*||w||*1)/||w|| = 1; k = 1/||w|| vector x is actually a point (x0, x1, ..., xn) that on the Decision Boundary, i.e. vector x starts at the original points and ends at the D.B.
why we are multiplying unit vector of w as w is normal to the plane ? is the vector x also normal to the plane along the direction of w ? but, x is a point on that plane which in that case k will be 0. I am confused . Can you please simplify ?
On the other references they use the plus (+) sign on w x - b = 0. Why on your example this was changed to minus sign? w x - b = 0. or wx - b > 1. Hope you could answer. Thanks
Can you please do videos on normal to a plane, distance of a point from a plane and other basic aspects of linear algebra... Big fan and an early subscriber🙏🏻keep growing!
That's a good idea; I've been thinking of next videos and these linear algebra basics would be likely helpful in understanding the eventually more difficult concepts. Thanks for the input!
@@ritvikmath I'm a big fan of your content since I saw your videos on time series AR and MAs....now I'm going through the math behind ML, but given I have a business degree at my undergrad I don't have the intuition behind lot of very basic stuff hence your video series on those would be great help for people like me👍🏻Always happy to help
Hi Ritvik, you are a great teacher of stats, calculus and ML/DL! I have one question regarding the equations. Why is the decision boundary equation W.X - b = 0? Shouldn't it be W.X + b = 0. I know the derivations and procedure to find the maximal margin is not affected but I don't understand -b. Please let me know if the sign is inconsequential. If it is, why is it? Thanks!
Great Viideo!. I found your notation for x to be quite confusing. I think the small x should be x11 x12 x13 to x1p. Say GPA is xi1 and MCAT is xi2. Then the student data for these two features will be: student 1(x11,x12) student 2 (x21, x22) student 3(x31,x32)
Thanks for this wonderful video. I understand that the equation of blue dotted line (plane) is W.X+b =0 . But how can we decide the other two lines. I mean how those can be W.X+b = +1 and W.X+b = -1. And if they are, then the width is 2 right? how we can maximize it, it is fixed isnt it so? I know I am talking nonsense :) I dont have anyone else to ask this :) Thanks in advance!
@ritvikmath, why is the intercept (b) is negative? The equation of the line/plane/hyperplane should be w1x1 + w2x2 + w3x3 + b = 0, i.e. wx+b = 0 should the line equation. isn't?
@ritvikmath - Thanks for this great explanation. I have noticed other material online advises the equation for the hyperplan is w.x+b=0 rather than w.x-b=0. Can you confirm which is accurate
This guy is underrated for real. RUclips - throw him into recommendations.
I know... I recommend him all the time on Reddit.
True! He deserves way more subscription. He should prepare a booklet like statquest did but of his own. Would definitely buy it!
True!!
This guy is super smart and he takes sophisticated concepts and explains it in a way where it's digestible without mocking the theory! What a great teacher!
I can't explain how grateful I am for your channel! I am doing an introductory machine learning course at Uni and it's extremely challenging as it's full of complex concepts and the basics aren't explored throughly. Many videos I came across on youtube were too overly simplified and only helped me very briefly to make sense of my course. However, your videos offer the perfect balance, you explore the complex maths and don't oversimplify it, but do so in a way that's easy to understand. I read through this concept several times before watching your video, but only now do I feel as if I TRULY understand it. I HIGHLY appreciate the work you do and look forward to supporting your channel.
same
This has been simultaneously the simplest, most detailed and yet most concise explanation of this topic I've come across so far. Much appreciated! I hope you keep making awesome content!
Glad it was helpful!
@@ritvikmath Is it possible to find w and b if you are not explicitly given constraints?
Is it possible to find the values of w and b without explicitly solving the optimization problem?
Can both be done through geometric intuition?
In case you're also having trouble figuring out how we arrive at k=1/||w|| from k * (w*w/||w||) = 1:
remember that the dot product of any vector with itself is equal to its squared magnitude. Then, w*w can also be expressed as ||w||^2.
||w||^2/||w|| simplifies to just ||w||. Finally bring ||w|| to the other side by dividing the whole equation by ||w||, and you're done :)
if you also have trouble understanding why exactly the dot product of any vector with itself is equal to its squared magnitude it also helps to know that the magnitude of a vector is the square root of the sum of squares of its components and that sqrt(x) * sqrt(x) = x
I hope that somehow makes sense if you're struggling, surely took me a while to get that lol
I almost forget this rule, thank you brother for saving my day
yes. w*w = ||w||*||w|| * cos 0 = (||w||)^2
angle is 0 degress because multiplying the same vectors
I'm a PhD student studying data mining and I just wanted commend you for this SUPERB explanation. I can't thank you enough for the explaining this so clearly. Keep up the excellent work!!
This is the best and most comprehensible math video on hard margin SVM I have seen till date!
THE BEST EXPLANATION of SVM on RUclips! And the whole internet! THANK YOU!
Just to add onto all the love, I'm a data scientist in marketing and you are my number one channel for reviewing concepts. You are a very talented individual!
You answered all the questions I had in mind without me even asking them to you. This was an amazing walkthrough. Thank you!
That's what i've been waiting for! Thanks a lot. Great video!
Glad it was helpful!
This is the best and the most intuitive explanation for SVM. It is really hard for me to actually read research papers and understand what story each line of the equation is telling. But you made it soo intuitive. Thanks a ton! Please Please make more videos like this
Great video on SVM. Simple to understand.
The best video I've watched on SVMs! Thank you so much!!
Wow, thank you!
Another great video on SVM. As a mathematician I do appreciate your succinct yet accurate exposition not playing around with irrelevant details.
I think this might be top 5 explanations of SVM mathematics all-time. Very well done
I finally get svm after watching a lot of tutorial on RUclips. Clever explanation. Thank you
At 5:10, I don't get how you obtain K from the last simplification. Can you/someone please explain?
Btw beautiful video!
thanks! I did indeed kind of skip a step. The missing step is that the dot product of a vector with itself is the square of the magnitude of the vector. ie. w · w = ||w||^2
@@ritvikmath right, thank you!!
your videos are what allowed me to take a spring break vacation bro, saved me so much time thank you
Great to hear!
You and statquest are the perfect combination :) Thanks for all of your hardwork.
studying my masters in data science and this is a brilliant easy to understand explanation tying graphical and mathematical concepts - thank you!
Best high-level explanation of SVMs out there, huge thanks
Glad it was helpful!
Very easy to follow the concept! Thanks for this wonderful video! Looking forward to seeing next video!
I love your channel. You explain difficult concepts that could be explained to my dear grandmother who never went to college. Excellent job sir! You should become a professor one day. You would be good.
Thanks man great explaination , was trying to understand the math for 2 days , finally got it
Glad it helped!
So simple, so clear!!! Wish all the teachers are like this!
Thank you so much for this video! I am learning about SVM now and your tutorial perfectly breaks it down for me!
Great video as usual!
A possible side note - I find 3d picture even more intuitive.
Adding z-direction which is basically can be shrunk to [-1;1] is our class prediction dimension and x1 x2 are feature dimensions.
Hence, the margin hyperplane "sits" exactly on (x1; x1; 0)
This is also helpful for further explanation of what SVM kernels are and why kernel alters the norms (e.g. distances) between data points, but not the data points themselves.
This is very clearly defined. Thank you.
But could someone explain to me what w is? How can I visualize it and calculate it.
Great video ! Why we can assume that right hand side of wx - b in those three lines is 1, 0, -1 ?
very informative and helpful video to help understand the SVM! Thanks for such a great video! You deserve more subscribers
🌟Magnificient🌟I actually understood this loss function in by watching once. Very nice explanation of math. I saw lot of other lectures but you cant understand math without graphical visualization.
It's so easy to understand thi s math stuff! Best explanation ever in such a short video.
Hi, how exactly did you choose 1 and -1, the values for wx -b where x is a support vector? wx-b = 0 for x on the separating line makes sense however. Could it have other values?
this guy explained what my professors couldn't explain in 2 hours 😂😂😂
Just Amazing Clarity of Topics!!
thank you for your genius explanation. At 5:11, before getting the value k, the equation k * ( w * w) / (magnitude of w) = 1 contains w * w, why the output k doesn't have w in the end.
This is a serious good stuff video. I have not seen a better svm explanation
Awesome explanation
I've a doubt, (might be silly) How did people come up with W.X-b=1 and W.X-b=-1?does 1, -1 in these equations tell us something? For some reason, I'm unable to get the intuition of 1,-1 in the above equations.(although i understood that they are parallel lines)
Someone pls help me
I have the same question.
maybe an assumption so we say that the margin is the magnitude of w so easily interpreted? i dont know really
Could you do the math behind each Machine learning algorithm, also would you be doing Neural Networks in the future?
along with the assumptions of supervised and un-supervised ML algorithms that deals specifically with structured data.
Yup neural nets are coming up
@@ritvikmath CNN's and Super Resolution PLEASE PLEASE PLEASE
This is giving "Jacked Kal Penn clearly explains spicy math" and | am HERE for it
Equation for points on margins are:
w.x - b = 1
w.x - b = -1
That means we have fixed our margin to "2" (from -1 to +1). But our problem is to maximize the margin, so shouldn't we keep it a variable? like:
w.x - b = +r
w.x - b = -r
where maximizing r is our goal?
Have you figured it out?
Thank you so much. This is what i have been looking for so long time. would you please do the behind other ML and DL algorithms.
Once again, ritvikmath being a lifesaver for me. If I understand the underlying math behind this concepts, it is because of him
Best video on large margin classifiers 👍
You are an amazing elucidator👍
Such a clear explanation! Thank you!!!
Bro, you're a superhero
Hi ritvik! I wonder what is the geometric intuition of the vector w? We want to minimize ||w||, but what does w look like on the graph?
Thank you for this video. Thanks for simplifying SVM.
Great video, with easy to follow explanation. However, you formulated the optimization problem that needs to be solved by the end of thevideo. The most ineteresting question now is how to actually solve this optimization problem. Can you give some directions on how this problem is actually solved?
Holy shit what a banger of a video this is
Question on the notation.
The image shows that the vector between the central line and decision line is w. So, I think, that w is the length of the decision boundary. But then we go on to show that the length of the decision boundary is k=1/||w||. So I'm not clear on what w (or k, for that matter) are actually representing.
I too expected k to equal the length of that vector w :-/
Great Work! Just one confusion; why minus b? Your response would be highly appreciated!
Excellent explanation Ritvik
I'm not sure but I think you forgot to say that in order to have margin = +-1 you should scale multiplying constants to w and b. Otherwise I don't explain how we could have distance of 1 from the middle
The rest of the video is awesome, thank you very much :)
What an amazing video bro. Keep going.
You should mention that your W is an arbitrary direction vector of the hyperplane. (it is not the same size as the margin)
Absolutely amazing channel! You're a great teacher
Terrific tutorial, save me
5:12 to simplify k*(W*W)/||w|| =1, W means vector w
W*W = ||w||*||w||*cos 0; cos 0 == 1; Thus k*(||w||*||w||*1)/||w|| = 1; k = 1/||w||
vector x is actually a point (x0, x1, ..., xn) that on the Decision Boundary, i.e. vector x starts at the original points and ends at the D.B.
why we are multiplying unit vector of w as w is normal to the plane ? is the vector x also normal to the plane along the direction of w ? but, x is a point on that plane which in that case k will be 0. I am confused . Can you please simplify ?
Thanks for such brilliant explanation really appreciate your work!!
great, concise explanation !
On the other references they use the plus (+) sign on w x - b = 0. Why on your example this was changed to minus sign? w x - b = 0. or wx - b > 1. Hope you could answer. Thanks
Can you please do videos on normal to a plane, distance of a point from a plane and other basic aspects of linear algebra...
Big fan and an early subscriber🙏🏻keep growing!
That's a good idea; I've been thinking of next videos and these linear algebra basics would be likely helpful in understanding the eventually more difficult concepts. Thanks for the input!
@@ritvikmath I'm a big fan of your content since I saw your videos on time series AR and MAs....now I'm going through the math behind ML, but given I have a business degree at my undergrad I don't have the intuition behind lot of very basic stuff hence your video series on those would be great help for people like me👍🏻Always happy to help
bro is a savior
You explained this topic really well and helped me a lot! Great work!
This really helped me learn the math of svm thanks !!
you are my savior
Youre so unbelieveble good in explaining :)
Pls also make one for svm regression.. you are amazing
I am very happy that I found Your YT Channel Awsome Videos I was unable to Understand SVM UntilNow !!!!
very informative and intuitive
Hi Ritvik, you are a great teacher of stats, calculus and ML/DL!
I have one question regarding the equations. Why is the decision boundary equation W.X - b = 0? Shouldn't it be W.X + b = 0. I know the derivations and procedure to find the maximal margin is not affected but I don't understand -b. Please let me know if the sign is inconsequential. If it is, why is it? Thanks!
Great explanation!
I just want to know how is vector 'w' perpendicular to the plane?
Dot product.
very helpful! I always wanted to learn math behind the model! thanks!
You explained this topic perfectly! Amazing!
Glad you think so!
it is a great video to understand svm.
but the equation for hard margin W * X + B >= 1 (is it + or -). In video we are saying it is -
you are the smartest person I know
Great Viideo!. I found your notation for x to be quite confusing. I think the small x should be x11 x12 x13 to x1p. Say GPA is xi1 and MCAT is xi2. Then the student data for these two features will be: student 1(x11,x12) student 2 (x21, x22) student 3(x31,x32)
Hey Ritvik, Nice video, can you please cover the kernalization part too.
Thanks for this wonderful video.
I understand that the equation of blue dotted line (plane) is W.X+b =0 .
But how can we decide the other two lines. I mean how those can be W.X+b = +1 and W.X+b = -1.
And if they are, then the width is 2 right? how we can maximize it, it is fixed isnt it so?
I know I am talking nonsense :) I dont have anyone else to ask this :)
Thanks in advance!
Amazing explanation!
It is unclear how you derived the equations of planes. For instance, why it is w.x-b=1 and not w.x-b=2?
maybe the question is, what algorithm svm uses to look for the weight or coefficients of hyperplane?
Amazing teaching skills - Thanks, a lot!
Hi all, at 5:14, how does he get from k (W.W/|| W ||) =1 to k = 1/|| W ||?
Appreciate if anyone can enlighten me
|| W || = [W.W]^{1/2} so, square everything to get rid of the square root in the denominator and there you have it.
Loved it!
How will the algorithm classify if an arbitrary observation lies within the hyperplane
@ritvikmath, why is the intercept (b) is negative? The equation of the line/plane/hyperplane should be w1x1 + w2x2 + w3x3 + b = 0, i.e. wx+b = 0 should the line equation. isn't?
did you able to understand it ? I am still confused for that negative b.
Thank you Sir . You really simplified the concept. I have subscribed already waiting patiently for more videos 😊
phenomenal
Great explanation!
@ritvikmath - Thanks for this great explanation. I have noticed other material online advises the equation for the hyperplan is w.x+b=0 rather than w.x-b=0. Can you confirm which is accurate
Easily Explained 👍,
Can you also explain how does SVM works with respect to regression problems?
Thank you! I am wodering why do we use "+1 and -1" instead of "+1 and 0" to classify these two areas?
BRILLIANT!
woww what an explanation..........great
Glad you liked it
Eagerly waiting for your video on SVM Soft margin :D
How do I choose the values for w vector and b ??
you might want to search 'lagrange multipliers' for solving this problem
and maybe this will also help: web.mit.edu/6.034/wwwbob/svm-notes-long-08.pdf
Thanks for your inputs Andrey !!
Smart! This is the easiest way to come up with the margin when given theta (or weight)... gosh..