In case you're also having trouble figuring out how we arrive at k=1/||w|| from k * (w*w/||w||) = 1: remember that the dot product of any vector with itself is equal to its squared magnitude. Then, w*w can also be expressed as ||w||^2. ||w||^2/||w|| simplifies to just ||w||. Finally bring ||w|| to the other side by dividing the whole equation by ||w||, and you're done :) if you also have trouble understanding why exactly the dot product of any vector with itself is equal to its squared magnitude it also helps to know that the magnitude of a vector is the square root of the sum of squares of its components and that sqrt(x) * sqrt(x) = x I hope that somehow makes sense if you're struggling, surely took me a while to get that lol
This guy is super smart and he takes sophisticated concepts and explains it in a way where it's digestible without mocking the theory! What a great teacher!
I can't explain how grateful I am for your channel! I am doing an introductory machine learning course at Uni and it's extremely challenging as it's full of complex concepts and the basics aren't explored throughly. Many videos I came across on youtube were too overly simplified and only helped me very briefly to make sense of my course. However, your videos offer the perfect balance, you explore the complex maths and don't oversimplify it, but do so in a way that's easy to understand. I read through this concept several times before watching your video, but only now do I feel as if I TRULY understand it. I HIGHLY appreciate the work you do and look forward to supporting your channel.
This has been simultaneously the simplest, most detailed and yet most concise explanation of this topic I've come across so far. Much appreciated! I hope you keep making awesome content!
@@ritvikmath Is it possible to find w and b if you are not explicitly given constraints? Is it possible to find the values of w and b without explicitly solving the optimization problem? Can both be done through geometric intuition?
I'm a PhD student studying data mining and I just wanted commend you for this SUPERB explanation. I can't thank you enough for the explaining this so clearly. Keep up the excellent work!!
Just to add onto all the love, I'm a data scientist in marketing and you are my number one channel for reviewing concepts. You are a very talented individual!
Question on the notation. The image shows that the vector between the central line and decision line is w. So, I think, that w is the length of the decision boundary. But then we go on to show that the length of the decision boundary is k=1/||w||. So I'm not clear on what w (or k, for that matter) are actually representing.
This is the best and the most intuitive explanation for SVM. It is really hard for me to actually read research papers and understand what story each line of the equation is telling. But you made it soo intuitive. Thanks a ton! Please Please make more videos like this
Dude thank you! now these equations don't feel like they were pulled out of thin air. and the best part is I can work them out too! I haven't done linear algebra in almost a decade so I got stuck on the ||w||/(w*w) part for a good bit but this pushed me to refresh some concepts and figure it out! Thank you
I love your channel. You explain difficult concepts that could be explained to my dear grandmother who never went to college. Excellent job sir! You should become a professor one day. You would be good.
🌟Magnificient🌟I actually understood this loss function in by watching once. Very nice explanation of math. I saw lot of other lectures but you cant understand math without graphical visualization.
Equation for points on margins are: w.x - b = 1 w.x - b = -1 That means we have fixed our margin to "2" (from -1 to +1). But our problem is to maximize the margin, so shouldn't we keep it a variable? like: w.x - b = +r w.x - b = -r where maximizing r is our goal?
Great video as usual! A possible side note - I find 3d picture even more intuitive. Adding z-direction which is basically can be shrunk to [-1;1] is our class prediction dimension and x1 x2 are feature dimensions. Hence, the margin hyperplane "sits" exactly on (x1; x1; 0) This is also helpful for further explanation of what SVM kernels are and why kernel alters the norms (e.g. distances) between data points, but not the data points themselves.
thank you for your genius explanation. At 5:11, before getting the value k, the equation k * ( w * w) / (magnitude of w) = 1 contains w * w, why the output k doesn't have w in the end.
I'm not sure but I think you forgot to say that in order to have margin = +-1 you should scale multiplying constants to w and b. Otherwise I don't explain how we could have distance of 1 from the middle The rest of the video is awesome, thank you very much :)
thanks! I did indeed kind of skip a step. The missing step is that the dot product of a vector with itself is the square of the magnitude of the vector. ie. w · w = ||w||^2
Great Viideo!. I found your notation for x to be quite confusing. I think the small x should be x11 x12 x13 to x1p. Say GPA is xi1 and MCAT is xi2. Then the student data for these two features will be: student 1(x11,x12) student 2 (x21, x22) student 3(x31,x32)
Terrific tutorial, save me 5:12 to simplify k*(W*W)/||w|| =1, W means vector w W*W = ||w||*||w||*cos 0; cos 0 == 1; Thus k*(||w||*||w||*1)/||w|| = 1; k = 1/||w|| vector x is actually a point (x0, x1, ..., xn) that on the Decision Boundary, i.e. vector x starts at the original points and ends at the D.B.
why we are multiplying unit vector of w as w is normal to the plane ? is the vector x also normal to the plane along the direction of w ? but, x is a point on that plane which in that case k will be 0. I am confused . Can you please simplify ?
Hi Ritvik, you are a great teacher of stats, calculus and ML/DL! I have one question regarding the equations. Why is the decision boundary equation W.X - b = 0? Shouldn't it be W.X + b = 0. I know the derivations and procedure to find the maximal margin is not affected but I don't understand -b. Please let me know if the sign is inconsequential. If it is, why is it? Thanks!
Great video! Question: I've seen other resources/videos online that use this equation: w * x + b = 0 for the classifier. Is there a particular reason why it's w * x - b = 0 here? is there any mathematical difference?
Can you please do videos on normal to a plane, distance of a point from a plane and other basic aspects of linear algebra... Big fan and an early subscriber🙏🏻keep growing!
That's a good idea; I've been thinking of next videos and these linear algebra basics would be likely helpful in understanding the eventually more difficult concepts. Thanks for the input!
@@ritvikmath I'm a big fan of your content since I saw your videos on time series AR and MAs....now I'm going through the math behind ML, but given I have a business degree at my undergrad I don't have the intuition behind lot of very basic stuff hence your video series on those would be great help for people like me👍🏻Always happy to help
On the other references they use the plus (+) sign on w x - b = 0. Why on your example this was changed to minus sign? w x - b = 0. or wx - b > 1. Hope you could answer. Thanks
Great video, with easy to follow explanation. However, you formulated the optimization problem that needs to be solved by the end of thevideo. The most ineteresting question now is how to actually solve this optimization problem. Can you give some directions on how this problem is actually solved?
Hi, how exactly did you choose 1 and -1, the values for wx -b where x is a support vector? wx-b = 0 for x on the separating line makes sense however. Could it have other values?
Awesome explanation I've a doubt, (might be silly) How did people come up with W.X-b=1 and W.X-b=-1?does 1, -1 in these equations tell us something? For some reason, I'm unable to get the intuition of 1,-1 in the above equations.(although i understood that they are parallel lines) Someone pls help me
Thanks for this wonderful video. I understand that the equation of blue dotted line (plane) is W.X+b =0 . But how can we decide the other two lines. I mean how those can be W.X+b = +1 and W.X+b = -1. And if they are, then the width is 2 right? how we can maximize it, it is fixed isnt it so? I know I am talking nonsense :) I dont have anyone else to ask this :) Thanks in advance!
@ritvikmath - Thanks for this great explanation. I have noticed other material online advises the equation for the hyperplan is w.x+b=0 rather than w.x-b=0. Can you confirm which is accurate
Amazing explanation from the theoretical to the mathematical. Please tell me how you do it? So i can self-learn myself how you are able to understand and then explain these concepts or other concepts. what resources do you use ?
This guy is underrated for real. RUclips - throw him into recommendations.
I know... I recommend him all the time on Reddit.
True! He deserves way more subscription. He should prepare a booklet like statquest did but of his own. Would definitely buy it!
True!!
In case you're also having trouble figuring out how we arrive at k=1/||w|| from k * (w*w/||w||) = 1:
remember that the dot product of any vector with itself is equal to its squared magnitude. Then, w*w can also be expressed as ||w||^2.
||w||^2/||w|| simplifies to just ||w||. Finally bring ||w|| to the other side by dividing the whole equation by ||w||, and you're done :)
if you also have trouble understanding why exactly the dot product of any vector with itself is equal to its squared magnitude it also helps to know that the magnitude of a vector is the square root of the sum of squares of its components and that sqrt(x) * sqrt(x) = x
I hope that somehow makes sense if you're struggling, surely took me a while to get that lol
I almost forget this rule, thank you brother for saving my day
yes. w*w = ||w||*||w|| * cos 0 = (||w||)^2
angle is 0 degress because multiplying the same vectors
This guy is super smart and he takes sophisticated concepts and explains it in a way where it's digestible without mocking the theory! What a great teacher!
I can't explain how grateful I am for your channel! I am doing an introductory machine learning course at Uni and it's extremely challenging as it's full of complex concepts and the basics aren't explored throughly. Many videos I came across on youtube were too overly simplified and only helped me very briefly to make sense of my course. However, your videos offer the perfect balance, you explore the complex maths and don't oversimplify it, but do so in a way that's easy to understand. I read through this concept several times before watching your video, but only now do I feel as if I TRULY understand it. I HIGHLY appreciate the work you do and look forward to supporting your channel.
same
This has been simultaneously the simplest, most detailed and yet most concise explanation of this topic I've come across so far. Much appreciated! I hope you keep making awesome content!
Glad it was helpful!
@@ritvikmath Is it possible to find w and b if you are not explicitly given constraints?
Is it possible to find the values of w and b without explicitly solving the optimization problem?
Can both be done through geometric intuition?
This is the best and most comprehensible math video on hard margin SVM I have seen till date!
I'm a PhD student studying data mining and I just wanted commend you for this SUPERB explanation. I can't thank you enough for the explaining this so clearly. Keep up the excellent work!!
Just to add onto all the love, I'm a data scientist in marketing and you are my number one channel for reviewing concepts. You are a very talented individual!
THE BEST EXPLANATION of SVM on RUclips! And the whole internet! THANK YOU!
Question on the notation.
The image shows that the vector between the central line and decision line is w. So, I think, that w is the length of the decision boundary. But then we go on to show that the length of the decision boundary is k=1/||w||. So I'm not clear on what w (or k, for that matter) are actually representing.
I too expected k to equal the length of that vector w :-/
You answered all the questions I had in mind without me even asking them to you. This was an amazing walkthrough. Thank you!
I finally get svm after watching a lot of tutorial on RUclips. Clever explanation. Thank you
This is the best and the most intuitive explanation for SVM. It is really hard for me to actually read research papers and understand what story each line of the equation is telling. But you made it soo intuitive. Thanks a ton! Please Please make more videos like this
I think this might be top 5 explanations of SVM mathematics all-time. Very well done
That's what i've been waiting for! Thanks a lot. Great video!
Glad it was helpful!
Another great video on SVM. As a mathematician I do appreciate your succinct yet accurate exposition not playing around with irrelevant details.
The best video I've watched on SVMs! Thank you so much!!
Wow, thank you!
Best high-level explanation of SVMs out there, huge thanks
Glad it was helpful!
your videos are what allowed me to take a spring break vacation bro, saved me so much time thank you
Great to hear!
Dude thank you! now these equations don't feel like they were pulled out of thin air. and the best part is I can work them out too! I haven't done linear algebra in almost a decade so I got stuck on the ||w||/(w*w) part for a good bit but this pushed me to refresh some concepts and figure it out! Thank you
I love your channel. You explain difficult concepts that could be explained to my dear grandmother who never went to college. Excellent job sir! You should become a professor one day. You would be good.
studying my masters in data science and this is a brilliant easy to understand explanation tying graphical and mathematical concepts - thank you!
Great video on SVM. Simple to understand.
You and statquest are the perfect combination :) Thanks for all of your hardwork.
this guy explained what my professors couldn't explain in 2 hours 😂😂😂
Great video ! Why we can assume that right hand side of wx - b in those three lines is 1, 0, -1 ?
Once again, ritvikmath being a lifesaver for me. If I understand the underlying math behind this concepts, it is because of him
🌟Magnificient🌟I actually understood this loss function in by watching once. Very nice explanation of math. I saw lot of other lectures but you cant understand math without graphical visualization.
Equation for points on margins are:
w.x - b = 1
w.x - b = -1
That means we have fixed our margin to "2" (from -1 to +1). But our problem is to maximize the margin, so shouldn't we keep it a variable? like:
w.x - b = +r
w.x - b = -r
where maximizing r is our goal?
Have you figured it out?
Great video as usual!
A possible side note - I find 3d picture even more intuitive.
Adding z-direction which is basically can be shrunk to [-1;1] is our class prediction dimension and x1 x2 are feature dimensions.
Hence, the margin hyperplane "sits" exactly on (x1; x1; 0)
This is also helpful for further explanation of what SVM kernels are and why kernel alters the norms (e.g. distances) between data points, but not the data points themselves.
So simple, so clear!!! Wish all the teachers are like this!
Very easy to follow the concept! Thanks for this wonderful video! Looking forward to seeing next video!
Thank you so much for this video! I am learning about SVM now and your tutorial perfectly breaks it down for me!
thank you for your genius explanation. At 5:11, before getting the value k, the equation k * ( w * w) / (magnitude of w) = 1 contains w * w, why the output k doesn't have w in the end.
This is a serious good stuff video. I have not seen a better svm explanation
Thanks man great explaination , was trying to understand the math for 2 days , finally got it
Glad it helped!
This is truly great study material . thank you very much for putting this much effort.
Glad you enjoy it!
This is giving "Jacked Kal Penn clearly explains spicy math" and | am HERE for it
very informative and helpful video to help understand the SVM! Thanks for such a great video! You deserve more subscribers
It's so easy to understand thi s math stuff! Best explanation ever in such a short video.
I'm not sure but I think you forgot to say that in order to have margin = +-1 you should scale multiplying constants to w and b. Otherwise I don't explain how we could have distance of 1 from the middle
The rest of the video is awesome, thank you very much :)
YOU ARE MY SAVIORRR. GOD BLESS YOU!!!
At 5:10, I don't get how you obtain K from the last simplification. Can you/someone please explain?
Btw beautiful video!
thanks! I did indeed kind of skip a step. The missing step is that the dot product of a vector with itself is the square of the magnitude of the vector. ie. w · w = ||w||^2
@@ritvikmath right, thank you!!
Just Amazing Clarity of Topics!!
Great Viideo!. I found your notation for x to be quite confusing. I think the small x should be x11 x12 x13 to x1p. Say GPA is xi1 and MCAT is xi2. Then the student data for these two features will be: student 1(x11,x12) student 2 (x21, x22) student 3(x31,x32)
Great, thanks for this lucid explanation about the math behind SVM
Best video on large margin classifiers 👍
Thank you for this video. Thanks for simplifying SVM.
Terrific tutorial, save me
5:12 to simplify k*(W*W)/||w|| =1, W means vector w
W*W = ||w||*||w||*cos 0; cos 0 == 1; Thus k*(||w||*||w||*1)/||w|| = 1; k = 1/||w||
vector x is actually a point (x0, x1, ..., xn) that on the Decision Boundary, i.e. vector x starts at the original points and ends at the D.B.
why we are multiplying unit vector of w as w is normal to the plane ? is the vector x also normal to the plane along the direction of w ? but, x is a point on that plane which in that case k will be 0. I am confused . Can you please simplify ?
Could you do the math behind each Machine learning algorithm, also would you be doing Neural Networks in the future?
along with the assumptions of supervised and un-supervised ML algorithms that deals specifically with structured data.
Yup neural nets are coming up
@@ritvikmath CNN's and Super Resolution PLEASE PLEASE PLEASE
I am very happy that I found Your YT Channel Awsome Videos I was unable to Understand SVM UntilNow !!!!
Hi Ritvik, you are a great teacher of stats, calculus and ML/DL!
I have one question regarding the equations. Why is the decision boundary equation W.X - b = 0? Shouldn't it be W.X + b = 0. I know the derivations and procedure to find the maximal margin is not affected but I don't understand -b. Please let me know if the sign is inconsequential. If it is, why is it? Thanks!
This is very clearly defined. Thank you.
But could someone explain to me what w is? How can I visualize it and calculate it.
Thank you so much. This is what i have been looking for so long time. would you please do the behind other ML and DL algorithms.
Absolutely amazing channel! You're a great teacher
Great video! Question: I've seen other resources/videos online that use this equation: w * x + b = 0 for the classifier. Is there a particular reason why it's w * x - b = 0 here? is there any mathematical difference?
This really helped me learn the math of svm thanks !!
Can you please do videos on normal to a plane, distance of a point from a plane and other basic aspects of linear algebra...
Big fan and an early subscriber🙏🏻keep growing!
That's a good idea; I've been thinking of next videos and these linear algebra basics would be likely helpful in understanding the eventually more difficult concepts. Thanks for the input!
@@ritvikmath I'm a big fan of your content since I saw your videos on time series AR and MAs....now I'm going through the math behind ML, but given I have a business degree at my undergrad I don't have the intuition behind lot of very basic stuff hence your video series on those would be great help for people like me👍🏻Always happy to help
it is a great video to understand svm.
but the equation for hard margin W * X + B >= 1 (is it + or -). In video we are saying it is -
On the other references they use the plus (+) sign on w x - b = 0. Why on your example this was changed to minus sign? w x - b = 0. or wx - b > 1. Hope you could answer. Thanks
Great Work! Just one confusion; why minus b? Your response would be highly appreciated!
You explained this topic perfectly! Amazing!
Glad you think so!
You explained this topic really well and helped me a lot! Great work!
Excellent explanation Ritvik
What an amazing video bro. Keep going.
Such a clear explanation! Thank you!!!
You should mention that your W is an arbitrary direction vector of the hyperplane. (it is not the same size as the margin)
Great video, with easy to follow explanation. However, you formulated the optimization problem that needs to be solved by the end of thevideo. The most ineteresting question now is how to actually solve this optimization problem. Can you give some directions on how this problem is actually solved?
Hi, how exactly did you choose 1 and -1, the values for wx -b where x is a support vector? wx-b = 0 for x on the separating line makes sense however. Could it have other values?
you are the smartest person I know
I AM SO THANKFUL!!
Thanks for such brilliant explanation really appreciate your work!!
bro explained in 5 minutes the maximal margin derivation what my teacher couldn't explain in 30 min
Bro, you're a superhero
Pls also make one for svm regression.. you are amazing
Holy shit what a banger of a video this is
very informative and intuitive
Hi ritvik! I wonder what is the geometric intuition of the vector w? We want to minimize ||w||, but what does w look like on the graph?
How you chose equation of two blue parallel lines? I mean how did you get 1 in the upper line and -1 for bottom line?
You are an amazing elucidator👍
Awesome explanation
I've a doubt, (might be silly) How did people come up with W.X-b=1 and W.X-b=-1?does 1, -1 in these equations tell us something? For some reason, I'm unable to get the intuition of 1,-1 in the above equations.(although i understood that they are parallel lines)
Someone pls help me
I have the same question.
maybe an assumption so we say that the margin is the magnitude of w so easily interpreted? i dont know really
woww what an explanation..........great
Glad you liked it
maybe the question is, what algorithm svm uses to look for the weight or coefficients of hyperplane?
Easily Explained 👍,
Can you also explain how does SVM works with respect to regression problems?
you are my savior
Hey Ritvik, Nice video, can you please cover the kernalization part too.
Amazing teaching skills - Thanks, a lot!
Hi all, at 5:14, how does he get from k (W.W/|| W ||) =1 to k = 1/|| W ||?
Appreciate if anyone can enlighten me
|| W || = [W.W]^{1/2} so, square everything to get rid of the square root in the denominator and there you have it.
very helpful! I always wanted to learn math behind the model! thanks!
Eagerly waiting for your video on SVM Soft margin :D
Loved it!
bro is a savior
Thanks for this wonderful video.
I understand that the equation of blue dotted line (plane) is W.X+b =0 .
But how can we decide the other two lines. I mean how those can be W.X+b = +1 and W.X+b = -1.
And if they are, then the width is 2 right? how we can maximize it, it is fixed isnt it so?
I know I am talking nonsense :) I dont have anyone else to ask this :)
Thanks in advance!
@ritvikmath - Thanks for this great explanation. I have noticed other material online advises the equation for the hyperplan is w.x+b=0 rather than w.x-b=0. Can you confirm which is accurate
Smart! This is the easiest way to come up with the margin when given theta (or weight)... gosh..
Thank you Sir . You really simplified the concept. I have subscribed already waiting patiently for more videos 😊
Amazing explanation from the theoretical to the mathematical. Please tell me how you do it? So i can self-learn myself how you are able to understand and then explain these concepts or other concepts. what resources do you use ?
Thank you! I am wodering why do we use "+1 and -1" instead of "+1 and 0" to classify these two areas?
Nice explanation and really easy to follow!
great, concise explanation !