I know this is an old video but I just want to express my appreciation for doing this! I have suffered for an entire semester of my professor jumping into complicated math without ever explaining the reason behind this. I binged watched all your SVM video and is moving on to the Kernel one right now. This is the first time everything about SVM suddenly becomes clear and even interesting for me! Thank you so much!
Ritvik you my man are a godsend. Thank you for sharing your extremely technical expertise on youtube for free. You teach this material better than any platform (university or online).
Thank you so much for your videos. As a stats major person, I still learned so much from your channel. In many cases I learned the math in school, but no one ever talked about the intuition behind. Really appreciate your work here
@@ritvikmath No problem! I took SVM's in school, and have read about them many times, and I have only ever seen 1 of 2 levels of explanation: 1) just stating the results without question, or 2) diving deeply into measure theory and other crazy math that though sounds interesting, but I don't really have time for. This is the first source I've found that explains the gist of how the kernel trick works without diving super deeply into the math.
Thanks for these great videos. I would suggest to reorder the playlists to single topics - for example SVM, and add in the description links to the playlist or all other videos on the topic
Thank you so much for explaining why we need the dual formation. This video made me understand some of the concepts I jotted down from other videos without understanding.
I love the way you throw the marker and hope to catch it. And if it's too far, you don't snap and point at the camera. Ah. And Nice videos. It comforts my soul
Really explained well. If you want to get the theoretical concepts one could try doing the MIT micromasters. It’s rigorous and demands 10 to 15 hours a week.
Thank you for posting this informative video. You mention that alphas only need to be calculated for support vectors. That does simplify things considerably; however, how can one in practice determine which vectors are support vectors without doing the minimisation?
Thanks for another wonderful video! However, there's one thing I really want to understand: in the formulation of the hard margin SVM problem (the 'SVM Math' video) you indeed stated that we want to minimize ||w||, not 1/2 ||w||^2. where does this difference come from, and why are both approaches equivalent? can anybody shed some light on this?
Hello, regarding the efficiency of the two forms of the problem: What about the inner product gives the dual form a complexity of N^2 rather than NP? It seems that the inner product operation would have complexity O(P) since it is dependent on the number of terms in the input vectors x(i) and x(j); similarly W(T)x(i) in the primal form also has complexity O(P). So this O(P) term shows up in the same place in both forms, which would mean they are both dependent on P. Is there some other place in which only the primal form complexity scales with P? Or is complexity not even the right way to analyse this?
Thanks for the great teaching!!! Just have 1 question that why it was Max{Σα_i - 1/2(Σ.....)} after substituting w, but not Max{Σα_i + 1/2(Σ.....)}? I tried for several times and still got "+" not "-", might I ask for illustration on this calculation?
Hi man, I stuck with the same problem but we did it wrong! Resolved it finaly, first element w^T*w is reduced to 0! The secound element - sum(alpha_i * y_i * w^T*x_i +... you have to extend that w^T here then you have double sum that is not reduced to zero and here you go with this minus. Hope it helps you
11:30 but while solving the optimization, we don't know what are the support vectors. so, we need to solve the convex program for all cross terms but we'll find \lambda to be zero for most of them.
How are the support vectors determined in the first place, such that only pairs of support vectors need to be considered for the final minimisation problem?
I think you are one of the great teachers on youtube. But I think the only reason I understood this video is because I already have knowledge about Optimization, linear algebra and Multivariable Calculus. And also because I already understood dual SVM problem. To be honest, some of your videos are totally not for beginners. I think you should try to be proving the math behind the algorithms you are explaining from the first principal. Its better for beginners.
@@ritvikmath I think CrimaCode might not be fair here, because if you even come to this video for SVM's you will have atleast some knowledge about the realm of linear algebra and multivariate calculus. Both are basic things taught at most highschool levels. I think ritvik is doing a great job and video is dope for beginners.
Thank you very much for this great video. I have a question: you said that for non-support vectors the alpha values would be zero due to the fact that they have no contribution to the final solution. But here we have considered the Hard Margin version of the SVM. how about the soft margin. because in the soft margin, all the points have a contribution to the solution and we can't ignore non-support vectors. are they still 0 or not?
Hello ritvik, superb video of dual SVM. nicely explained. but I have one doubt in final equation of dual form of SVM. Where is that xj term coming from?? actually, what is that term?? if xi is our training data then what is xj??
_/\_ for this. I do have on question though, regarding non-support vectors requiring alpha(i)'s to be zeros. Intuitively, that would mean only support vector data points ( which are very few ) would contribute to optimal weights of the model, wouldn't that be bad for model weights ? i.e., only few contributions ? I'm wondering if model would not generalize well with only few contributions from few data points ?
Hello, sir I have 2 doubts: 1. Once you found the optimal values for alpha, how to determine the optimal value for b(bias)? 2. You said that for non-SVs, the alpha value is just 0, but how to determine those non-SVs in the real data set? I know those non-SVs are outside the bottom lines for -1 and 1, but in practice, for given alpha values for each data point, you can get the optimal values for w or b , then you can get the decision boundary given w and b based on that set of alpha values, so to use that decision boundary to evaluate the support vectors and non-SVs? Is that true? Thank you!
Once we get the point where we are trying to minimize the equation on the right of the screen, I don't understand how you actually do the minimization. Let's say we have 3 support vectors, so we will have a function of three variables, alpha_1, alpha_2, alpha_3. How do you minimize a multi-variable function? What does that even mean? I've only ever done minimization of one variable.
Using stationarity we can get rid of 'w' but how did we get rid of the 'b' that does not appear in the dual formulation of the problem on the top right?
I think this is due to the dL/db = Σα_i*y_i. When we plug these value we have b*Σα_i*y_i such that the term containing b becomes 0. We don't have an explicit statement that tells us that b = 0.
The only man who actually explains the concepts on the internet. God bless your soul Ritvik
🎉
I know this is an old video but I just want to express my appreciation for doing this! I have suffered for an entire semester of my professor jumping into complicated math without ever explaining the reason behind this. I binged watched all your SVM video and is moving on to the Kernel one right now. This is the first time everything about SVM suddenly becomes clear and even interesting for me! Thank you so much!
Man I swear you are one of the very few that actually understands what it takes to make these concepts understandable.
Ritvik you my man are a godsend. Thank you for sharing your extremely
technical expertise on youtube for free. You teach this material better
than any platform (university or online).
Important video for those who want to understand SVM properly ... thanks for uploading!
Glad it was helpful!
Thank you so much for your videos. As a stats major person, I still learned so much from your channel. In many cases I learned the math in school, but no one ever talked about the intuition behind. Really appreciate your work here
I have watched 6 times now, still can't wrap my head around it. Enjoy your views!!!
I soooo agree😂😂
@@Pleaseletmenamemyselfmeme we dumb dumbs
You just give the exact amount of details needed to grasp these concept ! very nice !! thx
althogh this dosen't include all the maths in depth but this is enough for most of us to understand SVM formulation very well
Along with StatQuest, these are by far the best ML videos on the RUclips. Thank you!
Wow, thanks!
@@ritvikmath No problem! I took SVM's in school, and have read about them many times, and I have only ever seen 1 of 2 levels of explanation: 1) just stating the results without question, or 2) diving deeply into measure theory and other crazy math that though sounds interesting, but I don't really have time for. This is the first source I've found that explains the gist of how the kernel trick works without diving super deeply into the math.
This guy is so underrated! period!
Amazing that SVM is derived from KKT conditions. Never noticed that until watching this.
Thanks for these great videos.
I would suggest to reorder the playlists to single topics - for example SVM,
and add in the description links to the playlist or all other videos on the topic
Thank you so much for explaining why we need the dual formation. This video made me understand some of the concepts I jotted down from other videos without understanding.
Great to hear!
I love the way you throw the marker and hope to catch it. And if it's too far, you don't snap and point at the camera. Ah. And Nice videos. It comforts my soul
hahaha, thanks for noticing something I wasn't even aware of :)
Thanks for the wonderful video!!! It's really benefit for me who don't know the Lang and Dual problem.
Fantastic video. I have been binging your content. Any chance you will make a series on stochastic calculus?
thanks for the suggestion!
Really explained well. If you want to get the theoretical concepts one could try doing the MIT micromasters. It’s rigorous and demands 10 to 15 hours a week.
Thanks for the explanation of crucial points in this topic , thank you for the effort
Glad it was helpful!
Thank you for posting this informative video. You mention that alphas only need to be calculated for support vectors. That does simplify things considerably; however, how can one in practice determine which vectors are support vectors without doing the minimisation?
Great explanation, I enjoyed it a lot. Thanks
Glad it was helpful!
Very-very clear explanation! Thanks!
Thank you so much!! Very clear and intuitive explanation!!
You are a legend my man
Hi @ritvkikmath, thank you for these videos. In what type of degree do you usually cover these subjects? I would like to enroll in one.
Please make more videos on SVM!!
Thanks a lot
more coming up :)
Thanks for another wonderful video!
However, there's one thing I really want to understand: in the formulation of the hard margin SVM problem (the 'SVM Math' video) you indeed stated that we want to minimize ||w||, not 1/2 ||w||^2. where does this difference come from, and why are both approaches equivalent? can anybody shed some light on this?
I am in the same boat.. really would like to understand that
Hello, regarding the efficiency of the two forms of the problem: What about the inner product gives the dual form a complexity of N^2 rather than NP? It seems that the inner product operation would have complexity O(P) since it is dependent on the number of terms in the input vectors x(i) and x(j); similarly W(T)x(i) in the primal form also has complexity O(P). So this O(P) term shows up in the same place in both forms, which would mean they are both dependent on P. Is there some other place in which only the primal form complexity scales with P? Or is complexity not even the right way to analyse this?
apostle of machine learning !!
Thanks for the great teaching!!! Just have 1 question that why it was Max{Σα_i - 1/2(Σ.....)} after substituting w, but not Max{Σα_i + 1/2(Σ.....)}? I tried for several times and still got "+" not "-", might I ask for illustration on this calculation?
Hi man, I stuck with the same problem but we did it wrong! Resolved it finaly, first element w^T*w is reduced to 0! The secound element - sum(alpha_i * y_i * w^T*x_i +... you have to extend that w^T here then you have double sum that is not reduced to zero and here you go with this minus. Hope it helps you
When you substitute, the first two terms are the same, except the first one in multiplied by 1/2. 0.5x - x = -0.5x.
11:30 but while solving the optimization, we don't know what are the support vectors. so, we need to solve the convex program for all cross terms but we'll find \lambda to be zero for most of them.
Keep going buddy!! Amazing work. Really helpful :) Thank You.
You are a legend really
The math this video covers is kind of sophisticated :(
How are the support vectors determined in the first place, such that only pairs of support vectors need to be considered for the final minimisation problem?
Thanks mahn! U just saved me!!
I think you are one of the great teachers on youtube. But I think the only reason I understood this video is because I already have knowledge about Optimization, linear algebra and Multivariable Calculus. And also because I already understood dual SVM problem. To be honest, some of your videos are totally not for beginners. I think you should try to be proving the math behind the algorithms you are explaining from the first principal. Its better for beginners.
thanks for the feedback! It's important for me to strike a balance between making the videos accessible to everyone and covering complex topics.
@@ritvikmath I think CrimaCode might not be fair here, because if you even come to this video for SVM's you will have atleast some knowledge about the realm of linear algebra and multivariate calculus. Both are basic things taught at most highschool levels. I think ritvik is doing a great job and video is dope for beginners.
I found the video very helpful despite not being well versed in svm before watching it. People learn in different ways :)
@@kroth5810 okay dicc
@@ritvikmath agree. May be you can try to cover basics of those basics separately for the pure beginners. But all your videos are awesome.
Bro - Please discuss on the VC dimension
Thank you very much for this great video. I have a question: you said that for non-support vectors the alpha values would be zero due to the fact that they have no contribution to the final solution. But here we have considered the Hard Margin version of the SVM. how about the soft margin. because in the soft margin, all the points have a contribution to the solution and we can't ignore non-support vectors. are they still 0 or not?
Hello ritvik,
superb video of dual SVM. nicely explained. but I have one doubt in final equation of dual form of SVM. Where is that xj term coming from?? actually, what is that term?? if xi is our training data then what is xj??
a different training instance.
At 6:30 Why do we take the alpha that MAXIMIZE the solution to the inner minimization?
_/\_ for this.
I do have on question though, regarding non-support vectors requiring alpha(i)'s to be zeros.
Intuitively, that would mean only support vector data points ( which are very few ) would contribute to optimal weights of the model, wouldn't that be bad for model weights ? i.e., only few contributions ?
I'm wondering if model would not generalize well with only few contributions from few data points ?
thanks, this was very helpful 😀😀
can you make video explain about twin support vector machine. thank's in advances
Wonderful video thank you
Hello, sir I have 2 doubts: 1. Once you found the optimal values for alpha, how to determine the optimal value for b(bias)?
2. You said that for non-SVs, the alpha value is just 0, but how to determine those non-SVs in the real data set? I know those non-SVs are outside the bottom lines for -1 and 1, but in practice, for given alpha values for each data point, you can get the optimal values for w or b , then you can get the decision boundary given w and b based on that set of alpha values, so to use that decision boundary to evaluate the support vectors and non-SVs? Is that true?
Thank you!
You are the boss!
Excellent videos
Glad you like them!
Once we get the point where we are trying to minimize the equation on the right of the screen, I don't understand how you actually do the minimization.
Let's say we have 3 support vectors, so we will have a function of three variables, alpha_1, alpha_2, alpha_3. How do you minimize a multi-variable function? What does that even mean? I've only ever done minimization of one variable.
great explanation
Thanks!
Cool jacket!
super clear! really helpful:)
you're brilliant
good video with clearly explain!!!!
Thanks for the video 😄😄
Using stationarity we can get rid of 'w' but how did we get rid of the 'b' that does not appear in the dual formulation of the problem on the top right?
I think this is due to the dL/db = Σα_i*y_i. When we plug these value we have b*Σα_i*y_i such that the term containing b becomes 0. We don't have an explicit statement that tells us that b = 0.
Hey when you code this out how exactly would he choose the alphas like what are the upper bounds of alpha?
You're great
The term |w| is squared only for mathematical convenience ?
yes, and makes the objective function differentiable as well.
how did you get rid of b?
Thank u...
Welcome 😊
thanks
❤
Thank you. But from scratch, it is too difficult.
Kumar?
Wonderful video thank you
Wonderful video thank you
Wonderful video thank you
Wonderful video thank you