11:50 Actually that transformation is *not* what the kernel trick is. The kernel is what we call the transformation, and it can be applied to any model. The kernel trick is simply an optimization trick to make that transformation faster, which is possible in the case of SVM as long as the kernel follows certain assumptions, and only dot products end up being used.
Hi Cruz.. If you don't mind, can you help me to learn SVM..I am not able to learn it from the scratch and little bit confused..Could you please help me.
Do you have a video explaining the actual update of the filters in a convolutional network ? not the backpropagation part, but the pooling and the convolution filter. Thanks.
I want to extend this implementation to incorporate kernel trick and do a complete simulation of the SVM. Can I get some guidance on how to do the kernel trick here ? Also, I am confused on how the b in wx + b equation is being handled here. Thanks in advance :) Great video btw
Hey man I don't watch your videos or do the tutorials as much as I wish I did, but I wanted to let you know how amazing of a resource you are to people who are trying to get into machine learning. It's nice knowing I have this to come back to. Keep up the good work, I love and appreciate your content
loving the math explanations behind machine learning models as well as the code implementation without using many libraries, it also helps understanding them deeper keep up the math of intelligence series please!
Great job inserting and weaving in the fundamental concepts of machine learning into the presentation. Helps integrate the concepts, purpose, and reasons behind how these mathematical components fit together and how they translate ultimately into code. Cheers!
Thanks for your generous efforts brother! Not just a great teacher but also inspirational in your enthusiasm. Keep up the great work and fuck the haters!
Great video @Siraj! I am looking forward to this series. Most of courses go very superficially through machine learning/deep learning algorithms and we just end up using APIs! Thanks! Looking forward to the next ones!
@siraj, great going bro, your tutorials are just great. just to add to it, and generalize , i did something like this: for i,data in enumerate(y): if (data
@18:35 -> If the regularization term (lambda) is too high, wouldn't the model under fit? If our regularization term is high it forces the weights to get close to zero. This introduces errors due to high bias.
For anybody who didnt have a good boundary region , try to decrease the learning rate to 0.1 with scale 0.5 and it will work perfectly fine. Love you @Siraj
much better than previous one. appreciate it. I have requested, please share the exact topic names of maths so we can prepare ourselves before watching your video. so we will understand not just 80℅ but 100℅.
For reference when doing further research, the objective function in most literature has the lambda on the 'hinge loss' term and is instead called C, the regulariser is instead the 'loss term' in this video and the term on the left is the one that maximizes the margin.
I love how Siraj shot exactly to the point without bringing up how he got there with the lagrange multiplier and what not. I always get lost in proofs :D.
Nice work. Just one correction, at 18:35 you said(also mentioned in the notebook) about the regularization term saying, if the regularizer is too large the model will overfit (correction: underfit, corresponds to high bias( low variance)), and if the regularizer term is too low then the model is going to be underfitting ( correction: overfit, corresponds to low bias( high variance). one can think of it in this way. if the λ is too high(close to 1) the model will try to minimize the L2 loss term( margin error) and will have less incentive to minimize the classification error hence high bias, while the reverse happens if the λ value is too low (close to 0).
Rudy I think w is weights... f(x) is a function that has weights as coefficients (variables). So my guess is f takes an input and you could also inject a vector (list) of weights. I think f(x) = xi*wi. I'm answering this question to try and better understand this myself so if I've got this wrong please correct me someone!
Thank you for this great video, My suggestion to do an exclusive series just on understanding Maths notation and reading equation. I know you try to explain where possible, however, I feel this topic is important enough to deserve it on series.
there is some error in the code. The svm does not seem to draw the correct decision boundary.. this is probably because the learning rate is quite high and the regularizer is equal to the inverse of the number of epochs.
1. SVM are great & are preferred only if we have less data 2. we should use SVM when we don't need more time, space etc like complex neural network 3.Points which are closest to the margin are support vectors 4. Our aim is to draw a hyperplane/decision surface with max margin/space separating differently labelled points & it is placed in perfect middle spot, with a line in the middle 5. a point belongs to a certain class, Such that it will have max likelihood, to fall on the side of the decision boundary where it should. 6. hyperplane will n-1 dimension if there are n dim/features 7. SVM can perform linear & non- linear, here, we are doing- Supervised linear classification 8. ML is about minimize an objective function , the way we optimize it is a loss/cost/error function 9. SVM use hinge loss function- c(x,y,f(x)) = (1 - y*f(x))+ where x is ample data point, y is true label, f(x) is predicted label, c is cost function. its{ 0 if if y*f(x) =1, then we dont need to update loss, w=w+ eta0(-2lambda.w)else we'll update regularizer & loss function both w=w+ eta0(yi*xi -2lambda.w) 14. regularizer = 1/epoch, hence lambda decreases when epoch increases.
1>hey man I can't get this regularizer concept, is it 1/epochs always or just in this case and what other types of them are there. 2>why did we just change weight with gradient if y
Is there any video that explains implementation of Weighted Loss Support Vector Machine ALSVM, preferably for Regression tasks, or a clear and simple github respo?
When I was learning SVM, the objective is to minimize ||w||^2, not as a regularizer. Since the very intuitive idea of SVM is to find the hyperplane with the largest margin, a small ||w|| would indicate that in math and geometry. But here you say it is only a regularizer, I found this not convincing. Does it compare to the sklearn-SVM?
What kind of program is Siraj using for scrolling at the background, while presenting at the front? I use a Mac OSX. Is there any program that helps to scroll presentations?
He uses a green background, records his own video with a camera at the front and the screen of his macbook while presenting, then when he edits the video, he replaces the green background with the part he recorded on his macbook.
Just wondering, how did you get the derivative of y , it looks like you did power rule but it is a vector. Plus, for the weights, wouldn't the update function constantly lower the weight values as for misclassified yx would be negative and for classified the -2lambda w would be negative.
Your videos are amazing! Having said that... Some feedback on this one: - The formulas get overwhelming fairly quickly when you're not used to reading them in that format. I know it's tricky to get all the information into these short videos. Maybe you could make a video about the different notations that pop up everywhere? The brackets that look similar to were completely new to me, for example. - The jupyter notebook seems to have some errors in it and doesn't run without modifications. I love the notebooks so I can interact with the code and it would be great if it worked right away so I can be the one to break it :)
Seb, it doesn't have any errors, its functionality is like this, I was also stuck in this problem initially. Actually, you need to run the modified cell and then the final cell which will give your output. Else, you could do run all which will run all the cells from starting. So , mind it that jupyter run command works on the current cell, not on whole notebook
Why does the (1-y*f(x)) in the loss function become (1-yi) in the objective function? Is supposed to be a vector where xi is the feature and w is the predicted label? thanks siraj
Am I right: in svm loss we care only about misclassified points and points which are classified correctly but not very confidently because is smaller than 1?
Hi Siraj, before you start to explain its math, you mentioned digit classification. If svm is for 2 class classification, how svm works at 10 class (0,1,2,...,9) ?
Hey Siraj, what do you think of Fuzzy Logic? Is this still a good field of research? I tried to lookup a good tutorial on youtube, where state of the art approaches are described, but it seems to be quite unpleasant. Would you do a video on Fuzzy Logic? I think it is quite interesting to define a hybrid approaches with neural networks. BTW: Nice work with all of your videos and thanks a lot for working so hard!!!
Siraj, great work. Very interesting. You have made the basic idea of SVM accessible to an educated audience with minimal maths expertise. In my opinion, repeating basic definitions is good but to a point. Desist a temptation to do so too often. 24:45 you didn't say what update would be if a classification is correctly done - stay the same value as the update equals to zero? Imho, not much need to show the previous slides in the conclusion of lecutre. BTW, I think anyone can download audio only and listen to it while driving and perhaps get 75% of the content easily. :)
Not to criticize in any way (every bit of this channel is ridiculously awesome) I genuinly want to know: am a right to think the notation in `w = w + η( -2λw)` is incorrect ? "=" is used as an affectation operator, which is denoted '←' in the last ML book I read. Is it standard ?
I added the test cases you have used in the input set and trained the model. After checking the graphical representation, I found few points were misclassified. What should I change in order to classify them correctly? Is this due to the fact that the difference between two closest oppoite class data point is less than one and we just cant use the comdition wx+b>1 or wx+b
Great video sir. I followed all your codes but why the graph is not showing in my computer? I thought when i type plt.show() the graph will show but it didn't. Did i miss something?
I always think of weights are the feature importance towards the label so each feature (column) in X has a certain weight towards the prediction of the label (y). Correct me if I am wrong. And for regularization I always think of this as a way to avoid overfitting or underfitting through the penalization of outliers. Remember, the point of the model is to grab the general pattern, not to make wrong assumptions through outliers which will lead to a bad accuracy score when we test the model. Again, correct me if I am wrong.
I have choosen different points and post training and determining wieghts , I am able to classify the points correctly . for i , x in enumerate(X): total = np.dot(X[i],w) print(np.sign(total)) output :- -1.0 -1.0 1.0 1.0 1.0 The SVM seems to be working correctly but how to determine the hyperplane. I was unable to understand the last bit of code where @siraj created the hyperplane using the weight coordinates. Once I get the weights w (w1,w2,w3) where w3 is the bias vector, how to plot the hyperplane
From this channel I got very concise term where I can start digging. But, I am confused about some points. Please, help me to clear these things. What I understood so far: - Main job of SVM is to find a decision line or hyperplane that is maximally far way from both group of data points To achieve above requirement SVM model hires two function which I can say two components or tools which are: - objective function and - loss function Now inside objective function there comes loss function and regularizer(Heart of svm)
Awesome video on SVM Siraj Raval!!!! I am almost new in machine learning field and it is really very helpful in terms of understanding the concept and answers questions on what and why this method?
Very good job Siraj! Although I feel that you didnt mention any details (or the intuition ) for maximazing the boundary - 'LARGE MARGIN' CASE. It appears in you diagrams but not any solid clue. Whats the difference between Linear regression and SVM truly? To me it looks like they do the same thing - even if they use different objective functions; linear separation for non discrete decisions. I am pretty sure is not the case.
Siraj, Thanks a lot for this video and quite frankly for all of your videos. I have a question, I see you update the weights after each sample, is that always the case in SVM?
Really cool video and simple code :) !! Would also love to see a little more intuition about the loss function. I only have a faint idea about it. You penalise cases where predictions and labels are on opposite side of the decision boundary ?
Hey, Im pretty sure its the most trivial part but why do you add - signs to some of the weights in x2 and x3 before plotting them? Ive been messing with the code for ages trying to figure it out and I have no idea :/
the video was great, is it me the only 1 worry about how to graph a hyperplane? I understand everything but plotting the hyperplane code, can i have explaination or some resource that i can learn?
Really enjoying your videos, but you have the regularizer backwards in the video. Shouldn't it be too small -> overfit / too large -> underfit? If you, you might want to put a note. Keep up the good work!
Tongue color is actually a diagnostic tool used in Traditional Chinese Medicine: A pale tongue body indicates Deficient Xue or Qi or Yang or Excess Cold. A overly red tongue body indicates Excess Heat. A purple tongue indicates that Qi and/or Xue are not moving harmoniously and are Stagnant. Pale purple means the Stagnation is related to Cold.
In the video at 18:35 you were talking about the regularicer term lambda, larger values of lambda will under fit the model & smaller values may over fit the model. The parameter C, used in the standard implementations = 1/lambda. If C is large it will over fit and if C is small it may over fit. was that a mistake in the video or is it a mistake in my understanding? Thanks for the video
Siraj, I tried this for a case where X has multiple attributes and I cannot plot the SVM for that as there are more attributes against y variable. How do I generate a plot in this case?
Something isn't clear for me. So the matrix is the collection of vectors (which are columns), the vectors themselves being represented by the coordinates of the endpoint in n dimensions (the starting point of the vector being the origin). What did I get wrong?
this guy starts his video with a burst of energy that wakes you up and pull your attention. great technique
The positivity really shines here. As a Mech Eng student I wish my lecturers had this kind of enthusiasm.
11:50 Actually that transformation is *not* what the kernel trick is. The kernel is what we call the transformation, and it can be applied to any model. The kernel trick is simply an optimization trick to make that transformation faster, which is possible in the case of SVM as long as the kernel follows certain assumptions, and only dot products end up being used.
Hi Cruz.. If you don't mind, can you help me to learn SVM..I am not able to learn it from the scratch and little bit confused..Could you please help me.
Love it when its from scratch with no libraries ... great !!!
woot will continue
Do you have a video explaining the actual update of the filters in a convolutional network ? not the backpropagation part, but the pooling and the convolution filter. Thanks.
Go watch Andrej Karpathy
I want to extend this implementation to incorporate kernel trick and do a complete simulation of the SVM. Can I get some guidance on how to do the kernel trick here ? Also, I am confused on how the b in wx + b equation is being handled here. Thanks in advance :) Great video btw
wait what is labda ? and why it is 1/epoch ?
Hey man I don't watch your videos or do the tutorials as much as I wish I did, but I wanted to let you know how amazing of a resource you are to people who are trying to get into machine learning. It's nice knowing I have this to come back to. Keep up the good work, I love and appreciate your content
Siraj....your enthusiasm is infectious. Thank you for sharing your gift of knowledge.
Superb, excellent. You have a gift for clarity.
loving the math explanations behind machine learning models as well as the code implementation without using many libraries, it also helps understanding them deeper keep up the math of intelligence series please!
thanks will do
what is labda ? and why it is 1/epoch ?
Dude, I hope you're making money with these lessons, because you're one of the best here! Total Success!
Saraj your jupyter notebook is straight to the point, so much better than my machine learning prof. Thank you Saraj
There is no regularizer term in SVM, that l-2 norm is the margin that we are maximising. It's really bad to mislead people.
Great job inserting and weaving in the fundamental concepts of machine learning into the presentation. Helps integrate the concepts, purpose, and reasons behind how these mathematical components fit together and how they translate ultimately into code. Cheers!
Thanks for your generous efforts brother! Not just a great teacher but also inspirational in your enthusiasm. Keep up the great work and fuck the haters!
正是我最需要的简单模型推导,thumbs up from China。
Very precise and straight to the point. I am using SVM to classify Petroleum Emulsions.
Great video @Siraj! I am looking forward to this series. Most of courses go very superficially through machine learning/deep learning algorithms and we just end up using APIs! Thanks! Looking forward to the next ones!
These videos are essential watching, not just for programmers, but anyone who cares about intelligence.
thanks!
Hi from Ukraine. This series of videos is so useful for me. Thank you so much.
awesome thx
@siraj, great going bro, your tutorials are just great.
just to add to it, and generalize , i did something like this:
for i,data in enumerate(y):
if (data
@18:35
-> If the regularization term (lambda) is too high, wouldn't the model under fit? If our regularization term is high it forces the weights to get close to zero. This introduces errors due to high bias.
For anybody who didnt have a good boundary region , try to decrease the learning rate to 0.1 with scale 0.5 and it will work perfectly fine. Love you @Siraj
Thanks Siraj you're such a great teacher! Greetings from Paraguay
much better than previous one. appreciate it. I have requested, please share the exact topic names of maths so we can prepare ourselves before watching your video. so we will understand not just 80℅ but 100℅.
For reference when doing further research, the objective function in most literature has the lambda on the 'hinge loss' term and is instead called C, the regulariser is instead the 'loss term' in this video and the term on the left is the one that maximizes the margin.
I love how Siraj shot exactly to the point without bringing up how he got there with the lagrange multiplier and what not. I always get lost in proofs :D.
Nice work. Just one correction, at 18:35 you said(also mentioned in the notebook) about the regularization term saying, if the regularizer is too large the model will overfit (correction: underfit, corresponds to high bias( low variance)), and if the regularizer term is too low then the model is going to be underfitting ( correction: overfit, corresponds to low bias( high variance).
one can think of it in this way. if the λ is too high(close to 1) the model will try to minimize the L2 loss term( margin error) and will have less incentive to minimize the classification error hence high bias, while the reverse happens if the λ value is too low (close to 0).
that hair of yours needs a support vector machine to keep it up.
HAHAHAHAHAAHAHAH
Can someone explain more about the regularizer and the loss function thing... , How are these used to adjust the w
i will talk more about both terms in details in the next weeks thx
Yes, I'm losing it, when there's suddenly w come in (?!)
Rudy I think w is weights... f(x) is a function that has weights as coefficients (variables). So my guess is f takes an input and you could also inject a vector (list) of weights. I think f(x) = xi*wi.
I'm answering this question to try and better understand this myself so if I've got this wrong please correct me someone!
Thank you for this great video, My suggestion to do an exclusive series just on understanding Maths notation and reading equation. I know you try to explain where possible, however, I feel this topic is important enough to deserve it on series.
Keep doing what you are doing! You are help me a lot.
there is some error in the code. The svm does not seem to draw the correct decision boundary..
this is probably because the learning rate is quite high and the regularizer is equal to the inverse of the number of epochs.
1. SVM are great & are preferred only if we have less data
2. we should use SVM when we don't need more time, space etc like complex neural network
3.Points which are closest to the margin are support vectors
4. Our aim is to draw a hyperplane/decision surface with max margin/space separating differently labelled points & it is placed in perfect middle spot, with a line in the middle
5. a point belongs to a certain class, Such that it will have max likelihood, to fall on the side of the decision boundary where it should.
6. hyperplane will n-1 dimension if there are n dim/features
7. SVM can perform linear & non- linear, here, we are doing- Supervised linear classification
8. ML is about minimize an objective function , the way we optimize it is a loss/cost/error function
9. SVM use hinge loss function- c(x,y,f(x)) = (1 - y*f(x))+ where x is ample data point, y is true label, f(x) is predicted label, c is cost function. its{ 0 if if y*f(x) =1, then we dont need to update loss, w=w+ eta0(-2lambda.w)else we'll update regularizer & loss function both w=w+ eta0(yi*xi -2lambda.w)
14. regularizer = 1/epoch, hence lambda decreases when epoch increases.
Thanks, it's always good to learn a white box approach
This is explained 100x better than from my ML Prof. Why do I pay so much money for it :( Looking forward to more videos!
Which college ? If you don't me asking.
Not at all. University of Bristol, UK. Paying for a piece of paper - IMO.
Isn't Bristol supposed to be a good? I'm from America
Apparently so. The prof is well respected in his field but he seemed very disinterested when teaching his undergraduates.
Same here, I learnt more in a 30 minute video than a 4 month course..............
WOW, thanks for this video, much better than some of your other videos where there's a lot of drama and distraction(rapping etc.)
1>hey man I can't get this regularizer concept, is it 1/epochs always or just in this case and what other types of them are there.
2>why did we just change weight with gradient if y
I had the same doubt as your point 2.
Did you get a answer to your question? Could you share that?
Is there any video that explains implementation of Weighted Loss Support Vector Machine ALSVM, preferably for Regression tasks, or a clear and simple github respo?
This was borderline perfection
When I was learning SVM, the objective is to minimize ||w||^2, not as a regularizer. Since the very intuitive idea of SVM is to find the hyperplane with the largest margin, a small ||w|| would indicate that in math and geometry. But here you say it is only a regularizer, I found this not convincing. Does it compare to the sklearn-SVM?
Can someone please explain how he plotted the "hyperplane"? what is x2x3 and X,Y,U,V?
What kind of program is Siraj using for scrolling at the background, while presenting at the front?
I use a Mac OSX. Is there any program that helps to scroll presentations?
He uses a green background, records his own video with a camera at the front and the screen of his macbook while presenting, then when he edits the video, he replaces the green background with the part he recorded on his macbook.
Siraj, this is really good material. I look forward to continuing watching the rest of your videos!!
Hyperplane explanation was awesome
Just wondering, how did you get the derivative of y , it looks like you did power rule but it is a vector. Plus, for the weights, wouldn't the update function constantly lower the weight values as for misclassified yx would be negative and for classified the -2lambda w would be negative.
Very good but Siraj is a kind of a gift to humanity.
So great! Keep enlightening us, pls!
Thanks for re-recording it!
Your videos are amazing! Having said that... Some feedback on this one:
- The formulas get overwhelming fairly quickly when you're not used to reading them in that format. I know it's tricky to get all the information into these short videos. Maybe you could make a video about the different notations that pop up everywhere? The brackets that look similar to were completely new to me, for example.
- The jupyter notebook seems to have some errors in it and doesn't run without modifications. I love the notebooks so I can interact with the code and it would be great if it worked right away so I can be the one to break it :)
Seb, it doesn't have any errors, its functionality is like this, I was also stuck in this problem initially. Actually, you need to run the modified cell and then the final cell which will give your output. Else, you could do run all which will run all the cells from starting. So , mind it that jupyter run command works on the current cell, not on whole notebook
Why does the (1-y*f(x)) in the loss function become (1-yi) in the objective function? Is supposed to be a vector where xi is the feature and w is the predicted label? thanks siraj
Would you be able to provide the PDF of the document you are working off in the video?
Am I right: in svm loss we care only about misclassified points and points which are classified correctly but not very confidently because is smaller than 1?
Great explanation buddy, just what I was looking for!
Can you include bias as well in the above code and teach us on how to apply gradient to optimize it
Hi Siraj, before you start to explain its math, you mentioned digit classification. If svm is for 2 class classification, how svm works at 10 class (0,1,2,...,9) ?
In the Jupyter Notebook code file from Github, which is provided in the description box, the entire (For loop) of Training is missing.
just fixed, thanks for telling me
Thank you.
Hey Siraj, what do you think of Fuzzy Logic? Is this still a good field of research? I tried to lookup a good tutorial on youtube, where state of the art approaches are described, but it seems to be quite unpleasant. Would you do a video on Fuzzy Logic? I think it is quite interesting to define a hybrid approaches with neural networks.
BTW: Nice work with all of your videos and thanks a lot for working so hard!!!
Siraj, great work. Very interesting. You have made the basic idea of SVM accessible to an educated audience with minimal maths expertise. In my opinion, repeating basic definitions is good but to a point. Desist a temptation to do so too often. 24:45 you didn't say what update would be if a classification is correctly done - stay the same value as the update equals to zero? Imho, not much need to show the previous slides in the conclusion of lecutre. BTW,
I think anyone can download audio only and listen to it while driving and perhaps get 75% of the content easily. :)
Not to criticize in any way (every bit of this channel is ridiculously awesome) I genuinly want to know: am a right to think the notation in `w = w + η( -2λw)` is incorrect ? "=" is used as an affectation operator, which is denoted '←' in the last ML book I read. Is it standard ?
I added the test cases you have used in the input set and trained the model. After checking the graphical representation, I found few points were misclassified. What should I change in order to classify them correctly? Is this due to the fact that the difference between two closest oppoite class data point is less than one and we just cant use the comdition wx+b>1 or wx+b
Great video sir. I followed all your codes but why the graph is not showing in my computer? I thought when i type plt.show() the graph will show but it didn't. Did i miss something?
I always think of weights are the feature importance towards the label so each feature (column) in X has a certain weight towards the prediction of the label (y). Correct me if I am wrong. And for regularization I always think of this as a way to avoid overfitting or underfitting through the penalization of outliers. Remember, the point of the model is to grab the general pattern, not to make wrong assumptions through outliers which will lead to a bad accuracy score when we test the model. Again, correct me if I am wrong.
Hi Siraj, I'm watching this over and over trying to keep up with the fast pace. I just wondered is the regularizer synonymous with a learning rate?
I have choosen different points and post training and determining wieghts , I am able to classify the points correctly .
for i , x in enumerate(X):
total = np.dot(X[i],w)
print(np.sign(total))
output :-
-1.0
-1.0
1.0
1.0
1.0
The SVM seems to be working correctly but how to determine the hyperplane. I was unable to understand the last bit of code where @siraj created the hyperplane using the weight coordinates. Once I get the weights w (w1,w2,w3) where w3 is the bias vector, how to plot the hyperplane
From this channel I got very concise term where I can start digging. But, I am confused about some points. Please, help me to clear these things.
What I understood so far:
- Main job of SVM is to find a decision line or hyperplane that is maximally far way from both group of data points
To achieve above requirement SVM model hires two function which I can say two components or tools which are:
- objective function and
- loss function
Now inside objective function there comes loss function and regularizer(Heart of svm)
Awesome video on SVM Siraj Raval!!!! I am almost new in machine learning field and it is really very helpful in terms of understanding the concept and answers questions on what and why this method?
Can someone explain me the last part of the code i.e. plotting the hyper plane from the final weights calculated by gradient descent
Great lesson, very helpful. Thanks a lot
Very good job Siraj! Although I feel that you didnt mention any details (or the intuition ) for maximazing the boundary - 'LARGE MARGIN' CASE. It appears in you diagrams but not any solid clue. Whats the difference between Linear regression and SVM truly? To me it looks like they do the same thing - even if they use different objective functions; linear separation for non discrete decisions. I am pretty sure is not the case.
Can you explain the function of the bias term in the array?
Siraj, Thanks a lot for this video and quite frankly for all of your videos. I have a question, I see you update the weights after each sample, is that always the case in SVM?
You have such a great energy!!
Can you pls briefly explain about Kernel in SVM
still kind of confused about how you plot the svm model (the line)in the end :(..
Really cool video and simple code :) !! Would also love to see a little more intuition about the loss function. I only have a faint idea about it. You penalise cases where predictions and labels are on opposite side of the decision boundary ?
Hell yeah! Another great video Siraj! U forgot to talk about dat kernel trick doe!
truuu thx
Hey Siraj the video was awesome and cool learned a lot ..Why can't you provide a link to the notebook with which you are teaching us!!!
You da man Siraj!
Hey, Im pretty sure its the most trivial part but why do you add - signs to some of the weights in x2 and x3 before plotting them? Ive been messing with the code for ages trying to figure it out and I have no idea :/
Great videos, Thanks man
i have a question at the objective function..
how
lambda*W^2 + (1-y) , after differentiation
they become
y*x - 2*lambda*W ?
Hi Siraj,
I love your videos and have learnt a lot from them.
I am sure you are high on some sativa in this video :)
Peace from a fellow stoner.
the video was great, is it me the only 1 worry about how to graph a hyperplane? I understand everything but plotting the hyperplane code, can i have explaination or some resource that i can learn?
Can anyone tell me what "k" is in the gradient decent equation?
Really enjoying your videos, but you have the regularizer backwards in the video. Shouldn't it be too small -> overfit / too large -> underfit? If you, you might want to put a note. Keep up the good work!
"Tongue color... where did that come from?" ... Salute Your Shorts
Tongue color is actually a diagnostic tool used in Traditional Chinese Medicine:
A pale tongue body indicates Deficient Xue or Qi or Yang or Excess Cold. A overly red tongue body indicates Excess Heat. A purple tongue indicates that Qi and/or Xue are not moving harmoniously and are Stagnant. Pale purple means the Stagnation is related to Cold.
In the video at 18:35 you were talking about the regularicer term lambda, larger values of
lambda will under fit the model & smaller values may over fit the model.
The parameter C, used in the standard implementations = 1/lambda. If C is large it will over fit and if C is small it may over fit.
was that a mistake in the video or is it a mistake in my understanding?
Thanks for the video
New mic? Awesome!
Tongue colour? Ed, Edd and Eddy nostalgia right there :P
Total Amazeballs. This is so much better explained than any text book I have read. Thanks Siraj!
Daaaahude. Great video. But wow you talk fast. I dig it, I have to rewind a bunch if I look away for even a moment
Why does this boy always have the most cool shirts!
tienes material acerca de LS SVM?
Siraj, I tried this for a case where X has multiple attributes and I cannot plot the SVM for that as there are more attributes against y variable. How do I generate a plot in this case?
Great video man!
Wow, great and appreciated work. The other video was good, but it's true that was below the standards of this channel haha. Keep pushing!
thanks Alberto!
Awesome 👍👍👍
Great Explanation!
Hello...nice video, do u work on satellite imagery data...
Is their a printable version of the documentation?
Something isn't clear for me. So the matrix is the collection of vectors (which are columns), the vectors themselves being represented by the coordinates of the endpoint in n dimensions (the starting point of the vector being the origin). What did I get wrong?
Siraj Raval can you suggest some books in order to follow your videos please?