Support Vector Machines - The Math of Intelligence (Week 1)

Siraj Raval

Просмотров 228 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 дек 2024

Комментарии • 366

@alwardafahd2887 5 лет назад ⁺¹
this guy starts his video with a burst of energy that wakes you up and pull your attention. great technique
@yoloswag6242 6 лет назад ⁺¹
The positivity really shines here. As a Mech Eng student I wish my lecturers had this kind of enthusiasm.
@rpcruz 7 лет назад ⁺²⁹
11:50 Actually that transformation is *not* what the kernel trick is. The kernel is what we call the transformation, and it can be applied to any model. The kernel trick is simply an optimization trick to make that transformation faster, which is possible in the case of SVM as long as the kernel follows certain assumptions, and only dot products end up being used.
@bhavanaz2435 5 лет назад ⁺¹
Hi Cruz.. If you don't mind, can you help me to learn SVM..I am not able to learn it from the scratch and little bit confused..Could you please help me.
@omarlopezrincon 7 лет назад ⁺¹²²
Love it when its from scratch with no libraries ... great !!!
@SirajRaval 7 лет назад ⁺¹⁴
woot will continue
@omarlopezrincon 7 лет назад ⁺¹
Do you have a video explaining the actual update of the filters in a convolutional network ? not the backpropagation part, but the pooling and the convolution filter. Thanks.
@ongjiarui6961 7 лет назад ⁺⁴
Go watch Andrej Karpathy
@ishankhaparde6686 6 лет назад ⁺¹
I want to extend this implementation to incorporate kernel trick and do a complete simulation of the SVM. Can I get some guidance on how to do the kernel trick here ? Also, I am confused on how the b in wx + b equation is being handled here. Thanks in advance :) Great video btw
@sanheensethi9054 6 лет назад ⁺¹
wait what is labda ? and why it is 1/epoch ?
@toastyboi7302 7 лет назад ⁺¹
Hey man I don't watch your videos or do the tutorials as much as I wish I did, but I wanted to let you know how amazing of a resource you are to people who are trying to get into machine learning. It's nice knowing I have this to come back to. Keep up the good work, I love and appreciate your content
@kishataylor6963 6 лет назад ⁺²
Siraj....your enthusiasm is infectious. Thank you for sharing your gift of knowledge.
@L5man 4 года назад ⁺¹
Superb, excellent. You have a gift for clarity.
@fridget8799 7 лет назад ⁺³
loving the math explanations behind machine learning models as well as the code implementation without using many libraries, it also helps understanding them deeper keep up the math of intelligence series please!
@SirajRaval 7 лет назад ⁺²
thanks will do
@sanheensethi9054 6 лет назад
what is labda ? and why it is 1/epoch ?
@HamletJunior 6 лет назад ⁺²
Dude, I hope you're making money with these lessons, because you're one of the best here! Total Success!
@sandile4764 6 лет назад
Saraj your jupyter notebook is straight to the point, so much better than my machine learning prof. Thank you Saraj
@NEERAJKUMAR-ug7bf 6 лет назад ⁺¹³
There is no regularizer term in SVM, that l-2 norm is the margin that we are maximising. It's really bad to mislead people.
@armandoanderson3536 7 лет назад
Great job inserting and weaving in the fundamental concepts of machine learning into the presentation. Helps integrate the concepts, purpose, and reasons behind how these mathematical components fit together and how they translate ultimately into code. Cheers!
@manicsurfing 6 лет назад
Thanks for your generous efforts brother! Not just a great teacher but also inspirational in your enthusiasm. Keep up the great work and fuck the haters!
@chris--tech 5 лет назад ⁺³
正是我最需要的简单模型推导，thumbs up from China。
@AbubakarAbubakarUmarHidima 7 лет назад
Very precise and straight to the point. I am using SVM to classify Petroleum Emulsions.
@giacomosarchioni1043 7 лет назад
Great video @Siraj! I am looking forward to this series. Most of courses go very superficially through machine learning/deep learning algorithms and we just end up using APIs! Thanks! Looking forward to the next ones!
@Lunsterful 7 лет назад ⁺⁶
These videos are essential watching, not just for programmers, but anyone who cares about intelligence.
@SirajRaval 7 лет назад ⁺²
thanks!
@dmitriidatsenko 7 лет назад
Hi from Ukraine. This series of videos is so useful for me. Thank you so much.
@SirajRaval 7 лет назад
awesome thx
@tusharseth2539 6 лет назад
@siraj, great going bro, your tutorials are just great.
just to add to it, and generalize , i did something like this:
for i,data in enumerate(y):
if (data
@marooffarooq2160 7 лет назад
@18:35
-> If the regularization term (lambda) is too high, wouldn't the model under fit? If our regularization term is high it forces the weights to get close to zero. This introduces errors due to high bias.
@ahmedbahaaeldin750 5 лет назад ⁺¹
For anybody who didnt have a good boundary region , try to decrease the learning rate to 0.1 with scale 0.5 and it will work perfectly fine. Love you @Siraj
@sebastianlena7016 7 лет назад
Thanks Siraj you're such a great teacher! Greetings from Paraguay
@avinashjha555 7 лет назад
much better than previous one. appreciate it. I have requested, please share the exact topic names of maths so we can prepare ourselves before watching your video. so we will understand not just 80℅ but 100℅.
@gordonlim2322 5 лет назад
For reference when doing further research, the objective function in most literature has the lambda on the 'hinge loss' term and is instead called C, the regulariser is instead the 'loss term' in this video and the term on the left is the one that maximizes the margin.
@mohammednagdy6661 5 лет назад ⁺²
I love how Siraj shot exactly to the point without bringing up how he got there with the lagrange multiplier and what not. I always get lost in proofs :D.
@theflippedbit 5 лет назад
Nice work. Just one correction, at 18:35 you said(also mentioned in the notebook) about the regularization term saying, if the regularizer is too large the model will overfit (correction: underfit, corresponds to high bias( low variance)), and if the regularizer term is too low then the model is going to be underfitting ( correction: overfit, corresponds to low bias( high variance).
one can think of it in this way. if the λ is too high(close to 1) the model will try to minimize the L2 loss term( margin error) and will have less incentive to minimize the classification error hence high bias, while the reverse happens if the λ value is too low (close to 0).
@fadiyousif1709 5 лет назад ⁺⁵³
that hair of yours needs a support vector machine to keep it up.
@lucborauzimam9721 4 года назад ⁺¹
HAHAHAHAHAAHAHAH
@vincentsung5972 7 лет назад ⁺⁸
Can someone explain more about the regularizer and the loss function thing... , How are these used to adjust the w
@SirajRaval 7 лет назад ⁺⁵
i will talk more about both terms in details in the next weeks thx
@linkinpat2 7 лет назад ⁺²
Yes, I'm losing it, when there's suddenly w come in (?!)
@zebcode 7 лет назад ⁺²
Rudy I think w is weights... f(x) is a function that has weights as coefficients (variables). So my guess is f takes an input and you could also inject a vector (list) of weights. I think f(x) = xi*wi.
I'm answering this question to try and better understand this myself so if I've got this wrong please correct me someone!
@aamir122a 7 лет назад ⁺¹
Thank you for this great video, My suggestion to do an exclusive series just on understanding Maths notation and reading equation. I know you try to explain where possible, however, I feel this topic is important enough to deserve it on series.
@StephenRayner 6 лет назад ⁺¹
Keep doing what you are doing! You are help me a lot.
@krishanudasbaksi9530 7 лет назад ⁺⁴
there is some error in the code. The svm does not seem to draw the correct decision boundary..
this is probably because the learning rate is quite high and the regularizer is equal to the inverse of the number of epochs.
@ankanmazumdar5000 2 года назад
1. SVM are great & are preferred only if we have less data
2. we should use SVM when we don't need more time, space etc like complex neural network
3.Points which are closest to the margin are support vectors
4. Our aim is to draw a hyperplane/decision surface with max margin/space separating differently labelled points & it is placed in perfect middle spot, with a line in the middle
5. a point belongs to a certain class, Such that it will have max likelihood, to fall on the side of the decision boundary where it should.
6. hyperplane will n-1 dimension if there are n dim/features
7. SVM can perform linear & non- linear, here, we are doing- Supervised linear classification
8. ML is about minimize an objective function , the way we optimize it is a loss/cost/error function
9. SVM use hinge loss function- c(x,y,f(x)) = (1 - y*f(x))+ where x is ample data point, y is true label, f(x) is predicted label, c is cost function. its{ 0 if if y*f(x) =1, then we dont need to update loss, w=w+ eta0(-2lambda.w)else we'll update regularizer & loss function both w=w+ eta0(yi*xi -2lambda.w)
14. regularizer = 1/epoch, hence lambda decreases when epoch increases.
@oscarfernandofloresgarcia830 7 лет назад
Thanks, it's always good to learn a white box approach
@mikejcooper1 7 лет назад ⁺⁴⁰
This is explained 100x better than from my ML Prof. Why do I pay so much money for it :( Looking forward to more videos!
@SwanandKulkarni2194 7 лет назад
Which college ? If you don't me asking.
@mikejcooper1 7 лет назад ⁺³
Not at all. University of Bristol, UK. Paying for a piece of paper - IMO.
@fmartin59 7 лет назад
Isn't Bristol supposed to be a good? I'm from America
@mikejcooper1 7 лет назад
Apparently so. The prof is well respected in his field but he seemed very disinterested when teaching his undergraduates.
@leeboonkong1021 6 лет назад
Same here, I learnt more in a 30 minute video than a 4 month course..............
@precogtyrant 7 лет назад
WOW, thanks for this video, much better than some of your other videos where there's a lot of drama and distraction(rapping etc.)
@daaaniel21 7 лет назад ⁺⁴
1>hey man I can't get this regularizer concept, is it 1/epochs always or just in this case and what other types of them are there.
2>why did we just change weight with gradient if y
@sharansrivatsa210 7 лет назад
I had the same doubt as your point 2.
Did you get a answer to your question? Could you share that?
@kenanmorani9204 3 года назад
Is there any video that explains implementation of Weighted Loss Support Vector Machine ALSVM, preferably for Regression tasks, or a clear and simple github respo?
@diegososa5280 4 года назад
This was borderline perfection
@jindongwang8361 7 лет назад ⁺¹
When I was learning SVM, the objective is to minimize ||w||^2, not as a regularizer. Since the very intuitive idea of SVM is to find the hyperplane with the largest margin, a small ||w|| would indicate that in math and geometry. But here you say it is only a regularizer, I found this not convincing. Does it compare to the sklearn-SVM?
@salrite 7 лет назад ⁺⁵
Can someone please explain how he plotted the "hyperplane"? what is x2x3 and X,Y,U,V?
@daliborkrajc7437 5 лет назад ⁺¹
What kind of program is Siraj using for scrolling at the background, while presenting at the front?
I use a Mac OSX. Is there any program that helps to scroll presentations?
@ThePRINCEBARPAGA 5 лет назад ⁺¹
He uses a green background, records his own video with a camera at the front and the screen of his macbook while presenting, then when he edits the video, he replaces the green background with the part he recorded on his macbook.
@david59675 7 лет назад
Siraj, this is really good material. I look forward to continuing watching the rest of your videos!!
@kathirs1 7 лет назад
Hyperplane explanation was awesome
@ravgaji 7 лет назад
Just wondering, how did you get the derivative of y , it looks like you did power rule but it is a vector. Plus, for the weights, wouldn't the update function constantly lower the weight values as for misclassified yx would be negative and for classified the -2lambda w would be negative.
@Myrslokstok 7 лет назад
Very good but Siraj is a kind of a gift to humanity.
@qtptnqtptnable 6 лет назад
So great! Keep enlightening us, pls!
@durand101 7 лет назад
Thanks for re-recording it!
@sjkba 7 лет назад ⁺¹
Your videos are amazing! Having said that... Some feedback on this one:
- The formulas get overwhelming fairly quickly when you're not used to reading them in that format. I know it's tricky to get all the information into these short videos. Maybe you could make a video about the different notations that pop up everywhere? The brackets that look similar to were completely new to me, for example.
- The jupyter notebook seems to have some errors in it and doesn't run without modifications. I love the notebooks so I can interact with the code and it would be great if it worked right away so I can be the one to break it :)
@tusharseth2539 6 лет назад
Seb, it doesn't have any errors, its functionality is like this, I was also stuck in this problem initially. Actually, you need to run the modified cell and then the final cell which will give your output. Else, you could do run all which will run all the cells from starting. So , mind it that jupyter run command works on the current cell, not on whole notebook
@peteredmonds1712 6 лет назад
Why does the (1-y*f(x)) in the loss function become (1-yi) in the objective function? Is supposed to be a vector where xi is the feature and w is the predicted label? thanks siraj
@justinh8747 7 лет назад
Would you be able to provide the PDF of the document you are working off in the video?
@RunFMe 7 лет назад
Am I right: in svm loss we care only about misclassified points and points which are classified correctly but not very confidently because is smaller than 1?
@modelworkzseo 5 лет назад
Great explanation buddy, just what I was looking for!
@antopratheesh8608 5 лет назад ⁺¹
Can you include bias as well in the above code and teach us on how to apply gradient to optimize it
@buzmez01 6 лет назад
Hi Siraj, before you start to explain its math, you mentioned digit classification. If svm is for 2 class classification, how svm works at 10 class (0,1,2,...,9) ?
@SwanandKulkarni2194 7 лет назад
In the Jupyter Notebook code file from Github, which is provided in the description box, the entire (For loop) of Training is missing.
@SirajRaval 7 лет назад
just fixed, thanks for telling me
@SwanandKulkarni2194 7 лет назад
Thank you.
@MariusConstantinDinu 7 лет назад
Hey Siraj, what do you think of Fuzzy Logic? Is this still a good field of research? I tried to lookup a good tutorial on youtube, where state of the art approaches are described, but it seems to be quite unpleasant. Would you do a video on Fuzzy Logic? I think it is quite interesting to define a hybrid approaches with neural networks.
BTW: Nice work with all of your videos and thanks a lot for working so hard!!!
@KeisOhtsuka 6 лет назад
Siraj, great work. Very interesting. You have made the basic idea of SVM accessible to an educated audience with minimal maths expertise. In my opinion, repeating basic definitions is good but to a point. Desist a temptation to do so too often. 24:45 you didn't say what update would be if a classification is correctly done - stay the same value as the update equals to zero? Imho, not much need to show the previous slides in the conclusion of lecutre. BTW,
I think anyone can download audio only and listen to it while driving and perhaps get 75% of the content easily. :)
@valentindion5573 5 лет назад
Not to criticize in any way (every bit of this channel is ridiculously awesome) I genuinly want to know: am a right to think the notation in `w = w + η( -2λw)` is incorrect ? "=" is used as an affectation operator, which is denoted '←' in the last ML book I read. Is it standard ?
@vibhassingh5063 7 лет назад
I added the test cases you have used in the input set and trained the model. After checking the graphical representation, I found few points were misclassified. What should I change in order to classify them correctly? Is this due to the fact that the difference between two closest oppoite class data point is less than one and we just cant use the comdition wx+b>1 or wx+b
@aladdindorado3608 6 лет назад
Great video sir. I followed all your codes but why the graph is not showing in my computer? I thought when i type plt.show() the graph will show but it didn't. Did i miss something?
@janiobachmann5029 7 лет назад
I always think of weights are the feature importance towards the label so each feature (column) in X has a certain weight towards the prediction of the label (y). Correct me if I am wrong. And for regularization I always think of this as a way to avoid overfitting or underfitting through the penalization of outliers. Remember, the point of the model is to grab the general pattern, not to make wrong assumptions through outliers which will lead to a bad accuracy score when we test the model. Again, correct me if I am wrong.
@zebcode 7 лет назад
Hi Siraj, I'm watching this over and over trying to keep up with the fast pace. I just wondered is the regularizer synonymous with a learning rate?
@Trymachinelearning 6 лет назад
I have choosen different points and post training and determining wieghts , I am able to classify the points correctly .
for i , x in enumerate(X):
total = np.dot(X[i],w)
print(np.sign(total))
output :-
-1.0
-1.0
1.0
1.0
1.0
The SVM seems to be working correctly but how to determine the hyperplane. I was unable to understand the last bit of code where @siraj created the hyperplane using the weight coordinates. Once I get the weights w (w1,w2,w3) where w3 is the bias vector, how to plot the hyperplane
@huojinchowdhury3933 3 года назад
From this channel I got very concise term where I can start digging. But, I am confused about some points. Please, help me to clear these things.
What I understood so far:
- Main job of SVM is to find a decision line or hyperplane that is maximally far way from both group of data points
To achieve above requirement SVM model hires two function which I can say two components or tools which are:
- objective function and
- loss function
Now inside objective function there comes loss function and regularizer(Heart of svm)
@mausamrani6138 6 лет назад
Awesome video on SVM Siraj Raval!!!! I am almost new in machine learning field and it is really very helpful in terms of understanding the concept and answers questions on what and why this method?
@ganeshnaik9624 6 лет назад ⁺¹
Can someone explain me the last part of the code i.e. plotting the hyper plane from the final weights calculated by gradient descent
@Raghadhav 7 лет назад
Great lesson, very helpful. Thanks a lot
@nkdms.2031 7 лет назад
Very good job Siraj! Although I feel that you didnt mention any details (or the intuition ) for maximazing the boundary - 'LARGE MARGIN' CASE. It appears in you diagrams but not any solid clue. Whats the difference between Linear regression and SVM truly? To me it looks like they do the same thing - even if they use different objective functions; linear separation for non discrete decisions. I am pretty sure is not the case.
@thekishanpatel 6 лет назад
Can you explain the function of the bias term in the array?
@ilansch1 6 лет назад
Siraj, Thanks a lot for this video and quite frankly for all of your videos. I have a question, I see you update the weights after each sample, is that always the case in SVM?
@jesusloaiza3032 5 лет назад
You have such a great energy!!
@pradeeshbm5558 5 лет назад ⁺¹
Can you pls briefly explain about Kernel in SVM
@recusivematrix9735 5 лет назад ⁺⁵
still kind of confused about how you plot the svm model (the line)in the end :(..
@rfhp1710 7 лет назад
Really cool video and simple code :) !! Would also love to see a little more intuition about the loss function. I only have a faint idea about it. You penalise cases where predictions and labels are on opposite side of the decision boundary ?
@electrookosh 7 лет назад
Hell yeah! Another great video Siraj! U forgot to talk about dat kernel trick doe!
@SirajRaval 7 лет назад
truuu thx
@prasannakumar7035 5 лет назад ⁺¹
Hey Siraj the video was awesome and cool learned a lot ..Why can't you provide a link to the notebook with which you are teaching us!!!
@yitto4965 6 лет назад
You da man Siraj!
@benbenjamin5 7 лет назад
Hey, Im pretty sure its the most trivial part but why do you add - signs to some of the weights in x2 and x3 before plotting them? Ive been messing with the code for ages trying to figure it out and I have no idea :/
@fortunestabilizer7560 7 лет назад
Great videos, Thanks man
i have a question at the objective function..
how
lambda*W^2 + (1-y) , after differentiation
they become
y*x - 2*lambda*W ?
@samuelarman4558 7 лет назад ⁺²
Hi Siraj,
I love your videos and have learnt a lot from them.
I am sure you are high on some sativa in this video :)
Peace from a fellow stoner.
@duybuiuckhang7078 6 лет назад
the video was great, is it me the only 1 worry about how to graph a hyperplane? I understand everything but plotting the hyperplane code, can i have explaination or some resource that i can learn?
@sharansrivatsa210 7 лет назад ⁺²
Can anyone tell me what "k" is in the gradient decent equation?
@infinitebuzz 7 лет назад
Really enjoying your videos, but you have the regularizer backwards in the video. Shouldn't it be too small -> overfit / too large -> underfit? If you, you might want to put a note. Keep up the good work!
@cogwheel42 7 лет назад ⁺⁵
"Tongue color... where did that come from?" ... Salute Your Shorts
@jtekmmx 7 лет назад ⁺⁴
Tongue color is actually a diagnostic tool used in Traditional Chinese Medicine:
A pale tongue body indicates Deficient Xue or Qi or Yang or Excess Cold. A overly red tongue body indicates Excess Heat. A purple tongue indicates that Qi and/or Xue are not moving harmoniously and are Stagnant. Pale purple means the Stagnation is related to Cold.
@mdaadithya5846 5 лет назад
In the video at 18:35 you were talking about the regularicer term lambda, larger values of
lambda will under fit the model & smaller values may over fit the model.
The parameter C, used in the standard implementations = 1/lambda. If C is large it will over fit and if C is small it may over fit.
was that a mistake in the video or is it a mistake in my understanding?
Thanks for the video
@FilipeSilva1 7 лет назад
New mic? Awesome!
@atrumluminarium 5 лет назад ⁺²
Tongue colour? Ed, Edd and Eddy nostalgia right there :P
@callumlock8936 7 лет назад
Total Amazeballs. This is so much better explained than any text book I have read. Thanks Siraj!
@swimfartlek 7 лет назад ⁺⁵
Daaaahude. Great video. But wow you talk fast. I dig it, I have to rewind a bunch if I look away for even a moment
@ibrahimprice2485 6 лет назад
Why does this boy always have the most cool shirts!
@carlostorres5207 5 лет назад
tienes material acerca de LS SVM?
@abhiasreddy 6 лет назад
Siraj, I tried this for a case where X has multiple attributes and I cannot plot the SVM for that as there are more attributes against y variable. How do I generate a plot in this case?
@arthdh5222 7 лет назад
Great video man!
@alberjumper 7 лет назад
Wow, great and appreciated work. The other video was good, but it's true that was below the standards of this channel haha. Keep pushing!
@SirajRaval 7 лет назад
thanks Alberto!
@santoshprasanth6633 4 года назад
Awesome 👍👍👍
@ego_sum_liberi 7 лет назад
Great Explanation!
@adedayoadeyemi7671 7 лет назад
Hello...nice video, do u work on satellite imagery data...
@yakumo885 7 лет назад
Is their a printable version of the documentation?
@gogyoo 6 лет назад
Something isn't clear for me. So the matrix is the collection of vectors (which are columns), the vectors themselves being represented by the coordinates of the endpoint in n dimensions (the starting point of the vector being the origin). What did I get wrong?
@sweatysweatson9399 7 лет назад
Siraj Raval can you suggest some books in order to follow your videos please?

Следующие

Автовоспроизведение

K-Means Clustering - The Math of Intelligence (Week 3)