You are a really great teacher. Watching you, we are feeling that you re-discover what you already knows with us ! I think it is the perfect way to learn people knowledges !
got stuck on gradient descent from the andrew ng coursera course, so as always, I'm back here for more digestable explanations. love your teaching style!
Great videos Daniel! Thank you! I started a IA course at college this semester (it's almost ending now), and this helped me to settle what I was studying. Keep it up!
Its incredible when you display the error and guess values, my next try is to make a learning rate which changes depending on the numbers behind the comma. This tutorial is awesome!!
Thank you for this. I was taking a coursera course on machine learning and got stuck on week one (incredibly frustrating!!) because half the math instructions didnt make sense. I had no idea it was so simple! I just passed week one. Thank you.
Hi Dan, I really enjoy your movies, I'm a self thought programmer and your movies give a real good insight in different kind of algorithms. Maybe nice to know... I'm actually a railtrack (P-Way) engineer and we use for example the least square method quite a lot. Keep up the great work! Ps. If your interested in some actual train datasets (from the Dutch Rail Network) leave a message.
You need separate learning rates for m and b. Then set the learning rate for b higher than the one for m so it would rotate faster, but move up and down slower.
Sooo impressed by the white board being magically erased! I watched the live stream and thought it would be a total disaster; well, I'm beyond impressed - some fine editing there! :) Loving the ML series so far Dan.
Wow! machine learning!.. you gave understanding how they work and how they write by line by line without package unlike package like tensor flow XD Wow..thank u
Are you going over gradient descent because it's used by the back propagation algorithms for neural networks? Because I can't wait to watch you do stuff with NN's.
So my guess on an explanation on these lines: m = m + (error * x) * learning_rate; b = b + (error) * learning_rate; First line: think about the question "when I change m, how does that affect y?". This is what calculus is used for, more specifically differentiation. The answer to the question is written in math as dy/dm, if our line expression is defined as: y = m * x + b. dy/dm = D(m * x + b, m) = x. This is why the error should by multiplied by x. For the second line same thing! Change of y when changing b? dy/db = D(m * x + b, b) = 1. We could multiply error by 1, or leave it out as Shiffman did. What does the D function do? It differentiates the expression with regards to the second parameter passed. To calculate this you can either use a calculator, use a lookup table of rules or derive the answer yourself following the proof.
This isn't quite right I don't think? Shouldn't you divide by x? Let's say your error was 1. So you want to change y by 1. If you change m by 1 you'll get a change of x! out of that. If you change m by 1/x you'll get the 1 out that you want. Or maybe written out... e1 = y - m1 * x - b1 e2 = y - (m1 + m_change) * x - b1 if you want e2 to be 0, then you get 0 = y - m1 * x - m_change * x - b1 = y - m1 * x - b1 - m_change * x = e1 - m_change * x m_change = e1 / x
I'm not sure I'm following you. But if, for one of the datapoints, the error is 1, you want to adjust the parameters (m and b) a small amount (learning_rate), weighted by error, so that for all your datapoints you get closer to a best fit.
Hi Dan, greate video. I had watch most of your videos and I would be glad if you could make video about addEventListener and what advantages and disadvantages over onclick, onblur, onmouseover... thank you in advance
For a more complete and in depth discussion of Linear Regression with Gradient Descent check out Professor Andrew Ng of Stanford series of machine learning videos: ruclips.net/video/PPLop4L2eGk/видео.html
Thank you for your awesome and easy to understand explanations! :) But I have a question regarding the code from 18:08 Why can we see the line moving instead of being just in its final position? So far, as I can see it in the code, the drawline() method is called after the gradientDescent() method. What am I missing here?
Great videos! :D You are the best! Do you recommend going with "Intelligence and Learning" sessions after p5.js introduction for someone who wants to get into Machine learning?
hey, nice video. could you explain why you normalize the values between 0 and 1 and what it does? i tried not normalizing them and i got some really wacky results using gradient descent even though it worked fine with the Ordinary Least Squares method. do you know why that happens?
Can someone explain why this is correct ? m = m + (error * x) * learning rate; I mean how is it dimensionally correct ? Shouldn't error be divided by x so that m can be added to something that is of type m.
But once again i don't understand how does NN know what the desired output is? You are calculating the Loss function based on desired output that you explicitly writing in the system? If you explicitly write out the numbers you are more or less telling the Neural Network what to do, isnt't the whole concept of neural network to find the way by itself?
Apologies for not making this clear. The technique I'm applying is called "supervised learning" where you have a set of training data with known outputs! The neural network learns how to reproduce the correct results with the known outputs so that it can (hopefully) produce the correct results also with data that doesn't have the answers paired with it. I think I cover this more in my 10.x neural network series.
The reason for this is because he is not simply writing a program that finds the correct line. He specifically is writing this program in such a way that implements and showcases the idea of back propagation. Calculating the line directly would be the most efficient way to write this program, but that's not the point of the video. There will be instances with much higher dimensional data where prognostication is much more efficient than doing what you suggest, such as in a Neural Network.
So, in a way anyone who's made any game with some basic NPC characters have dealt with gradient descent. Example: Trying to get the spaceship to turn and chase the player.
I tried to build a gradient descent algorithm from scratch. Why isn't mine working? Here's my code: for i in range(4): ypred = m * x + b error = (ypred - y) **2 m = m - (0.001 * error) b = b -(0.001 * error) m = m.sum() b = b.sum() #My 'm' and 'b' values decrease infinitely
I don't get your point, why we should switch to gradient descent. If you think about multidimensional models your linear regression looks like Y=X*b+e Where Y is a vector,X is a design matrix, b is some vector and e ('error') some random vector with mean(e)=0 and covariance matrix A. Now if A is invertible and X fullfills nice enough conditions, then there exists a LQ-estimator for b. And hence we would get our line, which fits the data the best. So the reason, why we are doing gradient descent is, because computing inverse matrices is pretty shitty? Or whats the point? P.S RUclips commentaries should support Latex :D
This is the amount of enthusiasm I need from my professor.
Keep up the good work, sir!
You are a really great teacher. Watching you, we are feeling that you re-discover what you already knows with us ! I think it is the perfect way to learn people knowledges !
got stuck on gradient descent from the andrew ng coursera course, so as always, I'm back here for more digestable explanations. love your teaching style!
Excellent. Love the way you present - enthusiastic, excited, but totally at ease.
man, this series with both board and coding together is really the best from yt, congrats
You single handedly made me go into cs. Thank you for your inspiration.
Hey, I am watching your channel for the first time and I am amazed how good you explain things! I am a teacher myself and I find you very inspiring!
This is the most intuitive explanation of linear regression. Thank you sir!
Keep up the good work. Your teaching is the best, especially when it comes to complicated topics.
Great videos Daniel! Thank you! I started a IA course at college this semester (it's almost ending now), and this helped me to settle what I was studying. Keep it up!
This was a great visual representation of SGD, thank you!
Its incredible when you display the error and guess values, my next try is to make a learning rate which changes depending on the numbers behind the comma. This tutorial is awesome!!
How awesome is this explanation, theory + programming is the way to go Coding train
Really awesome video! Thank you for making machine learning and math so much fun!!
Dude thank you so much for the intuition! many ppl don't bother going through that
Thank you for making these! Very informative!
You're welcome!
I agree ;)
Dan i love how you get so excited to explain things.. So much to say! 😅 super cute. Plus so informative. I m glad I found this channel.
2:35 spoiler for Avengers: Infinity War
Thank you Dan. Really you made this topic so easy to understand. Keep up the good work.
Thank you for this. I was taking a coursera course on machine learning and got stuck on week one (incredibly frustrating!!) because half the math instructions didnt make sense. I had no idea it was so simple! I just passed week one. Thank you.
you are really incredibly awesome teaching Sir!!!!... there is no words say....
This channel is really an amazing place to learn high programming algorithm.
thank you for the videos Mr shiffman.
Thank you!
Thank you so much!
I've been wanting to go over statistics to start diving into ml and mv you've just made my day!
I'm so glad to hear, thank you!
You are hilarious man! Best teacher on youtube for machine learning
Great videos Dan keep up the good work. The code really helps getting a handle on the theory.
That's great to hear.
the snap was cool ..... but we saw the truth in livestream lol😁
I must say i like the way you teach . You're a nice man God bless .
thank you for showing me how to implement multivariable calculus in programming!
Hi, I love your videos...I think they are amazing! I'm Italian and don't understand many words😕 you are great!
Thank you! I need to get more language subtitles!
Dan is wearing a funky t-shirt! looks good!
Hi Dan, I really enjoy your movies, I'm a self thought programmer and your movies give a real good insight in different kind of algorithms.
Maybe nice to know... I'm actually a railtrack (P-Way) engineer and we use for example the least square method quite a lot.
Keep up the great work!
Ps. If your interested in some actual train datasets (from the Dutch Rail Network) leave a message.
Oh yes, that could be good!
You need separate learning rates for m and b. Then set the learning rate for b higher than the one for m so it would rotate faster, but move up and down slower.
That is an awesome use of DOM man!
Sooo impressed by the white board being magically erased! I watched the live stream and thought it would be a total disaster; well, I'm beyond impressed - some fine editing there! :) Loving the ML series so far Dan.
Wow! machine learning!.. you gave understanding how they work and how they write by line by line without package unlike package like tensor flow XD Wow..thank u
Awesome cool..... What a teaching style I really love it you made my day by understanding linear regression with simple story really love you man
Hey Dan, thank you so much for making all these videos (: You're amazing!
Are you going over gradient descent because it's used by the back propagation algorithms for neural networks? Because I can't wait to watch you do stuff with NN's.
That's right!
great explanation
Shiffman is always nice man. Love you Guru !
So my guess on an explanation on these lines:
m = m + (error * x) * learning_rate;
b = b + (error) * learning_rate;
First line: think about the question "when I change m, how does that affect y?".
This is what calculus is used for, more specifically differentiation. The answer to the question is written in math as dy/dm, if our line expression is defined as: y = m * x + b.
dy/dm = D(m * x + b, m) = x. This is why the error should by multiplied by x.
For the second line same thing! Change of y when changing b?
dy/db = D(m * x + b, b) = 1. We could multiply error by 1, or leave it out as Shiffman did.
What does the D function do? It differentiates the expression with regards to the second parameter passed. To calculate this you can either use a calculator, use a lookup table of rules or derive the answer yourself following the proof.
This isn't quite right I don't think? Shouldn't you divide by x? Let's say your error was 1. So you want to change y by 1. If you change m by 1 you'll get a change of x! out of that. If you change m by 1/x you'll get the 1 out that you want. Or maybe written out...
e1 = y - m1 * x - b1
e2 = y - (m1 + m_change) * x - b1
if you want e2 to be 0, then you get
0 = y - m1 * x - m_change * x - b1
= y - m1 * x - b1 - m_change * x
= e1 - m_change * x
m_change = e1 / x
I'm not sure I'm following you. But if, for one of the datapoints, the error is 1, you want to adjust the parameters (m and b) a small amount (learning_rate), weighted by error, so that for all your datapoints you get closer to a best fit.
you are like my coding guru
lol thanks so much mr dan for your help!
Great videos! You are good at making videos by just being yourself and explaining in the best way possible. :))
velocity in this example doesn't mean speed? but instead means heading?
All videos by you are rocking
you are a good man. thank u
Really superb explanation of Gradient Descent. Is there any book which you refer or suggest us for Machine Learning ?
wonderful. nobody can teach better than you.
Thank you so much!
you're the boss. Very good explanation, loved it!
Very helpful
Hi Dan, greate video. I had watch most of your videos and I would be glad if you could make video about addEventListener and what advantages and disadvantages over onclick, onblur, onmouseover... thank you in advance
Hey Dan! I really like your videos, but sometimes you seem so lonely in that studio. :D
Wouldn't be something like a co-op coding challenge awesome?
Hah, love this idea!
great stuff Dan, this stuff is invaluable for anyone starting out in ML. top stuff.
can you host me?
For a more complete and in depth discussion of Linear Regression with Gradient Descent check out Professor Andrew Ng of Stanford series of machine learning videos: ruclips.net/video/PPLop4L2eGk/видео.html
Make a video on lasso regression without library as did for linear regression
You're really amazing! Thank you so much. Really enjoyed the way you explain things.
Nice tutorial channel!
Thank you for your awesome and easy to understand explanations! :) But I have a question regarding the code from 18:08
Why can we see the line moving instead of being just in its final position? So far, as I can see it in the code, the drawline() method is called after the gradientDescent() method. What am I missing here?
Very interesting !
It worked yay haha. I was waiting for it. I was watching at the time though.
GREAT Video .. Thanks a Lot
had a small doubt, shouldn't the change in slope be error/x instead of error*x?as it is rise / run
That "come back to me" ... hahahahaha
What you're describing here is effectively a kalman filter?
Great videos! :D You are the best!
Do you recommend going with "Intelligence and Learning" sessions after p5.js introduction for someone who wants to get into Machine learning?
The concepts in this are very similar to the perceptron model
Hello, there is a traduction of your description and your title in french. I live in France and I can't disable this, how to do it please ?
Awesome
I love this video Great Tnx Nice logo on your shirt
Maybe it would be cool if you made an AI for a simple game like noughts and crosses with a minimax algorithm
Would you have two separate learning rates for m and b? Seems like weighting the slope change higher could be beneficial.
So the so called steer is the delta of the weights? So called the change of weights in each iteration/epoch?
Well explained. it will be nice to see the code. Cant find it on github
github.com/CodingTrain/website/tree/master/Courses/intelligence_learning/session3
(Need to figure out a way for things to be more findable!)
Would it be possible Applying PID control scheme to the learning rate, so it will accelerate our learning process?
Your snap has inspired Thanos :D
Hey! Great Video! But...how is it possible that the line is self adjusting...According to the code..
hey, nice video. could you explain why you normalize the values between 0 and 1 and what it does? i tried not normalizing them and i got some really wacky results using gradient descent even though it worked fine with the Ordinary Least Squares method. do you know why that happens?
Julian atlasovich but it didn't work without normalization
So the steer on the graph, it would be vertical line between Yguess and Yactual as a difference?
Would you please elaborate the implementation of Gradient Descent Algorithm using vectorization method in python?
Our Coding Train Discord is a great place to get help with coding questions ! discord.gg/hPuGy2g
- The Coding Train Team
Really rookie right now... Gotta progress fast!
PID ? As always thx Dan...
cool video
why you multiply error * x by learning_rate??
Can someone explain why this is correct ?
m = m + (error * x) * learning rate;
I mean how is it dimensionally correct ? Shouldn't error be divided by x so that m can be added to something that is of type m.
I agree with you, I feel confused at this part as well.
yep I do not understand either.
oh it is in the next video.
The derivative from first principles allows for that. 2error*x= (error*x)*2(or any small number epsilon)
But once again i don't understand how does NN know what the desired output is?
You are calculating the Loss function based on desired output that you explicitly writing in the system?
If you explicitly write out the numbers you are more or less telling the Neural Network what to do,
isnt't the whole concept of neural network to find the way by itself?
Apologies for not making this clear. The technique I'm applying is called "supervised learning" where you have a set of training data with known outputs! The neural network learns how to reproduce the correct results with the known outputs so that it can (hopefully) produce the correct results also with data that doesn't have the answers paired with it. I think I cover this more in my 10.x neural network series.
Got it!
Thank you for your time and determination, see you in next episode (:
Where is the code ??i am not finding it in github?
I would love to see the snapping of the fingers live :pp
Hey Thanks for The awesome video, i dont understand why not calculate the correct line directly ?
The reason for this is because he is not simply writing a program that finds the correct line. He specifically is writing this program in such a way that implements and showcases the idea of back propagation. Calculating the line directly would be the most efficient way to write this program, but that's not the point of the video. There will be instances with much higher dimensional data where prognostication is much more efficient than doing what you suggest, such as in a Neural Network.
look at 19:14 for his explanation
makes sense Thanks.
but can u give examples or reference on why would i need this learning process
0:05 Hahahah my complete life in 1 question
I think you are awesome 😊😊
what is the best book for machine learning?
Cost function in this video is mean sqaure error?
Brother , Do you have any slack Channel or discord?
Dear sir, if you give any suggesion to understand that formula : "DELTA_m = error * x " , I will be very greatful .
So, in a way anyone who's made any game with some basic NPC characters have dealt with gradient descent. Example: Trying to get the spaceship to turn and chase the player.
I had previously heard it described as a Mine Craft player walking down hill to find a treasure.
2:41 Thanos of blackboard writings.
I tried to build a gradient descent algorithm from scratch. Why isn't mine working? Here's my code:
for i in range(4):
ypred = m * x + b
error = (ypred - y) **2
m = m - (0.001 * error)
b = b -(0.001 * error)
m = m.sum()
b = b.sum()
#My 'm' and 'b' values decrease infinitely
I don't get your point, why we should switch to gradient descent. If you think about multidimensional models your linear regression looks like
Y=X*b+e
Where Y is a vector,X is a design matrix, b is some vector and e ('error') some random vector with mean(e)=0 and covariance matrix A.
Now if A is invertible and X fullfills nice enough conditions, then there exists a LQ-estimator for b. And hence we would get our line, which fits the data the best.
So the reason, why we are doing gradient descent is, because computing inverse matrices is pretty shitty? Or whats the point?
P.S RUclips commentaries should support Latex :D
I think this is a stepping stone to non-linear optimization. It makes the example simple to just apply it to linear regression.
Yes, I see what he is going to do with that. But his argument was a little bit sloppy, why we would do that.
You have done the snap even before Thanos have done that =D
why you said x=data[i]*x and y=data[i]*y at 12:20
I'm sure this video can be condensed without losing any real info. I don't have the patience to see it through half way.
How old are you and how old you was when u started programming