Stochastic Gradient Descent | Why and How it Works?
HTML-код
- Опубликовано: 19 окт 2024
- This video contains all the conceptual details of the stochastic gradient descent and mini-batch gradient descent with a python implementation! I know it's a longer video than usual but I guess you'll enjoy the learning.
#machinelearning #stochasticgradientdescent #python
For more videos please subscribe -
bit.ly/normaliz...
Support me if you can ❤️
www.paypal.com...
www.buymeacoff...
Notebook link -
github.com/Suj...
Medium article by me -
towardsdatasci...
Playlist ML Algos from Scratch -
• How to Implement Gradi...
Prev video on activation functions -
• Why Activation Functio...
Facebook -
/ nerdywits
Instagram -
/ normalizednerd
Twitter -
/ normalized_nerd
Really appreciate your videos
appreciate the intuitive example of mobile app
In Logistics regression SGD is quite different in linear regression
linear regression is only takes a random single data point to train until it loop through the length of data
Hi, I am wondering what drawing pad device is used for this lecture. Thank you,
Nice explanation…!!! Thanks for sharing.
Usually for real world data sets for optimization which one we will prefer GD or SGD also we have to do this manually or we have to use scikit learn libraries which one we will use.
Thanks,
For large real-world data-set, most of the time we use mini-batch GD. Nearly every implementation of neural networks (keras, tf, torch, etc.) follows this technique and you just need to choose the batch size. To work with scikit learn, there is a module called SGDClassifier. If you don't find mini-batch version of the algorithm you're using then you can write this manually.
Assume for linear regression considering two training datapoints say [X11, X12] & [X21, X22], and output Y1 & Y2, the cost function.
For plain vanilla GD be (Y1-(X11*W1 + X12*W2))^2 + (Y2-(X21*W1 + X22*W2))^2
For SGD: will the weights change for every datapoint?
Yes...for a true SGD, the weights should be updated after every data point. But generally, we update the weights after computing the cost function for a batch of data points.
And one more thing, Y1 & Y2 are not the outputs. They are ground truths.
X11*W1 + X12*W2 is the output.
Thanks for the video bro 👌👍
Good one mate!
Thank you!
SPEED = 0.75