Stochastic Gradient Descent | Why and How it Works?

Поделиться
HTML-код
  • Опубликовано: 19 окт 2024
  • This video contains all the conceptual details of the stochastic gradient descent and mini-batch gradient descent with a python implementation! I know it's a longer video than usual but I guess you'll enjoy the learning.
    #machinelearning #stochasticgradientdescent #python
    For more videos please subscribe -
    bit.ly/normaliz...
    Support me if you can ❤️
    www.paypal.com...
    www.buymeacoff...
    Notebook link -
    github.com/Suj...
    Medium article by me -
    towardsdatasci...
    Playlist ML Algos from Scratch -
    • How to Implement Gradi...
    Prev video on activation functions -
    • Why Activation Functio...
    Facebook -
    / nerdywits
    Instagram -
    / normalizednerd
    Twitter -
    / normalized_nerd

Комментарии • 12

  • @soyatoya2802
    @soyatoya2802 9 месяцев назад

    Really appreciate your videos

  • @suryavaraprasadalla8511
    @suryavaraprasadalla8511 2 года назад

    appreciate the intuitive example of mobile app

  • @ccuuttww
    @ccuuttww 4 года назад

    In Logistics regression SGD is quite different in linear regression
    linear regression is only takes a random single data point to train until it loop through the length of data

  • @jakechoi2359
    @jakechoi2359 3 года назад

    Hi, I am wondering what drawing pad device is used for this lecture. Thank you,

  • @NeeRaja_Sweet_Home
    @NeeRaja_Sweet_Home 4 года назад +1

    Nice explanation…!!! Thanks for sharing.
    Usually for real world data sets for optimization which one we will prefer GD or SGD also we have to do this manually or we have to use scikit learn libraries which one we will use.
    Thanks,

    • @NormalizedNerd
      @NormalizedNerd  4 года назад

      For large real-world data-set, most of the time we use mini-batch GD. Nearly every implementation of neural networks (keras, tf, torch, etc.) follows this technique and you just need to choose the batch size. To work with scikit learn, there is a module called SGDClassifier. If you don't find mini-batch version of the algorithm you're using then you can write this manually.

  • @praveenk302
    @praveenk302 4 года назад

    Assume for linear regression considering two training datapoints say [X11, X12] & [X21, X22], and output Y1 & Y2, the cost function.
    For plain vanilla GD be (Y1-(X11*W1 + X12*W2))^2 + (Y2-(X21*W1 + X22*W2))^2
    For SGD: will the weights change for every datapoint?

    • @NormalizedNerd
      @NormalizedNerd  4 года назад +1

      Yes...for a true SGD, the weights should be updated after every data point. But generally, we update the weights after computing the cost function for a batch of data points.
      And one more thing, Y1 & Y2 are not the outputs. They are ground truths.
      X11*W1 + X12*W2 is the output.

  • @vashistnarayansingh5995
    @vashistnarayansingh5995 5 лет назад

    Thanks for the video bro 👌👍

  • @think__tech
    @think__tech 5 лет назад

    Good one mate!

  • @karthikreddy9098
    @karthikreddy9098 4 года назад +1

    SPEED = 0.75