The Stochastic Gradient Descent Algorithm

Поделиться
HTML-код
  • Опубликовано: 8 ноя 2024

Комментарии • 14

  • @TKR911
    @TKR911 3 года назад +1

    Beautiful explanation, simply beautiful !

  • @schmijo
    @schmijo 3 года назад +1

    Very well explained,
    And the production quality is out of this world
    Thank you sir 🙏🏼

  • @LarryPanozzo
    @LarryPanozzo 3 года назад +1

    GREAT channel! Subscribing!
    Currently struggling through global multivariate nonlinear optimization of a fortunately-differentiable nonconvex, high-dimensional scalar function.

  • @vladimir7533
    @vladimir7533 3 года назад

    Appreciate your very clear explanation!

  • @scienceknight5122
    @scienceknight5122 5 месяцев назад

    ty

  • @notgabby604
    @notgabby604 Год назад

    Stochastic gradient descent is magical thinking that actually works. Sometimes you get lucky.

  • @ChristopherLum
    @ChristopherLum Год назад

    Hi Professor Kutz, do you still have a bounty on errata? I think your going rate used to be around $0.25 per instance 😊. I believe there might be a minor errata around timestamp 7:23 . At this time you state that f is the map from input to output (effectively the forward propagation through the network) but in the context of gradient descent, you actually want to use the gradient of the error function that you mentioned in the previous slide (around timestamp 6:55).

  • @SphereofTime
    @SphereofTime 11 месяцев назад

    19:00

  • @posa5225
    @posa5225 4 года назад +1

    Can you explain what the "weights" are to a non-computer science person? (There's so much jargon, my brain)

    • @Nissearne12
      @Nissearne12 4 года назад +1

      weights are all the connections between all the neurons (neurons are some sort of non linear functions). So the weights is the memory in the network, how have to be trained. The weights are initialized with random values before start training the weights. It's important that the weights is not set symmetric before start training.

    • @BruinChang
      @BruinChang 4 года назад +7

      "weights" can be considered as coefficients "a" and "b" in the following equation:
      y = a*x + b.
      Weights have their such specific "positions" that determine the mathematical structure of a computational model they construct.
      For example, in the above equation, "a" represents the linear term (slope), and "b" indicates the constant (intersection of y axis): "a" occupies the position of linear term, and "b" occupies the position of constant. The tuple (a, b) does not indicate a linear model until we set the specific positions to "a" and "b" mentioned above.
      "Weights" in a neural network model are something like this, but in a much higher dimensional space. Hope this helps.

    • @jordanzhen7174
      @jordanzhen7174 4 года назад +2

      Weights are a set of attributes. For example, a house has # Bedrooms, color of door, # washrooms, #floors, each of these is a single weight, (Whatever else you want to add in there, # of windows). Then you want to see how much each of these weights affect the house price, so # of bedrooms may have a higher weighting to the price, thus higher coefficient value, and color of door may have a lower weighting. These weights then make up a formula such that you can input your own attributes of the house and try to estimate the price.

  • @lit22006
    @lit22006 3 года назад +1

    are you related to Matthew McConaughey!

  • @pnachtwey
    @pnachtwey 5 месяцев назад

    What is a lot of weights? The problem I have had is that it is hard to find the gradient where the derivative of all the parameters are minimized at once. Usually, there is a parameter or two that will significantly increase the sum of squared errors or mean squared error so the step can't be big and one of the many parameters will keep the parameter set from moving towards a minimum.. In other words, the valley of the data set of parameter is trying to walk down is very narrow. I have lots of real data. GD always has problems with optimizing his data. There are many algorithms that a much better if the number of parameters is less than 25. The professor's visual example is OK for teaching but is too simple.easy,