23. Accelerating Gradient Descent (Use Momentum)

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 48

  • @gigik64
    @gigik64 5 лет назад +39

    Jesus man, I remember back before I started college when I checked out Prof Strang’s calculus series.
    He’s aged quite a lot since that series, but he’s always sharp as a tack. And I’m just astonished that even being so old he knows so much about machine learning, I didn’t think it was his field.
    Huge kudos Gilbert Strang, huge kudos.

    • @marsag3118
      @marsag3118 3 года назад +5

      impressive indeed. I'd be happy to be 50% sharp at that age as he was here.

  • @franzdoe5558
    @franzdoe5558 4 года назад +6

    Such a great lecturer, as well as in his classic Linear Algebra lecture series. Really nice to see him up and healthy, sharp and as a great step-by-step-explainer as ever.

  • @georgesadler7830
    @georgesadler7830 3 года назад +3

    Professor Strang ,thank you for an old fashion lecture on Accelerating Gradient Descent.
    These topics are very theoretical for the average student.

  • @dengdengkenya
    @dengdengkenya 5 лет назад +20

    Why is there no more comments for such a great course? MIT is a great university!

  • @nguyenbaodung1603
    @nguyenbaodung1603 3 года назад +1

    I'm so happy to see you here. I only trust you when it comes to lecture

  • @marjavanderwind4251
    @marjavanderwind4251 4 года назад +4

    Wow this old man is so smart. I would wish to see more lectures from him and learn much more of this stuff.

    • @yefetbentili128
      @yefetbentili128 4 года назад +1

      absolutely ! this man is a pure tresor

    • @mdrasel-gh5yf
      @mdrasel-gh5yf 4 года назад +1

      Check out his linear algebra course, this is one of the most liked playlists of MIT.
      ruclips.net/video/7UJ4CFRGd-U/видео.html

  • @Arin177
    @Arin177 Год назад

    Those who have sixth edition of Introduction to Linear Algebra can enjoy this course!!! In my view this course really increases the value of the book.

  • @honprarules
    @honprarules 4 года назад +3

    He radiates knowledge. Love the content!

  • @yubai6549
    @yubai6549 4 года назад +2

    祝老爷子健康,非常感谢您!

  • @MsVanessasimoes
    @MsVanessasimoes 3 года назад +1

    I loved this amazing lecture. Great professor, and great content. Thanks for sharing it openly on RUclips.

  • @何浩源-r2y
    @何浩源-r2y 5 лет назад +1

    Prof Boyd is also very good teacher !
    I enjoy his lecture very much.

  • @casual_dancer
    @casual_dancer Год назад

    Finally a lecture that explains the magic numbers in momentum! Those shorter video formats are great for introduction but leave me confused about the math behind it. Love the ground up approach to explaining.
    Could any one tell me what the book that Professor Strang mentioned in 06:53 of the lecture is?

    • @scotts.9460
      @scotts.9460 11 месяцев назад

      web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf

  • @vaisuliafu3342
    @vaisuliafu3342 3 года назад +4

    such great lecturing makes me wonder what part of MIT student success is due to innate ability and how much due to superior teaching

    • @PrzemyslawSliwinski
      @PrzemyslawSliwinski 2 года назад +3

      In terms of this very lecture: think about a professor as a gradient with your ability being a momentum. ;)

  • @brendawilliams8062
    @brendawilliams8062 3 года назад

    It’s nice you got it on a linear line.

  • @newbie8051
    @newbie8051 Год назад

    Tough course to follow, from what I feel (I'm currently in my 4th semester of undergrad)
    Great lecture of Prof Gilbert, I feel kinda dumb after listening to this lecture, will try again

  • @antaresd1
    @antaresd1 4 года назад

    Crystal clear! Thank you very much for sharing it

  • @meow75714
    @meow75714 3 года назад +1

    wow, beautiful, now i see why it oscillates

  • @alessandromarialaspina9997
    @alessandromarialaspina9997 2 года назад

    Can this procedure be expanded to deal with problems in multiple dimensions? So a, b, c, and d are not scalars but actually vectors themselves, representing the inputs x1, x2, x3 to a function f(x1, x2, x3). How would you form R that way, and would you have different condition numbers for each element of b?

  • @RAJIBLOCHANDAS
    @RAJIBLOCHANDAS 2 года назад

    Excellent lecture

  • @Schweini8
    @Schweini8 9 месяцев назад

    why is it enough to assume x follows an eigenvector to demonstrate the rate of convergence?

  • @itay4178
    @itay4178 4 года назад

    Such a great lecturer. Thank you!

  • @vnpikachu4627
    @vnpikachu4627 3 года назад +2

    At 27:00 why follow the direction of eigenvalue? It just comes out of no where

    • @ky8920
      @ky8920 3 года назад

      i think it has something to do with pca.

    • @e2DAiPIE
      @e2DAiPIE Год назад +1

      Can anyone provide some clarification here?
      I think why we would like to follow an eigen-vector is made clear, but what's not clear to me is why we expected this would work prior to deriving the result (that f decreases faster).
      I can see that following an eigen vector reduces the problem of inverting a block matrix containing the original S to just inverting a much smaller matrix of scalars. So, maybe this strategy was just wishful thinking that paid off?
      Insight would be very welcome. Thanks.

    • @Schweini8
      @Schweini8 9 месяцев назад

      @@e2DAiPIE maybe if you can show that the method converges in all directions pointed by eigenvectors then it also converges with at least the same rate in all other directions (since any vector x in S can be written as a linear combination of the eigenbasis)

  • @ShadowGamer-qy7ls
    @ShadowGamer-qy7ls 2 года назад

    That guy who is always capturing the photo

  • @archibaldgoldking
    @archibaldgoldking 3 года назад

    B is just the momentum :)

  • @anarbay24
    @anarbay24 4 года назад

    why f is equal to (1/2)X(transpose)Sx where prof did not explain what is S. Does anyone know what is that?

    • @sheelaagarwal3392
      @sheelaagarwal3392 4 года назад

      see lecture 22 for the definition

    • @ky8920
      @ky8920 3 года назад

      this subchapter is limited to the convex function. convex provides a nice property: the local minima is also the global minima

  • @vishalpoddar
    @vishalpoddar 4 года назад +1

    why do we need to make the eigen vector as small as possible ?

    • @samymohammed596
      @samymohammed596 3 года назад

      You mean why are we trying to make the eigenvalue as small as possible? I am also wondering the same... if we make eigenvalues of R small, then R^k -->0 as k-->\infty and you end up with c_k, d_k --> 0, and what good is that? I am surely missing a few parts to this story...

    • @0ScarletBlood0
      @0ScarletBlood0 3 года назад +3

      @@samymohammed596 1) if on the contrary, the powers of R where increasing, the new values of c_k, d_k would increase with them, meaning that x_k = c_k*q would never settle for the minimum of the function but diverge from it.
      2) you do want the value of d_k to approach zero, meaning that z_k = d_k*q = 0 which then makes x_(k+1) = x_k, the point of convergence would be found at the minimum of the function.
      it's true that R^k --> 0 as k --> inf but we are not computing these values that many times! Taking this into account, R^k*[c_k, d_k] is not = [0, 0]

    • @samymohammed596
      @samymohammed596 3 года назад +1

      @@0ScarletBlood0 Ah, of course you are right about wanting d_k = 0! :):) Thanks for making that point clear!
      I certainly see the issue with powers of R increasing and then that causing immediate divergence. Yes, better for eigenvalues to be < 0 because then at least you don't start off with divergence...
      But then you might hit zero... I guess you need a little skill to pick the parameters s, beta to ensure that your problem is well defined so that you reach convergence (d_k = 0) before the powers of R runaway and make the whole thing zero! Just my 2 cents... but thanks very much for your reply!

    • @ky8920
      @ky8920 3 года назад

      @@samymohammed596 that matrix has full rank, as long as β!=0.

    • @brendawilliams8062
      @brendawilliams8062 3 года назад

      All I know is it’s based on symmetry and the remaining 5 will be at the end of the spool.

  • @omaribrahim3370
    @omaribrahim3370 4 года назад

    Momentum forsenCD

  • @ostrodmit
    @ostrodmit 4 года назад +3

    Would they please stop calling Nesterov's algorithm ``descent''? It's not a descent method as Nesterov himself keeps repeating. Otherwise, a wonderful lecture, and an impressive feat for the lecturer given his age.

    • @ketan9318
      @ketan9318 4 года назад

      I agree with your point.

  • @naterojas9272
    @naterojas9272 4 года назад

    I'm back! 🤓

  • @murat7456
    @murat7456 4 года назад +1

    reis 85 yaşında kafa zehir.