Adam Optimization Algorithm (C2W2L08)

Поделиться
HTML-код
  • Опубликовано: 30 сен 2024
  • Take the Deep Learning Specialization: bit.ly/2vBG4xl
    Check out all our courses: www.deeplearni...
    Subscribe to The Batch, our weekly newsletter: www.deeplearni...
    Follow us:
    Twitter: / deeplearningai_
    Facebook: / deeplearninghq
    Linkedin: / deeplearningai

Комментарии • 77

  • @manuel783
    @manuel783 3 года назад +71

    Clarification about Adam Optimization
    Please note that at 2:44, the Sdb equation is correct. However, from 2:48 , the db² lost the ².
    The bottom right equation should still be:
    Sdb = β₂Sdb + (1 - β₂)db²

  • @sahanmendis3369
    @sahanmendis3369 3 года назад +18

    Only understood his friend has nothing to do with Adam optimization!

    • @donfeto7636
      @donfeto7636 2 года назад +2

      cuz it is based on many another algorithms you need to start from gradient decent to know this

  • @IgorAherne
    @IgorAherne 6 лет назад +79

    I don't understand why some people hating, - yes, Proff missed a couple of symbols (once in a lifetime)
    The matter of truth - without his or Geoffrey's videos to watch we would be totally fucked ))

  • @douglaskaicong131
    @douglaskaicong131 5 лет назад +56

    i am confuse to the maximum level, can i buy more brain power like i buy more rams?

    • @cimmik
      @cimmik 5 лет назад +6

      Don't worry, fellow human. Neither I have enough neurons to process that data with a decent success rate.

    • @mohdazam1404
      @mohdazam1404 4 года назад +9

      Before this video you should be very much clear on GD, SGD and Mini Batch SGD...

    • @ehsankhorasani_
      @ehsankhorasani_ 3 года назад

      No, just find a better video on RUclips that explain Adam

    • @seeking9145
      @seeking9145 2 года назад

      @@ehsankhorasani_ Can't find many which even explain it

  • @jerrylin5089
    @jerrylin5089 5 лет назад +15

    Why did you erase the squared at 2:46? Shouldn't RMSprop have a squared term for the bias as well?

    • @KrishnaAnapindi92
      @KrishnaAnapindi92 4 года назад +5

      I think thats a typo

    • @rohitborra2507
      @rohitborra2507 4 года назад +3

      yes thats a mistake

    • @sammathew243
      @sammathew243 4 года назад

      The square appeared for a while, after Andrew missed it, but then vanished. It seems the video edit could not sustain the presence of 2!

    • @doyugen465
      @doyugen465 3 года назад

      this has most definitely saved me a good few hours, possibly more than 24, of staring at my code wondering why im doing this. thank you :)

  • @danlan4132
    @danlan4132 5 лет назад +9

    from 0:00-4:36, S_db is missing a square on db element, it should be s_db = b_2*s_db +(1-b_2)*db^2

  • @aa4mad
    @aa4mad 5 лет назад +9

    please apply a low pass filter on the audio of this video

    • @SohamRailkar
      @SohamRailkar 4 года назад +1

      create model fo it

    • @MrDeyzel
      @MrDeyzel 4 года назад +3

      I've been downloading the vids and adding a lowpass in VLC. Can't stand the hissing.

  • @pipilu3055
    @pipilu3055 4 года назад +7

    This video is closely related to the video "Bias Correction of Exponentially Weighted Averages". Please revisit that video if you feel this is too confusing.

  • @Troglodyte2021
    @Troglodyte2021 4 года назад +4

    Eve Optimization Algorithm will come soon!

  • @jerekabi8480
    @jerekabi8480 5 лет назад +13

    This nailed down the Adam paper. Thanks alot

  • @mdashiqurrahman9369
    @mdashiqurrahman9369 5 лет назад +21

    there is roasting in the end

  • @GRMREAP3R97
    @GRMREAP3R97 3 года назад +3

    Could anyone give me a list of the notations he mentions in the video or direct me towards a video that has those explained? Main issue with understanding the concept in the video is the lack of explanation of the notations used.

    • @austinhoag5130
      @austinhoag5130 2 года назад +1

      Notice the C2W2L08 in the title of this video. The "L08" part means "Lecture 8" of this series. If you search for "C2W2L01" through "C2W2L07" in youtube you will find the videos leading up to this where he explains all of these terms. In particular the ones that are most helpful are: C2W2L03 (exponentially weighted averages), C2W2L05 (bias correction), C2W2L06 (momentum) and C2W2L07 (rms prop).

  • @EranM
    @EranM 5 лет назад +15

    Haha showing Adam there was hilarious :>

  • @prajwollamichhane4064
    @prajwollamichhane4064 4 года назад +10

    Roasting at the end ! Hahaha

  • @briansehoggarth4687
    @briansehoggarth4687 16 дней назад

    Smith Helen Clark Ronald Jones Mary

  • @omidtaghizadeh9698
    @omidtaghizadeh9698 4 года назад +1

    you really dont think that statement of the problem that ADAM solves is of relevance, when you are introducing ADAM?

  • @mostafanakhaei2487
    @mostafanakhaei2487 4 года назад +13

    any time I want to implement ML from scratch, I watch all Andrew's videos from beginning to end! I don't know how to express my appreciation to this great man.

  • @mllo2003
    @mllo2003 2 года назад +6

    The very best and most succinct explanation of ADAM I've ever seen. Things become crystal clear if one watches L06 to L08 in a row.

  • @kisome2423
    @kisome2423 6 месяцев назад

    Handwriting is too difficult to distinguish

  • @krishnakrmahto97
    @krishnakrmahto97 5 лет назад +4

    God of lucid explanation

  • @piotr780
    @piotr780 Год назад

    -1 no knowledge about why Adam works better then previous algorithms is provided

  • @cw9249
    @cw9249 Год назад

    It would be easier if you just typed instead of handwrite I can’t read it

  • @ximingwen2542
    @ximingwen2542 3 года назад +1

    what is s and v

  • @therealme613
    @therealme613 3 года назад

    First task of significance is for me to figure out how to spell Andrews last name then I move on to the algorithm 🤓

  • @sashakobrusev3162
    @sashakobrusev3162 3 года назад

    what is t I do not completely understand

  • @stipepavic843
    @stipepavic843 2 года назад +1

    this man is a Legend!!

  • @DerAfroJack
    @DerAfroJack 2 года назад

    Hey there I know I am late to the party but I have a pressing question the rest of the internet has failed to answer so far.
    I currently have to work with a model and network I didn't design and my job is to basically find out whats wrong so naturally I need to understand the LOC used.
    There was a line I havent found any example for: optimizer = keras.optimizers.Adam(0.002, 0.5)
    I am still studying so I am not that well versed in Keras or anything AI so far really but I wanna know if this second value refers to the beta_1 or any other value I am not noticing.
    The documentation has me puzzled so far so I hope theres someone here who can answer this.

    • @judy1982
      @judy1982 Год назад

      I'm late but yes it refers to beta_1. The parameters according to the documentation are learning_rate, beta_1, beta_2, epsilon etc.
      Additionally, if you want to be sure you can run the optimizer with your values 0.002 and 0.5 and then check the value of each parameter. it shows that beta_1 is indeed 0.5

  • @submagr
    @submagr 4 года назад +3

    You are so sweet. Thank you Sir, for these awesome videos!

  • @Philson
    @Philson 6 лет назад +4

    SGD vs ADADELTA? If I only had those 2 choices.

  • @iNationOnline
    @iNationOnline 6 лет назад +1

    is that you beatthebush?

  • @veganphilosopher1975
    @veganphilosopher1975 3 года назад

    Ow my ears

  • @paulodybala132
    @paulodybala132 3 года назад

    😂 6:34

  • @hudsonvan4322
    @hudsonvan4322 3 года назад

    比助教講得好太多了

  • @ahmedelsafy9323
    @ahmedelsafy9323 6 лет назад +3

    shouldnt db at Sdb be squared ?!

    • @pascalgula
      @pascalgula 6 лет назад

      yes, you can see it between 2:43 and 2:47 added to the video but not consistently.

  • @dexlee7277
    @dexlee7277 6 лет назад +1

    why do this-> m1 /(1-beta1) , m2/(1-beta2) ? ? this operation zoom it 10 times and 1000 times, reason? bias correction? what does it means?

    • @dpacmanh
      @dpacmanh 6 лет назад +1

      Watch lectures C2W2L03, 04 and 05

  • @abhijeetghodgaonkar
    @abhijeetghodgaonkar 6 лет назад

    Yay

  • @ayushsahu6304
    @ayushsahu6304 4 года назад +1

    You are my god.

  • @ffffffffffy
    @ffffffffffy 7 лет назад +2

    what does db mean here? derivative of the bias?

    • @michaellaskin3407
      @michaellaskin3407 6 лет назад +4

      yes

    • @banipreetsinghraheja8529
      @banipreetsinghraheja8529 6 лет назад +2

      Derivative of bias w.r.t to Loss Function. See previous videos for conventional queries.

    • @NikosKatsikanis
      @NikosKatsikanis 6 лет назад +1

      decibels

    • @LL-yf9kp
      @LL-yf9kp 6 лет назад +9

      Wrong, it is the derivative of the loss function with respect to the bias.

    • @EndersupremE
      @EndersupremE 5 лет назад

      @@LL-yf9kp loss function or cost function?

  • @sandipansarkar9211
    @sandipansarkar9211 3 года назад

    great explntion.Meed to watch again

  • @pnachtwey
    @pnachtwey 3 месяца назад

    It would have been nice to see it work with real python code and real data. What instructors don’t understand is they all use different symbols and terminology. I also don’t like their scribbling. I have data I use to test. There are 5 different parameters. The ‘terrain is more like the grand canyon and not a bowl so the path is extremely narrow. Adagrad works best so far. I will try ADAM.

  • @infratechethiopia
    @infratechethiopia 7 лет назад +3

    How can we use it for facial recognition

    • @ahmedelsafy9323
      @ahmedelsafy9323 6 лет назад +7

      Thats an optimization algorithm used to train a machine learning model ! you need to refer to the Machine Learning course

  • @ADITYASINGHthesoftwareengineer
    @ADITYASINGHthesoftwareengineer 6 лет назад

    Why do we need correction in momentum or rms using T elimination?

  • @VR_JPN
    @VR_JPN 6 лет назад +3

    It would be appreciated if teachers would revisit their videos and replace torturous live digital pen notes with elegant text and diagrams. Chalk on a board is fine, but digital pens are painful to endure.

    • @zelllers
      @zelllers 5 лет назад +7

      Hello Abbass, I'm very sorry for the pain you are suffering.

    • @doyugen465
      @doyugen465 3 года назад +1

      it is a way of punishing those who do not understand the math but know how to code it..