Adam Optimization Algorithm (C2W2L08)
HTML-код
- Опубликовано: 30 сен 2024
- Take the Deep Learning Specialization: bit.ly/2vBG4xl
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai
Clarification about Adam Optimization
Please note that at 2:44, the Sdb equation is correct. However, from 2:48 , the db² lost the ².
The bottom right equation should still be:
Sdb = β₂Sdb + (1 - β₂)db²
Only understood his friend has nothing to do with Adam optimization!
cuz it is based on many another algorithms you need to start from gradient decent to know this
I don't understand why some people hating, - yes, Proff missed a couple of symbols (once in a lifetime)
The matter of truth - without his or Geoffrey's videos to watch we would be totally fucked ))
correct
Yeah very true
by he provided no justification behind Adam algorithm
i am confuse to the maximum level, can i buy more brain power like i buy more rams?
Don't worry, fellow human. Neither I have enough neurons to process that data with a decent success rate.
Before this video you should be very much clear on GD, SGD and Mini Batch SGD...
No, just find a better video on RUclips that explain Adam
@@ehsankhorasani_ Can't find many which even explain it
Why did you erase the squared at 2:46? Shouldn't RMSprop have a squared term for the bias as well?
I think thats a typo
yes thats a mistake
The square appeared for a while, after Andrew missed it, but then vanished. It seems the video edit could not sustain the presence of 2!
this has most definitely saved me a good few hours, possibly more than 24, of staring at my code wondering why im doing this. thank you :)
from 0:00-4:36, S_db is missing a square on db element, it should be s_db = b_2*s_db +(1-b_2)*db^2
please apply a low pass filter on the audio of this video
create model fo it
I've been downloading the vids and adding a lowpass in VLC. Can't stand the hissing.
This video is closely related to the video "Bias Correction of Exponentially Weighted Averages". Please revisit that video if you feel this is too confusing.
thanks
Eve Optimization Algorithm will come soon!
This nailed down the Adam paper. Thanks alot
Can Adam optimization be used for classification problems?
@@ogunyolufunmilola2059 Yes it can
there is roasting in the end
Could anyone give me a list of the notations he mentions in the video or direct me towards a video that has those explained? Main issue with understanding the concept in the video is the lack of explanation of the notations used.
Notice the C2W2L08 in the title of this video. The "L08" part means "Lecture 8" of this series. If you search for "C2W2L01" through "C2W2L07" in youtube you will find the videos leading up to this where he explains all of these terms. In particular the ones that are most helpful are: C2W2L03 (exponentially weighted averages), C2W2L05 (bias correction), C2W2L06 (momentum) and C2W2L07 (rms prop).
Haha showing Adam there was hilarious :>
Roasting at the end ! Hahaha
Smith Helen Clark Ronald Jones Mary
you really dont think that statement of the problem that ADAM solves is of relevance, when you are introducing ADAM?
any time I want to implement ML from scratch, I watch all Andrew's videos from beginning to end! I don't know how to express my appreciation to this great man.
The very best and most succinct explanation of ADAM I've ever seen. Things become crystal clear if one watches L06 to L08 in a row.
Handwriting is too difficult to distinguish
God of lucid explanation
-1 no knowledge about why Adam works better then previous algorithms is provided
It would be easier if you just typed instead of handwrite I can’t read it
what is s and v
First task of significance is for me to figure out how to spell Andrews last name then I move on to the algorithm 🤓
what is t I do not completely understand
this man is a Legend!!
Hey there I know I am late to the party but I have a pressing question the rest of the internet has failed to answer so far.
I currently have to work with a model and network I didn't design and my job is to basically find out whats wrong so naturally I need to understand the LOC used.
There was a line I havent found any example for: optimizer = keras.optimizers.Adam(0.002, 0.5)
I am still studying so I am not that well versed in Keras or anything AI so far really but I wanna know if this second value refers to the beta_1 or any other value I am not noticing.
The documentation has me puzzled so far so I hope theres someone here who can answer this.
I'm late but yes it refers to beta_1. The parameters according to the documentation are learning_rate, beta_1, beta_2, epsilon etc.
Additionally, if you want to be sure you can run the optimizer with your values 0.002 and 0.5 and then check the value of each parameter. it shows that beta_1 is indeed 0.5
You are so sweet. Thank you Sir, for these awesome videos!
SGD vs ADADELTA? If I only had those 2 choices.
adadelta
is that you beatthebush?
Ow my ears
😂 6:34
比助教講得好太多了
shouldnt db at Sdb be squared ?!
yes, you can see it between 2:43 and 2:47 added to the video but not consistently.
why do this-> m1 /(1-beta1) , m2/(1-beta2) ? ? this operation zoom it 10 times and 1000 times, reason? bias correction? what does it means?
Watch lectures C2W2L03, 04 and 05
Yay
You are my god.
what does db mean here? derivative of the bias?
yes
Derivative of bias w.r.t to Loss Function. See previous videos for conventional queries.
decibels
Wrong, it is the derivative of the loss function with respect to the bias.
@@LL-yf9kp loss function or cost function?
great explntion.Meed to watch again
I am watching it again
It would have been nice to see it work with real python code and real data. What instructors don’t understand is they all use different symbols and terminology. I also don’t like their scribbling. I have data I use to test. There are 5 different parameters. The ‘terrain is more like the grand canyon and not a bowl so the path is extremely narrow. Adagrad works best so far. I will try ADAM.
How can we use it for facial recognition
Thats an optimization algorithm used to train a machine learning model ! you need to refer to the Machine Learning course
Why do we need correction in momentum or rms using T elimination?
Watch Bias Correction videos before watching this.
It would be appreciated if teachers would revisit their videos and replace torturous live digital pen notes with elegant text and diagrams. Chalk on a board is fine, but digital pens are painful to endure.
Hello Abbass, I'm very sorry for the pain you are suffering.
it is a way of punishing those who do not understand the math but know how to code it..