Is there a typo at 10:01? Intuitively, it seems like the exponent of γ should (i - t) since, in current formulation, the reward terms will quickly go to 0 when t becomes large.
Here, s' is the next upcoming state and a' is the next action. Max Q(s',a') would be the max of all the Q values for the next action and state. In that equation, the left term is an estimated "actual" value of the future reward which is the sum of the current reward and the reward of the next best action.
Theta represents all of the weights of the neural network policy (pi), which is a network that takes as input the state (s_t) and outputs the likelihood of taking each action (a_t).
Clearly one of the best videos on the topic, the use of examples was really good.
😂
Unusually clear presentation; well done Alexander.
You are one of the best teacher i have ever seen..
Is there a typo at 10:01? Intuitively, it seems like the exponent of γ should (i - t) since, in current formulation, the reward terms will quickly go to 0 when t becomes large.
Yes, I think the coefficient of r[t] should be gamma^0 which is one here
in 34:35 how do I calculate the log-likelihood of the action given the state?
فوق العاده بود آقای امینی
What is max Q (s' , a' ) ? When i have a lot of future states and they are unknown , how can I destinate the max Q ( s' , a' ) ? 24:00
sample your network again with the new state.
Here, s' is the next upcoming state and a' is the next action. Max Q(s',a') would be the max of all the Q values for the next action and state. In that equation, the left term is an estimated "actual" value of the future reward which is the sum of the current reward and the reward of the next best action.
If your state space is uncountable or continuous, don't use Q models
Fantastic, very clear and concise. Great work!
At 36.02 does anyone know what theta is? is it a policy?
Theta represents all of the weights of the neural network policy (pi), which is a network that takes as input the state (s_t) and outputs the likelihood of taking each action (a_t).
Very well explained. How to get the slides? The link in the bio mentions coming soon!
Very good lecture. Just one moment, i not unrestand hot it policy createng (maby Alexander show it by laser stick, but it not showing in slides)
Great video. The whole series is very good
very good way of explaining
I love you guys!
Thank you all for these great videos. One thing I want to mention is that the audio volume is a little bit too low
Outstanding. Thank you.
can you also teach how to write code for it?
It was very clear and helpful.
Amazing video... Kind of Reinforcement Learning in a nutshell..
This is really good. Thank you!
Thank you so much guys.
Excellent tutorial indeed
Thank you!
What a really nice course!
Good course, thankyou
7:00
wow, thanks.
Dude it's awesome T^T
please increase sound level
WHAT?!
This is why terminator is so fake... The AI will learn not to miss a shot within 20mins.