He is the only guy for which I have to watch RUclips video at normal speed. The best thing about Andrej is that he gives a new perspective about the concepts which you already know very well.
Amazing lecture, thank you, Andrej! (I'm 4 years late but still hahaha). 00:00 Intro 01:05 Pong example high-level explanation (from his blog "Pong from pixels") 11:10 Q&A 19:13 Pong example detailed walk-through 33:10 Q&A Some notes: * He calls the input to sigmoid "logp" when it should actually be a logit and not log probability * Numbers on his chart, before the positive reward, are +0.27, +0.24, etc. but that should be +0.9, +0.81 instead (if gamma == 0.9). * Some minor typos like backpro prelu instead of backprop relu xD, etc.
I think it's meant to be logit_p, suggesting that it's a quantity that is going to be mapped to a probability by the sigmoid function. It's certainly confusing to name a variable like this. He even wrote the comment wrong, as he briefly mentioned in 3:09
How does it work for a continuous action space? For example if the steering angle is 23 degrees, and it's a succesfull episode, then there is no error between 23(label) and 23(output), and therefore there's no gradient
In a discrete one you can minimize the error between 0.7(probability of up) and 1 (up), i get that. But if it's a continuous action space i don't see how this works. You have to discretize and instead of one steering angle you have a probability value for each of the 360 degree angles?
He is the only guy for which I have to watch RUclips video at normal speed. The best thing about Andrej is that he gives a new perspective about the concepts which you already know very well.
Amazing lecture, thank you, Andrej! (I'm 4 years late but still hahaha).
00:00 Intro
01:05 Pong example high-level explanation (from his blog "Pong from pixels")
11:10 Q&A
19:13 Pong example detailed walk-through
33:10 Q&A
Some notes:
* He calls the input to sigmoid "logp" when it should actually be a logit and not log probability
* Numbers on his chart, before the positive reward, are +0.27, +0.24, etc. but that should be +0.9, +0.81 instead (if gamma == 0.9).
* Some minor typos like backpro prelu instead of backprop relu xD, etc.
Karpathy somehow manages to talk at 5x the speed of Mnih
Andrej is a superstar in the field. It's always a pleasure listening to Andrej.
This lecture is Gold
Andre karpathy is so humble that i want to be like him one day.Thanks for great lecture andre
And I use to think that I have understood Policy Gradient, nice lecture!
Best interpretation of Policy Gradients
Andrej should make money rolling out his own Khan academy-like business, his explanations are just brilliant!
totally, just incredible piece-by-piece explanation
What a lecture... blown away!
this is pure gold thank you so much !!
Policy gradient explanation couldn't be simpler than this..
great lecture and speak very well.
I may be dumb, but how is the np.dot(W2,h) is log probability ?it can be seen in 2:59
I think it's meant to be logit_p, suggesting that it's a quantity that is going to be mapped to a probability by the sigmoid function. It's certainly confusing to name a variable like this. He even wrote the comment wrong, as he briefly mentioned in 3:09
The input to the sigmoid function is often called "logit". See Hands on ML 2nd edition page 144
I wish there were two Kartpathy's, one for Computer Visio and another for RL.
maybe adding a moving average of the last predictions could make the moves more smoother and less trembling.
Ah and great lecture! :)
Why is the learning rate in the RMSProp formula -alpha but Andrej's code he uses +alpha?
This is fantastic.
just found out that he is the director of ai at tesla
Andrej simultaneously seems to want to answer questions, and stop answering questions as soon as possible.
great talk if 0.75
How does it work for a continuous action space? For example if the steering angle is 23 degrees, and it's a succesfull episode, then there is no error between 23(label) and 23(output), and therefore there's no gradient
In a discrete one you can minimize the error between 0.7(probability of up) and 1 (up), i get that. But if it's a continuous action space i don't see how this works. You have to discretize and instead of one steering angle you have a probability value for each of the 360 degree angles?
Can somebody give me the link for the code
Sheldon :D
Of course, using an abstract representation rather than raw pixels would be helpful. Pong can be looked at as a simple Newtonian physics environment.
Pepega
Pepega indeed my friend...