I find it amusing that after building up strong math foundation behind policy gradient methods we end up with Cacla which is essentially a heuristic driven by some reasonable intuition.
Great presentation. Thanks! Here’s a question: It is stated in the presentation (time: 1:03:49) that the correlation between an action at time T+N and rewards which were obtained in earlier time T is zero (and the corresponding terms in the summation are deleted). Is this true? It seems to be based on the assumption that a reward does not depend on future actions. However, the policy gradient equation is about correlation, not causality, and correlation does not require causality. A reward at time T may be correlated with an action taken at time T, which may be correlated with state at time T+N which is correlated with action at time T+N. So, there might be a correlation between action at time T+N and reward at time T. Here’s another more intuitive example: An episode may contain multiple repeating identical or substantially similar states and actions before it is terminated (for instance, a game where the player and ball are in the same situation multiple times during an episode). Therefore, an action selected at time T+N is likely to be correlated and indicative of a reward obtained at earlier time T, if there is a repeating identical scenario at time T and time T+N. Will be grateful for deblurring this issue. Thanks!
Around 1:02:00 why don’t we take the rewards into account when computing p(tau) ? In my opinion we should take them into account, but hopefully since the probabilities P(R|S,A) do not depend on theta, they have not effect on the gradient so the equalities on the slide are true
hi! Could you please explain the ruclips.net/video/y3oqOjHilio/видео.html part? i think we should drop Expectation in the right side because we use sum everything up by probs. tell me if i'm wrong
Another great lecure! Every year this lecture gets more and more polished, thank you Hado and Deepmind! :)
I find it amusing that after building up strong math foundation behind policy gradient methods we end up with Cacla which is essentially a heuristic driven by some reasonable intuition.
Lol
Great presentation. Thanks! Here’s a question: It is stated in the presentation (time: 1:03:49) that the correlation between an action at time T+N and rewards which were obtained in earlier time T is zero (and the corresponding terms in the summation are deleted). Is this true? It seems to be based on the assumption that a reward does not depend on future actions. However, the policy gradient equation is about correlation, not causality, and correlation does not require causality. A reward at time T may be correlated with an action taken at time T, which may be correlated with state at time T+N which is correlated with action at time T+N. So, there might be a correlation between action at time T+N and reward at time T. Here’s another more intuitive example: An episode may contain multiple repeating identical or substantially similar states and actions before it is terminated (for instance, a game where the player and ball are in the same situation multiple times during an episode). Therefore, an action selected at time T+N is likely to be correlated and indicative of a reward obtained at earlier time T, if there is a repeating identical scenario at time T and time T+N. Will be grateful for deblurring this issue. Thanks!
you have to take the expectation, only after the expectation it is 0
Amazing lectures Hado!
Around 1:02:00 why don’t we take the rewards into account when computing p(tau) ? In my opinion we should take them into account, but hopefully since the probabilities P(R|S,A) do not depend on theta, they have not effect on the gradient so the equalities on the slide are true
Great lecture! Very clear and very understandable! Thanks so much:)
very useful, thank you
hi! Could you please explain the ruclips.net/video/y3oqOjHilio/видео.html part? i think we should drop Expectation in the right side because we use sum everything up by probs. tell me if i'm wrong
@a17benayad got it, thanks
Subtitles please!