the reason why it should be zero is because at state s0 with Greedy policy pie(k+1) agent never gets to terminate state. is this what your thinking? also it is my thought
I still find the A in V_k+1 = AT*_vk very hard to understand, she mentions that, "the A means, we are going to do this iteration step at k approximately". Then what is A? is that an approximated function? an update operator? what is it?
@@kyouichilogpose8059 It's actually an operator that acts in function space. Specifically T* is the optimality Bellman equation that was touched in her previous lecture
I like Diana's lectures, very detailed derivations, learned a lot. Thanks!
at 1:16:15 at the evaluation table for new policy q value for a1 (right action) for s0 should be 0 and not 0.9, i guess
the reason why it should be zero is because at state s0 with Greedy policy pie(k+1) agent never gets to terminate state. is this what your thinking? also it is my thought
I still find the A in V_k+1 = AT*_vk very hard to understand, she mentions that, "the A means, we are going to do this iteration step at k approximately". Then what is A? is that an approximated function? an update operator? what is it?
I feel like Diana intentionally put a buzzer in the theorem proof part to get us to wake up :D
hhha, that's true
What is T* ?
i seeee its the value function
@@kyouichilogpose8059 It's actually an operator that acts in function space. Specifically T* is the optimality Bellman equation that was touched in her previous lecture
quite a bad teacher