DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

Поделиться
HTML-код
  • Опубликовано: 4 фев 2025

Комментарии • 12

  • @rita9651
    @rita9651 3 года назад +7

    I like Diana's lectures, very detailed derivations, learned a lot. Thanks!

  • @evgeniyv1536
    @evgeniyv1536 3 года назад +1

    at 1:16:15 at the evaluation table for new policy q value for a1 (right action) for s0 should be 0 and not 0.9, i guess

    • @dohyun0047
      @dohyun0047 3 года назад +1

      the reason why it should be zero is because at state s0 with Greedy policy pie(k+1) agent never gets to terminate state. is this what your thinking? also it is my thought

  • @barbarajiqinwang7527
    @barbarajiqinwang7527 Год назад +1

    I still find the A in V_k+1 = AT*_vk very hard to understand, she mentions that, "the A means, we are going to do this iteration step at k approximately". Then what is A? is that an approximated function? an update operator? what is it?

  • @nasirasadov634
    @nasirasadov634 2 года назад

    I feel like Diana intentionally put a buzzer in the theorem proof part to get us to wake up :D

  • @kyouichilogpose8059
    @kyouichilogpose8059 2 года назад

    What is T* ?

    • @kyouichilogpose8059
      @kyouichilogpose8059 2 года назад

      i seeee its the value function

    • @vslaykovsky
      @vslaykovsky 2 года назад +1

      @@kyouichilogpose8059 It's actually an operator that acts in function space. Specifically T* is the optimality Bellman equation that was touched in her previous lecture

  • @clockwork6290
    @clockwork6290 4 месяца назад +1

    quite a bad teacher