Control in Monte Carlo

Поделиться
HTML-код
  • Опубликовано: 12 ноя 2024

Комментарии • 6

  • @swat_katz_tbone
    @swat_katz_tbone 3 года назад +3

    I took the same course when I was a student at IIT Madras. Very glad to be re-watching these lectures to brush what I had learned. Thanks, NPTEL.

  • @bzzzzz1736
    @bzzzzz1736 Месяц назад

    I have a doubt, if you consider a deterministic policy where only actions that are picked will have q(s,a) calculated. In such a case how do we behave greedily during improvement step as it requires to calculate q value for all state actions??

  • @vaibhav4634
    @vaibhav4634 5 лет назад +1

    great explanation

    • @deepaks.m.6709
      @deepaks.m.6709 4 года назад

      Have you understood the math 19:25? If yes, please tell me the prerequisites to get it. Thanks.

    • @swat_katz_tbone
      @swat_katz_tbone 3 года назад

      @@deepaks.m.6709 Its just the definition of expectation

    • @deepaks.m.6709
      @deepaks.m.6709 3 года назад +1

      Thank you for your reply. Currently on function approximation 😀