Размер видео: 1280 X 720853 X 480640 X 360
Показать панель управления
Автовоспроизведение
Автоповтор
I took the same course when I was a student at IIT Madras. Very glad to be re-watching these lectures to brush what I had learned. Thanks, NPTEL.
I have a doubt, if you consider a deterministic policy where only actions that are picked will have q(s,a) calculated. In such a case how do we behave greedily during improvement step as it requires to calculate q value for all state actions??
great explanation
Have you understood the math 19:25? If yes, please tell me the prerequisites to get it. Thanks.
@@deepaks.m.6709 Its just the definition of expectation
Thank you for your reply. Currently on function approximation 😀
I took the same course when I was a student at IIT Madras. Very glad to be re-watching these lectures to brush what I had learned. Thanks, NPTEL.
I have a doubt, if you consider a deterministic policy where only actions that are picked will have q(s,a) calculated. In such a case how do we behave greedily during improvement step as it requires to calculate q value for all state actions??
great explanation
Have you understood the math 19:25? If yes, please tell me the prerequisites to get it. Thanks.
@@deepaks.m.6709 Its just the definition of expectation
Thank you for your reply. Currently on function approximation 😀