Размер видео: 1280 X 720853 X 480640 X 360
Показать панель управления
Автовоспроизведение
Автоповтор
It seems to me like for the case at 21:08 you could use GCSL to produce an interface between the position information and the control of the robot.
At 18:27, how is the policy update TRPO specifically? Isn't this just vanilla policy gradient/REINFORCE-type gradient with the cost as the negative of the reward?
It seems to me like for the case at 21:08 you could use GCSL to produce an interface between the position information and the control of the robot.
At 18:27, how is the policy update TRPO specifically? Isn't this just vanilla policy gradient/REINFORCE-type gradient with the cost as the negative of the reward?