Some timestamps: - Exercise 1 : Effect of discount (factor/rate) and noise : at 32:41 - Exercise 2 : Policy evaluation with stochastic policy : at 45:22 - Policy Improvement Idea at 49:21 to 50:10 and 52:55 to 54:12 - Infinite actions : exact methods barely ever work : at 54:25
@@SayanGHD I can't understand why there are only three choices of the actions except going west(2,3) and why the probability of going north and south are 0.1.
I think Juna made a good point, since it is more likely to include (2,3) and (3,2) as possible states, rather than considering walking into a wall and neglecting one possible move to (2,3). But still, I am not certain about it. EDIT: At 19:54 he explains it. 80% chance to go where you wanted to go, 10% right and left of said direction. So the robot will not go backwards. Therefore bumping into the walls as explained seems right.
how can I get the slides of this awsome bootcamp I still a student and I have been a while studying RL and getting the slides will help me more to refer directly to them when I forget something
Thank you for the lecture. But I don’t get how the valuation of V in policy iteration can be solved by linear system of equations. It looks like unknowns (i.e. V) are on both side of the equation so the equations are nonlinear
Except there is no Reinforcement Learning in this lecture, only Markov Decision Process solving for optimal policy by value/policy iteration. So no Reinforcement Learning, and certainly no Deep Reinforcement Learning. Reinforcement Learning is an approach to solve MDP without knowing the model. Here the model is known.
Bad lesson... So many Formulas with no hints what the terms all mean. From the Point "Policy Evaluation" I understood nothing anymore. Before I could follow because the graphs gave some understanding what its even about. But I dont even know what a policy is, and suddenly there are no Graphs, just plain formulas and unexplained termina. Started okay, but ended confusing.
Poor motivation in this lecture. The idea of using value iteration is in itself a lookback from achieving a goal. Hence the lookback is simply a step taken through an episodic path to determine which actions are best taken to achieve the goal one step back from the termination point. Now that gives rise to value iteration as the value is determined iteratively through the many steps to be taken to carve out the optimal path to be taken. Nevertheless a superb introduction!
Some timestamps:
- Exercise 1 : Effect of discount (factor/rate) and noise : at 32:41
- Exercise 2 : Policy evaluation with stochastic policy : at 45:22
- Policy Improvement Idea at 49:21 to 50:10 and 52:55 to 54:12
- Infinite actions : exact methods barely ever work : at 54:25
This RL bootcamp is incredible.
Starts at 1:00
Great lecture. If the questions asked are repeated for playback, it will make it even better
This guy is not messing around.
Great lecture. At 44:04 shouldn't the s in V^(pi)_(k-1) (s) be s'?
I wonder the same.. ^_^
Yes. The prime on the "s" is missing.
Yes, as it's the discounted value of the next/future state.
At time 45:13, in the update equations(last 2 on that slide) isn't s' should be in place of s in gamma*V(k-1)(^pi)(s) and gamma*V(^pi)(s) ?
Great lecture. Would be better if questions are repeated. We can only guess what the questions are.
i never thought a UFC fighter would be watching this. props bro
At 20:50, isn't the V*(3,3) supposed to be V*(2,3)?
Juna No, if you hit the wall, you stay at that state itself.
@@SayanGHD I can't understand why there are only three choices of the actions except going west(2,3) and why the probability of going north and south are 0.1.
I think Juna made a good point, since it is more likely to include (2,3) and (3,2) as possible states, rather than considering walking into a wall and neglecting one possible move to (2,3).
But still, I am not certain about it.
EDIT: At 19:54 he explains it. 80% chance to go where you wanted to go, 10% right and left of said direction. So the robot will not go backwards. Therefore bumping into the walls as explained seems right.
Awesome lectures! Anyone knows where to download the slides?
how can I get the slides of this awsome bootcamp
I still a student and I have been a while studying RL and getting the slides will help me more to refer directly to them when I forget something
Does anyone know if there are transcripts for these lectures ? I can't hear the student's questions especially
Thank you for the lecture. But I don’t get how the valuation of V in policy iteration can be solved by linear system of equations. It looks like unknowns (i.e. V) are on both side of the equation so the equations are nonlinear
53:30 poliception
Best lecture in Deep Reinforcement learning
Except there is no Reinforcement Learning in this lecture, only Markov Decision Process solving for optimal policy by value/policy iteration. So no Reinforcement Learning, and certainly no Deep Reinforcement Learning. Reinforcement Learning is an approach to solve MDP without knowing the model. Here the model is known.
Some more typos at various places. In the equation for policy iteration last term should contain S' and not S.
exactly
Awesome lecture!!!
Outstanding lecture. Very comparable to David Silver's lectures.
Which one would you recommend? This Bootcamp playlist or David Silver's lectures?
Thank you in advance!
@@volodscoi see this first then go to david silver
Awesome lecture
This is an excellent lecture
great video.
Very well taught lecture!
Could you put this series in a RUclips playlist, please?
Here is the summary: sites.google.com/view/deep-rl-bootcamp/lectures
Great lecture +1
24:41
exercise 1: 4123
nice lecture
말 개 빠르네 진짜
Bad lesson... So many Formulas with no hints what the terms all mean. From the Point "Policy Evaluation" I understood nothing anymore. Before I could follow because the graphs gave some understanding what its even about. But I dont even know what a policy is, and suddenly there are no Graphs, just plain formulas and unexplained termina. Started okay, but ended confusing.
this just means the bootcamp is not for you
7 min: policy is choosing an action
Poor motivation in this lecture. The idea of using value iteration is in itself a lookback from achieving a goal. Hence the lookback is simply a step taken through an episodic path to determine which actions are best taken to achieve the goal one step back from the termination point. Now that gives rise to value iteration as the value is determined iteratively through the many steps to be taken to carve out the optimal path to be taken.
Nevertheless a superb introduction!