47:09 Why is V(pimax,pi7)=2 and not 5, assuming agent will try to maximize his value while the opponent will act stochastically (ie. 0,2,5 as distributions)
Hi, I believe the agent try to maximize his value with the assumption that the opponent is a minimizer. It is like you do not know what your opponent next move but you will imagine your opponent is a minimizer and calculate the value for your opponent under that assumption. In that scenario, if my policy is pi_max, I always choose the second branch.
agent assuming the opponent will give him the min, so he choose the box with highest value, which is 1 in this case. But in fact the opponent is playing Stochastically, so the agent will get 2 instead of 1. Nothing to do with 5.
47:09 Why is V(pimax,pi7)=2 and not 5, assuming agent will try to maximize his value while the opponent will act stochastically (ie. 0,2,5 as distributions)
Hi, I believe the agent try to maximize his value with the assumption that the opponent is a minimizer. It is like you do not know what your opponent next move but you will imagine your opponent is a minimizer and calculate the value for your opponent under that assumption. In that scenario, if my policy is pi_max, I always choose the second branch.
agent assuming the opponent will give him the min, so he choose the box with highest value, which is 1 in this case. But in fact the opponent is playing Stochastically, so the agent will get 2 instead of 1. Nothing to do with 5.
Nice Lecture.
these algorithms looks cool in theory
Really good lecture series on reinforcement learning, good balance of math, theory, and actual implementation details!!!
The eval function is the same for the 2 player in chess ?
Not sure why this is having less view count, lectures are high quality and detailed.
@@parmoksha Reinforcement learning also quite popular bro