If I am not mistaken, because the blue node is the action of the opponent then you should not increase the amount of wins in the numerator and only the amount of times he/she played this move. 29:52
Yeah, it's a bit weird because i've seen both implementations online but only increasing the win count from the point of view of the player that made the move in a certain node yielded better results for me
In 28:49 I am not sure if that's really true in general. In Gelly and Silver 2007 they show that using better simulation policies in UCT does not necessarily translate to better final MCTS performance.
Yeah, the caveat he gave ("As long as it's not that much more expensive") is crucially important, because a heuristic based selection or rollout policy can be very detrimental - as shown in a number of papers (including recent work) - since heuristics take time that could have been used to process more rollouts, and heuristics may fail in complex scenarios where random exploration may succeed. Enhancements are certainly possible, but any approach should be well-tested.
Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution? The problem about MCTS is that it chooses the child node with the highest probability of having a solution. As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable. If the best solution happens to be in an unreachable leaf, MCTS will never find it. The problem is how you calculate the probabilities of the child node. There are formulas like UCT, but I don't know if such formulas are really correct. The probabilites need to change in such a way to allow MCTS to traverse every single node and every single leaf of the tree. I don't think UCT does that. That means some node and leaves are unreachable.
ruclips.net/video/xmImNoDc9Z4/видео.html as i understand we don't store node in simulation state because that node can be seen as a child of other node, and when you compute ucb, the result will be affected by the previous simulation => we lost the randomness So my question is: Am i understand it correctly? If yes, is this only correct with random strategy?
I doubt the person giving the lecture has ever written a chess or go program. His explanation of how to terminate the search was not good. Tic-Tac-Toe is a poor example problem. It is too simple. NIM would be a better example. I am surprised the algorithm uses a ln and sqrt function as these are time consuming. Is this all you need to know to use MCTS? The evaluation routine that is used to evaluate nodes is key. There is no point is search deep if you don't know what you are searching for or searching for the wrong thing.
The Ln and Sqrt functions come from good math theory with regards to probability of regret, what's the chance that the best option by chance had a few too many losses early on in the sampling and a sub optimal option (a trap perhaps) had by chance a few wins. The theoretical calculation for that is UCB, which includes ln and sqrt. You can try other non-expensive calculations but I think you will find that althrough you can run more simulations, the simulations will not be as effectively used.
Test out your theories here www.codingame.com/multiplayer/bot-programming/tic-tac-toe and get a really good insight into tradeoffs. Pure MCTS will do pretty well and you'll have a very hard time beating it. Those at the top of the leaderboard mostly use vanilla MCTS.
@@AntonPanchishin Thanks for the link. I wrote an Othello program back in 1980 and entered it in the first man machine Othello tournament in Evanston, IL. I met a lot of the Chess pioneers there. Later I wrote my own chess program and entered into USCF chess tournaments but I never got it to play better than I could. I only had a 386 computer. Also, real life got in the way.
Best lecture I found to understand MTCS!
wish it had better audio quality
If I am not mistaken, because the blue node is the action of the opponent then you should not increase the amount of wins in the numerator and only the amount of times he/she played this move. 29:52
Yeah, it's a bit weird because i've seen both implementations online but only increasing the win count from the point of view of the player that made the move in a certain node yielded better results for me
In 28:49 I am not sure if that's really true in general. In Gelly and Silver 2007 they show that using better simulation policies in UCT does not necessarily translate to better final MCTS performance.
Yeah, the caveat he gave ("As long as it's not that much more expensive") is crucially important, because a heuristic based selection or rollout policy can be very detrimental - as shown in a number of papers (including recent work) - since heuristics take time that could have been used to process more rollouts, and heuristics may fail in complex scenarios where random exploration may succeed. Enhancements are certainly possible, but any approach should be well-tested.
Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?
The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.
The problem is how you calculate the probabilities of the child node.
There are formulas like UCT, but I don't know if such formulas are really correct.
The probabilites need to change in such a way to allow MCTS to traverse every single node and every single leaf of the tree.
I don't think UCT does that.
That means some node and leaves are unreachable.
The branching factor of TicTactoe is nine at maximum ??? It's OK or not ?
Marque-page : 47:00
A nice presentation! Thank you guys :D
@Ramon Major yup, have been using InstaFlixxer for years myself :)
ruclips.net/video/xmImNoDc9Z4/видео.html as i understand we don't store node in simulation state because that node can be seen as a child of other node, and when you compute ucb, the result will be affected by the previous simulation => we lost the randomness
So my question is: Am i understand it correctly? If yes, is this only correct with random strategy?
32:30
interesting
I doubt the person giving the lecture has ever written a chess or go program. His explanation of how to terminate the search was not good. Tic-Tac-Toe is a poor example problem. It is too simple. NIM would be a better example.
I am surprised the algorithm uses a ln and sqrt function as these are time consuming.
Is this all you need to know to use MCTS?
The evaluation routine that is used to evaluate nodes is key. There is no point is search deep if you don't know what you are searching for or searching for the wrong thing.
The Ln and Sqrt functions come from good math theory with regards to probability of regret, what's the chance that the best option by chance had a few too many losses early on in the sampling and a sub optimal option (a trap perhaps) had by chance a few wins. The theoretical calculation for that is UCB, which includes ln and sqrt. You can try other non-expensive calculations but I think you will find that althrough you can run more simulations, the simulations will not be as effectively used.
Test out your theories here www.codingame.com/multiplayer/bot-programming/tic-tac-toe and get a really good insight into tradeoffs. Pure MCTS will do pretty well and you'll have a very hard time beating it. Those at the top of the leaderboard mostly use vanilla MCTS.
@@AntonPanchishin Thanks for the link. I wrote an Othello program back in 1980 and entered it in the first man machine Othello tournament in Evanston, IL. I met a lot of the Chess pioneers there. Later I wrote my own chess program and entered into USCF chess tournaments but I never got it to play better than I could. I only had a 386 computer. Also, real life got in the way.
live audiences are so gross. so much coughing