Not sure I understand the Simulation step. Presumably each environment/game requires that step to be custom tailored to generate some outcome based on a game state. And for many environment/games, the outcome is dependent upon a long chain of steps (which, at this point, aren't executed). Does it just randomly pick steps after that until an end state is reached? That doesn't seem like it should work. Thanks for the talk!
From my understanding it seems to be so, it will randomly pick steps until all resources are exhausted. For systems that have infinite resources, I have no idea, I guess the programmer would have to explicitly state the maximum amount of steps the simulated tree should take before it stops.
For Go this would mean simulating random legal moves for both sides until the game ends and then registering the result of the game. But I could be wrong.
jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ ^ this article is a good summary which also links to his Python code examples. Multiple games including Connect4 and Reversi.
Selection and Expansion follow a "Tree Policy" which deterministically chose the nodes with the best values. Simulation follows a "Rollout Policy" which you can design yourself using tools of choice/heuristics. If it's selecting actions randomly you get a really robust but slow learning Tree Policy. If you replace it with a function approximator you don't even have to do full rollouts (the results of those are approximated) so faster learning Tree Policy but also more biased.
Great explanation of how the MCTS works. I have one question: At 10:34, what is meant by "after enough simulations the MCTS converges to look like values in the minimax"? Suppose minimax returns "1" for winning positions and "0" for loosing positions. Suppose for two moves "a" and "b" the MCTS returns the values "9990/10000" and "10/1000000" respectively. Does "9990/10000" correspond to "1" in the minimax tree? Should we choose "b" because it has more simulations? Or will MCTS never output such values?
Like he said the tree node with more simulations will be the one that is picked. A node that is clearly losing won’t have that many simulations either as shown by the formula. All he means by converging is that the algorithms will eventually agree on what the best move is and the second and third best etc. they will converge to the same move ranking from best to worst.
In the first example I see each node is a 3*3 square, so does it mean the next move of AI is only limited in the 3*3 area? If we enlarge the square, it will increase the time complexity right?
One question. If there is no eval function, to be able to say "good or bad" after doing simulation, do we have to arrive at a final state so that we can say it's a win or loss?
assume you dont have any knowledge of the "promising" evaluation function, only the win/lose ending condition, you have to randomly put somehow, only recording the position placed, diff chains, game recordings
Interesting video but the presenter does not know much about games and makes a number of inaccuracies. 1. Kasparov beat Deep Blue in 1996 but presenter says the computer won 2. Chess cannot be, and will never be, brute-forced; there are too many possible combinations 3. It's not true that it took longer for a Go computer to beat the top human player (presenter says "world champion", but Go does not have official world championships like in chess), one can only say it happened later. But that was because there was not much interest in making a Go computer, whereas Deep Blue was the culmination of 40 years of making chess computers. Deep Blue was a significant investment for IBM, including one of the fastest supercomputers of its time, a top and large programming team, hiring of a full-time chess grandmaster, hiring several other grandmasters as consultants, etc. No one expended any resources into a Go computer until Deepmind. AlphaGo was also magnitudes faster than Deep Blue, so if talking about computing speed (presenter's reference to brute force), it applies much more so to AlphaGo. There certainly wasn't anything close to 40 years of development in Go computers.
First two points are true, but the third one simply wrong. The Computer Go research is almost as long as for Chess, and it was a popular research theme, exactly because it couldn't be solved with the same approaches as in chess. The Ing foundation even offered a $1.4M dollar price till the year 2000 for anyone who could write a strong Go program, which went unclaimed (senseis.xmp.net/?IngPrize). So it's not like there was no incentive to invest many resources into the problem.
You're missing a lot of information. The method used for chess was only tractable because of intelligent pruning. For Go, you cannot prune intelligently using hand-made methods due to the state space being so much larger as well as Go not having a well-defined way to determine who's winning (outside of the obvious cases). Go required a whole new way of approaching the search space to become tractable. Consequently, the newer methods that are used for Go are also orders of magnitudes better at Chess.
Your point 3 is a gross misstatement. Chess got lots of development and funding because it was on the frontier of feasibility. Go was not. It actually had lots of CS departments chugging along at it for a very long time. The approaches to Chess utterly failed at Go and thus CS departments used it for a research problem.
Best Concise intuition building video out there. Thank You
Concise talk. Thank you.
Awesome explanation on MCTS for beginners! Thanks
This is really helpful and easy to understand. Thank you!
Thank you very much! This video game me better understanding in MCTS.
Concise, but well explained.
what happens if a selected node has only terminal next states?
wont n=0 nodes get always visited due to infinity in the selection function?
As the search tree expands more and more, does it mean it is requiring us exponential memory also?
Not sure I understand the Simulation step. Presumably each environment/game requires that step to be custom tailored to generate some outcome based on a game state. And for many environment/games, the outcome is dependent upon a long chain of steps (which, at this point, aren't executed). Does it just randomly pick steps after that until an end state is reached? That doesn't seem like it should work. Thanks for the talk!
From my understanding it seems to be so, it will randomly pick steps until all resources are exhausted. For systems that have infinite resources, I have no idea, I guess the programmer would have to explicitly state the maximum amount of steps the simulated tree should take before it stops.
Simulation means creating random states with the constrain of state of previous node. As we approach end game this state space will get smaller.
For Go this would mean simulating random legal moves for both sides until the game ends and then registering the result of the game. But I could be wrong.
jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/
^ this article is a good summary which also links to his Python code examples. Multiple games including Connect4 and Reversi.
Selection and Expansion follow a "Tree Policy" which deterministically chose the nodes with the best values. Simulation follows a "Rollout Policy" which you can design yourself using tools of choice/heuristics. If it's selecting actions randomly you get a really robust but slow learning Tree Policy. If you replace it with a function approximator you don't even have to do full rollouts (the results of those are approximated) so faster learning Tree Policy but also more biased.
great explanation for beginners, thank you so much
Great video but I still cannot understand how the simulation step works can anyone explain it for me please. Thanks in advance.
Great explanation of how the MCTS works. I have one question:
At 10:34, what is meant by "after enough simulations the MCTS converges to look like values in the minimax"? Suppose minimax returns "1" for winning positions and "0" for loosing positions. Suppose for two moves "a" and "b" the MCTS returns the values "9990/10000" and "10/1000000" respectively. Does "9990/10000" correspond to "1" in the minimax tree? Should we choose "b" because it has more simulations?
Or will MCTS never output such values?
Like he said the tree node with more simulations will be the one that is picked. A node that is clearly losing won’t have that many simulations either as shown by the formula. All he means by converging is that the algorithms will eventually agree on what the best move is and the second and third best etc. they will converge to the same move ranking from best to worst.
Does the root node need to be updated with statistics?
you are my father bro, i love you
This was really clear and helpful thanks. But you also forgot to mention that MTCS simulations can be cut off after a certain depth
strange it took us until 2016 to figure this out when Joshua was doing this back in 1983
Really helpful and easy to understand. Thank you!
In the first example I see each node is a 3*3 square, so does it mean the next move of AI is only limited in the 3*3 area? If we enlarge the square, it will increase the time complexity right?
For the minimax algorithm, yes. For MCTS it just reduces the accuracy if given the same computation time.
One question. If there is no eval function, to be able to say "good or bad" after doing simulation, do we have to arrive at a final state so that we can say it's a win or loss?
Yep.
Awesome explanation. Thank you!
very clear explanation, thanks!
Thank you very much for this wonderful explaination, keep it up.
Very concise, thank u.
really helpful thank you very much
Thank you!
I hope uct treesearch fomular was introduced...
assume you dont have any knowledge of the "promising" evaluation function, only the win/lose ending condition, you have to randomly put somehow, only recording the position placed, diff chains, game recordings
I conjecture that breadth first will solve the go non-problem, optimally, like the maze
problem not defined, program not defined, only win/lose
Thanks you for that
thanks the tutorial.
Schowalter Vista
Prohaska Fords
Pyramid scheme
Lol, this dude seems to have no clue what he's talking about.
Interesting video but the presenter does not know much about games and makes a number of inaccuracies.
1. Kasparov beat Deep Blue in 1996 but presenter says the computer won
2. Chess cannot be, and will never be, brute-forced; there are too many possible combinations
3. It's not true that it took longer for a Go computer to beat the top human player (presenter says "world champion", but Go does not have official world championships like in chess), one can only say it happened later. But that was because there was not much interest in making a Go computer, whereas Deep Blue was the culmination of 40 years of making chess computers. Deep Blue was a significant investment for IBM, including one of the fastest supercomputers of its time, a top and large programming team, hiring of a full-time chess grandmaster, hiring several other grandmasters as consultants, etc. No one expended any resources into a Go computer until Deepmind. AlphaGo was also magnitudes faster than Deep Blue, so if talking about computing speed (presenter's reference to brute force), it applies much more so to AlphaGo. There certainly wasn't anything close to 40 years of development in Go computers.
First two points are true, but the third one simply wrong. The Computer Go research is almost as long as for Chess, and it was a popular research theme, exactly because it couldn't be solved with the same approaches as in chess. The Ing foundation even offered a $1.4M dollar price till the year 2000 for anyone who could write a strong Go program, which went unclaimed (senseis.xmp.net/?IngPrize). So it's not like there was no incentive to invest many resources into the problem.
You're missing a lot of information. The method used for chess was only tractable because of intelligent pruning. For Go, you cannot prune intelligently using hand-made methods due to the state space being so much larger as well as Go not having a well-defined way to determine who's winning (outside of the obvious cases). Go required a whole new way of approaching the search space to become tractable. Consequently, the newer methods that are used for Go are also orders of magnitudes better at Chess.
Your point 3 is a gross misstatement. Chess got lots of development and funding because it was on the frontier of feasibility. Go was not. It actually had lots of CS departments chugging along at it for a very long time. The approaches to Chess utterly failed at Go and thus CS departments used it for a research problem.