Monte Carlo Tree Search (MCTS) Tutorial

Поделиться
HTML-код
  • Опубликовано: 17 дек 2024
  • НаукаНаука

Комментарии • 50

  • @srijan4622
    @srijan4622 4 года назад +18

    Best Concise intuition building video out there. Thank You

  • @kishorepv4924
    @kishorepv4924 7 лет назад +31

    Concise talk. Thank you.

  • @mikeouwen
    @mikeouwen 2 года назад +2

    Awesome explanation on MCTS for beginners! Thanks

  • @thawsitt
    @thawsitt 7 лет назад +8

    This is really helpful and easy to understand. Thank you!

  • @멜리사-j3w
    @멜리사-j3w 6 лет назад +9

    Thank you very much! This video game me better understanding in MCTS.

  • @nintishia
    @nintishia Месяц назад

    Concise, but well explained.

  • @418teapot9
    @418teapot9 3 года назад +3

    what happens if a selected node has only terminal next states?

  • @themodderg1067
    @themodderg1067 3 месяца назад +1

    wont n=0 nodes get always visited due to infinity in the selection function?

  • @ras4884
    @ras4884 Год назад +1

    As the search tree expands more and more, does it mean it is requiring us exponential memory also?

  • @KevinHorecka
    @KevinHorecka 7 лет назад +8

    Not sure I understand the Simulation step. Presumably each environment/game requires that step to be custom tailored to generate some outcome based on a game state. And for many environment/games, the outcome is dependent upon a long chain of steps (which, at this point, aren't executed). Does it just randomly pick steps after that until an end state is reached? That doesn't seem like it should work. Thanks for the talk!

    • @RaimonTarou
      @RaimonTarou 7 лет назад

      From my understanding it seems to be so, it will randomly pick steps until all resources are exhausted. For systems that have infinite resources, I have no idea, I guess the programmer would have to explicitly state the maximum amount of steps the simulated tree should take before it stops.

    • @aamir122a
      @aamir122a 7 лет назад +6

      Simulation means creating random states with the constrain of state of previous node. As we approach end game this state space will get smaller.

    • @roccococolombo2044
      @roccococolombo2044 6 лет назад +4

      For Go this would mean simulating random legal moves for both sides until the game ends and then registering the result of the game. But I could be wrong.

    • @DarrenSemotiuk
      @DarrenSemotiuk 6 лет назад +3

      jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/
      ^ this article is a good summary which also links to his Python code examples. Multiple games including Connect4 and Reversi.

    • @JO-vj9kn
      @JO-vj9kn 5 лет назад +3

      Selection and Expansion follow a "Tree Policy" which deterministically chose the nodes with the best values. Simulation follows a "Rollout Policy" which you can design yourself using tools of choice/heuristics. If it's selecting actions randomly you get a really robust but slow learning Tree Policy. If you replace it with a function approximator you don't even have to do full rollouts (the results of those are approximated) so faster learning Tree Policy but also more biased.

  • @larry1285
    @larry1285 5 лет назад +5

    great explanation for beginners, thank you so much

  • @iamsmart5255
    @iamsmart5255 2 года назад +1

    Great video but I still cannot understand how the simulation step works can anyone explain it for me please. Thanks in advance.

  • @0114mercury
    @0114mercury 3 года назад +2

    Great explanation of how the MCTS works. I have one question:
    At 10:34, what is meant by "after enough simulations the MCTS converges to look like values in the minimax"? Suppose minimax returns "1" for winning positions and "0" for loosing positions. Suppose for two moves "a" and "b" the MCTS returns the values "9990/10000" and "10/1000000" respectively. Does "9990/10000" correspond to "1" in the minimax tree? Should we choose "b" because it has more simulations?
    Or will MCTS never output such values?

    • @intertextualdialectic1769
      @intertextualdialectic1769 3 года назад +2

      Like he said the tree node with more simulations will be the one that is picked. A node that is clearly losing won’t have that many simulations either as shown by the formula. All he means by converging is that the algorithms will eventually agree on what the best move is and the second and third best etc. they will converge to the same move ranking from best to worst.

  • @pankaj_pundir
    @pankaj_pundir 4 года назад +1

    Does the root node need to be updated with statistics?

  • @Ch3tok
    @Ch3tok 4 года назад +6

    you are my father bro, i love you

  • @AkshayAradhya
    @AkshayAradhya 5 лет назад +4

    This was really clear and helpful thanks. But you also forgot to mention that MTCS simulations can be cut off after a certain depth

  • @Chekmate99
    @Chekmate99 2 месяца назад +1

    strange it took us until 2016 to figure this out when Joshua was doing this back in 1983

  • @hamedfarahani4281
    @hamedfarahani4281 Год назад

    Really helpful and easy to understand. Thank you!

  • @Dudlykoo
    @Dudlykoo 4 года назад +1

    In the first example I see each node is a 3*3 square, so does it mean the next move of AI is only limited in the 3*3 area? If we enlarge the square, it will increase the time complexity right?

    • @sieyk
      @sieyk 4 года назад +1

      For the minimax algorithm, yes. For MCTS it just reduces the accuracy if given the same computation time.

  • @ThePositiev3x
    @ThePositiev3x 5 лет назад +1

    One question. If there is no eval function, to be able to say "good or bad" after doing simulation, do we have to arrive at a final state so that we can say it's a win or loss?

  • @matthewzhao7507
    @matthewzhao7507 6 лет назад +1

    Awesome explanation. Thank you!

  • @egor10_4
    @egor10_4 7 лет назад +1

    very clear explanation, thanks!

  • @anmaraljanabi7137
    @anmaraljanabi7137 3 года назад

    Thank you very much for this wonderful explaination, keep it up.

  • @timwu3423
    @timwu3423 7 лет назад +1

    Very concise, thank u.

  • @saeidshamsaliei2536
    @saeidshamsaliei2536 3 года назад

    really helpful thank you very much

  • @mangomilkshakelol
    @mangomilkshakelol 5 лет назад +1

    Thank you!

  • @haejulee259
    @haejulee259 2 года назад

    I hope uct treesearch fomular was introduced...

  • @Jkauppa
    @Jkauppa 3 года назад

    assume you dont have any knowledge of the "promising" evaluation function, only the win/lose ending condition, you have to randomly put somehow, only recording the position placed, diff chains, game recordings

    • @Jkauppa
      @Jkauppa 3 года назад

      I conjecture that breadth first will solve the go non-problem, optimally, like the maze

    • @Jkauppa
      @Jkauppa 3 года назад

      problem not defined, program not defined, only win/lose

  • @lunique_rudd
    @lunique_rudd 2 года назад

    Thanks you for that

  • @weihe1680
    @weihe1680 6 лет назад

    thanks the tutorial.

  • @FaithTran-m2f
    @FaithTran-m2f 2 месяца назад

    Schowalter Vista

  • @CathernBonato-b5y
    @CathernBonato-b5y 2 месяца назад

    Prohaska Fords

  • @anonymousnobody5001
    @anonymousnobody5001 2 года назад +1

    Pyramid scheme

  • @user-qi5kb5th7y
    @user-qi5kb5th7y 2 года назад +1

    Lol, this dude seems to have no clue what he's talking about.

  • @fsgfaf
    @fsgfaf 6 лет назад +3

    Interesting video but the presenter does not know much about games and makes a number of inaccuracies.
    1. Kasparov beat Deep Blue in 1996 but presenter says the computer won
    2. Chess cannot be, and will never be, brute-forced; there are too many possible combinations
    3. It's not true that it took longer for a Go computer to beat the top human player (presenter says "world champion", but Go does not have official world championships like in chess), one can only say it happened later. But that was because there was not much interest in making a Go computer, whereas Deep Blue was the culmination of 40 years of making chess computers. Deep Blue was a significant investment for IBM, including one of the fastest supercomputers of its time, a top and large programming team, hiring of a full-time chess grandmaster, hiring several other grandmasters as consultants, etc. No one expended any resources into a Go computer until Deepmind. AlphaGo was also magnitudes faster than Deep Blue, so if talking about computing speed (presenter's reference to brute force), it applies much more so to AlphaGo. There certainly wasn't anything close to 40 years of development in Go computers.

    • @bitti1975
      @bitti1975 5 лет назад +4

      First two points are true, but the third one simply wrong. The Computer Go research is almost as long as for Chess, and it was a popular research theme, exactly because it couldn't be solved with the same approaches as in chess. The Ing foundation even offered a $1.4M dollar price till the year 2000 for anyone who could write a strong Go program, which went unclaimed (senseis.xmp.net/?IngPrize). So it's not like there was no incentive to invest many resources into the problem.

    • @sieyk
      @sieyk 4 года назад +3

      You're missing a lot of information. The method used for chess was only tractable because of intelligent pruning. For Go, you cannot prune intelligently using hand-made methods due to the state space being so much larger as well as Go not having a well-defined way to determine who's winning (outside of the obvious cases). Go required a whole new way of approaching the search space to become tractable. Consequently, the newer methods that are used for Go are also orders of magnitudes better at Chess.

    • @michaelharrington6698
      @michaelharrington6698 2 года назад +2

      Your point 3 is a gross misstatement. Chess got lots of development and funding because it was on the frontier of feasibility. Go was not. It actually had lots of CS departments chugging along at it for a very long time. The approaches to Chess utterly failed at Go and thus CS departments used it for a research problem.