Monte Carlo Tree Search

John Levine

Просмотров 150 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 ноя 2024

Комментарии • 231

@luiz00estilo 4 года назад ⁺²¹⁵
This is a ridiculously good explanation for MCTS. Keep teaching, you're really good at it
@tukuri91 7 лет назад ⁺¹²⁸
Best explanation for MCTS
@vnpikachu4627 4 года назад ⁺⁶⁰
The visualization make this algorithm so easy to understand! Thanks for your great work, professor!
@omriram6960 5 лет назад ⁺⁵³
I watched this vid like 10 times to make my monte carlo agent. thnx man
@johnlevine2909 5 лет назад ⁺¹⁰
You're very welcome!
@ross951 2 месяца назад ⁺²
You have a phenomenal talent for teaching. Clarity, pace, and conciseness was brilliant. Thanks.
@TusharPal93 2 года назад ⁺²⁴
Honestly, the best explanation for MCTS so far. This helped greatly, thank you ☺
@RickeyBowers 4 года назад ⁺¹
After reading about this method, the rollout was confusing. The verbosity of your example completely eliminates that confusion. Thank you.
@ksm-180 2 месяца назад ⁺¹
Why algorithm tries zero visiting first -> ∞ explanation very helped me. Nobody accented on this so simple. Thanks
@puturavindrawiguna3025 2 года назад ⁺¹²
Oh my god, this video is so great, the explanation is so clear, all with flowchart , pseudocode, visualization. Thank you for providing this wonderful learning material
@tommatensalatxd565 2 года назад ⁺⁴
Thank you for your explanation. Your video was the only source where I could understand this algorithm. It really helps writing my seminar work
@johnlevine2909 2 года назад ⁺²
You're very welcome! Glad it was of use.
@kitgary 3 года назад ⁺²
This is the best step by step explanation of MTCS! You save my day! Thanks!
@lucaslealdasilva8019 28 дней назад ⁺¹
Amazing class, thank you so much for it! You made a very understandable class in under 16 minutes, that's impressive.
@vaibhavsingh2898 Год назад ⁺¹
I came here after watching many vids on MCTS and none of the videos were able to make me understand the concept but this video is the best
@ThePositiev3x 4 года назад ⁺⁸
At the beginning, your root node's visit number is 0, but you expand it instead of rollout. I know it is the right way but then you should alter your algorithm. At the left board you said "rollout if the ni value is 0"
You can do this by saying the root node has the visit value 1 from the start.
@anthonyzavala7173 Год назад
Don't you expand and then rollout just like he did, meaning that the visits would be 0.
@anthonyapm 7 месяцев назад ⁺¹
Beautifully explained
@TheAIEpiphany 3 года назад ⁺¹³
5:25 - It's not the smartest idea to take them in numerical order it will somewhat bias the algo, I think a better policy would be a random uniform one or to put some priors over the actions using some heuristic or better yet an ML model (maybe a policy dnn like in the case of AlphaGo) in case you know some specifics of the problem you're trying to solve with MCTS.
It's also maybe worth mentioning that the expansion threshold doesn't necessarily have to be 1, it's the hyperparam of MCTS. So maybe you'll have to visit the node 40 times before expanding it.
Finally in order to pick the best action at the end sometimes you don't want to choose it looking at the average reward of the child states but simply take the visit count - but I guess this is problem specific.
Loved the walk-through, thank you John! Great pedagogical methods you've got.
@IqweoR 2 года назад ⁺¹
Thanks for the insights!
@monkey_see_monkey_do 4 года назад ⁺⁵
Sir, thank you so much for the explanations. Even dumb like me now feels like I will now apply MCTS to my tictactoe game! If teachers like you taught me when I was a kid my life would definitely be different. Just can't thank enough for this lecture!
@akhilezai 3 года назад ⁺¹
One of the best explanations of MCTS
@vaibhavsingh8715 3 года назад
By far, the simplest explanation of MCTS. Thank you so much.
@TheDaAndy 7 лет назад ⁺³
Thank you for the video, you explained very well the single steps. That helps me a lot, most of all the expansion step was the one I didn't understand until now.
@johnlevine2909 7 лет назад ⁺⁵
You're very welcome! It took me a while to work out how the expansion step works, since the standard explanation offered is a bit unclear on this point.
@florianhamel17 5 лет назад ⁺⁶
Thank you so much, because this is the first source I find that gives a true example.
So once again, thank you so much. I think I'll now be able to implement it in C for my connect 4 :)
@LosSebosFR 5 месяцев назад ⁺²
Thank you so much ! The explanations are very clear, you make it sound like it's easy! What's with the human calculator by the way ?
@yinghaohu8784 Месяц назад
You did a great job. explain the process clearly.
@jarayu6494 6 лет назад ⁺⁴
What you did is admirable sir, great thanks for the explanation. Saved my day.
@angelogross8390 5 лет назад ⁺¹
Great explanation of the algorithm, showing all the elements which are needed to get a basic understanding of MCTS. Thank you.
@MrJustinRobertson 3 года назад
By far the best explanation I've seen. Thank you.
@pedrosmmc 5 лет назад ⁺¹
Didn't understand why after backpropagation on S5, S0 became 44 if it was 30 and S2 = 24. I thought we should add S0 = 30 plus S2 = 24 so S0 would be 54!
By the way, best explanation EVER! Thanks a lot for sharing.
@morelfotsing2221 5 лет назад
Pedro Coelho I believe we just backpropagate the value we obtained after the rollout. So if we got 14, we just add that 4 to each of the parents all the way to the root.
@cheyneravenscroft2945 4 года назад
Incredibly clear, after just one watch i now understand MCTS
@mortenbrodersen8664 3 года назад
Great video. Explaining things clearly is hard. You make it look easy John.
@darmilatron Год назад
Thank you for such a clear explanation of MCTS.
@taeyonkim 7 лет назад ⁺³
Great video. It was well explained and easy to follow. I'd love to see other videos related to deep learning if you have them.
@kate-145 Год назад
Thank you so much for the motivation this gives me to not give up, its a lot easier to believe in myself after understanding these concepts!
@ianyang8228 Год назад
The best explanation I can find.
@nikokolas1138 3 года назад ⁺¹
Your explanations are much better than the Wikipedia page. Congrats! Thought that MCTS was only applicable to games in which you have an opponent, glad to see it is not the case and will apply it to my pb. Thanks!
@Vatruha666 4 года назад
The best explanation of MCTS ever. And look like the Wikipedia article is not correct.
@robertmcvickers5975 3 года назад ⁺¹
Wow, this is liquid gold. Thanks John.
Nice to hear from a fellow brit and not MIT OCW! Big up America too tho..
@durgaharish1993 7 лет назад ⁺³²
Thanks a lot for the lecture, its was really helpful :)
@johnlevine2909 7 лет назад ⁺¹¹
You're very welcome!
@AppleFreak1976 2 месяца назад ⁺²
Great video! At 15:03, is t_0 supposed to be 54 not 44?
@cmulation5460 3 года назад
Thank you so much! I'm working on my FYP and this is the best explanation of MCTS out there!
@wangshiyao 3 года назад
Thank you Prof! The worked example makes the absolute difference!
@tljstewart 2 года назад ⁺¹
This is pure gold! Thank you infinitely
@MegaHoegaarden 6 лет назад ⁺⁶
The best explanation. Thank you!
@AlessandroOrlandi83 4 года назад
Thanks the best explanation of MCTS. I wish to understand how it would work with a game with multiple players.
@tolis947 4 года назад ⁺¹
First of all, thank you for the excellent lecture and explanation! The algorithm is very clear and understandable from the way it's presented.
My question is: In the end you mention that what this gives us, is that we should take action a2 over a1. Why can't we take it a step further and also infer that, for example a5 is better than a6 (assuming we have visited them both more times than what is presented in the example). Why do we only use a whole search tree for just one decision, and then scrap it and rebuild it anew, instead of making more sequential decisions using the same tree?
@jiangfenglin4359 4 года назад
Your videos are so helpful. Thank you for posting these online!
@aorusaki 4 года назад
Very good presentation. Thank you! I really needed clarification on whether to increment visit count BEFORE or AFTER backpropagation. Thanks for this :)
@johnr3936 2 года назад
Truly incredible explanation.
@AnsonSavage 11 дней назад
This is great, thank you! But why is the value found at the terminal node *added* to the nodes above? Is that simply so we can get the average expected value of a certain state by dividing by the number of times it's been visited?
@The_Gingiraffe 4 года назад
Thank you! Explained it better than my professor ever did
@andrewminhnguyen9446 3 года назад
What a great explanation! I've heard other explanations, but this is the condensed one I'll refer to in the future!
@indieelectroworks 4 года назад
Very clear explanation of the algorithm and the video is also well made! Thank you!
@sc0820 3 года назад
The content is so exciting even the camera shares the feeling.
@marionmeyers9836 7 лет назад ⁺⁷
Thank you so much for this video, really helped me a lot.
I just have one question : when expanding, why do you only roll out from one of the children? Wouldn't it give a more accurate score for the parent if we simulate all its children?
@johnlevine2909 7 лет назад ⁺²⁹
Good question. In general, MCTS is trying to make the best decision it can from the limited number of rollouts that it can make in the time available. Therefore, every time it does a rollout, it updates the whole tree and then uses this updated tree too decide where to sample next - it could be that sampling only a single child will tell us that the node above is a poor choice, and so we will start sampling elsewhere in the tree.
@sahhaf1234 3 года назад
a monument to clarity. thank you very much.
@Gornak40 3 года назад ⁺¹
Very nice explanation! Thank you!
I have a question: what should I do if I am in the leaf node and I should extend it, but the state of this node is terminal?
@arbutusunedo6866 4 года назад
Best possible explanation...
@micahwang4091 2 года назад
Very clear lecture. Great explanation! Thank you!
@sultanahmed5646 4 года назад
Thanks a lot for sharing this video Sir, It was really very help full.
@WahranRai 2 года назад ⁺¹
Could you explain what is behind the rollout in your example (simulation). Take an example like tic tac toe
@justinrobertson6241 3 года назад ⁺¹
According to your flowchart, shouldn't you do a rollout immediately from S0 since it is a leaf node and it has never been visited?
@vanderkarl3927 3 года назад ⁺³
In more specific applications such as chess, during the rollout phase can/should you weight the possibile actions by guesses for how good they are (like moves that take a piece, put the opponent in check, etc) instead of doing it purely randomly? Would that improve the algorithm on average?
@ToadalChaos 2 года назад ⁺¹
Yes. Indeed, most advanced implementations of MCTS have some form of heuristics to favour "good" moves during the rollour. AlphaGo for instance used neural networks both to favor good moves during the rollouts and to estimate the terminal value at the end of the rollout (since playing the rollout to the end would take too long).
@gianlucasperti4159 Год назад
Sorry I have a question, when we do a rollout from a leaf node, we have to start backpropagating the value from the leaf to the radix or from where we started the backpropagation to the radix?
@Galakyllz Год назад
I really appreciate the algorithm trace.
@hcm9999 Месяц назад
Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?
The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.
@zorankonkoli2050 3 года назад
great explanation! thx! a quick question about how the algorithm is constructed. the leaf nodes: the algorithm is constructed to rollout on the first visit, on the second visit to expand the tree & rollout, and on any subsequent visit to just traverse using UCB1 formula. is my understanding correct?
@dashadower 7 лет назад ⁺²
I've been searching for ages for a proper explanation on MCTS, and this is the best of all. Thanks for taking the time to make this! Just one question, however; how did you get the value of the leaf modes on rollout? I see that you randomized until terminal state. Do you use an evaluation function there?
@johnlevine2909 7 лет назад ⁺²
The idea is that the rollout should end in a state that really is terminal - the end of the game. For example, if we're playing tic-tac-toe, the terminal state is a win for one player or a draw because there are no moves left. In my example run, I'm imagining a game with a more sophisticated scoring system than tic-tac-toe, so the score that I'm using is the reward of the terminal state to the player that's making the decision. Hope this helps!
@dashadower 7 лет назад
John Levine Thanks for the reply. However in the expansion phase, we're adding possible moves we can make, but before that, how can we simulate where the enemy will place his move?
@dashadower 7 лет назад ⁺¹
John Levine If I elaborate: in the expansion phase, our current state is the state where the AI placed the stone. In a turn based game, the nodes added to S2 in the example in the video should be the opponent's moves. How do you simulate that?
@johnlevine2909 7 лет назад ⁺⁷
In a 2-player game the opponent will choose moves that keep that minimise the first player's score, while also choosing moves that keep the exploration term high. So the easiest way to do this is to put a negative sign in front of the first term, and then return the move that gives the maximum value for this new formula.
@generichuman_ 2 года назад
Hey, I want to learn monte carlo tree search, but I also want to feel like I'm going deaf in one ear sporadically... this video ---> "I got you"
@martinkunev9911 2 года назад ⁺¹
It's a little weird that we don't account for the first rollout of S1 in its children (we can't because we don't keep the path visited). It looks like once we get to a child of S1 we "forget" that first rollout.
@TheRandomPerson49 20 дней назад
Thank you sir for the great explaination !
@AkshayAradhya 5 лет назад
A really cool explanation of MCTS. Just some pointers, I never really understood what V_i stood for(understood it towards the end of the video), would have been better if you replaced it with simply t/n in the formula
@dipankarpurecha5564 Год назад
Hello, thank you for the explanation, it made things a lot clearer. I do have one question, do we save the path taken during the rollouts in order to ensure that during subsequent rollouts, we don't take the same path.
@datnade2299 Год назад
Rollouts are discarded and only their outcome affects following iterations (source DOI: 10.1109/TCIAIG.2012.2186810, p5-9)
@marksuper98 3 года назад
Why is there the logarithm and the square root in the UCB formula? What would have changed if there had been the N/n_i directly?
@gorkemvids4839 2 года назад
Amazing explanation. Thank you. But i'm not sure how to get terminal value of leaf state. Is it average score you get after you do random actions from this point?
@hazemtarek1302 6 лет назад ⁺¹
on what basis you give a value to the unvisited nodes or you just generate them randomly?
@zakariataouil2780 Год назад
Straight forward explanation
@brendongong7295 7 месяцев назад
amazing walkthrough, thank you
@malinyamato2291 3 года назад
thank u for helping me in my AI studies... ure great.
@animewatcher-bk9ur 10 месяцев назад
6:16 how do we compute the value on the terminal state ? (for instance, you said : v = 20). I'm trying to apply MCTS on easy case like tic-tac--toe ^^ .
@xXUkanlosXx 4 года назад
Thank you very much! My game of othello is fully working now! :)
@vinayprakash1687 4 года назад
According to the flow chat, shouldn't the start node itself be considered as leaf node at the start?
@sqliu9489 2 года назад
Super clear explanation!!! 🌟🌟🌟🌟🌟
@GanymedeCroft 2 года назад
Thank you for the explanation. It was clear!
@CSRocks 7 лет назад ⁺¹
Very good and easy to follow explanation!
@johnlevine2909 7 лет назад
Thank you! Glad you liked it.
@davidsewell6723 7 лет назад
Great video however unless I am mistaken the t (reward) value for s_0 should be (24 + 20)/2 the average value of the child nodes of the tree.
@aryangod2003 2 года назад
Question the professor says that after the algorithm has run out of iterations and budgets Pick the chidl with the highest scores: most textbooks I consulted says pick the child with the highest number of VISITS..because picking the one with the highest score leads to bias.
@bentupper4614 7 лет назад
Great explanation. You are slow and clear. It might be nice to mount the camera. Question: What is the point of back propagating the t value of the initial state (S-zero)?
@johnlevine2909 7 лет назад ⁺³
Thanks Ben! This was our first video - we mounted the camera for the next one and then used an external microphone, which helped too. You're right that there's no real point in calculating the score of S0, except to find out what the estimated value of the game is to the agent - if you have a choice to play or not to play, you might decline if the estimated value is negative. But if you have to play anyway, you can omit this calculation.
@sethnielsen3325 4 года назад
@@johnlevine2909 If I may, I will offer a practical take on the question of propagating all the way up. From pure intuition I would imagine the backpropagating code to be cleaner if it is allowed to propagate all the way up to the root node. Sure it's useless in most cases, but I imagine we can skip a subjectively ugly if-statement checking if current progagation is root (and practically: computational step). Im thinking simple as
propagate(int/float/whatev t) => { this.n++; this.t += t; If parent not null then parent.propagate(t); }
@joaquinvanloon7341 7 лет назад ⁺³
Great video! Helped me a lot, thanks.
@piotr780 Год назад
If S2 is better over S1, should we choose S3 or S4? Is there rule for that in MCTS algorithm or it differs between implementations?
@charlesphillips1963 3 года назад
life saver , great run through
@DennisRhodes-c9q 4 месяца назад
very helpful video! but.. i'm still confused when i consider stochastic games such as klondike. in those games, one action doesn't correspond to only single next state. am i missing or misunderstanding? anyone can help me?
@yuanhuang2489 7 лет назад ⁺²
Great lecture and vivid example! Thanks a lot! ^_^
@MastersofTrickshots 7 лет назад ⁺²
Thanks for the great explanation! I have one question though: If we now decide after four iterations to choose action a_2 as you showed in the video, then the next four iterations would start in s_2 to determine the next action. For this, do we use the results of the calculations from previous iterations or do we "throw everything away" and start with all values initialized to 0?
@johnlevine2909 7 лет назад ⁺⁹
Let's say we do select a_2 for execution. So we perform a_2 and go into state s_2, as you say. s_2 now becomes the new root node, and we need to decide what action we should take next. As you also say, we already have some results stored in the tree which are useful to us, so it makes sense to start the next iteration using the sub-tree with s_2 as the root node.
@lialkalo4093 4 года назад
Very nice explanation, but I still don't get what the rollout phase does. Is it a continuation of choosing random states until the algorithm hits some terminal criterion (with the subsequent evaluation)?
@AGCrush 3 года назад
Great Explanation!! Thank you alot!! Keep the good work going!
@muhammadshah1557 Год назад
Whats the difference between leaf node and terminal state?
@poshsagar 7 лет назад ⁺¹
Really great lecture. Continue the good work :)
@goddoes8 7 лет назад ⁺²
Thank you for the great lecture.
@zoc2 6 месяцев назад
John levine and josh starmer are carrying me so hard my feet aren't even touching the ground
@abdolvakilfazli2488 5 лет назад ⁺¹
why the value of V after roullout of the S1, is 20 , while this value in case of S2 is 10? Where are they comming from?
@red0pineapple 5 лет назад
They are seemingly arbitrary values for a terminal state in this setting. It might make more sense to relate this to chess, where the value of such a rollout is either +1 (win), 0 (draw), or -1 (loss), which is then propagated upwards.
@DNFrozen1 4 года назад
I have a question about the UCB1 formula what excacly is N? ist it the number of visits in the parent node or is it the number of visits in the root node?
the root node was alsways the parent node for all calculated examples so i'm not 100% sure about that

Следующие

Автовоспроизведение