AlphaGo - How AI mastered the hardest boardgame in history

Поделиться
HTML-код
  • Опубликовано: 31 дек 2024

Комментарии • 114

  • @kkkkjjjj4517
    @kkkkjjjj4517 7 лет назад +170

    7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.

    • @ArxivInsights
      @ArxivInsights  7 лет назад +87

      I checked the paper and you are indeed correct! The used MCTS doesn't always play out every thread until the very end of the game (they use value tresholds for early stopping) but I did misinterpret the meaning of the '1600 simulations', thanks for pointing this out!

    • @andreys7944
      @andreys7944 6 лет назад +2

      Do I get it correct: with average depth ~300 moves and no early stops that will be ~1600*300 network queries just for the first move?

    • @edeneden97
      @edeneden97 5 лет назад

      so it plays out and if he wins then those moves he picked are trained to be 1 and value 1? and if he losses everything 0?

    • @DiapaYY
      @DiapaYY 5 лет назад +1

      AGZ doesn't really use MCTS as it doesn't use rollouts, it doesn't play the game out to the end.

    • @philippheinrich7461
      @philippheinrich7461 5 лет назад

      So I am going down the search tree (based on an algorithm that takes exploration and other things into account), until I reach a leaf node.
      I put the leaf node position into my neural net and get a policy and an evaluation as a result. The policy adds new leaf nodes to my current position
      and the value function gives me an evaluation of my current position.
      Is this correct ?

  • @noranta4
    @noranta4 7 лет назад +55

    This is a valuable explanation, this channel is a great discovery

    • @ArxivInsights
      @ArxivInsights  7 лет назад +4

      noranta4 thanks man, just started two weeks ago ;) More vids coming up :p

  • @alonamaloh
    @alonamaloh 6 лет назад +19

    I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.

  • @SantoshGupta-jn1wn
    @SantoshGupta-jn1wn 7 лет назад +45

    You explanation skills are fantastic! I like how he has an outline at the begging of his video, very simple thing yet very effective when it comes to teaching a subject, yet so few educational videos do that.
    If I were to figure out the paper by myself, it would have taken me personally ~2x longer.
    Subscribed.

  • @shamimhussain396
    @shamimhussain396 6 лет назад +10

    We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...

  • @elishaishaal7958
    @elishaishaal7958 Год назад

    Thank you! This is the one of the clearest and most concise explanations of any paper I've found thus far.

  • @antonystringfellow5152
    @antonystringfellow5152 7 лет назад +3

    Clearest and most informative video I've seen on AlphaGo. Thanks!

  • @dankelly
    @dankelly 2 года назад

    Awesome explination! (And, you're greenscreen work looks great!)

  • @augustopertence2804
    @augustopertence2804 7 лет назад +7

    Best explanation I found about AlphaGo Zero

  • @clrajapaksha
    @clrajapaksha 6 лет назад +2

    You explained technical stuff very clearly. Thanks Arxiv Insights

  • @arijit07
    @arijit07 4 года назад

    This is the best video regarding Alpha GO paper. Just Amazing !!!

  • @Sl4ab
    @Sl4ab 3 года назад

    It's very clear, thank you! I can't wait to discover the other videos :)

  • @Hyrtsi
    @Hyrtsi 4 года назад +2

    Excellent explanation, thanks!! I'm going to make my own 9*9 alphago zero version

    • @JosiahPlett
      @JosiahPlett Месяц назад

      I couldn't find it on your GitHub friend

  • @AlessandroOrlandi83
    @AlessandroOrlandi83 5 лет назад

    Thank you for taking the time to explain it so well. Still difficult for me that I'm not familiar with the matter yet, but you did really a good job of showing it clearly!

  • @2000chinadragon
    @2000chinadragon 7 лет назад

    Fantastic explanation! Few people balance simplicity with thoroughness as well as you do.

    • @ArxivInsights
      @ArxivInsights  7 лет назад +1

      That's the goal indeed, thx for the feedback :)

  • @generichuman_
    @generichuman_ 2 года назад

    The part I don't understand, is how they dispense with rollout in MCTS. It seems like this is the only way to get a ground truth value ( by reaching a terminal state) which can then be propagated back up the chain. If you reach a non terminal state, you're back propagating a value from the policy network which won't have useful values until it can be trained on useful values from the tree search. It seems like it's pulling itself up by it's bootstraps. Is it the case that the true values come from the odd time that a simulation reaches a terminal state? Or am I missing something fundamental?

  • @curiousalchemist
    @curiousalchemist 7 лет назад +17

    Brilliant - thanks for this! Really enjoyed watching and I think it takes away all the right information from the paper.
    Just a quick point: is there any chance you could quieten down the background music for your next video? It was slightly distracting and I think it detracted a bit from your great explanation!
    Merry Christmas!

    • @ArxivInsights
      @ArxivInsights  7 лет назад +6

      Thanks a lot, great to hear :) And for the background music: I got the same feedback from a few different people! This was my first video (every other video you'll find on my channel has this fixed) :p

  • @tenacityisthekey
    @tenacityisthekey 3 года назад +1

    Does anybody know if the shape of the output layer changes for every phase of the game? In the video, he explains that the network produces probability distribution over possible moves and the number of possible moves is dynamic. Does that mean the output layer's dimension is also dynamic? If so, how is it achieved? Can anyone help me understand? Thanks!

    • @dshin83
      @dshin83 Год назад

      No, the output layer shape is static. You need to zero out the illegal moves from the output and then renormalize the probabilities to sum to 1.

  • @johnvonhorn2942
    @johnvonhorn2942 4 года назад +1

    Xander, you look like "The Hoff" (David Hasslehoff) and that's a great look!

  • @ericfeuilleaubois40
    @ericfeuilleaubois40 7 лет назад

    Damn great video! Carry on ! Makes it very easy to get into these advanced subjects :)

  • @zzewt
    @zzewt 10 месяцев назад

    This is cool, but after the third random jumpscare sound I couldn't pay attention to what you were saying--all I could think about was when the next one would be. Gave up halfway through since it was stressing me out

  • @ruanjiayang
    @ruanjiayang 4 года назад

    I am wondering what the output "policy vector" is like in the neural network

  • @RomeoKienzler
    @RomeoKienzler 5 лет назад

    7:27 u said "certain depth", did u mean "certain width"? btw. I'd say this is one of the very best channels on the DL topic I've ever seen! thanks so much!

  • @guitarchessplayer
    @guitarchessplayer 7 лет назад +2

    Thanks for the great explanation! Im still wondering how alpha go zero learns that certain moves are obviously bad like playin in the corner for example without playing a game till the end?

  • @SiavashFahimi
    @SiavashFahimi 5 лет назад +1

    Thank you, finally I found a good video on this paper.

  • @PALYGAP
    @PALYGAP 6 лет назад

    A little question on the AlphaGo Zero MCTS. The Monte-carlo aspect of the Alpha Zero MCTS seems to be gone AFAIK. Can't see random number or random choices in that MCTS. It seem to be replaced by the CNN calculating the probability of a board position leading to victory. What's your take on it ?

  • @Bippy55
    @Bippy55 2 года назад

    10 Nov 2022 - I just discovered this video and your channel. Fantastic explanation of granted a difficult subject to even tackle. Did you mention what kind of computer hardware the newest AlphaGo system uses? I assume it’s a mainframe of some type. Also, I wonder if the system can decide in advance to play a peaceful game or a highly combative game? I have seen games where they were very few prisoners taken off the board. Otherwise called the peaceful game. Still there is a winner nonetheless. Anyway bravo for an excellent video.

  • @joker_nomad
    @joker_nomad 7 лет назад

    This helps a lot for those who need insights of machine learning trends :)

  • @yolomein415
    @yolomein415 5 лет назад +1

    How is the value representation trained?

  • @HanhTangE
    @HanhTangE 5 лет назад

    Great summary of the paper! Thank you :)

  • @welcomeaioverlords
    @welcomeaioverlords 5 лет назад

    Excellent video, thanks for making it!

  • @siddarthc7091
    @siddarthc7091 Год назад

    the transition 'dhkk' hits hard

  • @columbus8myhw
    @columbus8myhw 7 лет назад +7

    You should open this up to community captioning

  • @Moonz97
    @Moonz97 6 лет назад

    Loving your channel!

  • @shafu0x
    @shafu0x 6 лет назад +1

    Thank you for this great explanation!

  • @railgunpat3170
    @railgunpat3170 5 лет назад

    wow, i see some mistakes and also I didn't watch to much of your videos, but i find this channel is definitely underrated

  • @matrixmoeniaclegacy
    @matrixmoeniaclegacy 5 лет назад

    Thank you for this valuable explanation!
    I just want to request, if you would like to highlight the parts of your images more, that you are talking about? E.g. in the diagrams, you show. This would make it easier to follow!

  • @Leibniz_28
    @Leibniz_28 5 лет назад

    Excellent explanation, thanks

  • @PesarTarofi
    @PesarTarofi 6 лет назад +3

    cant wait for this thing to perform in sc2

  • @alaad1009
    @alaad1009 Год назад

    Excellent video

  • @zee1645
    @zee1645 6 лет назад

    did u guys teach alphago how to beat security systems yet? and take over teh stock market and all nuclear launch codes?

  • @siskon912
    @siskon912 6 лет назад

    Great explanation. Thank you!

  • @thiru42
    @thiru42 5 лет назад +1

    The history (extra 7 layers) is also used to identify ko (kind of similar to threefold repetition in Chess)

  • @brahimelmssilha7234
    @brahimelmssilha7234 6 лет назад

    maaaan you are doing a great work keep up

  • @LaurentLaborde
    @LaurentLaborde 4 года назад

    i'm confused as your explanation contradict the points you mentionned in the introduction

  • @diracsea2774
    @diracsea2774 6 лет назад

    Excellent Presentation

  • @LOGICZOMBIE
    @LOGICZOMBIE 3 года назад

    GREAT WORK

  • @seleejanegaelebale9192
    @seleejanegaelebale9192 4 года назад

    Thanks for the impressive explanation, where can I learn the source code ?

  • @SreeramAjay
    @SreeramAjay 7 лет назад

    Wow, this is really a great explanation

  • @truehurukan
    @truehurukan 5 лет назад

    Thank you very much for the effort to educate the ignorants about the mechanisms linked to Alpha Go that is taken from a lot of ignorants like a "beast" like "terminator-like machine"... for me to simplify, I would say that the Go champion played versus 50000 professional go players -> no chance to win at all. As kasparov failed winning against 50000 human amateur players in this last decade.For me this is the massive parallel processes and recursive functions that beated the champion, technically beaten, this is definetly NOT intelligence but MASSIVELY PARALLEL processes clustered on thousands of CPU and GPU (floating point operations).

  • @bhargav7476
    @bhargav7476 3 года назад +1

    that's some giga chad jaw u have there

  • @timleonard2668
    @timleonard2668 6 лет назад

    Is it hard to implement this algorithm by myself? Could I create a super-human go Player on say a 7x7 board with just my laptop?
    How big could I make the board using just a normal laptop?

    • @ArxivInsights
      @ArxivInsights  6 лет назад

      There's a ton of open source implementations on GitHub: github.com/topics/alphago-zero but I know that many people are having issues reproducing the full strength of DeepMinds version. I don't know if the 'interesting game mechanics' of Go also emerge on a small board like 7x7 but I would guess that you can definitely train a decent model on a laptop for such a small game-state. Additionally, you could also apply the algorithm to chess, which has a much smaller branching factor so it's easier to train, although again I think in order to get decent results you would have to throw some cloud computing power in the mix :)

  • @XChen-te7hk
    @XChen-te7hk 6 лет назад

    7:37 "... to play about 1600 simulations for every single board evaluation ... " I have a question. How do they do this? Even if it's not 19*19*17, let's say it's just 19*19*2 around 700, there would be (2**700) "board evaluation"s (maybe less if there are some illegal board states, but not much less). How could they even just play one simulation for so many "board evaluation"s? Guess I'm missing something...

    • @ArxivInsights
      @ArxivInsights  6 лет назад +1

      To build out the search tree, potential actions are sampled from the policy network (both for white and black moves) (so this hugely constrains the tree rollout to the most-likely / best moves). And then they also do pruning according to the value network, so whenever a part of the search tree results in a very low chance of winning (below some threshold according to the value network) it is discarded and not explored further. Combining these two approaches they build out the search tree and finally decide what move to play at the top of the tree by looking at the average value estimates for each of the tree's branches.

  • @robertkinslow8953
    @robertkinslow8953 3 года назад

    Ok. So how do you play and what is the idea of it?

  • @arnavrawat9864
    @arnavrawat9864 6 лет назад

    What if instead of self training, the ai is trained with the data of matches of a previously trained alpha zero ai?

    • @ArxivInsights
      @ArxivInsights  6 лет назад

      You could use that to speed up training in the beginning for version 2.0, but eventually performance will saturate and you wont do better.. And if you're building a version 2.0 you're hoping to do better than 1.0, so bootstrapping on gameplay that is worse than what you want to achieve doenst really makes sense. Similarly AlphaGoZero got better than AlphaGo by NOT bootstrapping on human games...

  • @vornamenachname906
    @vornamenachname906 3 года назад

    11:39 two of four of your "very popular moves that stands for thousands of years" was dismissed by alphago after 50 hours of training.

  • @andresnet1827
    @andresnet1827 4 года назад

    Do Alphafold 2 when paper comes out)

  • @jakekim1357
    @jakekim1357 4 года назад

    yo this video is dope. it's super fire. Just letting u know i'm a dan player
    I want to know more about this i hope your my school teacher

  • @thekitchfamily
    @thekitchfamily 6 лет назад +2

    So not really AI, just number crunching using statistical analysis (montecarlo tree).

    • @ArxivInsights
      @ArxivInsights  6 лет назад +1

      Well, it uses deep neural nets (value estimate + policy net) + self-play training (Reinforcement Learning) to make the Monte Carlo Tree Search tractable on the exponentially scaling Go search space. So yes it's number crunching, but that's what AI is all about...

  • @keylllogdark
    @keylllogdark 5 лет назад +30

    my brain feels sexually abused after watching this video...

  • @funkyderrick3589
    @funkyderrick3589 4 года назад

    very nice video ! put a microphone closer to you so we don't have that annoying reverb please

  • @davidm.johnston8994
    @davidm.johnston8994 6 лет назад

    Interesting video :-)

  • @petercruz1688
    @petercruz1688 6 лет назад

    Danger Will Robinson!!!

  • @MilesBellas
    @MilesBellas 6 лет назад +71

    distracting music

  • @karFLY1
    @karFLY1 7 лет назад +1

    Keep on! It's great.

  • @uncledevin700
    @uncledevin700 6 лет назад

    It’s too difficult to explain how great of alphago to the people who don’t know how to play wei qi.

  • @confucamus3536
    @confucamus3536 6 лет назад

    so really it's just one big ass flow chart, if this then that

  • @IBMua
    @IBMua 6 лет назад

    Anybody knows wtf they use a move history for? Aside from obfuscating learning and computation by many many times over? Seems like nonesense.

    • @ArxivInsights
      @ArxivInsights  6 лет назад +1

      Ihor Menshykov yeah had the same thought at first. Apparently including the history let's the network learn a form of attention over the important/active parts of the game. But I agree that theoretically, it shouldn't really be necessary... See the Reddit Q&A for more details!

  • @dougdevine27
    @dougdevine27 6 лет назад +1

    Good info but you should consider losing those annoying and jarring scene transition guitar strums+kick drum sounds. They detract very much from the presentation.

    • @ArxivInsights
      @ArxivInsights  6 лет назад +1

      dougdevine27 haha very true, this was my first video :p I removed them in all my other content ;) Unfortunately, once uploaded RUclips doesn't let you change anything anymore..

  • @muschas1
    @muschas1 4 года назад

    well, basically like humans acquire skills ... from scratch.
    cool

  • @daaaniel21
    @daaaniel21 6 лет назад

    Epic👌👌👌👌

  • @these2menrgannadoit
    @these2menrgannadoit 5 лет назад

    *Guitar Noise*

  • @nightmareTomek
    @nightmareTomek 2 года назад

    Nice video. But your sound effects and music are VERY loud. Maybe normalize a bit?

  • @fyrerayne8882
    @fyrerayne8882 3 года назад

    🔥🧠🔥

  • @420_gunna
    @420_gunna 6 лет назад +1

    You're...gwern? What?

  • @RASKARZ34
    @RASKARZ34 5 лет назад

    +1 sub

  • @angloland4539
    @angloland4539 Год назад

    😊

  • @ophello
    @ophello 5 лет назад +1

    Get rid of that sound effect. It’s weird and jarring. Do a graphical transition instead.

  • @saifufuerte3349
    @saifufuerte3349 4 года назад

    You explain shi@ too complicated

  • @briandecker8403
    @briandecker8403 5 лет назад

    So big spreadsheet and not AI - got it.

  • @traumzeitstories
    @traumzeitstories 5 лет назад

    i am not getting a single thing out of this
    sry bad made