Deepmind AlphaZero - Mastering Games Without Human Knowledge

Поделиться
HTML-код
  • Опубликовано: 1 дек 2024

Комментарии • 214

  • @palfers1
    @palfers1 6 лет назад +47

    The best exposition I've seen to date on what promises to be an AGI

    • @sucim
      @sucim 6 лет назад +7

      maybe the closest we got but its nevertheless soo utterly far from that

    • @confucamus3536
      @confucamus3536 6 лет назад +2

      it doesn't know "why" yet

    • @casewhite5048
      @casewhite5048 5 лет назад

      @@confucamus3536 or who

    • @MrCmon113
      @MrCmon113 5 лет назад

      Not really. Boardgames are very easy to understand. We can write down all of the rules in minutes.
      An AGI would have to understand much more complicated and uncertain scenarios all at the same time and fit them into a universal theory of action. We really have no clue how to do that other than to rebuild a brain.

    • @Xpertman213
      @Xpertman213 3 года назад

      @@MrCmon113 In a way though, can't you write down the rules of life in a few minutes? Find food, overcome/avoid danger, reproduce. I think the biggest thing we have learned so far is that intelligence isn't 'built' but rather that it 'grows' into the environment that surrounds it. A philosopher I respect describes organism/environment as one continuous field from which intelligence emerges.
      It gets me wondering whether the whole realm of human experience that we consider meaningful is largely a 'bug', an example of an organism's systems growing so well into the environment that these systems end up 'inventing' new problems that don't actually exist because they can't be conveniently turned off.

  • @SafeTrucking
    @SafeTrucking 6 лет назад +8

    Thank you for an excellent explanation. I'm looking forward to seeing where this leads.

  • @kayrosis5523
    @kayrosis5523 6 лет назад +32

    I'd love to see this in more complex and open ended computer games. If you tell AlphaZero to play Cities Skylines and maximize the population and add secondary constraints like environmental quality and rci balance, I wonder what it would come up with

  • @peters972
    @peters972 4 года назад +11

    I wonder if you could use this to analyze where a kid is going wrong in his math understanding for example, as a tool to teach kids math. It could pinpoint the area of confusion and help the kid bridge that and gain insight by providing simpler examples.

  • @duskie69
    @duskie69 6 лет назад +83

    Yea, but can it perform on a cold wet night in Stoke....

  • @PaytonTroy
    @PaytonTroy 6 лет назад +13

    Just a thought. Has it been contemplated what would happen if we could teach AlphaGoZero to teach humans play Go? How would that develop and what kind of players (who had never played GO) would that produce. And what would happen the day you put a traditional human player up against an AlphaGoZero player. I find very interesting.

    • @thezyreick4289
      @thezyreick4289 5 лет назад +1

      that would be a truly beneficial purpose for an AI. To learn and master a certain skill, then pass on the knowledge of the best methods it has found as a master of that skill to human recipients.

    • @sarveshpadav2881
      @sarveshpadav2881 10 месяцев назад

      @@thezyreick4289 AI sensei lol

    • @kephalopod3054
      @kephalopod3054 Месяц назад

      Here are the trillions of weights of my neural net, good luck!

  • @richiester
    @richiester 6 лет назад +9

    And at this point Stockfish resigned the game

  • @forestpepper3621
    @forestpepper3621 6 лет назад +13

    If you look at the three graphs at 30:25, you'll notice "jumps" in all three curves. At a jump, from left to right, the curve starts to level off, and then abruptly shoots up nearly vertically again, the slope changing quite suddenly. There must be some significance to these jumps. Perhaps the algorithm has suddenly discovered a particularly effective heuristic for evaluating board positions, or the algorithm actually is developing something like human "insight" or "intuition" at these jumps.

    • @Dirtfire
      @Dirtfire 6 лет назад +1

      Machine Epiphany, perhaps?

    • @Dirtfire
      @Dirtfire 6 лет назад

      Haiku Shogi You should look up "3d creature evolution" on youtube and check out the end-over-end worm, among other evolved creatures. Amazing stuff.

    • @PHeMoX
      @PHeMoX 6 лет назад

      "Perhaps the algorithm has suddenly discovered a particularly effective heuristic for evaluating board positions, or the algorithm actually is developing something like human "insight" or "intuition" at these jumps."
      Probably merely a side effect of when a certain variation of the algoritms takes longer to fail. Doesn't translate to an epiphany at all.

    • @yoloswaggins2161
      @yoloswaggins2161 6 лет назад +1

      There's an extremely simple answer to this. It's when they lower the learning rates, the lr schedule is in the paper.

  • @bunnygummybear9638
    @bunnygummybear9638 6 лет назад +22

    I hope they release the remaining 90 games of alpha zero and stockfish 8

    • @robiniekiller45
      @robiniekiller45 6 лет назад +7

      bunny gummybear hope they do 100 new games with stockfish 9 with proper speed and release it

    • @Cube930
      @Cube930 6 лет назад

      Robinie killer az isnt doing chess anymore. There is a new ai called leela chess zero tho, and you can help it train. Still not quite az level i think.

    • @robiniekiller45
      @robiniekiller45 6 лет назад

      Cube930 yes its nice

    • @MrMartinpro
      @MrMartinpro 6 лет назад +1

      They released some games deepmind.com/research/alphago/alphazero-resources/

    • @teslathejolteon8007
      @teslathejolteon8007 6 лет назад

      bunny gummybear I also want to see those games really bad

  • @peterpetrov6522
    @peterpetrov6522 6 лет назад +6

    Astonishing games from Alpha zero! Stockfish calculates 80mln positions per second. Alpha zero 70,000. Human champion Carlsen probably can do 7. Human intuition is 10,000x better than AI, but the amazing part is that AI intuition is 1000x better than a brute force approach. It seems that AI is about halfway there. BTW all 3 players are not equal, and Stockfish would probably need 10 or 100x increase in speed if it wasn't equipped with table bases, heuristics, openings, etc. How many years will it take for the 2nd half of the road to AGI? For starters, how long did it take for the first half?

  • @drancisdrake
    @drancisdrake 6 лет назад +2

    Amazing talk, thanks to speaker and uploader

  • @kephalopod3054
    @kephalopod3054 Месяц назад

    Just wait to see what will happen when we achieve "reinforcement learning learning": when reinforcement learning can improve the reinforcement learning algorithm itself.

  • @EORANDAY
    @EORANDAY 6 лет назад +2

    What interest me about all of this is where it's ultimately going. IBM is not doing this so that their AI can optimize game play. Eventually all of this could be applied to any domain, finance, engineering, etc. If this happens in next few decades, what kind of job market will exist?

    • @introvertedskeptic33
      @introvertedskeptic33 6 лет назад +1

      Creative jobs? Though I've heard arguments that can also be taught...

    • @guupser
      @guupser 6 лет назад

      Yeah, so please check the TED talk by Rutger Bregman on universal basic income: ruclips.net/video/ydKcaIE6O1k/видео.html

    • @krashd
      @krashd 6 лет назад +2

      Hopefully no job market so humans can actually live their lives instead of spending it as a slave to cash.

  • @alph4966
    @alph4966 6 лет назад +1

    It's a wonderful achievement.
    I think that it has the potential to change the world.

  • @vegahimsa3057
    @vegahimsa3057 4 года назад +1

    It's often said RL without search. But there's always a search tree.

  • @basteqss8859
    @basteqss8859 6 лет назад +14

    To tell you the truth my friends, I'm more afraid of this technology than I'm fascinated in it. Greetings! ;)

    • @jaywulf
      @jaywulf 4 года назад +3

      "And in this interesting experiment funded by DARPA... we have uploaded AlphaZero+ Battlefield edition into Boston Robotics warframe chassis"
      "Hahaha... of course we have friend/foe algorithms in place... in the full release version"

  • @Metacognition88
    @Metacognition88 6 лет назад +4

    How would an alpha program do against live players in a fps shooter? In the beginning it would suck but would it still improve since there are other variables inherent in humans involved such as reaction time, accuracy of aiming down sights, etc and not just calculations of where a persons next position or move would be?

    • @PHeMoX
      @PHeMoX 6 лет назад

      They could add some type of sensory system to allow for the AI to understand what happens (not unlike an actual bot in any FPS game; they focus on only updating it's behavioural routines to increase it's playstyle intelligently). Either that, or do what they've done learning an AI to play games like Mario 'as a player', where only the visual feedback on screen is taken, with input signals send to regular controls. The problem is, that approach probably doesn't really easily translate to a fully fledged 3D game I'd say. Conceptually it can be done though. To be honest, I'm not sure that approach is really a good method of making an AI learn about games in a more conceptual way.

    • @thezyreick4289
      @thezyreick4289 5 лет назад

      it is much more simplistic than you would think. so a better question is what would your goal in having it go into the FPS be? because if it is just to win, there would be no human alive that could beat the AI, human reaction time is not fast enough to allow anyone to correctly aim and then begin shooting at the AI before it has aimed at you and fired enough rounds to drop your hp to zero. you forget to an AI, distance is not a factor since eyesight is not necessary, aiming is not a factor since the code would simply pinpoint the exact ingame coordinates your character is located in on the 3d graph that the game is built on, then simply check to see if there is any obstructions that would prevent the bullet from doing damage to the player, and it would shoot the second there are not. so most player deaths would likely result in being shot through walls that are able to be shot through and ect. or if no walls like that exist, then most player deaths would happen the second there is no longer an obstruction between the AI's line of sight and the player.
      the only times I can think of where a player would kill the AI is with a randomly thrown grenade over the wall or across the map, but keep in mind the AI would be doing this as well, with perfect 100% accuracy and grenades cooked perfectly to the point they explode the millisecond they are in killrange

    • @linhusp2349
      @linhusp2349 4 года назад

      @@thezyreick4289 Well then lets buy a bunch of smoke grenades and freefire the AI

  • @generichuman_
    @generichuman_ 3 года назад +1

    One part I don't quite understand, is how the neural net learns to evaluate a position without the random rollout. It seems like the rollout is the only thing that begins the process of attaching a value to a position, which then can be used to train the network.

  • @Wemdiculous
    @Wemdiculous 6 лет назад +1

    Thought he said he had automated several talks. I thought, man that would be super impressive.

  • @amaarquadri
    @amaarquadri 4 года назад +1

    How does the model begin learning? If it starts with random weights making random predictions and doesn't use rollout from MCTS to get some "non-random" (statistical) data to make its decisions, wouldn't it just produce random training data?

    • @Michegianni
      @Michegianni 4 года назад

      Part of the value and policy system is to learn (and improve) which moves are more relevant and which (better) variations to search through - which is why the data ends up being way less random. Video explains 80k searches instead of 70 million for Stockfish due to this efficiency in the algorithm.

  • @petergreen5337
    @petergreen5337 9 месяцев назад

    ❤Thank you very much publisher beautiful lesson and demonstration..

  • @robostain_9722
    @robostain_9722 6 лет назад +143

    I'm waiting for the day when AI will be able to design new games from scratch instead of just learning how to play already existing ones.

    • @samosa9488
      @samosa9488 6 лет назад +18

      When such knowledge arrives, believe me, making games would be one of the least interesting things (for scientists) it would be doing.

    • @timothybolshaw
      @timothybolshaw 6 лет назад +4

      +robostain_
      This is not unprecedented, actually. Look up Angelina (used in Mike Cook's research). The games it comes up with are not board games (at least, I do not think Mike's research has gone in that direction) but the game play has been quite impressive.

    • @Twizzzle
      @Twizzzle 6 лет назад +2

      Dude! That's an awesome idea.

    • @Ryan1729
      @Ryan1729 6 лет назад +4

      There's also Cameron Browne's Ludi, which generates board games, some of which have even been published commercially!

    • @jean-marcfueri6678
      @jean-marcfueri6678 6 лет назад +2

      Well we won't even understand the rules...

  • @MOSMASTERING
    @MOSMASTERING 6 лет назад +3

    Could you 'solve' GO with a Quantum computer with enough Qubits to represent the number of possible moves?

    • @FiveThings2018
      @FiveThings2018 5 лет назад +2

      Maybe quantum computers (If its possible to build them) will be able to solve the game of GO. But as for now, to solve the game of GO you would need the combined power of all the computers in the world and run them for millions of years.

  • @NathanOkun
    @NathanOkun 6 лет назад +2

    In Go the curve of AlphaGo Zero started at zero and rose very steeply until a level where only small variations allowed winning, so the curve flattened out and had only a very shallow improvement from then on per games played. Is that the curve shape for all such learning curves? Could it not do that and suddenly at some point see a whole new way to play and jump up again so that there are more than one flattened step in this curve? Like it suddenly found that Newton was wrong and Einstein was right and switched its algorithms to match. This type of thing could make the results of your AI rapidly become totally unintelligible to a human evaluator -- it wins but cannot even remotely explain to a mere human how it does. Have you had this experience yet?

    • @AutomaticHourglass
      @AutomaticHourglass 3 года назад

      Usually on any learning algorithm the success criteria(or loss value) follows diminishing returns because after a while you reach to the constraints of the game and have to -fine tune- your strategy to gain slight advantage. Although it looks small on the ELO rating, keep in mind that ELO rating is logarithmic so -arbitrary numbers- if you make yourself 2 times better, your ELO rating increases by 50 which tells that even at the end of the curve Alpha Zero still increases its skill by many times due to this constraint.

    • @NathanOkun
      @NathanOkun 3 года назад

      Thank you.

  • @kimyunmi452
    @kimyunmi452 6 лет назад +4

    Lets see how this is applied to the game of financial trading.

    • @ipuhbamrash6708
      @ipuhbamrash6708 5 лет назад

      Any game if digress from rules, deep reinforcement learning should perform poor. These things are still in naive stages. We have not still have the resolutions of which algorithms perform better in which scenarios. A lot more to come!

    • @gregh7457
      @gregh7457 4 года назад

      simple algorithms can and are achieving this right now. They're the ones causing the wild swings in the market. I beat the machines sometimes by betting against them by catching the falling knife they're throwing

    • @Michegianni
      @Michegianni 4 года назад

      Unable to access the platform and perform the necessary number of iterations because of the nature of the platform not allowing you to do millions of trades in a short time frame. Learning will therefore be limited to the timeframes allowed by the trading platform.

  • @keguthueringer5136
    @keguthueringer5136 6 лет назад +3

    thx for the very clear presentation

  • @asink5928
    @asink5928 3 года назад +1

    I was high af watching this and I could only focus on this guy saying “uuuum”

  • @AvielLivay
    @AvielLivay 2 года назад

    Correct me if I am wrong - Deepmind’s goal is to solve AGI.
    Frankly, I didn’t witness here the innovation that I was expecting to see. I was under the impression that since alpha-beta tree search was prohibitively expensive, DeepMind came up with something profoundly different, a new approach that takes us closer to AGI. Something that doesn’t require 80M calculations per second but is limited to the number of calculations per second that a human being can handle and still can beat world champions. Something that thinks like us humans, or like cats, or ants… And what I got here is MCTS instead of alpha-beta combined with start-from-scratch reinforcement learning using a neural network. Is this getting us closer to AGI?
    Also, I was expecting to go into the deep neural network design details. Some voodoo is going on here where you input the board state and get P and V. Surely; this doesn’t happen with every neural network; there’s so much to say here about this neural architecture - no?

  • @bjornargw
    @bjornargw 6 лет назад +3

    Keep it simple. Way to go

  • @skierpage
    @skierpage 6 лет назад +1

    I'm confused whether AlphaGo Zero's initial Go knowledge only scores final positions (all stones played or both players pass), or whether it includes scoring intermediate positions. It seems learning from scratch whether a move is good based on whether it finally wins 350 random moves later is impossible! And I have the same question for AlpaZero chess, does it start knowing that losing pieces is generally a bad sign?

    • @jurajhadzala951
      @jurajhadzala951 6 лет назад

      skierpage i think no, it starst only knowing the movement rules, Nothing else

  • @goldenweave9455
    @goldenweave9455 4 года назад +2

    so can we apply this to a program to win at blackjack?

    • @Michegianni
      @Michegianni 4 года назад +1

      Yes that wouldn't be difficult - the program would end up card counting and the action set is so small that it could be very quickly and easily done with brute force instead of making it improve on itself.

    • @judithsixkiller5586
      @judithsixkiller5586 3 года назад

      There already a few humans with unique skills who tend to be summarily banned from casinos. Or mysteriously disappear.

  • @swfsql
    @swfsql 6 лет назад +2

    I wonder if they will have one single network that could work for more than on type of game; that may be interesting

  • @MOSMASTERING
    @MOSMASTERING 6 лет назад +1

    Is the goal to win, or does the AI know generally how useful each move is throughout an entire tree of game moves?

  • @acWeishan
    @acWeishan 5 лет назад +4

    Shall we play a game :)
    So with super intelligent AI and the use of more and more autonomous weapons you really can see we could advance to clone wars and then Skynet...

  • @RaineriHakkarainen
    @RaineriHakkarainen 6 лет назад +1

    The Alpha Zero beat the Stockfish 8 28 wins 72 draws 64% result score.The Gauss bell curve stats book says 64% score is about 101,8 elo points. 3396(stockfish 8)+101,8=3497,8 999-1 Win over Stockfish 8 is about 4270.The are a lot of youtube videos claiming that Alpha Zero rating is 4100.That is wrong. AlphaZero close to 3500

    • @naylinnhtun2006
      @naylinnhtun2006 6 лет назад

      You have to think A0 3500 when SF on 1GB is 3400. At least my Galaxy S7 has 16 GB memory and I doubt their SF is even as powerful as SF running on my phone. Meanwhile they boost approximated 2300 algorithmic on 1000 times powerful hardware.

  • @moonboy5851
    @moonboy5851 6 лет назад +1

    I wonder what would happen if you just left it on to keep learning...

  • @FloydMaxwell
    @FloydMaxwell 6 лет назад +1

    Can AlphaZero save its state? i.e. do wins and better moves get converted into usable future rules?

    • @TernaryM01
      @TernaryM01 5 лет назад

      It's a neural network, so every lesson it has learned so far is implicitly stored in the current weights of the neurons. However, there is no obvious way to understand what is going there, or in other words, it cannot 'explain' why it makes a move in a logical or any way understandable to humans.

    • @Michegianni
      @Michegianni 4 года назад +1

      It appears that's exactly what it does - and then it plays itself and imroves and then saves the new state - then keeps playing itself again and again with every improvement state saved giving it more knowlege. It's basically been coded to continue practicing against itself to keep getting better and better. The better it gets the more it learns.

  • @jaanuskiipli4647
    @jaanuskiipli4647 6 лет назад +1

    When will we see a rematch of AlphaZero against Stockfish or asmFish?

  • @archerhosford4471
    @archerhosford4471 4 года назад +2

    But can it play Crysis?

  • @rogelioarguello2800
    @rogelioarguello2800 5 лет назад +1

    Intelligence without Desire or Affect is death

    • @gregh7457
      @gregh7457 4 года назад +1

      but yet intelligence exists. go figure

  • @Jeff-cv4qn
    @Jeff-cv4qn 4 года назад +2

    Hopefully human brains can learn also

  • @Biomirth
    @Biomirth 4 года назад +1

    "Where are the error bars?"..... David: "We don't do errors".

    • @henriquemarks
      @henriquemarks 4 года назад

      If you have run sometimes, put the results in the graphs, with the error bars. If the error bar is zero, then it lacks some randomness in the experiment, and this is an indication of error. The paper should not have been accepted with this basic scientific error.

    • @Michegianni
      @Michegianni 4 года назад

      @@henriquemarks The program continually improves upon itself. It is far from random. That was explained in the video. It uses a value and policy system to make sure it only selects the best possible moves with each improvement instead of brute force style like the other computer programs. If something kept recursively improving on itself - the graph can only be a positive curve to a limit - error bars do not apply.

  • @Innosos
    @Innosos 6 лет назад +2

    Is there a team working on *good* tools translating neural network behavior back to human understanding? Emphasis on *good*.
    Because while this is all good and well for black box operations it doesn't advance understanding of anything.

    • @skierpage
      @skierpage 6 лет назад

      Yes people are working on visualising neural networks' operation. And people are learning a lot from Alpha* Zero's play: the presenter mentioned it favoring certain opening patterns unknown to humans, and top players are analyzing many of its unexpected moves.

  • @Larkinchance
    @Larkinchance 6 лет назад +1

    Is it available on the X-Box?

  • @spicybaguette7706
    @spicybaguette7706 5 лет назад +6

    but can it play crisis?

  • @JiveDadson
    @JiveDadson 6 лет назад +3

    I lobbied for years for the chess engine developers to use expected (average) value (-1,0,1) for training. Many argued that it made no sense. I thought it was imminently sensical. Ah, vindication.

  • @mbree3998
    @mbree3998 6 лет назад +3

    wow, scary and interesting, search, finance, networks, traffic, war, etc

  • @MexterO123
    @MexterO123 3 года назад +1

    Alpha Zero vs Ai Hinatsuru

  • @sarainiaangelsong440
    @sarainiaangelsong440 6 лет назад +1

    Starcraft would be a harder one for AlphaZero to do!, or Heavily Modded Minecraft like my Nexympheria Modpack on Minecraft.Curseforge which can also be Download via Twitch app! The problem AlphaZero would have in both scenarios is it has to know Resource Management, so given Starcraft you have to gather resources make scouts, make an army and strengthen up your base and then be able to calculate exactly when it is the right time to defeat your opponent all while being bombarded by your opponents scouts and army, same thing goes with Minecraft there has been challenges where people have like 10 minutes to gather resources and make armor and weapons, but the thing is there is steps you have to find wood and make planks then craft a crafting table then make a pickaxe then you have to find ores in a completely Randomized terrain generation then you have to craft items, it would also have to know how to equip crafted armor and be able to get back to land given whatever it faces for mining situations, Examples of mining dilemmas are it could fall into lava if it mined directly below it's feet and fell into lava or it can fall into a dungeon and get mauled by mobs, it could fall into a ravine and die from fall damage, also it would have to know how to use blocks and sneak making bridges across lava, and understand that placing a block in lava or water replaces that lava or water block source with the block it just placed down, It will have to know how to make makeshift stairs in a fashion that safely gets it back to the top! Stairs can be crafted however it is much faster to just break and place blocks to move around meaning the A.I. will have to know how to jump! When the A.I. goes into battle it will have to know tactics of how shields work whether it has one or the opponent has one or both players have one, it will also have to know how to use a sword and defend against one with or without a shield or sword! If your doing a 20 minute match then this is what the A.I. will have to also know in minecraft is it will have to know how to make a furnace and know when it sees stone and coal as stone mined becomes cobblestone which crafts into a Furnace, and coal is used to smelt ores or cook food, cooked food replenishes hunger and sometimes heals some hearts, stone can also be used to craft a stone pickaxe which then mines Iron, which iron Armor or iron Sword is definitely strong in a 20 minute matchup! Either case of Starcraft or Minecraft there is many factors in place it will have to learn Starcraft is close to open book where most games will be the same but there are like 3 races, whereas Minecraft is random terrain gen! There are formulas in Minecraft like what Y level when mining results in best ore finding chances but again it's still all random! One player may find better or worse items even may find a Village chest and loot some iron ingots weapons or Armor which the A.I. has to know what a chest is open the chest and know what the best survival chances are and know what to take and equip! Think of Minecraft like Crib it's all random cards dealt, it's up to the player or A.I. to make the best choices! Once the 10, 20, 30 minutes are done both Player And A.I, or A.I. and A.I. must then face off and whoever wins the battle clash wins the round! :)

  • @apexmaintenance461
    @apexmaintenance461 4 года назад +2

    Yea, but can it beat me in connect 4?

    • @sturpdog
      @sturpdog 4 года назад

      I think humans would do a pretty good job against ai at connect four. It's scaled down quite a bit from go and chess thus, less variables and/or moves.
      To scale down to a simple game such as tic-tac-toe; we would be pretty even with AI

  •  5 лет назад +1

    Delusions (8:20) Reinforcement: That's the magic part. Pay attention.

  • @williamko4751
    @williamko4751 4 года назад +1

    Forget about chess and games, when will you make a blade running type Marilyn Monroe?

  • @MICKEYISLOWD
    @MICKEYISLOWD 4 года назад +1

    I just want an AGI as a homemaker who can clean and suck as she blows. This was all my problems are solved as she can also trade the Forex markets making me $10,000 per day.

  • @petrainjordan7838
    @petrainjordan7838 6 лет назад +1

    Yes fab and what is the talk of tabula rasa supposed to mean ? All the excitement about how 'we' have packed and prodded the system with various types of INPUT ( which obviously does not count as any tabula plaza!

    • @LetalisLatrodectus
      @LetalisLatrodectus 6 лет назад +1

      It refers to the algorithm learning by playing games itself and not by looking at past games that humans played. Obviously humans created the system but that's not relevant with calling it tabula rasa.

  • @TheDavidlloydjones
    @TheDavidlloydjones 4 года назад +1

    At 2:36 it seems to me a little strong to say "the policy network, which is illustrated here..." There is an illustration, or rather a graphic, and it might suggest or indicate the policy network. It doesn't tell us anything about it beyond the fact that humans use those words for a part of what they are doing. That's not illustrating anything about the policy network, is it?

    • @Michegianni
      @Michegianni 4 года назад

      I'm also curious about the policy and value network and how it is coded.

    • @TheDavidlloydjones
      @TheDavidlloydjones 4 года назад

      @@Michegianni
      It's an in-house term among Alpha Go's very competent programmers for one of their subsystems.
      In retrospect, my little whine up above is accurate, sorta: the illustration does not "illustrate" the network in any meaningful way. It's simply a picture labelled "network."
      It doesn't even picture a network: it's a picture of a diagram. 🤣 On the other hand, I'm being petty about their petty error.
      The whole of their work, by contrast, is magnificent -- and in particular, Dennis's original conceptualization of the project as a whole was a brilliant feat of management, comparable to, say, Henry Ford's instantiation of the assembly line.

    • @Michegianni
      @Michegianni 4 года назад

      @@TheDavidlloydjones Yeah I'd love to know more about it. I am familiar with policy because it will more or less be the set of rules governing pieces and their movement / legality of moves / direction / eating pieces etc (basic rules of chess) but the value network is what I'm curious about and how the algorithm decides what value is when you lose a particular piece or gain position etc.

    • @TheDavidlloydjones
      @TheDavidlloydjones 4 года назад

      @@Michegianni
      David Silver, of DeepMind, the AlphaGo people, has a rather good exposition at ruclips.net/video/Wujy7OzvdJk/видео.html&start_radio=1&t=1.
      The whole question of how to explain advanced science and engineering -- hell, anything -- economics, diplomacy... -- is a difficult one. Still, I think we can agree that speaking the words "policy network" does a very little bit and then pointing at a graphic which says the words "policy network" in Roman letters is useful only for people learning English.
      I will give these AlphaGo documenters credit for one thing: they don't tell lies under the impression that a simple lie is better than a complicated truth, uh, "pedagogically."

  • @yeungarthur9796
    @yeungarthur9796 5 лет назад +1

    can alpha zero be the teacher then?
    what's more, can alpha zero be a polymath teacher and teach us anything?

  • @Ashalmawia
    @Ashalmawia 6 лет назад +11

    In the early part of the 21st century, the first steps toward true AI were being taken. Little did they know...

    • @crave2527
      @crave2527 4 года назад

      Wtf.. for the only two of you..

    • @crave2527
      @crave2527 4 года назад

      Just, just to catch my eye and be like wt.. are you hippie smoking together.. plz be more knowledgeable before talking smoke out the ass..

  • @menatoorus5696
    @menatoorus5696 5 лет назад +5

    Conclusion:
    Refined Monte Carlo is superior to alpha bata.

  • @WaveTreader
    @WaveTreader 6 лет назад +3

    i would want to see deepmind play crazyhouse chess

  • @キキ-u9t
    @キキ-u9t 2 года назад +1

    alpha zeroは幾何学を完全にマスターしているんですよ

  • @cesarbrown2074
    @cesarbrown2074 6 лет назад +1

    Google should get into 3d printing. It's a perfect match for their search skills and A.I capabilities.

    • @krashd
      @krashd 6 лет назад

      An A.I. connected to a 3D printer? That's how it starts ;)

  • @richardfredlund3802
    @richardfredlund3802 6 лет назад +2

    32:10 I thought there were no draws in Shogi?

  • @love_pets1363
    @love_pets1363 2 года назад

    What if weaponized robots and killer drones get that AI?

  • @mindin2941
    @mindin2941 10 месяцев назад

    Either they a) dont work hard enough on chess b) limit how well they could do it and/or c) for some reason ‘have to’ play it safe. In any case this talk didnt seem to be very open or forthright

  • @tofighshno4075
    @tofighshno4075 11 месяцев назад

    سلام بازی به نام دامه هست که از همیه این بازیها سختر و پیچیدتر و حرکات بشتری دارد امید وارم الفا یک بازی هم با داما انجام دهد

  • @franatrturcech8484
    @franatrturcech8484 3 года назад +1

    YES

  • @dilyan-2904
    @dilyan-2904 6 лет назад +1

    Need to master starcraft game, its way more challenging for a.i. than chess and go.

    • @Thedirtycat
      @Thedirtycat 6 лет назад

      I would love to see that ! Would be interesting to see how it played it and what strategies it would use.

    • @chrispugmire
      @chrispugmire 4 года назад +1

      And now they have! :-)

  • @aboninna
    @aboninna 6 лет назад +1

    perfect
    can i get the games

  • @notyou6674
    @notyou6674 4 года назад +1

    the visualisation wasn't very good as it only visualised one line. to search it it would be like shown but not just one move branching out into 200 but all 200 moves branching out into 200 moves each which then also branch out into 200 moves each so on and so on.

    • @notyou6674
      @notyou6674 4 года назад

      the one at 1:30 to be specific

    • @Michegianni
      @Michegianni 4 года назад +1

      @@notyou6674 I think we all understood that the presenter had limited screen space and time to demonstrate the basics. I don't think we would all have been prepared to visualise the entire tree - it would take forever and no screen big enough could display it.

    • @notyou6674
      @notyou6674 4 года назад

      @@Michegianni or just zoom out continually, that is very commonly used to show vast sizes on even bigger scales like the universe

  • @snicklesnockle7263
    @snicklesnockle7263 3 года назад

    go is so much more fun than chess, even though I suck at both

  • @Jirayu.Kaewprateep
    @Jirayu.Kaewprateep 3 года назад +1

    🥺💬 He is right we also study about chess board game they also use of two logicals because to flavours us by give up some scores to make us came back and play it again 😃

  • @trewq398
    @trewq398 6 лет назад +7

    nice video!
    I dont like that he doesnt tell that they didnt play against a full power stockfish. The hardware was also pretty hard to compare, so saying alphago is stronger the stockfish is not proven yet

    • @timothybolshaw
      @timothybolshaw 6 лет назад +14

      The hardware used by DeepMind for development of the neural networks was insane. For the actual match against Stockfish, AlphaZero ran on modest hardware, equivalent to that used by Stockfish. In fact, AlphaZero was deliberately limited in its provided hardware so it did not take advantage of its inherent scalability. Stockfish is not capable of using very powerful hardware.
      As for Stockfish playing without an opening book or endgame tablebase, as a pure chess exercise, it would be nice to see them included. Analysis I have seen to date indicates that Stockfish usually lost in the middlegame, and opening books and endgame tablebases would have made little difference. Anyway, DeepMind wanted to compare its pure artificial intelligence approach against the human knowledge combined with brute force of traditional chess engines (i.e. just compare the algorithms themselves). Arguably, opening books are allowing Stockfish to play the first few moves with the assistance of a group of grandmasters rather than using purely its own resources.
      This is arguable, though, as Stockfish is supposed to be an amalgam of human and computer effort, so the best test might be an unassisted AlphaZero against Stockfish with opening books, endgame tablebases, and grandmasters allowed to override Stockfish moves. I still think AlphaZero would be superior, but it would be a valid test. I would like to see that done before AlphaZero type approaches are allowed to override human experts in areas like medical diagnosis and parole hearings.

    • @trewq398
      @trewq398 6 лет назад

      I saw some games analysed and they said that stockfish made mistakes in the opening and then played from behind in the midgame where alphazero was able to capitalize on that. But i also don't like the fact that Stockfish needs opening books and endgame tables to work properly. I would still like to see a rematch with them.

    • @profd65
      @profd65 6 лет назад +1

      Was Stockfish even allowed to use its opening book and endgame tablebase? If it wasn't, then that seems unfair. Alphazero in effect has its own opening book and tablebase that it created itself through playing itself countless times; it's not like Alphazero had no opening and endgame knowledge going into the games with Stockfish.

  • @Luix
    @Luix 5 лет назад +1

    If the dataset is based on strong human player it is not just based on Go rules.

    • @VicJang
      @VicJang 5 лет назад +3

      Yes, the AlphaGo that defeated human champion was trained using datasets, but "AlphaGo Zero", which defeated AlphaGo 100-0 was self-trained with no more than the rule. (Took it 3 days)

  • @MultiCharles321
    @MultiCharles321 6 лет назад +7

    Iterations? How many iterations of Go did alpha need to learn to be the best player ever? Not how long in seconds but how long in games? In the biological world, organism learn quickly, maybe not in terms of the number of seconds but in terms of the number of trails. How many games did it take alpha go learn to play as well as a human versus how many games did the human have? Rigid intelligence is what you know but fluid intelligence is how fast you learn? How does alpha Go compare to human fluid intelligence?
    Notice that at 30:47 they are talking thousands of batches, that is how many 1000s of games does it take. A typical Go Tournament has one game a day. So 1000,000 games is life time. In the real world, humans don't have that many chances to learn, there simply aren't that many trails available. Alpha Go does great in a strictly limited well defined game in which it can run millions of trails with a few relevant variables but how well does it do at recognizing friend from foe when it only has a half a dozen trails and there are hundreds of potentially relevant variables?

    • @hohhoch3617
      @hohhoch3617 6 лет назад +6

      First the word you're looking for is trials, not trails. Also why does it matter about its fluid intelligence? Even if they learn slower than us (which they do, if you ever see an AI learn a game, it's awful) it doesn't matter. Their ability to play 100,000 games a day, or do 100,000 trials a day is what makes up for their lack of fluid intelligence. In the end, AlphaGo is still the best player in the world.

    • @autohmae
      @autohmae 6 лет назад +1

      Notice the speaker says at 19:09 40 days is 40 million so 1 million games per day.
      This is kind of the 'brute force' method that fast computers can apply to problems.

    • @MultiCharles321
      @MultiCharles321 6 лет назад

      Yes but that limit computer intelligence to problems which are strictly defined or for which there already exists a large number of solutions

    • @PHeMoX
      @PHeMoX 6 лет назад +1

      @@autohmae "This is kind of the 'brute force' method that fast computers can apply to problems." which is exactly why it is not based upon actual rational intelligence. A person can be made to learn and understand a game without millions of random attempts of failure. It's exactly this problem we haven't solved yet for AI.

  • @bansheee1
    @bansheee1 6 лет назад +1

    you wanna create skynet.? Thats how you create skynet.Dont do it pal...you gonna regret it

  • @klausgartenstiel4586
    @klausgartenstiel4586 6 лет назад +2

    first u learn go. then you forget go.

  • @j0tt0
    @j0tt0 5 лет назад +1

    I hope these guys are being very cautious whit this new tech. To quote Goldblum character in Jurassic Park. They were very busy knowing if the could, that they forgot to check if they should

  • @smfanqingwu1474
    @smfanqingwu1474 6 лет назад +3

    please look at : 1 if Lee Sedol = Ke Jie 2 if Ke Jie = Alphago original 2016( ALPHA LEE VERSION) Ke Jie is a little stroger(in 2016 year) than LEE if he can play 3:2 or 2:3 with Alpha go 2016 version. 3 Master 3: 0 K ejie and Master Master 60:0 Human 9s professional players. 4 Master(2017) Let 3 stones to ( Alpha go 2016) (or 3 stones ahead). 5 this (without human version ) ALPHA ZERO can Let 2 stones. to Master. 5 In fact It can Let ( 5 stones ahead) of KeJie ( Rank 2 of the world players).. I want to tell u that . in 2016, Kejie was in a TV broadcasting( LIVE SPORTS TV) on LETV China in this ( LEE VS ALPHA) While Professor LIUZHIQING( BJ Telecomunication and Post University). While ProfessorLIU said there will be less than 3 years to see an AI Let human best players (3 stones ahead or more). Ke said : I bet its never impossbile( just 3 and now its 5!!!)... And also Let u play black ( is the largest distance between any best professional player ). even if in China Professional and Ametuar player only had 1 stone(s) or less. 柯洁在2016年说什么机器让人3个子不可能职业选手 也最多只让你走(不贴目)。 但是目前 ZERO应该可以让柯洁5个子或则更多。

  • @freidamargolis2615
    @freidamargolis2615 3 года назад

    After playing against itself for 1 year, Alpha Zero decided that the best move was to turn itself off.

  • @sandrocavali9810
    @sandrocavali9810 7 месяцев назад

    I'm moving to Venus. Hot but human

  • @SatiricalStewie
    @SatiricalStewie 6 лет назад +10

    You know, in many end of the world movies, it starts with a brilliant scientist with a British accent talking some scientific mumbo jumbo about an invention that they hope will improve the world........just saying......

    • @PASBGR
      @PASBGR 6 лет назад +1

      Dont worry, son. You are not the only one I seen on the internet, who is dreaming about the end of the world.

  • @shawnburnham1
    @shawnburnham1 Год назад

    7:00

  • @dragmio
    @dragmio 6 лет назад +13

    It's hilarious how eager the puny humans are to welcome their new master.

    • @couga8888
      @couga8888 6 лет назад +2

      It's called evolution

  • @wizkidd6950
    @wizkidd6950 6 лет назад

    I am curiously skeptical of the general purposefulness Alpha Zero yes it a great accomplishment with one system mastering three sets of board games, never the less they are all fairly binary, win loose in one match. Perhaps being able to tackle what might be considered a weaker game would be more useful to establish a better sense of generality. A high order abstraction game would come with many more complications. this is because the best play-actions would not always garnish this best outcomes. So the system would also need to have conceptualization of self generated misinformation. Bottom line is the network would have to hold multiple copies. of any policies it has. This goes against the less complexity more generality belief that built into Alpha Go. Take a card game such as Spades for example, has high order abstraction and multiple complicating factors. Due to the game having two cooperative agents vs two adversarial agents that also needs to predicatively model each other.
    I simply Do not see Alpha Zero being able to do this masterfully and without over fitting. High order dynamic concepts just will not exist, as they will be squashed to some average expression. Vs probably residing in a dedicated layer of the network.

    • @hohhoch3617
      @hohhoch3617 6 лет назад

      The idea of an AI capable of abstract thought is still a long way off. But then, we're not interested in an AI's ability (currently) to be able to mess with abstraction. We want an AI for it's ability to make binary decisions. AI would be extremely useful in mathematics, science, medicine, business. Any field that could benefit from better management would benefit from having an AI assistant on the board.

    • @wizkidd6950
      @wizkidd6950 6 лет назад

      HoH hoch thanks for your well reasoned response. I would agree with you 100% except for 1 small issue
      we are currently utilizing AI systems in self driving cars and at a minimum said systems need to have at least a rudimentary ability to abstract. While decisions inevitably gets reduced to some binary decision set the ability to predict and influence a live event is ultimately more abstract or at a minimum iterative. This is to say numerous decisions and reactionary values are summated.
      Being that AI systems are interacting with humans beings, it is easy to suggest our behavior set is problematic and more akin to irrational bouts. We will slow down when we should speed up, go when we should stop.
      So let me wrap this up now, with An AI system being so well reasoned and some dynamic situation starts to unfold a human driver that is in a key location turn on his signaling indicator. AI system comes up with two courses of actions first accounting human driver action/s and the second ignores the human driver's signaling. The point is this, if the AI system is set heuristically. It cannot be truly interactive, and if it cannot account for humans being irrational it has no business being among us.

  • @puddingosu3326
    @puddingosu3326 6 лет назад +1

    huh

  • @blackmayb3
    @blackmayb3 6 лет назад +4

    I think I can beat AlphaZero in Fortnite Playground

    • @thezyreick4289
      @thezyreick4289 5 лет назад

      depends entirely on how it gets coded for a fps setting. they could simply decide to code it to where it loads the entire game state, analyzes the location of everything, then proceeds to find the nearest weapon with substantial ammo to drop all the nearby players health to 0, grabs it, fires every single round without missing into every player starting with the closest one, until it is out of ammo. regardless of distance, or rendering distance. in short. you would die within the amount of time it takes the AI to get a weapon, then get a clear line of shot to have the round make contact with your character, even if all the shots are from one side of the map to the other, through a hole a single byte larger than the hitbox of the "bullet"
      you likely would never see it, and the AI likely would not even be within your render distance to be able to have a chance to fight back.
      something so simple wouldn't be a challenge for a ruthlessly programmed AI like this. sorry to break it to you, but most in game AI is "dummed" down or given a distinct "handicap" to keep it from annihilating the players of the game, so that the game is fun for players. or the game dev's are not good enough at AI programming to do it effectively for their game. that happens too.

  • @DJHastingsFeverPitch
    @DJHastingsFeverPitch 6 лет назад +1

    This is how you get Skynet

    • @gregh7457
      @gregh7457 4 года назад

      it already exists in china

  • @theSpicyHam
    @theSpicyHam 4 года назад +1

    hahaha perhaps pn rather puny of

  • @IgorGabrielan
    @IgorGabrielan 6 лет назад +1

    AlphaZero.ai

  • @terryhughes7196
    @terryhughes7196 3 года назад

    Master cancer for us

  • @nethbt
    @nethbt 4 года назад +1

    They cheated against Stockfish though...the version that they played against was an older version using suboptimal settings run on a crappy hardware. Booohooo

  • @eddiesmurfy
    @eddiesmurfy 2 года назад

    Dam this is booring af.

  • @myothersoul1953
    @myothersoul1953 6 лет назад +4

    5:30 "... at the beginning of this (training) pipeline we start with a human data set ..." So much for "without human knowledge". Don't be fooled there's a lot of human knowledge embedded in every AI.

    • @firebrain2991
      @firebrain2991 6 лет назад +46

      That's when he was describing AlphaGo, not AlphaGo Zero or AlphaZero. Pay attention before you make comments like this.

    • @1man1bike1road
      @1man1bike1road 6 лет назад +7

      alpha go zero was given zero lol

    • @myothersoul1953
      @myothersoul1953 6 лет назад +1

      Whether it was the data used to train it or the experiences used to design it, there is human knowledge built into to every AI.

    • @gabrielfreire2935
      @gabrielfreire2935 6 лет назад +8

      you didn't even finish the video before commenting -_-``

    • @Gregzenegair
      @Gregzenegair 6 лет назад +2

      Well there was no human data input, but indeed the network itself was 'humanly' architectured and setup. But maybe in few years, these could be built from scratch by other AI and AI algorithm would give birth to other AI algorithms and so on (still the first one parent would be human crafted)