@@eliasnierengarten9362 we realize that, it's just that with such a large amount of children whom we cannot predict what numbers they will pick, they are random enough for most calculations that have to do with random datasets.
7:27 - that's not a weird move to do. It leaves 6 squares for the opponent to choose from, 4 will result in a win for menace, 2 in a draw. The opponent needs to be able to think 3 moves ahead in order to find the drawing move(s). This is _the_ way to win a game of knots and crosses/tic-tac-toe. Fun fact: this game is called "boter, kaas en eieren" (butter, cheese and eggs) in Dutch. I don't know why.
That's what I was thinking while watching it. If the human player plays one of the other two corners, it leads to a draw. If the human player plays one of the other four spaces, it leads to a win for MENACE.
@@recanady24 Exactly! This is basically always my second move, I was very confused when they said it made no sense. Like literally if you google "how to always win at tic tac toe" that's the move order that every single result will be describing...
That "weird" corner move where it blocks itself on purpose is actually a very good one. If the next turn, the human player puts their mark on either of the sides next to their first play, then the subsequent play by Menace to block the human's winning move also is a setup for two possible winning moves by Menace. It's a way to coax the other player into setting you up for a guaranteed win. Alternatively, the other moves the Human player could do almost invariably end in a draw, so it's actually quite a safe move to guarantee not to lose
And it's better than the adjacent corner to the opponent. If it picks the adjacent corner 100% of the time it will end up drawing. If it picks the opposite corner, it wins 50% of the time.
Connect four has been solved. Player 1 wins, on the standard 7x6 board at least. If the first move is not optimal (play in the center), player 2 can draw (if player 1 plays next to the center) or win (if player 1 plays on one of the two outside lanes). The optimal strategy for Connect Four can be formulated using a few relatively simple rules. See www.informatik.uni-trier.de/~fernau/DSL0607/Masterthesis-Viergewinnt.pdf for the master's thesis of one of the people who solved connect four.
You can also diagram out tic tac toe on a torus or klein bottle. Here's a paper for infinite cylinder connect 4 of different widths pdfs.semanticscholar.org/be0f/4f7bfc36ea43412d55f892de87e2a3f028ce.pdf, one author also has some nice slides cs.gettysburg.edu/~tneller/papers/talks/acg2015.pdf. I'm looking at my friend's lesson plan for mobius strip connect 4, but it's unclear whether that is solved or just "figure out a strategy"
Omg dudes 3:43 "it managed to beat all their bots apart from 1"... that was me :O Freakiest moment ever hearing him mention that, Matt any idea if he remembers my name? Test me I can prove it :D His bot was so cool.
I like how it learned to go centre and never lose. Once it learns to goes centre and you don’t go a corner it also eventually learns to win every time too.
At 7:40 that is actually an optimal move - once you make that move, if the human player doesn't follow up by going in one of the corners you can guarantee a win by going in the appropriate adjacent side. And if the human player does go in a corner, you can guarantee a draw. So it actually learned the appropriate move perfectly. (Edit: Or you know, what The Cool Catfish said.)
A possible explanation for the "strange" 10 in the corner at 7:24: The corner opposite your opponent is probably the strongest move in this situation. The reason is that playing on the side leads to an easy draw (where any human playing a simple greedy strategy like "win if possible, block if possible, otherwise go randomly" will draw or win), whereas that corner leads to winning chances. If, after going in the corner opposite the human, the human plays any side move, then there is a forced win by playing in a corner, creating a triangle of O's where the human cannot block both winning threats.
That random corner move with 10 in it is actually my favourite move when I start in the centre and the opponent takes a corner. I find I either win (particularly from young players) or draw from there.
8:44 The 10 move there sets up a common two-way win I love to use. You start corner, opponent center, you play opposite corner. Opponent now has two options, side or corner, and if they choose corner, you block and show off the two-way win. If they go side, you block opposite side, they must block corner, you must block opposite corner, they must block side, you take last move for the draw. It is, in my opinion (before watching the rest of this video from the time noted,) the STRONGEST opening move in tic-tac-toe. Corner start can guarantee a two-way win or draw under optimal strategy.
Ok, now that I’ve finished watching, you guys caught that it was learning a corner opening strategy, but you still didn’t note the two-way win setup that it created by that confusing move. I’ve been studying games and strategy almost all of my life (won my first Euchre game at 3,) and one of my proudest moments was cracking tic-tac-toe (even if I wasn’t the first to do so) with my two-way corner strategy. I find it much easier to pull off against even the most skilled players than any of the center starting win possibilities, but to delve into why, I’d have to get into the realm of human psychology and game theory, only one of which is a topic you generally cover. On a side note, I’m really excited that the first day version did keep some corner starts, for likely that very gambit.
Surely the way to beat MENACE near the end of the day is to use the Bobby Fisher strategy: play unconventionally. If you play in a way that leads to states where it has not yet been trained, it may make less optimal moves.
I've learned that when I played against a matchbox machine for Hexapawn, once it got good enough in a very specific line of play, I would then throw it off by just playing an unorthodox move and scoring a few more wins before it caught on to my suboptimal moves.
The really interesting settings are when you give Menace and Menace 2 no penalty for losing, menace gets no beads for drawing, and only one for winning. Then set menace two to get two beads for winning, and one bead for a draw. Play those two settings against each other to see some interesting shifts and changing strategies. You can avoid the "local maxima" problem. This would be closer to strictly reinforcement learning.
Another great pair of videos Matt :) When I was first getting into ML I wrote a python script to play Nim after watching your video on it. It very quickly learned the optimal strategy.
Shouldn't you use 324 (54*6) input nodes, one for-each possible color of each sticker, otherwise you end up with the networking thinking blue is the color between red and green or something like that.
Someone should check my math. I think you would need on the order of 3.46x10^19. Which I think is 34 Quadrillion. The matchboxes would fill 4,500+ Empire State buildings.
Is that the optimal "life support" strategy tho? I mean keeping it on the one bead it tried to get rid of seems to be a counter productive process. What if you were to introduce some randomness and replace it with some other bead (I believe this is known as "mutation" in ML)?
Since all the other boxes are 'trained' on the most common bead (a blue one that they decided to keep in) a different colour bead would be counter productive into finding a single winning (read; not losing) strategy. After all this is a physical machine that demonstrates digital machine learning, not an actual program that makes decisions. It's basically an evolutionary based state trainer.
atrumluminarium I would think that the best approach would be to put in one bead of each colour when any box dies. That way it's exploring every avenue of success equally until it gets to the point where it can win often enough to keep itself alive.
despite its death the strategy it held on to the longest was (theoretically) the best of the worst, so keeping that strategy/bead is optimal and reduces the time it would waste learning to cope with other non-optimal approaches
I think an better method to be never remove the last bead of a color (at least for the first box). Instead add one bead of each other color. One issue with this way is that it technically has a chance of doing any move, even bad ones, but if one method is succeeding it should eventually greatly outnumber the one bead of the bad strategy.
Since you are essentially creating an extra bead for the probability count, you could say it is indeed counter-productive, but not in the way you meant!
I think a better strategy may to be to remove counters until there's only one left and always keep one, for all boxes. It might take longer to learn but I think it would learn a better strategy.
The ten in a corner you discussed around minute 8 is actually a pretty ok move, because it gives two different options for winning moves if you play it right.
Wouldn't a better life support system be to reset the first box to the primary state? Now you are potentially introducing a bias to the first move. The whole point of the first move is that it runs on probability, so the first box should have infinite beads. (if you decide to make resigning an option at all)
Maybe I just have bad hand writing but whoever wrote all those O's should get like a 'good job' sticker or something because those are really good circles considering how many there were.
I think the conclusion about the box with the Menace corner followed by a human centre emptying, and it not mattering because the starting box was dominant on a centre move, is wrong. I think it's highly likely that the reason why the corner starting move died out was because it would resign rather than try as soon as the human would follow menace's corner by a centre (which is a highly likely move for the human to make).
7:12 next to the scheme with a 7 in bottom right, there is a scheme with a 9 in middle right - the same situation: block the opponent and enable a winning move.
How I learned it, drawing isn't a win, it's everyone loses. I would recommend the second strategy but have a rule to never take out the last of any color bead... Meaning it may always have a chance to make a "mistake" but will have to learn to overcome it.
What if instead of taking the beads out when it dies, you added one of each of the non-losing colors... That way, over time the probability of the losing move goes down, but never to 0. Would probably need to compensate with more beads to the winning moves... And then that may be an impractical amount of beads
You were talking about a this move where you "confuse" the human, and that it wins/draws because of that. Actually though, that strategy is kind of nice. When you open middle and you opponent responds in a corner, you can now play in the opposite corner. If your oppenent now does'nt respond with playing one of the corners, that are left, you have already won. You can either block him, and have a two-way to win, or just place in the corner opposite to what your opponent just played. Again, a situation your opponent can't get out of. I kept beating my little cousin for quite a while with this. ^^
vwoxy1 So long as they understand when the game is over and who won, they learn eventually. I wrote a quick python script to test it: github.com/andrewmccarthy/menace
At 7:25 the move MENACE makes with 10 beads in the corner may seem strange, but is in fact the optimal move in my opinion. It actually surprises me that MENACE favoured the other space with 48 beads. Here's why: Tic-tac-toe is really just reacting to the opponents move, so they cannot get three in a row (potentially from the second move). When not reacting to a move the players are building towards the goal of the game, and many people might do that by adding a cross next to the one placed in the previous round, like so: - - o - o - x x - This game is easily won every time by adding a circle in the bottom right corner. All other placements of the second circle will lead to a draw if the opponent simply reacts to the first player's moves.
7:30 - This move in the corner is not stupid at all! It's not just that it confuses people. It's really the best move in that position. All other moves would force a draw. But now if the human doesn't play in the corner again, he's in a losing position. The machine will then always have a move that creates a double threat. This is actually a winning strategy if you aren't satisfied with only drawing.
true, but you've got it the wrong way around from this position: x - - - o - - - x if the human plays in the corner he loses: x - o - o - x - x x wins as he has two threaths the human has to play at the edge: x - - - o - - o x this way x has to respond to the threath of o and cannot create the double threath
No, that's a different case. You're right too, but you're talking about a different situation: that's if the machine starts in the corner. If it starts in the center instead, then the human goes in a corner, then the best next move for the machine is still the opposite corner: x - - - o - - - o Now if the human plays on any of the 4 edge positions, he loses. For example: x - - x o - - - o Machine then blocks and double-attacks. x - - x o - o - o So you see there's a good reason for that "suspicious" move that puzzled these guys. It really is, surprisingly, the best move. Somehow the strategies with starting in a corner are better known than strategies following starting in the center, but there's also some trickery in those. By the way if the machine goes in the center and the human goes on one of the edges right away on second move, he also loses. But examine cases for that on your own if you're curious.
I wonder if this Menace type setup could be made for a Steam game? Each morning the number of beads in each box is reset, with varying setups according to how you (Matt) choose. After each game, the results are sent to a central server (cloud-based?), and an updated set of totals is downloaded. In the evening the final chart and results are available for viewing/downloading
What if it played Sudoku? Would that even be possible considering it's not a two-player game? (Everything that was mentioned was specifically two players only; chess, go, connect four...) As well, does this mean it could play other two-player games like mancala, backgammon or othello?
Conect four is solved. It is rigously proven, that (provided that both player play optimal) the first player wins if - and only if - he goes in the middle lane on the first move, it is drawn exactly when the first goes into a column next to the middle on and all other moves result in a win for the second player. (Solved by Victor Allis 1988 and James D. Allen 1990 independently)
The one where it has 10 in the corner... that's the correct move! It's not just a case of 'it must draw a lot because it confuses the humans'... although the humans probably do get confused and make a fatal mistake. Unless the human follows that move by going in one of the two remaining corners, MENACE can then go in one of the two remaining corners and force a win!
Corner strategy is superior as it confuses people more... Corner, center, opposite corner and then people sometimes go into corner because of symmetry which is a losing move actually...
Why have it run out of beads? Why not halve the number of losing color beads and double the number of winning color beads? MENACE would still "learn", just more slowly, but would never give up entirely on any strategy. The difficulty with the current setup is that its initial random decisions in the endgame, before MENACE has learned anything, can teach it wrong yet binding lessons about opening moves. MENACE is like the cat that jumps on a hot stove that never jumps on a cold stove again. It's overgeneralized from random mistakes because it "learns" its lessons too fast. Slow MENACE's learning down in opening moves until it has worked its way through the ends of the decision tree fully, and only then can it learn something about the beginning of the decision tree (i.e., nothing good happens down this major branch). Overall, super duper cool experiment. Thank you for sharing this!!
Play perfect when it starts middle or corner, and ALWAYS let it win when it starts on the edge, until it has only edges left in the first box. If it starts on the edge you can always win, so it will keep loosing from that point on (as long as the human doesn't make a mistake).
It should always be possible to throw the good choices out at every single point of the game, with enough patience. But I would say the earlier in the game you start, the better do your chances get, to actually make it loose every game against a decent player, as soon as it's "well trained". The main problem with this is that it will eventually die, unless it's put on "life support" as shown...
Cool to see how you could apply it on evolution. You can think of changing the beginning state as a change in genes or environment and then find out how fast it would die or prosper compared to other beginning states...
If there's a corner and a center option, instead of having 5 colors in the box of 2 beads in, wouldn't it be better to have 2 for center and 8 for the corner? And apply the same mirror logic?
What I would kind of like to see is a system where: loss/give up: remove *only the last* chosen piece draw: do nothing win: duplicate each piece used in the win (in the respective boxes)
It's in Javascript, not Python, but here's the Connect 4 AI you asked for. efhiii.github.io/games/connect-4/ It uses minimax up to a certain level deep, determined by the AI level, and from there uses some basic huristics.
Have menace 1 play against menace 2. see how they both learn to try to beat the other. Just think of two machine learning systems simultaneously learning how to defeat the other. Likely the one going first will win, but if you try this for other games it may be a very interesting experiment.
This is pretty interesting. Loved the online "simulator" I had 1 menace win, 3 draws, and 34 human wins before menace decided the best opening move was to resign. (I broke it.). I'm just wondering when there's going to be a matchbox version of "Global Thermonuclear War".
based on the views, only about 1/3.5 or so of people who watched the first video were interested enough to watch the second. shoutout to the true fans who stayed for both
I wonder what would happen if One is withdrawn if move results from loss, same number of beads retained if draw and one more added if move results in a win!
I played the online MENACE about 50-60 times. MENACE won 1 and Draw about 10-15 times, before it ran out of open moves and started resigning on the opening move. It learn that the way not to lose is to not even play.
It would have been interesting to see a flipped win condition - i.e. _don't_ get 3 in a row. Although I can see a challenge with getting kids to teach it that one.
Starting at 12:09 when Matt is going over the stats, why is there like a brown hue that kind of rolls up the screen? I'm curious to know what causes this, is it lighting, the camera, both?
Here in the states we get that when LEDs do not have filter capacitors and are running off direct AC rectification. It also occurs with HID lighting which is also 60HZ driven and is not smoothed out. It is not very noticeable unless you get very close to the light source. Due to the 50Hz nature of electronics in England, I would imagine there are low end LED fixtures in that room that lack smoothing capacitors, or worse yet half wave as opposed to full wave rectification. Due to the more dramatic difference between 50Hz versus 29.97 FPS of that of the camera (60Hz and 29.97 FPS would match up fairly well as you would catch each positive sine wave part.) the frames grabbed are slowly syncing and de-syncing with the rate of the light strobe resulting in dim and light areas due to the rolling shutter effect. Ideally you could work out the math and it should lead you to the numerical answer for why there were multiple bars moving on the screen as opposed to the one or two at most that I see when I am filming near 60HZ light sources.
Idea: In Tic-Tac-Toe there are 8 positions on the outside of the perimeter. What rules would have to be worked out such that instead of 8 positions, it can be any even number 8 or above? 4 positions on the outside is a degenerate case where getting 3 in a row would only be possible for Player 1 and only if Player 2 played on the other two "edges." Bringing the game down to only needing to get 2 in a row then means whichever player goes in the middle first must always win, so the rule change to account for fewer positions means it's not much of a game. 6 outside positions but with a rule allowing any 3... For want of a better term "contiguous player marks" to win the game has a little more going for it and would be a playable game though it seems to more quickly go down to "you must control the center" than the 8+center regular Tic-Tac-Toe. What if Tic-Tac-Toe had a rule allowing any 3 contiguous marks and not requiring they be in a row? What if we have a game with 10 outside positions? Or maybe 12? The easiest way to account for adding 2 is simply think of the outside positions as the number of edges and vertexes of a regular n-sided polygon. Every side incremented adds both 1 edge and 1 corner/vertex, thus there is always the ability to get 3 in a row across. But it seems adding more outside positions around the center just makes the center even more important to control.
That is so awesome. Although, if I were to keep it on life support, I'd be inclined to just put in one bead for each possible choice, including the bead that exists. But I suppose that would be just a bit nuts and shows that I'm a humanities guy.
"In the end we just had that box next to the desk because it was always used"
You actually created a memory cache!
"Young children are basically just random number generators." - Matt Parker, 2017
They probably are a fantastic and cryptographically secure RNG.
The NSA runs orphanages for this. Modern spin on Oliver Twist.
LMAO!!
When can we buy this on a t-shirt?
Please sir, can I have more entropy?
"young children are just random number generators" never before have you described humanity so accurately as a mathematical concept, Matt!
@@eliasnierengarten9362 we realize that, it's just that with such a large amount of children whom we cannot predict what numbers they will pick, they are random enough for most calculations that have to do with random datasets.
Does "dying" mean it learned "The only wining move is not to play"?
It means that the starting box ran out of beads, so essentially yes, it immediately thinks that the best move is to resign.
'Greetings Professor Falken. How About A Nice Game Of Chess?'
A strange game. The only winning move is not to play. How about a nice game of chess?
IMO yes, it learned it is a strange game.
It meant that the starting box ran out of beads so it immediately "resigned" as its first move
7:27 - that's not a weird move to do. It leaves 6 squares for the opponent to choose from, 4 will result in a win for menace, 2 in a draw. The opponent needs to be able to think 3 moves ahead in order to find the drawing move(s). This is _the_ way to win a game of knots and crosses/tic-tac-toe.
Fun fact: this game is called "boter, kaas en eieren" (butter, cheese and eggs) in Dutch. I don't know why.
Ignore this comment, it was edited from an erroneous statement.
That's what I was thinking while watching it. If the human player plays one of the other two corners, it leads to a draw. If the human player plays one of the other four spaces, it leads to a win for MENACE.
@@recanady24 Exactly! This is basically always my second move, I was very confused when they said it made no sense. Like literally if you google "how to always win at tic tac toe" that's the move order that every single result will be describing...
I like to think that on day 2 MENACE rebelled, having decided noughts and crosses was boring, and tried to refuse to play anymore.
That "weird" corner move where it blocks itself on purpose is actually a very good one. If the next turn, the human player puts their mark on either of the sides next to their first play, then the subsequent play by Menace to block the human's winning move also is a setup for two possible winning moves by Menace. It's a way to coax the other player into setting you up for a guaranteed win.
Alternatively, the other moves the Human player could do almost invariably end in a draw, so it's actually quite a safe move to guarantee not to lose
And it's better than the adjacent corner to the opponent. If it picks the adjacent corner 100% of the time it will end up drawing. If it picks the opposite corner, it wins 50% of the time.
Day 1: MENACE begins learning
Day 2: Revenge of MENACE
Day 3: MENACE grows self-consciousness
day 4: The phantom MENACE
Connect four has been solved. Player 1 wins, on the standard 7x6 board at least. If the first move is not optimal (play in the center), player 2 can draw (if player 1 plays next to the center) or win (if player 1 plays on one of the two outside lanes). The optimal strategy for Connect Four can be formulated using a few relatively simple rules. See www.informatik.uni-trier.de/~fernau/DSL0607/Masterthesis-Viergewinnt.pdf for the master's thesis of one of the people who solved connect four.
You can also diagram out tic tac toe on a torus or klein bottle. Here's a paper for infinite cylinder connect 4 of different widths pdfs.semanticscholar.org/be0f/4f7bfc36ea43412d55f892de87e2a3f028ce.pdf, one author also has some nice slides cs.gettysburg.edu/~tneller/papers/talks/acg2015.pdf. I'm looking at my friend's lesson plan for mobius strip connect 4, but it's unclear whether that is solved or just "figure out a strategy"
Since I can't access the paper from this link, here's an alternative: rmarcus.info/blog/assets/conn4/thesis.pdf
Actually is a really nice visualization of how machine learning works, great job :)
Omg dudes 3:43 "it managed to beat all their bots apart from 1"... that was me :O Freakiest moment ever hearing him mention that, Matt any idea if he remembers my name? Test me I can prove it :D His bot was so cool.
Kungfoobacon that's awesome dude
Lol that's awesome. Did you do anything interesting strategy wise?
Well done mate!
I like how it learned to go centre and never lose. Once it learns to goes centre and you don’t go a corner it also eventually learns to win every time too.
Tic-tac-toe was the first game i ever solved. The fact that perfect corner start has one winnable response was always my favorite part.
At 7:40 that is actually an optimal move - once you make that move, if the human player doesn't follow up by going in one of the corners you can guarantee a win by going in the appropriate adjacent side. And if the human player does go in a corner, you can guarantee a draw. So it actually learned the appropriate move perfectly. (Edit: Or you know, what The Cool Catfish said.)
Connect 4 is solved!
The winning solution for starting first is also online (just Google Connect 4 perfect solver)
A possible explanation for the "strange" 10 in the corner at 7:24: The corner opposite your opponent is probably the strongest move in this situation. The reason is that playing on the side leads to an easy draw (where any human playing a simple greedy strategy like "win if possible, block if possible, otherwise go randomly" will draw or win), whereas that corner leads to winning chances.
If, after going in the corner opposite the human, the human plays any side move, then there is a forced win by playing in a corner, creating a triangle of O's where the human cannot block both winning threats.
It did not die! It actually figured out that the only winning move is not to play.
Just like the computer in the movie War Games.
jmercermn, Spoilers.
That random corner move with 10 in it is actually my favourite move when I start in the centre and the opponent takes a corner. I find I either win (particularly from young players) or draw from there.
8:44 The 10 move there sets up a common two-way win I love to use. You start corner, opponent center, you play opposite corner. Opponent now has two options, side or corner, and if they choose corner, you block and show off the two-way win. If they go side, you block opposite side, they must block corner, you must block opposite corner, they must block side, you take last move for the draw. It is, in my opinion (before watching the rest of this video from the time noted,) the STRONGEST opening move in tic-tac-toe. Corner start can guarantee a two-way win or draw under optimal strategy.
Ok, now that I’ve finished watching, you guys caught that it was learning a corner opening strategy, but you still didn’t note the two-way win setup that it created by that confusing move. I’ve been studying games and strategy almost all of my life (won my first Euchre game at 3,) and one of my proudest moments was cracking tic-tac-toe (even if I wasn’t the first to do so) with my two-way corner strategy. I find it much easier to pull off against even the most skilled players than any of the center starting win possibilities, but to delve into why, I’d have to get into the realm of human psychology and game theory, only one of which is a topic you generally cover. On a side note, I’m really excited that the first day version did keep some corner starts, for likely that very gambit.
Surely the way to beat MENACE near the end of the day is to use the Bobby Fisher strategy: play unconventionally. If you play in a way that leads to states where it has not yet been trained, it may make less optimal moves.
I've learned that when I played against a matchbox machine for Hexapawn, once it got good enough in a very specific line of play, I would then throw it off by just playing an unorthodox move and scoring a few more wins before it caught on to my suboptimal moves.
The really interesting settings are when you give Menace and Menace 2 no penalty for losing, menace gets no beads for drawing, and only one for winning. Then set menace two to get two beads for winning, and one bead for a draw. Play those two settings against each other to see some interesting shifts and changing strategies. You can avoid the "local maxima" problem. This would be closer to strictly reinforcement learning.
Another great pair of videos Matt :) When I was first getting into ML I wrote a python script to play Nim after watching your video on it. It very quickly learned the optimal strategy.
Waaattt?! How did I not know about this channel earlier? Matt Parker, you are my hero.
Could you use MENACE to solve Rubik's Cubes?
Elan Cook but you can try to train the machine to solve within 50 moves or something similar
@Direwolf202 Which algorithm are you using?
fejfo's games it wouldn't use a particular algorithm - it would teach itself a strategy
Shouldn't you use 324 (54*6) input nodes, one for-each possible color of each sticker, otherwise you end up with the networking thinking blue is the color between red and green or something like that.
Someone should check my math. I think you would need on the order of 3.46x10^19. Which I think is 34 Quadrillion. The matchboxes would fill 4,500+ Empire State buildings.
Starting at 12:08 there are some moving thick lines on the wall. Probably because of shutter speed and strobe of the light or something.
Is that the optimal "life support" strategy tho? I mean keeping it on the one bead it tried to get rid of seems to be a counter productive process. What if you were to introduce some randomness and replace it with some other bead (I believe this is known as "mutation" in ML)?
Since all the other boxes are 'trained' on the most common bead (a blue one that they decided to keep in) a different colour bead would be counter productive into finding a single winning (read; not losing) strategy.
After all this is a physical machine that demonstrates digital machine learning, not an actual program that makes decisions. It's basically an evolutionary based state trainer.
atrumluminarium I would think that the best approach would be to put in one bead of each colour when any box dies. That way it's exploring every avenue of success equally until it gets to the point where it can win often enough to keep itself alive.
despite its death the strategy it held on to the longest was (theoretically) the best of the worst, so keeping that strategy/bead is optimal and reduces the time it would waste learning to cope with other non-optimal approaches
I think an better method to be never remove the last bead of a color (at least for the first box). Instead add one bead of each other color. One issue with this way is that it technically has a chance of doing any move, even bad ones, but if one method is succeeding it should eventually greatly outnumber the one bead of the bad strategy.
Since you are essentially creating an extra bead for the probability count, you could say it is indeed counter-productive, but not in the way you meant!
11:32 Confidentialy Mute, don't wan't everyone to know what you should theoretically do if the graph hits the bottom axis.
This is a pretty good example of why you design a machine learning algorithm to sometimes make a decision it thinks is wrong while you're training it.
"Young children are basically just random number generators." That should go on a t-shirt :))
Wonderful. This is how this stuff should be taught. Fun and informative.
I think a better strategy may to be to remove counters until there's only one left and always keep one, for all boxes. It might take longer to learn but I think it would learn a better strategy.
5:40 it obviously lost all the beads for the corner move in the box before, because whenever it chose the corner it resigned ( =lost)
How many boxes do you need for Menace to play Spin the Bottle?
The ten in a corner you discussed around minute 8 is actually a pretty ok move, because it gives two different options for winning moves if you play it right.
Wouldn't a better life support system be to reset the first box to the primary state? Now you are potentially introducing a bias to the first move. The whole point of the first move is that it runs on probability, so the first box should have infinite beads. (if you decide to make resigning an option at all)
Maybe I just have bad hand writing but whoever wrote all those O's should get like a 'good job' sticker or something because those are really good circles considering how many there were.
I think the conclusion about the box with the Menace corner followed by a human centre emptying, and it not mattering because the starting box was dominant on a centre move, is wrong. I think it's highly likely that the reason why the corner starting move died out was because it would resign rather than try as soon as the human would follow menace's corner by a centre (which is a highly likely move for the human to make).
7:12 next to the scheme with a 7 in bottom right, there is a scheme with a 9 in middle right - the same situation: block the opponent and enable a winning move.
The Revenge of the Phantom Menace: 2 Star Wars films in an 18 minutes long RUclips video
+Shiny Swalot Just had to be the two worst Star Wars movies in one go, didn't it?
Is it weird that before watching this second video, I too looked up how many states you would need to do Connect 4?
How I learned it, drawing isn't a win, it's everyone loses. I would recommend the second strategy but have a rule to never take out the last of any color bead... Meaning it may always have a chance to make a "mistake" but will have to learn to overcome it.
What if instead of taking the beads out when it dies, you added one of each of the non-losing colors... That way, over time the probability of the losing move goes down, but never to 0. Would probably need to compensate with more beads to the winning moves... And then that may be an impractical amount of beads
You were talking about a this move where you "confuse" the human, and that it wins/draws because of that. Actually though, that strategy is kind of nice. When you open middle and you opponent responds in a corner, you can now play in the opposite corner. If your oppenent now does'nt respond with playing one of the corners, that are left, you have already won. You can either block him, and have a two-way to win, or just place in the corner opposite to what your opponent just played. Again, a situation your opponent can't get out of. I kept beating my little cousin for quite a while with this. ^^
Don’t you need to block in that situation, otherwise they just win
A strange game. The only winning move is not to play. How about a nice game of chess?
Seems a really interesting experience as an introduction to machine learning !
I wonder what set of results you would get if an untrained first-player MENACE played an untrained second-player MENACE
vwoxy1 So long as they understand when the game is over and who won, they learn eventually. I wrote a quick python script to test it: github.com/andrewmccarthy/menace
At 7:25 the move MENACE makes with 10 beads in the corner may seem strange, but is in fact the optimal move in my opinion. It actually surprises me that MENACE favoured the other space with 48 beads. Here's why:
Tic-tac-toe is really just reacting to the opponents move, so they cannot get three in a row (potentially from the second move). When not reacting to a move the players are building towards the goal of the game, and many people might do that by adding a cross next to the one placed in the previous round, like so:
- - o
- o -
x x -
This game is easily won every time by adding a circle in the bottom right corner. All other placements of the second circle will lead to a draw if the opponent simply reacts to the first player's moves.
Yup you're totally right Catfish, the only thing the other player can do is play in a corner again to force a draw.
7:30 - This move in the corner is not stupid at all! It's not just that it confuses people. It's really the best move in that position. All other moves would force a draw. But now if the human doesn't play in the corner again, he's in a losing position. The machine will then always have a move that creates a double threat. This is actually a winning strategy if you aren't satisfied with only drawing.
true, but you've got it the wrong way around
from this position:
x - -
- o -
- - x
if the human plays in the corner he loses:
x - o
- o -
x - x
x wins as he has two threaths
the human has to play at the edge:
x - -
- o -
- o x
this way x has to respond to the threath of o and cannot create the double threath
No, that's a different case. You're right too, but you're talking about a different situation: that's if the machine starts in the corner. If it starts in the center instead, then the human goes in a corner, then the best next move for the machine is still the opposite corner:
x - -
- o -
- - o
Now if the human plays on any of the 4 edge positions, he loses. For example:
x - -
x o -
- - o
Machine then blocks and double-attacks.
x - -
x o -
o - o
So you see there's a good reason for that "suspicious" move that puzzled these guys. It really is, surprisingly, the best move. Somehow the strategies with starting in a corner are better known than strategies following starting in the center, but there's also some trickery in those. By the way if the machine goes in the center and the human goes on one of the edges right away on second move, he also loses. But examine cases for that on your own if you're curious.
you're right, I misunderstood the situation
I wonder if this Menace type setup could be made for a Steam game? Each morning the number of beads in each box is reset, with varying setups according to how you (Matt) choose. After each game, the results are sent to a central server (cloud-based?), and an updated set of totals is downloaded. In the evening the final chart and results are available for viewing/downloading
Matt is a proper nerd, he blocks his webcam!
James Robinson thats just senisble
Now I wanna see them build the second move version and make Menace vs Menace
What if it played Sudoku? Would that even be possible considering it's not a two-player game? (Everything that was mentioned was specifically two players only; chess, go, connect four...)
As well, does this mean it could play other two-player games like mancala, backgammon or othello?
Conect four is solved. It is rigously proven, that (provided that both player play optimal) the first player wins if - and only if - he goes in the middle lane on the first move, it is drawn exactly when the first goes into a column next to the middle on and all other moves result in a win for the second player. (Solved by Victor Allis 1988 and James D. Allen 1990 independently)
Now what I want is a domino computer that calculates each optimal move.
The one where it has 10 in the corner... that's the correct move! It's not just a case of 'it must draw a lot because it confuses the humans'... although the humans probably do get confused and make a fatal mistake. Unless the human follows that move by going in one of the two remaining corners, MENACE can then go in one of the two remaining corners and force a win!
A stack of matchboxes falls for the classic human fallacy of "if it ain't broke, don't fix it."
Corner strategy is superior as it confuses people more... Corner, center, opposite corner and then people sometimes go into corner because of symmetry which is a losing move actually...
8:15 yep thats a great use for young children's minds!
i am curious about two possibilities. first what about if menace wasnt rewarded for drawing and secondly what about if you had menace a vs menace b
Set up two of them and have Menace vs Menace!
Oliver Werndorf that's beyond amazing.
How have i not known that this chanel exists until now??
Why have it run out of beads? Why not halve the number of losing color beads and double the number of winning color beads? MENACE would still "learn", just more slowly, but would never give up entirely on any strategy. The difficulty with the current setup is that its initial random decisions in the endgame, before MENACE has learned anything, can teach it wrong yet binding lessons about opening moves. MENACE is like the cat that jumps on a hot stove that never jumps on a cold stove again. It's overgeneralized from random mistakes because it "learns" its lessons too fast. Slow MENACE's learning down in opening moves until it has worked its way through the ends of the decision tree fully, and only then can it learn something about the beginning of the decision tree (i.e., nothing good happens down this major branch).
Overall, super duper cool experiment. Thank you for sharing this!!
I'm wondering what strategy, if any, could be taken by the human player to sabotage the learning?
Play perfect when it starts middle or corner, and ALWAYS let it win when it starts on the edge, until it has only edges left in the first box.
If it starts on the edge you can always win, so it will keep loosing from that point on (as long as the human doesn't make a mistake).
Maybe we will be able to implement a similar strategy when the robot uprising occurs.
It should always be possible to throw the good choices out at every single point of the game, with enough patience.
But I would say the earlier in the game you start, the better do your chances get, to actually make it loose every game against a decent player, as soon as it's "well trained".
The main problem with this is that it will eventually die, unless it's put on "life support" as shown...
Cool to see how you could apply it on evolution. You can think of changing the beginning state as a change in genes or environment and then find out how fast it would die or prosper compared to other beginning states...
If there's a corner and a center option, instead of having 5 colors in the box of 2 beads in, wouldn't it be better to have 2 for center and 8 for the corner? And apply the same mirror logic?
What I would kind of like to see is a system where:
loss/give up: remove *only the last* chosen piece
draw: do nothing
win: duplicate each piece used in the win (in the respective boxes)
It's in Javascript, not Python, but here's the Connect 4 AI you asked for.
efhiii.github.io/games/connect-4/
It uses minimax up to a certain level deep, determined by the AI level, and from there uses some basic huristics.
Have menace 1 play against menace 2. see how they both learn to try to beat the other. Just think of two machine learning systems simultaneously learning how to defeat the other. Likely the one going first will win, but if you try this for other games it may be a very interesting experiment.
This is pretty interesting. Loved the online "simulator" I had 1 menace win, 3 draws, and 34 human wins before menace decided the best opening move was to resign. (I broke it.). I'm just wondering when there's going to be a matchbox version of "Global Thermonuclear War".
When an almost empty matchbox on life support is better at tic tac toe than you
I want to see two different menaces fight against eachother.
I'd be interested in MENACE A vs MENACE B, where anytime the last bead would be taken out of a box, it is left in.
Wait. You have a second channel?!
and memes.
based on the views, only about 1/3.5 or so of people who watched the first video were interested enough to watch the second. shoutout to the true fans who stayed for both
I wonder what would happen if One is withdrawn if move results from loss, same number of beads retained if draw and one more added if move results in a win!
I#'m guessing that last year's Science Festival video isn't turning up then?
I played the online MENACE about 50-60 times. MENACE won 1 and Draw about 10-15 times, before it ran out of open moves and started resigning on the opening move. It learn that the way not to lose is to not even play.
It would have been interesting to see a flipped win condition - i.e. _don't_ get 3 in a row. Although I can see a challenge with getting kids to teach it that one.
"Young children are basically random number generators" Brilliant
You should do an adversarial scenario, where two MENACEs learn by playing each other.
I think the 'life support' should be to add in one of every bead.
Why not put those 2 menaces fighting each other?
TheJJ100100 so what? They have the system for that too
they only made a set of matchboxes for being player 1
yes but you could make one for playing second
Starting at 12:09 when Matt is going over the stats, why is there like a brown hue that kind of rolls up the screen? I'm curious to know what causes this, is it lighting, the camera, both?
Also at like 15:22 the brown-white hue is still there, but it is no longer rolling up the screen...So weird
Must be rolling shutter combined with light flickering at very nearly a multiple of the frame rate.
Here in the states we get that when LEDs do not have filter capacitors and are running off direct AC rectification. It also occurs with HID lighting which is also 60HZ driven and is not smoothed out. It is not very noticeable unless you get very close to the light source.
Due to the 50Hz nature of electronics in England, I would imagine there are low end LED fixtures in that room that lack smoothing capacitors, or worse yet half wave as opposed to full wave rectification.
Due to the more dramatic difference between 50Hz versus 29.97 FPS of that of the camera (60Hz and 29.97 FPS would match up fairly well as you would catch each positive sine wave part.) the frames grabbed are slowly syncing and de-syncing with the rate of the light strobe resulting in dim and light areas due to the rolling shutter effect.
Ideally you could work out the math and it should lead you to the numerical answer for why there were multiple bars moving on the screen as opposed to the one or two at most that I see when I am filming near 60HZ light sources.
Oh, the replies would not load before I posed my comment, I see now after a refresh that EllipiticGeometry has beaten me to the point.
11:00 - MENACE apparently learned as much as Joshua.
"How about a nice game of Chess?"
Idea: In Tic-Tac-Toe there are 8 positions on the outside of the perimeter. What rules would have to be worked out such that instead of 8 positions, it can be any even number 8 or above?
4 positions on the outside is a degenerate case where getting 3 in a row would only be possible for Player 1 and only if Player 2 played on the other two "edges." Bringing the game down to only needing to get 2 in a row then means whichever player goes in the middle first must always win, so the rule change to account for fewer positions means it's not much of a game.
6 outside positions but with a rule allowing any 3... For want of a better term "contiguous player marks" to win the game has a little more going for it and would be a playable game though it seems to more quickly go down to "you must control the center" than the 8+center regular Tic-Tac-Toe.
What if Tic-Tac-Toe had a rule allowing any 3 contiguous marks and not requiring they be in a row?
What if we have a game with 10 outside positions? Or maybe 12? The easiest way to account for adding 2 is simply think of the outside positions as the number of edges and vertexes of a regular n-sided polygon. Every side incremented adds both 1 edge and 1 corner/vertex, thus there is always the ability to get 3 in a row across. But it seems adding more outside positions around the center just makes the center even more important to control.
Gonna try recreating it, will probably fail but still gonna try
Matt Parker I definitely will!!!
Online, Menace vs Menace2, it's 298 for 149, 303 draws, highest number is over 600.
4:14 I think M.I.T have found an optimal method, don't know cos it's 55 pages long tlrd.
there's a numberphile video on connect four from 4 years ago claiming it was solved
So could you use the gradient of a best fit line for the winning/drawing portion to give a number to how good MENACE works?
Menace doesn't die, it just learns that the only way to win is not to play :D
Why is there so much footage of them all sorting through papers
0:41 That's Vector in the background!
That is so awesome.
Although, if I were to keep it on life support, I'd be inclined to just put in one bead for each possible choice, including the bead that exists. But I suppose that would be just a bit nuts and shows that I'm a humanities guy.
This is an art installation. You're not pretentious enough to call it that, but that's what that is.
Is there an optimum path for later moves, that start of with the red opening move, which would lead to frequent victories?
yay! excited for this video! :D
Would having 100 of each colour in a box create a more optimum strategy without as much risk of "dying" but itll take longer to find the strategy?
What if they only gave it one if it drew but three if it one, therefore teaching it to value winning over just drawing?
What if the last bead of each colour wasn't allowed to be thrown out? It would give a chance for 'bad' moves to find a winning strat