Scrabble GM vs. AI -- the Rematch! Game #16

Поделиться
HTML-код
  • Опубликовано: 9 июл 2024
  • The Scrabble AI BestBot got the best of me in my 100-game Human vs. AI Ultimate Scrabble Battle, but I'm not ready to cede to our AI overlords! Introducing... the GM vs. AI rematch!
    This 100-game series, running every Monday and Wednesday at 5pm ET for 50 weeks, will feature 20-minute games against BestBot with post-game analysis. Hope you guys enjoy, and wish me luck!
    BestBot is the upcoming ultimate Scrabble AI from Woogles.io, to be launched in 2024. For questions, please email woogles@woogles.io.
    Want personalized help taking your game to the next level or a fun gift for a friend? Check out www.mackmeller.com/lessons! for more info or email me at mackmeller@gmail.com!
  • ИгрыИгры

Комментарии • 30

  • @CyberchaoX
    @CyberchaoX 19 дней назад +9

    Wow that first turn. 8 different sevens with U in fourth position, but not a single eight with a U in fourth or fifth position.

  • @axcertypo
    @axcertypo 19 дней назад +21

    pretty sure your play of YO is better than ODYL. Leaving 2 in the bag is so much better than 1, you're just not outmaneuvering the endgame all that much to go for the higher-scoring tiles. LOUSY might not win much, but it might win more than YO or ODYL, given how well BestBot can block bingos while maintaining its lead.
    Gonna try some rough math, and will start by saying there's around a 95% chance I'm getting this wrong. You're hitting TUILLES in two spots with your likeliest two-tile draw. There are I think around 150 out of the 330 possible 4-tile combos where an I and a T are in the bag if this were random, but actually EVICTED comes down at M6 if the bot had the T, and EVINCED keeping an I makes a ton of sense, so we can assume that the IT draw is slightly more likely than average since we can rule out a T but lean more into an I. In any case, if we assume half of the 4-tile draws contain IT, >28 of those contain IIT or ITT and one contains IITT. With 28 4-tile combos you're at 2/6 to draw TUILLES and with 1 of them you're 3/6. So out of 165 draws, that's around 10/330 for that + 1/6 out of the other 135 or so draws, that's around a 10% chance you're drawing TUILLES alone.
    DI = SULLIED/ILLUDES, AD = ALLUDES, DK= SKULLED, IM = ILLUMES also all require a very precise blocking play from BestBot which could harm its chances of winning in the endgame.
    MA = MALLEUS, DT = DULLEST, FT = FULLEST, KM = SKELLUM, MT = MULLETS and MY = MULLEYS only play in one spot, so although BestBot must sacrifice to block most of the time, it should be flexible enough to not lose much in terms of endgame chances.
    By playing ODYL, the bot has so much information about what you might have. The fog of war that leaving two in the bag creates in terms of the perceived randomness of what you have really consumes a lot of BestBot's winning chances, whereas with one in the bag the assessment is not only easier from a computing perspective, but so much less random. The unseen threats fare far more likely to contain both bingos and scoring threats that lose it the game, and because endgame sequences that utilize scoring tiles tend to be flexible in terms of what other tiles you have (i.e multiple possible outplays at the end of the tree of sequences that score the same amount with many different racks), the bingo threats become less of a concern, as they require more precise racks. What the bot cannot do is infer that you are closer than average to those threats, so a 2/36 from its perspective could actually be closer to a 1/4 based on what you kept and what remains. This also applies to playing 3 tiles, but to a lesser extent.
    INWOUND was an inaccuracy, but I think that's about it. I agree with your assessment of this game, simply because your decisions were pretty clear from a playfinding*equity perspective. INWOUND was the only blemish in that regard, and that was I think your toughest turn besides the ELLOSUY position.

    • @SamRosin1
      @SamRosin1 19 дней назад +6

      Interesting analysis. My intuition was that the doubled L would be a big hindrance in the endgame, so that ODYL would be preferable, like Mack ended up concluding. But I didn't notice how frequently YO draws into bingos. Without calculation, it felt like there would be a lot of paths to outscoring BestBot after ODYL without bingoing, just by threatening big plays on column O, but maybe this doesn't happen as much as the bingo threats after YO?
      Macondo's pre-endgame engine gives the bag-emptying LOUSY a 24.7% chance of winning. Like you said, it's hard to compare that to ODYL and YO.
      Just for fun and to try to gain a tiny amount of intuition, I played out the sequences with BestBot's actual rack of ADIIMTY and the bag of FHKT. After ODYL, BestBot has the nice B6 MITY retaining DIVA. Mack wins 1/4 times, if he leaves the F in the bag, allowing him to play HUSK for 58 and go out first. After YO and MY, Mack wins 1/6 times by drawing NUTSHELL, although he loses by just 1 after drawing HK. (As Mack pointed out, he would have lost after LOUSY.)
      [Edit:] Also, the bag is slightly consonant-heavy, a minor factor in favor of playing the L

    • @mackmeller
      @mackmeller  19 дней назад +3

      Thanks to both of you for the analysis! Not much to really add on my end, other than that this is what's so awesome about Scrabble -- we may never really know which of these is best, and even the strongest engines can't really tell us for sure given the sheer number of possibilities after each play (at least the ones that don't empty the bag) :)

    • @domino14
      @domino14 19 дней назад +2

      @@mackmeller it's definitely possible to exhaustively enumerate all the possibilities here, but it is difficult and I will have to try again when I have time :( I'd love to be able to plug it in and let it crunch away for a while but come back with objectively correct answers. Of course, it would be even better with inferences, but we can apply those manually after the fact ideally. Someday...

  • @Ray_Ridley
    @Ray_Ridley 19 дней назад +10

    Two terrific games this week - thanks Mack !

    • @mackmeller
      @mackmeller  19 дней назад

      Thanks! Would've liked for one of them to go my way, but what can you do :)

  • @AugustusMatthias
    @AugustusMatthias 20 дней назад +14

    Definitions of Interesting Words Played in Game 16:
    BUTEO (20 pts) [noun] - a hawk of the genus Buteo (such as the rough-legged hawks and red-shouldered hawks) that has broad rounded wings and fan-shaped tails and that soar and wheel high in the air [from Latin]
    RENEGADO (77 pts) [noun] - variant spelling of "renegade"; a deserter from one cause, principle, party, or allegiance to another often hostile one; a turncoat, traitor [Spanish, from Latin]
    EPIGONES (86 pts) [noun] - plural of "epigone"; an imitative follower, especially an inferior imitator of a distinguished writer, philosopher, musician, or artist [from a Latin word derived from Greek combining forms]
    JIRD (36 pts) [noun] - any of several North African gerbils constituting a genus (Meriones) of the Cricetidae, a family of rodents [from Berber, an Afroasiatic language indigenous to North Africa]
    AGNOMENS (65 pts) [noun] - plural of "agnomen"; an additional name or epithet, specifically an additional cognomen given to a person by the ancient Romans (as in honor of some achievement) [from Latin]
    BERGENIA (74 pts) [noun] - any plant of a genus of perennial spring-blooming herbs (family Saxifragaceae) native to central and eastern Asia that have large thick rootstocks which produce typical colonies or clumps of plants with very thick heavy leaves [derived from the name of the German physician and botanist Karl August von Bergen (1704-1759)]
    FRAILEST (83 pts) [adjective] - superlative form of "frail"; physically weak or easily broken (not firm or durable) [from French]
    COATI (21 pts) [noun] - either of two tropical American mammals of the genus Nasua (Nasua nasua and N. narica) that are related to the raccoon but have a longer body and tail and a long flexible snout [Portuguese, from Tupi, an indigenous language of the Americas spoken is what is now Brazil]
    AIRWAVE (81 pts) [noun] - the medium of radio and television transmission, usually used in plural to refer collectively to radio and television broadcasts [compound noun]
    EVINCED (30 pts) [transitive verb] - past tense of "evince"; to constitute evidence of (prove, confirm) or to display clearly (exhibit, manifest, reveal, express) [from Latin]
    LOX (10 pts) [noun] - a fillet of brined salmon, which may be smoked [Yiddish, from German]

    • @Ecrilon
      @Ecrilon 19 дней назад +4

      Because I was curious:
      ODYL (28 pts, not played) [noun] - a force or natural power formerly held by some to reside in certain individuals and things and to underlie hypnotism, magnetism, and some other phenomena.

  • @elvenstein
    @elvenstein 19 дней назад +4

    Two absolute nailbiters in a row!

  • @henryt9281
    @henryt9281 19 дней назад +2

    Mack, I'd like to see you start playing more aggressively and offensively for a few games. Making the highest-scoring plays that open up the board for both players. It doesn't seem like your defense is doing much to restrict BestBot's scoring, but you are constraining your own scoring opportunities. I want to see if wide open boards can change your luck.

  • @craiglarimer1173
    @craiglarimer1173 18 дней назад

    Brilliant game. What an intense end game. Enjoyed the analysis. Keep these videos coming.

  • @AmaranthRBY
    @AmaranthRBY 19 дней назад +1

    I was screaming internally about looking for parallel plays on that first move! With a rack that good there had to be something, and these overlaps are so much nicer defensively.
    In this case bot would have still bingoed with PEONIES/BUTEOS, but that scores 17 less than EPI(G)ONES did, and you're the one that can pounce on all the floaters after it, so it definitely would've turned out way better.
    Especially with these low pointer racks, playing to a triple is not even *that* high scoring, so it's worth it to look for the defensive options in a situation like that

    • @mackmeller
      @mackmeller  19 дней назад +1

      100%. And also by definition with the low pointers they'll make better floaters for your opponent if you do play to the triple and expose them, so yet another reason to prefer the defensive play

  • @iwersonsch5131
    @iwersonsch5131 19 дней назад

    ah i think i get it. The units named after Siemens and Kelvin are words because it's _just_ siemens and _just_ kelvin. The units named after Fahrenheit, Celsius, and Elo are not words because it's _degrees_ fahrenheit, _degrees_ celsius, and elo _points._

    • @mackmeller
      @mackmeller  19 дней назад

      Hmm, never thought of it that way but sounds plausible!

  • @squadxzo
    @squadxzo 19 дней назад +1

    I feel like while the s helps bingo chances you have to cash in on the s straight away or else you get stuck in a board state after the bot blocks airwaves with no good place to hook the s to score points

  • @cbauermusic
    @cbauermusic 19 дней назад

    GG

  • @ohtani2024
    @ohtani2024 19 дней назад +5

    What an unfortunate loss...... thought you'd have this one! Could you do the end game quackle analysis?

  • @thatorange08
    @thatorange08 19 дней назад

    hi Mack, another unrelated question. I just discovered KAKA is a valid word. Do you think you've ever played this(or seen this) in an actual game?
    It's kind of like ZZZ, a fairly useless word that can only happen with the help of blanks.

    • @mackmeller
      @mackmeller  19 дней назад +2

      Hmm... can't say definitively but I don't remember ever playing it. I've definitely bingoed with 2 K's before (ROCKWORKS as a disconnected 9, most notably), but that's a lot more likely to be worth a blank than a play like KAKA

    • @Xadreco
      @Xadreco 18 дней назад

      @@mackmeller what about the less commonly played 3's, like SAP which burns an S and doesn't contain a 2 letter word (or things like UDS# in collins)

  • @almightyhydra
    @almightyhydra 19 дней назад +1

    Classic bestbot, bingo into bingo into draw the blank, dump a few 1 pointers then bingo for the third time in 4 turns
    Also are we absolutely sure it doesn't know the opponents' rack, it played MY instantly despite 2 in the bag (does it think on opponent's turn?)

    • @axcertypo
      @axcertypo 19 дней назад +3

      I would have muted you by now if I were Mack. Anyone with any understanding of computer science can be shown the code Woogles uses for BestBot and conclude that any cheating accusations are as thick as horse manure. It's open-source. It's right there for you to examine. They've debunked these conspiracies every step of the way.
      The reason it played MY so quickly is because it generates all of its plays and sees it cannot block FULLEST/AIRWAVES, (N)UTSHELL N5, HUSK O12, or FLUSH/FLESH O11 without scoring too little or creating new threats. Since all but one of the threats to BestBot losing are confined to the bottom right, it narrows down plays to consider to only the ones that cover some of those threats. There is literally no other play that does anywhere near the work MY does. The unseen letters are EFHKLLSTU. There are only 35 situations to look at, and it can narrow down options with lightning speed due to the hard work volunteers are putting into developing the best Scrabble engine the world has ever seen, and despite the efforts of the stubborn few that look for nonexistent lint within a pile of gold.
      As somebody who believes very strongly in the human mind's ability to match the superhuman calculating ability of Scrabble engines current and future, you and all who continue to falsely accuse this engine of cheating are doing a disservice to the very real potential of improving our understanding of Scrabble and trying to match up to the greatest tech our world has ever seen.

    • @Splax77
      @Splax77 19 дней назад +5

      FULLEST is the only possible bingo out of the remaining tiles from BestBot's POV. It makes perfect sense for the bot to block when a bingo is the only possible threat from Mack.

    • @SAYFVCKFOR100DOLLARS
      @SAYFVCKFOR100DOLLARS 19 дней назад

      After playing a bunch of bo100s with Mack against that so-called "BestBot" and looking at the games, it appears to me that BestBot is cheating. Also I've come across two separate games with the same 9 letter word T(R)OMI(N)OES played through the same two letters. Like, seriously? Is this drawing algorithm rigged or what? BestBot must be pulling off some sneaky shenanigans, drawing multiple racks from the bag and shamelessly picking the best tiles for itself. Meanwhile, poor Mack gets stuck with the absolute worst tiles, handpicked by the devious BestBot. It's a tile conspiracy, I tell ya!

    • @Matuiss4
      @Matuiss4 19 дней назад +3

      MY is a very obvious block tho.

    • @AlexDings
      @AlexDings 19 дней назад +5

      Yes we are absolutely sure. The code is open source

  • @iwersonsch5131
    @iwersonsch5131 19 дней назад

    BA(N)ACH*: (adj) normed and complete