Remember : ChatGPT will never think outside of the box. ChatGPT is the box. Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.
ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)
without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.
The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious. Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.
The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one. For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."
Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI. Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.
"ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck. It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget. I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.
Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.
While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses. 1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer. 2 [(C)] or C. Again, novel responses help guess the Patron. 3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation. Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause. 4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship. 5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind. 6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel. 7 D or [(D)]. This one is tough. The left was more novel. Wow. I expected something like 55/45 odds there. Let's Go! 8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors. 9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer. 10 S or [(S)]. Redundant LLM response is redundant.
After the 10 scores: Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise) LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A) Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.
Bonus Round? 1. [(B)] or C. The C had a novel response. Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.
Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.
Lord of Extinction: I was right from the grade alone Lightning Bolt: They had the same grade, so I guessed correct based on the description Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me. Taoist Mystic: Obvious from the grading. Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle. Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns. Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it. Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description. Tropical Island: The description helped. Right again. Forcefield: I was right and that's an interesting card. Of course it's on the reserved list. Chat GPT is fairly easy to sus out. But it's still interesting to see.
I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD (It was green black)
I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.
Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan. For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.
tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really
What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo
God damn, Mechanized Production just gettin' smoked out of nowhere here! It's gimmicky, sure, but as far as artifact payoffs are concerned, you can do way, way worse. Both the AI and the person remark on the resources you're pouring into trying to resolve it, when it really just doesn't ask for much of anything outside of it's 4 mana investment and works regardless of whether or not you're going for the win con or not. As a wincon, Mechanized asks "Can your deck make treasures, clues or foods?" If yes, then congratulations, your deck can run Mechanized Production for basically free. As just a random artifact goodstuff card, it still makes you free copies of good cards. Put it on a Solemn Simulacrum and you're getting ramp every turn and can churn them into card draw and that's low-balling it. Imagine every turn Sol Ring or Jeweled Lotus. I don't mean to overhype Mechanized Production or shit on the guy that gave the evaluation. I'm just here to defend my pet card. It's fitting, that it was listed just after Panharnomicon, because I think both kinda occupy a similar space of "4 mana do-nothing that usually gets blasted before it gets to do anything." and as kindred spirits, would've both gotten a C from me as fine cards that are probably not gonna get to go off, but are worth it in casuals for the off-chance they do get off the ground.
If both answers were then (re-)summarized by ChatGPT, it may have removed the obvious bias that is inherent in the verbiage and language used of ChatGPT. Clear prompt reiteration from ChatGPT and "I built a deck..." phrasing from the humans made this not much of a game.
"highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways. Also Ai do be yappin' with way too eloquent words all the time
Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.
The biggest take away is that CHATGPT doesn't refer to itself in its answers. Some of the people used "I" when they answered.
Didn’t notice that!
I also saw chatgpt using I.
Honestly for me it was pretty obvious and that gave it away a lot
Yeah AI don’t refer to themselves very often… in formal writing to drive a point home you don’t use the word “I.” I picked up on that too, lol.
Remember : ChatGPT will never think outside of the box. ChatGPT is the box.
Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.
ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)
without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.
16:09 that's why you either use the API or always restart a new conversation per case
Yea I didn’t know it was gonna do that. Made for a funny bit though
The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious.
Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.
woah demonic tutor trip me up I'm surprised someone gave that a B.
That threw me for a loop as well!
The AI says "decks within it's colors" or some variation of that A LOT, kind of a giveaway
The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one.
For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."
Yea those are the AI hallucinations which causes it to be wrong
Bros rating Mechanized Production as a D forget that treasures exist. It all goes back to Smothering Tithe.
Smothering tithe really was the hero of this story
I really need to hear more on the patron’s thoughts on giving demonic tutor a b
I made it anonymous so unless they tell me, I also won’t know more
I would have given it an A, but yeah, it isn't an S imo
Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI.
Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.
Yea it was such a weird response for that card
"ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck.
It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget.
I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.
Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.
Novel reasoning ends up being a huge giveaway! I think you are spot on
While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses.
1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer.
2 [(C)] or C. Again, novel responses help guess the Patron.
3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation.
Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause.
4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship.
5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind.
6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel.
7 D or [(D)]. This one is tough. The left was more novel.
Wow. I expected something like 55/45 odds there. Let's Go!
8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors.
9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer.
10 S or [(S)]. Redundant LLM response is redundant.
After the 10 scores:
Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise)
LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A)
Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.
Bonus Round? 1. [(B)] or C. The C had a novel response.
Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.
Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.
Lord of Extinction: I was right from the grade alone
Lightning Bolt: They had the same grade, so I guessed correct based on the description
Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me.
Taoist Mystic: Obvious from the grading.
Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle.
Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol
Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns.
Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it.
Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description.
Tropical Island: The description helped. Right again.
Forcefield: I was right and that's an interesting card. Of course it's on the reserved list.
Chat GPT is fairly easy to sus out. But it's still interesting to see.
I noticed thay chatgpt has a habit of reiterating the prompt. It always talks like its checking off items on a checkbox.
I feel like that’s how a lot of AI work looks.
I swear they didn't used to and now they always do it, in any context. Its very frustrating.
it's got the vibe of trying to fill space in a high school writing assignment you reeeeally don't want to do
I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD
(It was green black)
That’s funny maybe I’ll need to try that too
Oooohh, surprise Snail participation :D
I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.
Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan.
For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.
Yeahhh very not hard to guess. AI ain't killin us yet
tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really
I do like this idea, though the flexibility of a tutor I think makes it worth it!
What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo
@@Jacob-km4yb . . . I know
10:40 interesting option
I enjoy your content.
I’m glad! I know this one is a bit different so I’m happy you like it
rad, got recommended your work early
God damn, Mechanized Production just gettin' smoked out of nowhere here! It's gimmicky, sure, but as far as artifact payoffs are concerned, you can do way, way worse. Both the AI and the person remark on the resources you're pouring into trying to resolve it, when it really just doesn't ask for much of anything outside of it's 4 mana investment and works regardless of whether or not you're going for the win con or not. As a wincon, Mechanized asks "Can your deck make treasures, clues or foods?" If yes, then congratulations, your deck can run Mechanized Production for basically free. As just a random artifact goodstuff card, it still makes you free copies of good cards. Put it on a Solemn Simulacrum and you're getting ramp every turn and can churn them into card draw and that's low-balling it. Imagine every turn Sol Ring or Jeweled Lotus. I don't mean to overhype Mechanized Production or shit on the guy that gave the evaluation. I'm just here to defend my pet card. It's fitting, that it was listed just after Panharnomicon, because I think both kinda occupy a similar space of "4 mana do-nothing that usually gets blasted before it gets to do anything." and as kindred spirits, would've both gotten a C from me as fine cards that are probably not gonna get to go off, but are worth it in casuals for the off-chance they do get off the ground.
What version of chatgpt did yall use? 3.5 is terrible, 4 is great but behind a paywall. If yall used 4, did u put any additional reference info in?
Next time ask chat gpt to write like a normal personnor dumb it down and feed it other peoples reviews so ot wrotes on a similiar context
If both answers were then (re-)summarized by ChatGPT, it may have removed the obvious bias that is inherent in the verbiage and language used of ChatGPT. Clear prompt reiteration from ChatGPT and "I built a deck..." phrasing from the humans made this not much of a game.
I love you
to be fair to chat gpt
it never played commander, freacking out over lord of extinction is a classic noob mistake
"highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways.
Also Ai do be yappin' with way too eloquent words all the time
Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.
Interesting I often think of it as a charm effect. I don’t know if I’ve ever been explicitly targeted because of it
Whats funny, chat gpt learned to talk about MtG from real people chats. So if you're gonna blame anyone, blame reddit)
boooooo. AI is dumb, and you shouldn't be feeding it more data.
Me making articles or videos feeds it data. Not me asking questions. Just asking questions isn’t really training it
I would not be surprised if the questions are saved as more raw data to feed it later.