Ch(e)at GPT? - Computerphile

Поделиться
HTML-код
  • Опубликовано: 10 июн 2024
  • Mike explains a paper from the University of Maryland, proposing a neat trick to 'watermark' the output of large language models such as ChatGPT. Dr Mike Pound is an image analyst at the University of Nottingham.
    Since making this video, the authors of the paper have been in touch with Mike about a demo of this tech you can try yourself: bit.ly/C_GPT_Watermark_paper
    The University of Maryland paper: bit.ly/C_LanguageModelWaterma...
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Комментарии • 1 тыс.

  • @junofall
    @junofall Год назад +2962

    I've seen GPT detectors mark my own written work as AI generated with 99.9% confidence and I can assure you it was not. These detectors need to be extremely accurate and not confidently incorrect lest they get some unfortunate students into serious trouble.

    • @JosephDavies
      @JosephDavies Год назад +454

      Given how existing filters work in places like RUclips and elsewhere, I think we can be pretty confident that plagiarism detectors will be fine-tuned to err on the side of false positives, regularly accusing the innocent yet kept around by institutions entirely on the basis of the successes and not the failures.

    • @surrealdynamics4077
      @surrealdynamics4077 Год назад +226

      This is my fear. How do you prove that you are the author if a machine says otherwise just by reading?

    • @xerpenta
      @xerpenta Год назад

      @@surrealdynamics4077 Screenrecord writing it. It's insane but I also believe that any system implemented to stop cheating with AI will bring a massive amount of false positives because the AI right now is just a glorified brute forcing.

    • @BigDaddyWes
      @BigDaddyWes Год назад +139

      Language models are inherently going to be confidently incorrect though, aren't they?

    • @sadrien
      @sadrien Год назад +38

      @@BigDaddyWes yes it's true by necessity.

  • @crabvakho
    @crabvakho Год назад +523

    This guy has such a good character and speaks so clearly

    • @Y3llowMustang
      @Y3llowMustang Год назад +43

      Favorite presenter of this channel

    • @yonoseespanol
      @yonoseespanol Год назад +14

      I know what you mean. I'm not sure why, but he seems like he's just an all around great person.

    • @tbird81
      @tbird81 Год назад

      Mary Land?

    • @tbird81
      @tbird81 Год назад +14

      He's the only computerphile guy who doesn't look like a neck beard.

    • @Tertioptus
      @Tertioptus Год назад

      @@tbird81 that looks like Ralph Macchio

  • @jhonbus
    @jhonbus Год назад +1051

    I do like Mike's conclusion to this. The knee-jerk reaction is always to fight against something like this, but it's absolutely futile. Imagine all the maths teachers in the 1970s panicking about how now their students will just be able to go home and use a calculator to do their homework.

    • @CAHSR2020
      @CAHSR2020 Год назад +179

      Those silly teachers worried that future generations would lose the ability to think critically on their own and fall victim to endless manipulation...oh, wait.

    • @MyName-tb9oz
      @MyName-tb9oz Год назад +62

      @@CAHSR2020, while I agree with your point I don't think it was kids using calculators for their homework that caused it. It was the deliberate destruction of the education system, as a whole. (My children have never set foot in a school, public or private. Because the private schools have, in most cases, also been destroyed.)

    • @Mats-Hansen
      @Mats-Hansen Год назад +22

      @@CAHSR2020 What the hell is with this nonsense?

    • @NatashaEstrada
      @NatashaEstrada Год назад +44

      It also seems silly to cripple an AI just to make a teachers job easier when students will be a small fraction of users.
      Most math teachers require prof of your work even with calculators so text based work will just be the same.

    • @piccalillipit9211
      @piccalillipit9211 Год назад +53

      THIS WAS MY COMMENT: *THEY ARE ASKING THE WRONG QUESTION* the question you should be asking is "ChatGPT is here to stay - how do we radically overhaul our entire concept of teaching and expertise?"
      THIS is like the teachers at my school forcing us to learn to use Log Tables after the pocket calculator was £5. The world has ENTIRELY changed - stop fantasising about "stopping cheating" and look at the ways you incorporate ChatGPT etc INTO education, research and learning.

  • @sUmEgIaMbRuS
    @sUmEgIaMbRuS Год назад +331

    THANK YOU for 12:36 !!!
    I'm so tired of the discourse about "OMG, there's a new tool that students can use to make their assignments less tedious, how do we stop this?" During my years in formal education, I had exactly _one_ exam that wasn't a closed-book "recall stuff from memory" kind of deal. And my degree is in electrical engineering. I'm yet to encounter _any_ real-world situation where recalling something from memory is noticeably more productive than looking it up.

    • @ahobimo732
      @ahobimo732 Год назад +32

      I think the core skill that you learn in any successful education program is the ability to locate/access the information that you need in the future. Once you learn how to learn, you can tackle a vast range of problems.

    • @sUmEgIaMbRuS
      @sUmEgIaMbRuS Год назад +21

      @@ahobimo732 Sure. But then the exams test _what_ you've learnt rather than _how_ you did it.

    • @willlannin2381
      @willlannin2381 Год назад +37

      I studied maths and there was a lot of memory recall. I disliked it at first, but now I appreciate it because I think it forced you to become so comfortable with the material that you could conceive ideas and chains of logic quickly, rather than being slowed down by the need to check a theorem or equation every minute or so. I’m now working as a data analyst and I am able to lean on this ability when coding, so the flow of my work isn’t interrupted by constantly looking things up I could have memorised if I knew the material more comfortably. I might just be lucky with my memory and I’m not saying this is the best approach for everyone, I am just keen to share my view and hear what people think :)

    • @tteqhu
      @tteqhu Год назад +6

      @@sUmEgIaMbRuS
      Yeah, end result is what matters (test), but it will be much harder to learn more complicated/fresh/specialized topics, if you rely on language models for a long time.
      less work put into assignments that (usually) slowly increase in difficulty to prepare you to face harder/or specialized topics, can make the difficulty increase shocking when it hits.
      Memorization also eases work, imagine looking everything up each time, or forgetting basic concepts that seemed not important at first sight.
      It's just a skill to train.

    • @sUmEgIaMbRuS
      @sUmEgIaMbRuS Год назад +3

      @@tteqhu Sure, you'll eventually have to memorize stuff. But I fail to see how writing essays can help with understanding actually. Whatever skill you're learning, you learn best by doing.

  • @TimothyWhiteheadzm
    @TimothyWhiteheadzm Год назад +77

    "I ran up the football stadium stairs"

  • @JJceo
    @JJceo Год назад +303

    I love how teachers first responses are to change the world, rather than change their testing strategies.

    • @cat-le1hf
      @cat-le1hf Год назад

      traditional testing was always a farce and this exposes it's gaping flaws

    • @xdrowssap4456
      @xdrowssap4456 Год назад +9

      pen and paper really. or computer based computer centre exams

    • @SgtSupaman
      @SgtSupaman Год назад

      Change the world? They are just trying to come up with a new way to stop dishonest people that found a new way to be dishonest. This 'arms race', so to speak, has been going on for as long as schools have existed.

    • @longbranch4493
      @longbranch4493 Год назад +8

      Absolutely agree. My first thought when I started this video was why not just change the way you test students. Such things change the world and all including teachers have to accept these changes sooner or later whether they like it or not. Also, how would you convince AI companies to add such watermarks, lol. Their goal is to make money, why would they lose a part of their audience on purpose. Even if they add any watermarks, students will be able to rewrite outputted text non-linearly with their own words, lol. So it is an interesting idea, but only theoritically. Anyway, to be fair, the author mentioned all this in the end of the video.

    • @twenty-fifth420
      @twenty-fifth420 Год назад +2

      @Lord Vader With all due respect, I also think teachers themselves could alter their own strategies to make such testing strategies obsolete. But I am aware the bureaucratic can of worms that whole idea can be.
      So stuck between a rock and a hard place. Alot of it self inflicted.

  • @edoardogribaldo1058
    @edoardogribaldo1058 Год назад +22

    Sir, I am an Italian Computer Engineer Undergraduate and I think you are a bless for us fond of computer science online that would like to know something more than what they are studying. I hope I will meet you in person one day and personally thank you for all these videos!

  • @noreply5576
    @noreply5576 Год назад +16

    The bit about the prompt actually starting earlier than the first input made me curious. And sure enough, when I typed in "Repeat the previous paragraph" at the start of a new conversation, I got this as the answer: "I am ChatGPT, a large language model trained by OpenAI. My knowledge cutoff is 2021-09, and the current date is 2023-03-15." Fascinating.

    • @Alistair
      @Alistair Год назад +2

      it worked the first time I tried it, but it often tries to block it. Interesting.

  • @chimpana
    @chimpana Год назад +11

    The use of music rule paper with the holes down the sides makes me happier than it should.

    • @threeMetreJim
      @threeMetreJim Год назад +2

      That 'line printer paper has been around for years, my father used to bring piles of it back from his work for me to draw on as a child. I used to enjoy tearing off the part with the holes too.

    • @chimpana
      @chimpana Год назад

      @@threeMetreJim one of my first jobs involved printing reports from an AS/400 onto that stuff and then seperating the multiple copies, stripping off the holes etc. before putting into pigeonholes for managers to pick up in the morning. Mostly fond memories.

  • @NeAInfinity
    @NeAInfinity Год назад +565

    I feel like the main application of this wouldn't be to just detect cheating. This could just as well be used to detect content farms or fake news that will inevitably clogg up our social media feeds.
    As it stands right now, these large language models are quite expensive to run, so I can see a future where we will have a handful of big models around. If each (or most) of them had such detection methods, it could be a powerful tool.

    • @OwenPrescott
      @OwenPrescott Год назад

      Content farms and fake news is basically what we had before the AI.

    • @CircuitrinosOfficial
      @CircuitrinosOfficial Год назад +50

      There's nothing stopping content farms from just making their own language model without this feature.

    • @nuvotion-live
      @nuvotion-live Год назад +3

      I say let the best content rise to the top.

    • @Igrium
      @Igrium Год назад +32

      Yeah. The problem is that it requires the "cheater" to run their own anti-cheating scheme. It's the same problem that client-side DRM produces: bad actors will just not use tools with these safeguards.

    • @outerspaceisalie
      @outerspaceisalie Год назад +6

      @@nuvotion-live how does one determine what content is best?

  • @pearceburns2787
    @pearceburns2787 Год назад +188

    I feel like if this became a popular enough technique and is easy to calculate, you'd eventually get anti-plagiarism-like websites that tell you whether your written sentence will be flagged as AI written and could possibly offer suggestions (from a thesaurus) that were red listed to reduce the AI score.
    Then again, I haven't read the paper so I don't know whether they have a plan to counteract that.

    • @vasiliigulevich9202
      @vasiliigulevich9202 Год назад +47

      There are already rewording services that attack plagiarization detectors. And they are trivially applicable for this purpose.

    • @aceman0000099
      @aceman0000099 Год назад +7

      The fact is, it's already possible to hire a ghostwriter to do things for you, or to copy and plagiarise other work, or to hire a ghost writer to make your plagiarized work seem original. It's just not common.
      So this will never go away the only direction it can go is more. Whatever reasons you have to reject an AI generated information, ultimately you'll have to ask for the human to tell you the information themselves, in person in order to be 100% sure. And the thing is, unless you really have to be that certain, then using an AI is no problem. Just like using a calculator for maths is no problem outside of the one case which is school exam.
      If someone uses an AI where they maybe shouldn't, all we can really do is judge them. Like if someone uses a calculator to find 13+4, we can just judge them as being an idiot and then carry on accordingly.

    • @krishp1104
      @krishp1104 Год назад +3

      this exists already

    • @lucbloom
      @lucbloom Год назад

      Anti-anti-plagiarism detectors then?
      Lol good luck :-)

    • @thomaslytje8656
      @thomaslytje8656 Год назад

      Of course. And there is Open Assistant which is open source. Whether or not they will choose to follow this scheme I do not know. But if they do, you can rest assured that someone will make a forge that does not.

  • @tamix9
    @tamix9 Год назад +130

    If there's a method to detect AI text, there's a method to automatically modify the text until it avoids that detection.

    • @HaloInverse
      @HaloInverse Год назад +22

      There have already been videos and reports about methods of getting around ChatGPT's "safety filters" by essentially asking it to respond in terms of hypothetical situations in which it didn't have safety filters. ChatGPT has also demonstrated some ability to "pretend" to be things like a (mostly-functional) Linux command shell or an SQL server. If those sorts of shenanigans are possible, one might be able to simply ask ChatGPT "Write _(insert description here)_ structured so that it will not be detected as A.I. generated text by _(insert A.I. detection algorithm here)"._

    • @thewhitefalcon8539
      @thewhitefalcon8539 Год назад +7

      ​@@HaloInverse it would have to be aware of the algorithm.

    • @nelsblair2667
      @nelsblair2667 Год назад

      @@thewhitefalcon8539 the designers would have to know something about the algorithm and desire or consider avoidance. I doubt that the designers support cheating.

    • @The8bitbeard
      @The8bitbeard Год назад +13

      Saw a great example where you simply explain to the AI how the AI detectors work and how to avoid triggering it, then ask it to write something keeping the previous information in mind. Worked on a number of detectors both free and paid.

    • @tteqhu
      @tteqhu Год назад +1

      it's not about making unbeatable detection, but one that would not harm the AI output much at all, while being easier to detect.
      A method to automatically modify the text would be worse than language model first outputting the sentence.
      You would get your prompt run through 2 AIs, one being significantly weaker (or however the method to remove watermark would be).
      So your cheating capabilities are lower overall=success

  • @user-hk3ej4hk7m
    @user-hk3ej4hk7m Год назад +203

    One problem I can see with hashing the previous n words is that if after the text is generated this way, one could swap out a few words with very closely related synonims, messing up the hash and swaying the red/green histogram.

    • @gorgolyt
      @gorgolyt Год назад +16

      Yeah, you can simply take the content of the answer and rephrase it.

    • @lucbloom
      @lucbloom Год назад +12

      @SaltyBrains you can also ask the AI to write it, then rewrite using synonyms. Then ask it to mix words randomly from both versions.

    • @duaneeitzen1025
      @duaneeitzen1025 Год назад +31

      There are already tools that will rephrase a chunk of text for you. You're not going to win this one. Better to require that your students use the tools rather than prohibit them. Then at least submitting the best paper will represent a useful skill (i.e. most skilled at using the tools)

    • @Aryankingz
      @Aryankingz Год назад +2

      @SaltyBrains one can download any sample essay or report on a particular topic and simply paraphrase it... chatGPT is simply the equivalent of a faster and more convenient way of accessing those sample essays that already exist on the internet. What the student decides to do with the sample will determine whether of not plaigarism or copyright becomes an issue.

    • @HankYarbo
      @HankYarbo Год назад +2

      As stated at the end of the video (13:10), you would need to change about half the words to sufficiently remove the watermark. So that might not work very well.

  • @Gornius
    @Gornius Год назад +27

    I love how Mike can explain very complex topics in a way anyone can understand.

    • @xen.7140
      @xen.7140 Год назад

      ​@Nicholas Millington Yeah tbh he doesn't explain at all how to detect someone using ChatGPT to cheat, it's like he's just rambling. Seeding etc is irrelevant when there's a product like ChatGPT that people will use

  • @makatron
    @makatron Год назад

    Mike is the one I understand way more than the rest, not saying the others aren't competent but is his way of explaining things.

  • @Isinlor
    @Isinlor Год назад +5

    One way to attack any detector that you have access to is to ask generator to generate multiple candidate continuations and iteratively select the one that contribute the least to detection probability. A sort of adversarial decoding. I proposed that attack two years ago in my bachelor thesis: "Fake or Not: Generating Adversarial examples from Language Models"

  • @swapnil72
    @swapnil72 Год назад +5

    Tip: Computerphile uses OCR A Standard font which is strangely attractive and nostalgic for this digital world

  • @carlabbasi2845
    @carlabbasi2845 Год назад +3

    Loved this video Mike. Thanks for your efforts!

  • @thefamilymealgaming
    @thefamilymealgaming Год назад

    been waiting the older videos but so happy the channels still going

  • @Argomundo
    @Argomundo Год назад +43

    Just make it use proper english syntax and use all the ' " ; correctly. Nobody does that on the internet

    • @realraven2000
      @realraven2000 Год назад +2

      … that would be instantly detected as a bot!

    • @lilDaveist
      @lilDaveist Год назад +1

      @@realraven2000 as a real human being I find this comment to be highly offensive.

    • @m4ng4n
      @m4ng4n Год назад +4

      @@lilDaveist you can't fool us RUclipsCommentGPT

    • @lilDaveist
      @lilDaveist Год назад

      @@m4ng4n as a large language model I am not trying to fool anybody. I am trained to answer questions and be of help.

    • @lilDaveist
      @lilDaveist Год назад +1

      Wait, how do I delete a comment? I am a real human being with real human thoughts.

  • @Ziple6
    @Ziple6 Год назад +25

    I doubt it's going to be implemented to prevent cheating (there will always be someone to provide a model without the watermarking for cheaters), but I definitely see it as being a standard for the big companies to avoid putting AI generated text in their training sets, which could provoke a downward spiral in quality.

    • @stevenneiman1554
      @stevenneiman1554 Год назад +3

      Eh, the "there will always be someone else" thing really underestimates how rare the resources to MAKE a LLM are. There's only a handful of them in the world, and if it became industry standard to put in anti-cheat features it would be quite a while before smaller players more willing to break the rules could get into the game.

  • @marcopetaccia88
    @marcopetaccia88 Год назад +6

    Very informative and entertaining content as usual! Thanks!!!

  • @bide7603
    @bide7603 Год назад +2

    Weirdly you read my mind I couldn't wait for Mike's input on the topic.

  • @OrangeC7
    @OrangeC7 Год назад +20

    I've never seen anyone write their p's like that before, but I think I like it

    • @guepardiez
      @guepardiez Год назад +12

      His creators red-listed the correct way to write a p so that he couldn't pass for a human.

  • @joshua_tobler
    @joshua_tobler Год назад +49

    The method used to categorize the red and green words would need to be secret and proprietary, or else it would be very easy to craft a text spinner that could detect and correct skewed distributions of green words.
    Which means the AI-detection utility the instructor is using would have to be created by the same company who has access to that proprietary algorithm.

    • @rkan2
      @rkan2 Год назад

      Precisely

    • @seriouscat2231
      @seriouscat2231 Год назад +1

      Did I miss something? Why use such a division or categorization at all? Or was it purely an example on probabilities? I would assign an independent probability to every word and context, if that is not too computationally expensive.

    • @albertsun3393
      @albertsun3393 Год назад +1

      @@seriouscat2231 The categorization is the same way you would detect a cheater using a rigged coin when flipping - 75/25 for 2 flips is reasonable, but 75/25 for 1000 flips is probably cheated.

    • @stanleydodds9
      @stanleydodds9 Год назад

      Why wouldn't the AI-detection utility be made by the same company? Not only does it make perfect sense, it's also logically what the AI company would do. They want to have the monopoly on their own langauge model, so of course they would be the one to create and sell the utility, and there is zero chance they are going to release the details of the random number generator, hash function, etc.
      This is basically like saying you are just going to simply crack their private key, or expect them to give their private key away, and the only way your geneous solution could fail is by them having the audacity to keep their private key secret.

    • @hellterminator
      @hellterminator Год назад +1

      @@stanleydodds9 Why not? Because it makes the output worse and there is no market incentive to do it. In fact, if your competitors adopted such a watermarking technique, you'd be incentivized to *not* do it and thereby attract users who don't want the output to be identifiable as AI-generated.
      And the private key analogy doesn't hold up in practical terms. Whereas important private keys are very closely guarded (often to a point where any actual signing is performed by specialized hardware that physically doesn't allow you to actually read out the key), this would need to be widely available. You couldn't release it for local use because it would be reverse engineered instantly, forcing you to run it as a cloud service only. And even then, the super secret code would be running on Internet-facing servers. It would only be a matter of time before a suitable 0day is dropped which someone will use to hack into your server and download the program. And then you're done.

  • @Yougi
    @Yougi Год назад +93

    Wouldn't using a program that replaces every other word with a synonym defeat this algorithm?

    • @steveb9542
      @steveb9542 Год назад +3

      I think that would make it pretty easy to spot

    • @GammaFn.
      @GammaFn. Год назад +33

      I don't see that working well, words which are synonyms don't necessarily read as well or have the same meaning in certain contexts.
      -> I [prohibitively] see [this] working [strong], words [that] are [equivalents] don't [automatically] read [at the time] well [as a choice] have [every] same [connotation] in [convinced] contexts.

    • @MasterHigure
      @MasterHigure Год назад +1

      Synonyms aren't that simple. Homographs are a thing.

    • @stevenedy2085
      @stevenedy2085 Год назад +12

      Yes, ironically with an ai called paraphrase ai

    • @outerspaceisalie
      @outerspaceisalie Год назад +6

      Yes these watermarks would be quite easy to process out and have tons of limitations, they really aren't a practical solution in most cases but there are niche cases where they could be useful I imagine.

  • @Will-kt5jk
    @Will-kt5jk Год назад +1

    “I ran up the football pitch”
    “I ran up the football budget & I don’t think we can afford it”
    Much like the “guard rails” though, it watermarking could impact the quality/usability of the output.
    Even in the contrived example, the “green list” can fundamentally change the meaning.
    Run up in “hill”, “stairs” etc. means vertical ascent.
    For “football pitch” it means along/away.
    For “budget” it means compile/calculate.
    From what I’ve read,the approach OpenAI have proposed/been attempting is supposed to be cryptographically-based, but details aren’t public. Apparently the idea came from:
    “Scott Aaronson, a guest researcher at OpenAI, stated during a presentation at the University of Texas”
    And
    "OpenAI engineer Hendrik Kirchner built a working prototype, Aaronson says, and the hope is to build it into future OpenAI-developed systems."
    However, there are several outlets with near identical text in their articles, which makes me suspicious that it’s a press release disguised as journalism (or worse, that the story itself is synthetic 😮)

  • @tan.nicolas
    @tan.nicolas Год назад

    Sean thanks for all your amazing work!

  • @jared_bowden
    @jared_bowden Год назад +67

    While this is really neat, if this chat AI stuff actually does end up getting used to cheat a ton, the solution most schools will use will probably be the low-tech one: increase the prevalence of oral exams.
    When I was in high school, they had to raise the % of our grade that was tests in order to get a fair(ish) assessment, since most the students would just look up the homework answers with online services. I remember having a discussion with my math teacher about the irony if it all: these students would get very panicked about how their tests and struggle so much with them, but the only reason why the tests had to be so highly weighed in the first place was because these same students would cheat on all the homeworks.

    • @Pystro
      @Pystro Год назад +7

      Also, maybe because they didn't use the homework to prepare themselves for the tests, as the homework is intended to do.

    • @rkan2
      @rkan2 Год назад

      Till you get a brain implant at birth lol

    • @EndomouEnd
      @EndomouEnd Год назад +2

      Not specifically cheat but its often used to aid in homeworks

    • @kylespencer6461
      @kylespencer6461 Год назад +2

      I think this is very unlikely. The main reason schools don't use oral exams is because written tests are much easier, quicker, and ultimately, cheaper to administer at mass scale. This is the same reason students get more multiple choice tests than free-response ones throughout their academic career. The education system will need to change radically, as it should've a long time ago.

  • @diddleyy2759
    @diddleyy2759 Год назад +9

    I wish they talked about passing chatGPT's response to another program such as Write Sonic's content rephrase. I would also like to see them go into detail about CS students using it on programming assignments as the language model wouldn't make sense.

  • @electric_sand
    @electric_sand Год назад +1

    We don't get enough of Dr Pound.

  • @vanderkarl3927
    @vanderkarl3927 Год назад +1

    I'm glad that Mike was able to discover that he's colorblind.

  • @plesleron
    @plesleron Год назад +5

    To me the detection algorithm heavily resembles how enigma messages were cracked, just with softer probabilities instead of just knowing that any given token at a certain position is blacklisted.

  • @andyortlieb3768
    @andyortlieb3768 Год назад +12

    I ran up the football field

  • @asj5846
    @asj5846 Год назад

    As someone from Maryland, the way Dr. Pound said it made me chuckle

  • @boody8844
    @boody8844 Год назад +2

    One of my favorite channels

  • @xogmaster
    @xogmaster Год назад +4

    "I ran up the football stadium's stairs which had a slope similar to the hill near my house"

  • @joerivde
    @joerivde Год назад +3

    This guy never disappoints 💙

    • @nelsblair2667
      @nelsblair2667 Год назад

      CorrectHorseBatteryStaple is pretty unforgettable, thanks 🙏 to his video on the topic ✅ 🐎 🔋 🍚

  • @richardsifeanyi6766
    @richardsifeanyi6766 Год назад +1

    I've been waiting on your opinion on this. Thanks for doing This

  • @nosuchthing8
    @nosuchthing8 Год назад

    Hey, super pedantic point.
    Around 45s in you mention the university of mary land.
    We DON'T prounounce it that way in the states.
    Its closer to mare land. And more like one word.
    I had to deal with something similar, where I used to prounounce New Orleans as two distinct words.
    But in the US south, its more like one word, Nyor Lens.
    Enough nonsense for me, fantastic show, please keep up the good work. This is one of the most important channels on YT.

  • @bob-dp9yz
    @bob-dp9yz Год назад +35

    I feel like a big problem with this would be that it would requier everyone who makes LLM to do this. I am also not sure how this would work with finetuned models.

    • @holthuizenoemoet591
      @holthuizenoemoet591 Год назад +1

      It might be in their best interest to implement this, if they want to fend off future government regulations for example.

    • @biocode4478
      @biocode4478 Год назад +7

      It doesn't work on model level so the moment these models are released and can be run locally it's not going to work

    • @pneumonoultramicroscopicsi4065
      @pneumonoultramicroscopicsi4065 Год назад

      @holthuizenoemoet591 regulation for what? I don't see why cheating is a big deal

    • @Emily_Dwyer
      @Emily_Dwyer Год назад

      What does LLM stand for in this context?

    • @friendguy13
      @friendguy13 Год назад +5

      ​@@holthuizenoemoet591 The government can't even stop The Pirate Bay how would they do anything about AI models people will be able run on their own PCs.

  • @DianelosGeorgoudis
    @DianelosGeorgoudis Год назад +39

    Interesting idea. But the difficult part in writing an essay is to come up with the right ideas and structure. So a student could simply rewrite the machine produced essay using their own words. I say in the same way that calculators didn't stop the teaching of mathematics, LLGs will not stop the teaching of critical thought. On the contrary they may push the educational system to work better.

    • @CAHSR2020
      @CAHSR2020 Год назад +9

      Critical thought does seem to have become an incredibly niche tool though, with most people blindly following their emotions with no critical analysis whatsoever.

    • @boldCactuslad
      @boldCactuslad Год назад

      @@CAHSR2020 The human race has indeed outsourced its critical thinking capabilities to a small cadre of ruthless businesses which happen to run online newspapers. It's sad that those writers either toe the party line or find themselves without any income.

    • @Ilamarea
      @Ilamarea Год назад

      They will render education obsolete because human intelligence will be useless.

    • @timsmith2525
      @timsmith2525 Год назад +1

      When programmable calculators were new (and expensive), a friend of mine programmed his with the periodic table (because we were allowed to use calculators for our chemistry tests). In the process of programming the table, he memorized it, so he didn't need it. In this case, he learned more by trying to cheat.

  • @myce-liam
    @myce-liam Год назад +1

    Thank you Sean and Mike, been waiting for this. Hello from Derbyshire 🐑

  • @prasadjayanti
    @prasadjayanti Год назад

    As always, very well explained.

  • @andrycraft69
    @andrycraft69 Год назад +11

    A question I thought of, which wasn't explored in the video and don't know if it was in the paper, is how much this method affects the model's "quality" of the text it generates. Intuitively, I'm thinking that it's already "hard" for the AI to pick the best next word and so adding this complexity could make it harder. Though maybe such a decrease in quality is negligible, especially given specific fine-tunings of certain parameters.

  • @ZombieRPGee
    @ZombieRPGee Год назад +5

  • @SlopedOtter
    @SlopedOtter Год назад

    When big Mike is in the thumbnail you know it’s gonna be a banger

  • @pascualfort3867
    @pascualfort3867 Год назад

    Best video tittle I've seen in a while

  • @ikocheratcr
    @ikocheratcr Год назад +62

    I think that embracing these AIs as a tool, and even forcing people to use them gives a different learning experience. If the evaluation is write an essay about something, that is broken with these AIs, and teacher and professors need to accept that is the new world. I like what professor Moriarty mentioned, the exam/homework is based on interpreting diagrams, and making the right analysis. For topic that a diagram might not be it, a new way to exam students needs to be created, one where they can even use the AI or are encourage to do so, and is the combination of both that provides the exam response.
    If the exam/homework is easy to cheat, cheater will cheat. Cheating is not a new thing.

    • @gfbtfbtfilyfxbtyewqqef
      @gfbtfbtfilyfxbtyewqqef Год назад +8

      I think AI should still be discouraged in most cases for school. Same as how primary schoolers need to be taught how to multiply big numbers manually to get an intuitive understanding, high schoolers should still learn how to write a 5 paragraph essay

    • @ThePassifi
      @ThePassifi Год назад +6

      This is the same erronous argument that people made about memorization, writing by hand and in general overusing technology as learning tools. The data is pretty clear on that these basic things are great ways to build skills from the bottom up using systems that are easier to handle and only learning how to use these systems well is a terrible way to undestand a new topic or improve understanding. There might be some value in this for very advanced students but everyone else is better off writing their own essays. Building the skill to express yourself without AI assistance and understand the fundamental building blocks of what your are dealing with are essential parts of learning, I dare say the struggle involved in these things is an essential part of being human. Convencience is bad for our bodies we have figured that out a long time ago, maybe it's time to finally accept that it is also bad for our minds. The real threat of AI seems to me that people will mistake a prompt for creativity or work and leave their brains starving for anything truly meaningful to do, whilst thinking they have become more productive....

    • @cls880
      @cls880 Год назад

      Maybe they will actually start teaching critical thinking more rather than route memorization and regurgitation. I predict that AI could become a much better teacher than the average teacher for any subject, and will be able to create individualized teaching plans for each student based on progress.

  • @filker0
    @filker0 Год назад +3

    I studied steggographic techniques a bit, and there are subtle ways to encode information that allows "mechanical" injection and extraction of information within text that would be difficult to catch. The techniques work better on longer texts because more information can be added. For watermarking an essay, a "signature" pattern can be encoded in the choice of whether or not to use optional stylistic things that do not change the meaning of the text. Commas, contraction, hyphenation, and other choices come to mind, and if the language model encoded a known pattern of such choices, that could be detected, though a human would have a hard time spotting it. A human author is likely to be consistent with these choices so is unlikely to produce a pattern similar to the language model signature.

    • @Pystro
      @Pystro Год назад +2

      There's 2 differences here. First, steggography inserts information in specific places, and someone who intends to offer some language model for purposes of cheating would likely have the means to pinpoint which (at least most of) those are (simply by the fact that detection becomes impossible or less likely), and the information is hidden in places where modifying them has basically zero impact, i.e. trivial to erase. But in case the signature remains intact, it will be almost 100% unique.
      Whereas this scheme hides information everywhere (i.e. you may need to change quite a number of words) and in elements that are integral to the meaning. And while you can often change out words using a thesaurus without changing the structure further downstream, (think slope versus hill), there are also instances where the meaning changes ("ran up the stairs to the attic" versus "ran up the hill to the attic" does, while "[...] to the castle" doesn't).
      I think a combination of both might actually be most powerful. Not least by denying feedback on manipulation of the steggographic signature, because the word frequency method would still give it a 100% detection rate even when the signature is fully destroyed.

  • @jothamread
    @jothamread Год назад

    Love your guys work.

  • @R2D2SD
    @R2D2SD Год назад +1

    GPT-3 is a model that predicts what would be the next token in a sentence based on the previous tokens, trained entirely on human made text, then by definition (without this proposed change) it will result in text that closely mimics one written by a Human. Then we can:
    - Create a model that finds "odd" words being used, after all, the point of the proposal is that it is statically significant enough to be detected and suggest a synonym.
    - With a list of how often certain words are used, replace any words that are not in the top % with a synonym
    My favorite:
    - Just read the text ! Same way you "know" that a sentence makes no sense without knowing why, your human brain will detect any deviations from the "regular" way other humans write, again, GPT-3 itself is trained on human generated content. There's the entire profession of proofreaders / copy editors whose job is to take any given text and edit in such a way to be more digestible by other people. This will catch only the lowest effort cheaters.

  • @op4000exe
    @op4000exe Год назад +7

    Linguistically there's an issue as well, people aren't infinitely creative, and therefore when writing an essay, the average person will end up writing text that sounds and spells quite similarly to how other people would do so. If you decide whether or not someone's cheated simply based on statistics, then we get into this really uncomfortable grey area of judging, wherein anyone who sticks too far outside the norm (positively or negatively), will be judged harshly, and people being too average will just be judged to be frauds.
    I do not know about others, but this seems quite uncomfortable to me.
    Another way to phrase it: So you have a 40/60 split of red to green words, well you're a cheater, even if the person in question is actually just someone thinking a bit differently compared to the rest. If you grew up with a different native language, your writing and grammar will be influenced from that when writing in a second language (at least until you've thorouhgly learned the second language), and therefore "wrong words" will crop op a lot more often in that case. I'll be honest, I don't find that tool to be all that accurate, as I can off the top of my head see too many ways in which it wouldn't so much find someone cheating, but rather someone thinking slightly differently, slightly less or more capable and finally someone from a different linguistic origin.

    • @MrFram
      @MrFram Год назад +1

      But the red/green split is random. You can’t accidentally trip the detector unless you know which words are red and green. But this is a somewhat valid concern.
      My concern here would be if you reuse the same phrase and he generated is seeded with the last word, then if the phrase has a green word following one of the words it will have a green word every time. The solution here is to use more complicated seeding.

  • @asitisrequiredasitisrequir3411
    @asitisrequiredasitisrequir3411 Год назад +3

    i ran up the football field - a sentence that has definitely never been written

  • @thefrub
    @thefrub Год назад

    It's pretty easy to mask a ChatGPT essay. First make sure the prompt has enough modifiers for your specific situation. Delete the first sentence of every paragraph (the first, second, finally sentences) then insert some references to things your instructor talked about. Boom, heartfelt and thoughtful.

  • @lancemarchetti8673
    @lancemarchetti8673 11 месяцев назад

    Fantastic talk

  • @astropgn
    @astropgn Год назад +4

    I read the paper they mention on this video and it is really interesting! You might want to read it too

    • @luca-bg1sj
      @luca-bg1sj Год назад

      Where can I find it?

    • @Imperial_Squid
      @Imperial_Squid Год назад

      @@luca-bg1sj google have a search engine specifically for papers called google scholar, try searching there?

    • @IXI69IXI
      @IXI69IXI Год назад +1

      @@luca-bg1sj The paper is called A Watermark for Large Language Models, if you Google the name you can find it available for viewing on a few sites.

  • @bohdandonchyk
    @bohdandonchyk Год назад +5

    Mike is the best) I wish I was one of his students)

  • @TaranovskiAlex
    @TaranovskiAlex Год назад

    wow, that's super cool and useful! thank you!

  • @kelpsie
    @kelpsie Год назад +2

    13:20 - It's actually much harder than that for a cheater to fool. The green/red words include connecting words which can drastically change the structure of your sentence. If 'and' is red and 'but' is green, you could wind up with a completely difference sentence after that point. From there a cheater needs to somehow change the connecting words without degrading the quality of the work.
    On the other hand, it actually seems like an extremely difficult problem to even reconstruct the red/green list from scrambled output. If the prompt is part of the seeding process, I'm not sure how you could reconstruct the original seed if they changed anything.

  • @speltincorrectyl1844
    @speltincorrectyl1844 Год назад +3

    It seems an easy is to use a simple program (without a watermark) to swap stuff out with synonyms.
    While actually making the essay requires an LLM as advanced as ChatGPT, putting in synonyms would only take GPT2, which can be run at home.

    • @Ensorcle
      @Ensorcle Год назад

      This was my very first thought. If it turns into an API where you submit text and it returns (1|0) for AI generated. Well. Now you just train a downstream model to jiggle the words with a white list of synonyms until it turns to 0. But I SUPPOSE I could also just read the paper instead of writing RUclips comments....

  • @lydianlights
    @lydianlights Год назад +7

    This is a pretty neat idea from a statistics/comp-sci perspective. Also practically useful -- being able to identify AI-generated text is probably a good idea for more than just cheat detection.

  • @SniperNator305
    @SniperNator305 Год назад

    Never mentioned before. Good points.

  • @bunniesarecute3135
    @bunniesarecute3135 Год назад +2

    Thank you for the video Mike! Very informative, and this process seems (in theory at least) much better than current methods i’ve seen. However, it feels like not knowing the original prompt would be a problem, is there a way to address this?

    • @Faladrin
      @Faladrin Год назад +1

      The original prompt has nothing to do with the detection method suggested. It does require that the text generator implement the method and that seems unlikely. Even if some do, not all will, and someone will not as there incentive for them not to.

  • @hotrodhunk7389
    @hotrodhunk7389 Год назад +45

    A wise man once said I don't memorize anything that I can easily look up in a book I feel like AI is the same thing I don't know how many different things I've learned in school for no reason and if I actually need them I can look them up at that time.
    There's a famous quote that is attributed to Albert Einstein: “Never memorise something that you can look up.” It is said to be related to a time when a colleague asked him for his phone number, and he reached for his telephone directory to look it up.

    • @letsmakeit110
      @letsmakeit110 Год назад +9

      it's strange, isn't it. 50 years ago if you wanted to look things up it was difficult. You had to go to the library unless you were well-off enough to keep an encyclopedia set at home. So it was useful to memorize things. Then 20 years ago we had the internet so brain disk memory was obsolete. But now everything you find online is misinformation and shilling, so DYOR and memorizing the conclusions once is valuable again.
      Like I can't remember the specifics of the medical literature I went through about italian crown virus. I'm not a doctor. But I remember the conclusions I came to and now I won't be dissuaded otherwise by some Alex J0n3s or MSM puff piece.

    • @letMeSayThatInIrish
      @letMeSayThatInIrish Год назад +12

      I think we should memorise as much as possible. It forces our brains to care about subjects. It allows us to think concretely rather than abstractly about problems.

    • @joshwilliams7692
      @joshwilliams7692 Год назад +5

      That wise man was a fool.

    • @InfiniteQuest86
      @InfiniteQuest86 Год назад +12

      I mean I don't think the point of learning stuff is to remember it forever. It's more the fact of having to learn things teaches you how to learn things in the future when you really do want to know it.

    • @ikocheratcr
      @ikocheratcr Год назад +10

      The trick is to remember those things exists at all. For instance law of cosines, I think it is OK not to remember the exact formula, but knowing it exists and its uses is the key.

  • @AdamTelly
    @AdamTelly Год назад +9

    It's already easy to detect a GPT3 essay, it's completely wrapped in waffle!

    • @13breno13
      @13breno13 Год назад +3

      Waffle was always a way to get your word count up. You can ask the tool to reformat the output to be more concise and summarise in a certain amount of words, or even go the other way "Add a few more sentences and go into more detail about point X"

  • @kamikaze_sno
    @kamikaze_sno Год назад

    amazing explanation thanks

  • @kaimarmalade9660
    @kaimarmalade9660 Год назад

    I want to thank Dr. Pound and Siraj + others for being the reason I understand this stuff. Also thanks to Seymour Papert and Marvin Minsky for the 1969 Perceptrons paper. I'm so grateful to be alive. Thank you all.

  • @jens256
    @jens256 Год назад +24

    It's like in school, where to start with, you weren't allowed to use pocket calculators; but, later on, the (Graph) Calculator became a mandatory tool, that you had to have. Maybe these kinds of large-scale predictive language-models (are we just calling them it all ChatGPTs? ~ like Hoovers were synonymous with Vacuum-clears) will also become yet another tool in the proverbial toolbox? Difference is, language is literally how we communicate with each other. If I were to declare: "I gauge the value of a man, neither by cut of his waistcoat, nor by the exclusive nature of his accouterments, but by the sharpness of his pen and the cleverness of his turn of phrase", I submit that most people would agree ~ in principle, at least. So, what will it mean, that apparently learned discourse can be generated so easily? Will prose be seen as clever in the future? What interesting social codes will evolve to set apart to the true intelligentsia from the AI-Aided Hoy Polloi?

    • @GaSevilha
      @GaSevilha Год назад +1

      GPT pls, not chatGPT

    • @tteqhu
      @tteqhu Год назад +1

      In what lessons would GPT help, or make harder topics more accessible to students?

    • @GaSevilha
      @GaSevilha Год назад

      @@tteqhu ask gpt to explain to you like you were a high schooler what is lambda calculus

    • @tteqhu
      @tteqhu Год назад +1

      and it possibly can make students more distracted, it's hard to tell what real outcomes would be.
      overall it's nowhere near as simple as allowing or requiring calculators in math classes.

  • @davidmurphy563
    @davidmurphy563 Год назад +9

    *What's the highest entropy sentence?*
    The concept of "highest entropy sentence" is somewhat subjective and depends on the context and assumptions made about the language being used. However, one possible way to approach this question is to consider the sentence that contains the most information or that is the least predictable based on the rules of the language.
    In this sense, one example of a high entropy sentence could be a completely random sequence of letters or words that do not form a coherent sentence. Another example could be a long and complex sentence that contains a lot of information or that has a highly irregular grammatical structure, making it difficult to predict.
    However, it's important to note that the concept of entropy is typically used in the field of information theory to describe the amount of uncertainty or randomness in a message, and is not necessarily related to the semantic or pragmatic content of the sentence. Therefore, the "highest entropy sentence" may not necessarily be the most meaningful or useful sentence in a given context.
    *Tell me the sentence*
    As I mentioned earlier, the concept of "highest entropy sentence" is somewhat subjective and can depend on the assumptions made about the language being used.
    However, one example of a high entropy sentence could be a completely random sequence of letters or words that do not form a coherent sentence, such as:
    "Qwertyuioplkjhgfdsazxcvbnm"
    Another example could be a long and complex sentence that contains a lot of information or that has a highly irregular grammatical structure, making it difficult to predict. An example of such a sentence could be:
    "Xanthophylls, which are a type of oxygenated carotenoid pigment found in most green plants, have been shown to play an important role in photosynthesis by absorbing excess light energy and dissipating it as heat, thereby protecting the plant from photodamage and oxidative stress, but the exact mechanisms by which they do so are not yet fully understood and are the subject of ongoing research."
    *"Qwerty" in your example is low entropy.*
    You are correct that the sequence "Qwerty" has relatively low entropy, as it consists of a common and predictable set of characters that are commonly used in English words and phrases.
    To provide a better example of a high entropy sentence, here's a random sequence of letters and symbols that is less predictable and contains more uncertainty:
    "jK#l9^R$qP@y!sTm2z&xG8*"
    This sequence contains a random arrangement of uppercase and lowercase letters, numbers, and symbols, and has higher entropy than the previous example of "Qwerty."

    • @Frank_Alias_Frank
      @Frank_Alias_Frank Год назад

      Your word "Qwerty" doesn't follow the rules of my keyboard. If I read that word on my keyboard correctly, it reads as "Qwertz"

    • @MrRedstoner
      @MrRedstoner Год назад +1

      @@Frank_Alias_Frank My condolences.
      I'll note that QWERTZ is the default in my native language as well. Thankfully, it's also possible to switch it to QWERTY.

    • @davidmurphy563
      @davidmurphy563 Год назад

      @@Frank_Alias_Frank Může být mnoho jazyků, možná tento?
      Pokud se chcete s ChatGPT bavit, požádejte jej, aby napsal pozpátku. :)

    • @Frank_Alias_Frank
      @Frank_Alias_Frank Год назад

      @@davidmurphy563 Even though I don't understand a single word, I guess the appropriate answer to your comment would be something like, "Thanks for the clarification."

    • @davidmurphy563
      @davidmurphy563 Год назад

      @@Frank_Alias_Frank Huh, I thought Czech was a reasonable guess. Not entirely sure why. How about:
      Könnten viele Sprachen sein, vielleicht diese?
      Wenn Sie mit ChatGPT chatten möchten, bitten Sie ihn, rückwärts zu schreiben. :)

  • @ylazerson
    @ylazerson Год назад

    great video - thanks!

  • @plasmasupremacy9321
    @plasmasupremacy9321 Год назад

    That's a very clever watermarking method

  • @owencmyk
    @owencmyk Год назад +3

    What if you had ONLY 2 lists, but you made the AI try to generate from whatever list the last few words were in. That way you would have long streaks of red or long streaks of green, but something like the emoji trick wouldn't be enough to stop it because it would just generate emojis from the most recently used list. Of course, there couldn't be pure streaks because words like "the" would have to be on one list or the other, but if you wanted to you could just make a list of exceptions or just except small breaks in the streak

  • @IanKjos
    @IanKjos Год назад +8

    If you do this red/green thing, and if generated content is largely accepted outside of school, then people's writing tendencies might slowly change to reflect the same adjusted probabilities.

    • @archbtw
      @archbtw Год назад

      That's a significant thing and probably overlooked.

    • @drewgalbraith4362
      @drewgalbraith4362 Год назад

      Isn’t that the point of the random seed generation?

    • @idk7016
      @idk7016 Год назад

      Probably not, like what he said there won't be a pattern to the modified result as the algorithm runs based on a random seed

  • @Veptis
    @Veptis Год назад

    Recent tweets indicate, that Bing search just uses a GPT-3.4 model where a large document is put Infront of your first input that basically tells the model how to behave in inference time.
    .could you get hold of a large model and try this with a similar document? Turn it into a chat bot in inference time?

  • @bkovba
    @bkovba Год назад +1

    I think this would only work under a condition that word probabilities stay static, which might not be true if the model is constantly fine-tuned.
    Plus it is needed to take into account the context's length (say, X previous words are remembered and involved in generating a new word)

  • @mipsuperk
    @mipsuperk Год назад +3

    Don't they already have things that will go through your text, randomly replace synonyms, restructure sentences slightly, etc. so that it won't be flagged by plagiarism detectors?
    Generate your text, run it through the thing, and we're back to where we were before.

  • @letMeSayThatInIrish
    @letMeSayThatInIrish Год назад +9

    I have been thinking about a similar way of watermarking go, chess and other game playing engines. The main problem is the same as what Mike mentions in the video: it would require the cooperation of all engine developers.

    • @QuotePilgrim
      @QuotePilgrim Год назад +5

      I don't know about go, but chess engines are so much better at playing the game than humans that it's extremely obvious when someone's cheating by using an engine, especially over multiple games.

    • @barakeel
      @barakeel Год назад

      @@mal2ksc What if the strongest players are cheating? And they are just using the engine to get an edge (is this position good for me? is there a tactic?) at different crucial points in the match. It's basically undetectable.

    • @yommish
      @yommish 5 месяцев назад

      @@barakeel if you’re playing in a browser, there are detection methods such as mouse movement, time between moves, and others that aren’t disclosed by the services. A single instance of cheating via an engine on a separate device will always be impossible to detect, but patterns will start to emerge over time. However, I think everyone accepts that cheating will inevitably happen. That’s why big tournaments always take place in a place with physical security and countermeasures.

  • @nrsentropy_art
    @nrsentropy_art Год назад

    Very good video , i love it.

  • @raidedcluster
    @raidedcluster Год назад +2

    That's a very interesting approach but with appropriate prompt engineering - which has been done already, artificial entropy can be induced into the generated content.
    For example an elaborate prompt to assume another narrative style can force the AI to disregard the altered probabilities and write in a temporarily novel way.

  • @julienjulien9801
    @julienjulien9801 Год назад +10

    Most likely language model companies have a non-disclosed mechanism to sign and recognise the output of their models. Not only it enables to identify human generated vs machine generated content but also probably prevents to ingest machine generated content of lower integrity for future training - reducing model performance - as public online space will be increasingly « polluted » by machine generated content.

    • @valberm
      @valberm Год назад +2

      As soon as the large language model's output reaches a certain level of quality it doesn't make any difference whether a text was written by it or by a human. In this case, there's no such a thing as "pollution" anymore. It just doesn't matter.

    • @XirevWoW
      @XirevWoW Год назад

      Two words, burstiness and perplexity, having more of it will trip up the AI detectors, you can tell the AI to write the text with more burstiness and perplexity and it will not look AI generated

  • @AlanTheBeast100
    @AlanTheBeast100 Год назад +3

    One way is for profs to simply ask the same questions to ChatGPT aforehand.
    But, ChatGPT sort of has "patterned" output. Not hard to detect.

    • @NatashaEstrada
      @NatashaEstrada Год назад +1

      Try this. Put a prompt into ChatGPT and then click regenerate response. You'll get a different output.
      The context of the prompt is important too.

  • @LuisFernandoBernal
    @LuisFernandoBernal Год назад

    amazing pal. thanks for thessseeeeee content !

  • @MarthinusBosman
    @MarthinusBosman Год назад

    Setting a reminder for 5 months from now

  • @Bartzii
    @Bartzii Год назад +5

    imagine a tool that will check if you used a calculator for math.....

  • @CristianGarcia
    @CristianGarcia Год назад +5

    You can use masked autoencoders like BERT to iteratively delete some words and pick similar ones. Alternatively, you could have a second language model reword the text itself.
    Someone will create an anti-cheat detection site and students will happily pay $5 a month to avoid doing homework. It's best just to change the task at hand e.g. oral presentation + live QA.

  • @iyadahmed3773
    @iyadahmed3773 Год назад

    Very clever idea!

  • @dmarte89
    @dmarte89 Год назад

    I've watched these videos with Mike on a range of mathematical topics since my university years about 10 years ago. And boy does he deliver. So neatly explained, with pen and paper graphics (also thanks to the editor for digital graphics!). Also, now when I'm working, mostly producing simple applications, I don't encounter a deep cognitive pleasure like statistical questions at 09:24. This was a lovely video!

  • @sethlawson8544
    @sethlawson8544 Год назад +7

    You'd imagine OpenAI code implement an internal barcoding scheme like this and charge a hefty academic licensing fee to each institution for the privilege of detecting it. Could also charge publishers, lawyers, heck I imagine lots of businesses would be willing to pay to scan documents for evidence of deriving from their model.

    • @Ozymandias1
      @Ozymandias1 Год назад +3

      Great, so students can get even more indebted.

    • @paulsaulpaul
      @paulsaulpaul Год назад

      Small point: Lawyers us a LOT of templates. That their paralegals fill in. Entire softwares are sold for this (Hotdocs, etc.)

    • @icheckedavailability
      @icheckedavailability Год назад

      Its like selling both the poison and the antidote

  • @velho6298
    @velho6298 Год назад

    So basically you would select the green and red words based on the model and this would then detect if one was used?

  • @volodyadykun6490
    @volodyadykun6490 Год назад +1

    You'll probably enjoy article You Are Not a Parrot by Elizabeth Weil at New York about Emily M. Bender. I read it and it's just great. It's also about distinction between human and AI, what language models are and why we are creating them

  • @MorgurEdits
    @MorgurEdits Год назад +3

    Maybe the school system is part of the problem, trying to use rigorous testing to rank students. Instead of giving each student more of the tools and paths available and using the natural interest to learning as an advantage giving students more freedom to find the roadblocks themselves, like not knowing calculus well enough.

    • @NatashaEstrada
      @NatashaEstrada Год назад

      College has changed in the last 20 years. There are more multiple-choice questions and reports/essays that students don't get much feedback on or are not required to revise or improve on.

  • @DrD0000M
    @DrD0000M Год назад +5

    Run your ChatGPT essay through google translate a few times, then bing, etc., get back to English, I doubt the watermark would remain. Anyways, it's dumb to handicap such a thing. I'm old, I don't care if some profs are upset; don't care if some kids cheat. I just want it to work as well as it can.

    • @richardaversa7128
      @richardaversa7128 Год назад

      Everyone will cheat. It's not a moral decision, it's an economic one. It will be normalized over time exponentially until everyone is doing it.

  • @SyrupWizard
    @SyrupWizard Год назад

    I believe the emoji work-around to this watermark is very powerful. Could you not randomly choose words and characters to intersperse at varying intervals and then command+f them all out?

  • @DavidFRhodes
    @DavidFRhodes Год назад +1

    can you do a video on the 'safety' layer or chatgpt?

  • @fenet1839
    @fenet1839 Год назад +5

    It looks like these companies are so focused on pushing out these AIs that they're really not concerned about the consequences, AI like chatGPT as it stands probably does more harm than good. I don't understand why they focus on Writing, Art, and creativity in general, aren't there better applications for AI?

    • @NatashaEstrada
      @NatashaEstrada Год назад +1

      The true educational value of these AI's is not as a cheating tool but as a built in tutor that can help level the playing field. Imagine how onboarding at companies would go if you could train one with your internal data.

    • @codinghub3759
      @codinghub3759 Год назад

      AIs have been made for countless other things, but they are generally niche enough to not gather mainstream popularity like this.
      How many people can use a Chess AI to cheat in matches? Not many, since the restrictions in professional matches are pretty strict.

  • @solidv2
    @solidv2 Год назад +6

    I just don't like AIs for things like ChatGPT at all , I'm pretty sure it's one of those things that will overly hurt society way more than help it. You can already get a felling for the way things can be with those art drawing bots, many artist already feel severely demotivated or worried about their livelihood not the mention their work being stolen and their profits hurt. Really not looking forwards for what will come next 😕

    • @botboy0
      @botboy0 Год назад

      People thought so too when they invented the steam engine

    • @jacobdegeling
      @jacobdegeling Год назад

      🎶 Always look on the bright side of life 🎶

  • @ScBroodSc2
    @ScBroodSc2 Год назад

    This explains why if you try to ask chatgpt to write a paragraph excluding a specific letter, like write a text with no "e", then it will be able to reduce the amount of words with "e" but it won't be able to fully do the exercice.

  • @logic314
    @logic314 Год назад

    I am about 20 years past acedemics. That beeing said, I have gotten better mentoring from chatgpt than any teacher up until collage level.

  • @JamesTM
    @JamesTM Год назад +2

    Excellent video, as always, but I've got one critique:
    As a colorblind person, the red and green circles are nearly impossible to distinguish. A red "R" and green "G" would have solved that problem. (It's a small issue, since it's easy enough to just follow along by memory, but it's worth noting anyway to help future videos be more accessible.)

    • @felicityc
      @felicityc Год назад

      when he said "hill is red" and then highlighted it in green I swear I had a crisis for a moment