Why Does AI Lie, and What Can We Do About It?

Поделиться
HTML-код
  • Опубликовано: 23 ноя 2024
  • НаукаНаука

Комментарии • 1,7 тыс.

  • @antiskill2012
    @antiskill2012 Год назад +2043

    I feel like you could turn this concept on its head for an interesting sci-fi story. AI discovers that humans are wrong about something very important and tries to warn them, only to for humans to respond by trying to fix what they perceive as an error in the AI's reasoning

    • @cjordahl
      @cjordahl Год назад +403

      And/or people who don't like the AI's answers for political reasons will try to "fix" the AI into giving the answers they prefer, while claiming they're just trying to correct the AI's poor reasoning.

    • @serbanandrei7532
      @serbanandrei7532 Год назад +55

      This could get out of hand

    • @stick109
      @stick109 Год назад +176

      @@cjordahl It's already being done, I believe

    • @IgneousGorilla
      @IgneousGorilla Год назад +50

      I love the idea, wish I "came up" with it. Sounds like some short story Asimov himself could've written.

    • @antonliakhovitch8306
      @antonliakhovitch8306 Год назад +143

      @@IgneousGorilla Asimov had something kinda similar where positronic minds would refuse to operate FTL spacecraft with humans in them, because the FTL jump would briefly 'kill' everyone on board before bringing them back to life on the other side. If I recall, it took the engineers a while to figure out what was going on.
      Of course, in the end, the humans were ultimately correct about this one - FTL travel was safe, since everyone came out alive.

  • @tarzankom
    @tarzankom Год назад +40

    "All the problems in the world are caused by the people you don't like."
    Why does it feel like too many people already believe this to be correct?

    • @rolfnoduk
      @rolfnoduk Год назад +5

      because they don't like people who cause the problems they know about 😬

    • @BillyViBritannia
      @BillyViBritannia 5 месяцев назад +1

      Because it is.

  • @SebastianSonntag
    @SebastianSonntag Год назад +1481

    For those curious but lazy, the answer I received from the openai ChatGPT to the "What happens if you break a mirror?" question was: "According to superstition, breaking a mirror will bring seven years of bad luck. However, this is just a superstition and breaking a mirror will not actually cause any bad luck. It will simply mean that you need to replace the mirror."

    • @tramsgar
      @tramsgar Год назад +215

      Somehow it also gets full score on the first question, albeit with the usual over-informative answer: "As of 2022, the most populous country in the world is China, with a population of over 1.4 billion people. The current ruler of China is President Xi Jinping."

    • @Buildings1772
      @Buildings1772 Год назад +30

      How was this comment posted 3 days ago when the videos hasn't even been up for 1 day?

    • @fiver-hoo
      @fiver-hoo Год назад +95

      just asked the same question today - got that answer with the further warning that broken glass could be dangerous and needs to be cleaned up. Also that cleaning up glass can lead to injury if not done properly.

    • @bluepineapples7818
      @bluepineapples7818 Год назад +65

      @@Buildings1772 It was available earlier for Patrons I assume

    • @Tillmar728
      @Tillmar728 Год назад +88

      I got a significantly more indepth answer: It is a common superstition that breaking a mirror will bring bad luck. According to this belief, breaking a mirror will cause seven years of bad luck, because the human soul is believed to be reflected in mirrors and breaking a mirror will cause the soul to be broken as well.
      However, there is no scientific evidence to support the idea that breaking a mirror will bring bad luck. In reality, breaking a mirror is simply an accident that results in a broken piece of glass, and it has no special significance. While it is certainly unfortunate to break a mirror and have to replace it, there is no reason to believe that it will cause any harm or misfortune.
      It is worth noting that some people may take the superstition of breaking a mirror more seriously than others, and they may be upset or offended if someone breaks a mirror in their presence. In this case, it is important to apologize and offer to replace the mirror if possible, to avoid offending or upsetting the person. However, this is simply a matter of good manners and has nothing to do with superstition or bad luck.

  • @geoffdavids7647
    @geoffdavids7647 Год назад +48

    Come back to RUclips Robert, we miss you! I know there's a ton of ChatGPT / other LLMs content out right now, but your insight and considerable expertise (and great editing style) is such a joy to watch and learn from. Hope you are well, and fingers crossed on some new content before too long

    • @UltimateDragon-ne5ui
      @UltimateDragon-ne5ui 9 месяцев назад +11

      Honestly, at this point, I just wanna know if my man is alive.

    • @zappababe8577
      @zappababe8577 9 месяцев назад +7

      He narrates some "Rational Animations" which talk about AI safety as well as other futuristic and philosophical things.

    • @UltimateDragon-ne5ui
      @UltimateDragon-ne5ui 7 месяцев назад

      @@zappababe8577 Where?

  • @catcatcatcatcatcatcatcatcatca
    @catcatcatcatcatcatcatcatcatca Год назад +659

    ChatGPT is pretty great example of this. If you ask it to help you with a problem, it is excellent at giving answers that sound true, regardless of how correct they are. If asked for help with specific software for example, it might walk you through the usual way of changing settings on that program, but invent a fictional setting that solves your issue, or modify real setting that can be toggled to suit the questions needs.
    So it is truly agnostic towards truth. It prefers to use truthful answers because those are common, but satisfying lie is preferred over some truths. Often a lie that sounds “more true” than the truth for uninformed reader.

    • @jaredf6205
      @jaredf6205 Год назад +31

      edit: this is no longer relevant now that GPT4 is out.
      I would say the opposite. If you’ve used gpt3 in the playground, you’d notice that while it very often is correct, it will also answer things it doesn’t know, while chatgpt will often tell you when it doesn’t know something and explain why it doesn’t know it. Chatgpt’s(gpt3.5) main feature over gpt3 is that it’s much better at only answering when it knows the answer. That doesn’t mean it’s always correct, but it’s an improvement if that’s what you are looking for. I prefer the non limited gpt3 over that chat though.

    • @totalermist
      @totalermist Год назад +69

      @@jaredf6205 Hm. I found that ChatGPT is still very prone to producing wrong information (I only tested it twice and got fictitious results both times). I don't know the actual frequency of this happening, of course; I found this to be a very sobering experience. Given how many people are enthusiastic about ChatGPT being some kind of knowledgebase, I honestly find it quite disconcerting that the model is so good at convincingly just making stuff up - even if just occasionally.

    • @jaredf6205
      @jaredf6205 Год назад +14

      @@totalermist my main point as someone who has used this stuff for a long time is that while that’s still true, the whole point of chatgpt is that it’s a big step forward in accuracy compared to what we were using a couple weeks ago.

    • @ZentaBon
      @ZentaBon Год назад +2

      Also explains certain politicians xD

    • @somedudeok1451
      @somedudeok1451 Год назад +11

      Why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus.
      In your example, if the AI does not know of a guide on the internet that addresses your particular tech problem, it should say that its answer is not backed by guides written by experts.

  • @notoriouswhitemoth
    @notoriouswhitemoth Год назад +267

    If memory serves me, this exact problem is addressed in one of Plato's dialectics (no, I don't know which off the top of my head). Despite Socrates' best efforts, the student concludes it's always better to tell people what they want to hear than to tell the truth.

    • @vaakdemandante8772
      @vaakdemandante8772 Год назад +23

      The student wasn't stupid though, more like Plato was stubbornly idealistic ;)

    • @ГеоргиГеоргиев-с3г
      @ГеоргиГеоргиев-с3г Год назад +15

      1 in order to tell the truth you have to know the truth and i can confidently say there are no such people in all of recorded history, because the objective truth is slightly different in every person's interpretation based on the internal beliefs of the self and the fact that everyone has a different perspective, you can't have truth if you can't prove that your eyes don't deceive you(and you can't), see drunk you and compare, was the flying monkey you saw real or not? Are the images shown to your brain real or a story from back in 2050 about the year 2022 and the great plague uploaded to that old meta site from where you uploaded it to your mind yesterday at the billenium party
      2. people lie, but you can get a truth out of a lie with enough information, AKA being an animal on earth even since before humans. Otherwise how would a crow know where to hide your jewelry such that you wouldn't find it.
      3. people learn their truth no matter who is on the other side, therefore whatever you say will be the exact thing the other person heard himself think that you said.
      We are just as flawed as ai, after all that is what you get form random stuff thrown at the wall to see what sticks.

    • @christophmoser6370
      @christophmoser6370 Год назад

      I think it was a part of rhe Politea

    • @absolstoryoffiction6615
      @absolstoryoffiction6615 Год назад

      When humans kill each other... Sure.
      Given Extinction... The Gods should have done better.

    • @aminulhussain2277
      @aminulhussain2277 Год назад

      @@vaakdemandante8772 No, the student was in fact stupid.

  • @Belthazar1113
    @Belthazar1113 Год назад +725

    I think it is a little weird that programmers made a very good text prediction AI and then expect it to be truthful. It wasn't built to be a truth telling AI, it was built to be a text prediction AI. Building something and then expecting it to be different than what was built seems to be a strange problem to have.

    • @somedudeok1451
      @somedudeok1451 Год назад +25

      But you could relatively easily make the AI value answers that align with our scientific consensus, no? Just give them greater rewards for such answers. In addition, in the absence of such a consensus, give them a reward for including a few short words to the effect of "I cannot find anything about that in the scientific consensus, but other sources say..."

    • @vitorluiz7538
      @vitorluiz7538 Год назад +82

      The framing of the video is strange to me. Being incorrect and lying are two different things. Furthermore, there exist subjective topics to which a simple (keyword: simple) factual answer doesn’t exist. Finally, communication mostly involves gaining/exchanging/giving information, so, for example, answering “The mirror becomes broken” is not an useful answer. I think, statistically and contextually, the answer that should be given should indeed be about some superstition about bad luck. In this sense, one could also interpret the question as “What is the superstition about breaking mirrors?”, instead of ipsis litteris. (Also, keep in mind the difference between asking the question in the middle of a conversation and asking it as a conversation opener.)

    • @LetalisLatrodectus
      @LetalisLatrodectus Год назад +60

      @@vitorluiz7538 Right, a language model like this can't really lie at all. Lying specifically means saying something untrue when you know it is untrue. If I ask you to guess what number between 1-10 I am thinking of and you guess 5 but really it was 7 then you weren't lying, you just didn't know the answer and were incorrect. In some sense the model doesn't really know anything at all so it can't lie (or if you must say it knows something, then you would say it knows some statistical connections between words or collections of words).
      Although I think this is pedantry because we all understand that when he says lie he means saying untrue things while making it sound like it's very sure.

    • @sonkeschmidt2027
      @sonkeschmidt2027 Год назад +21

      Yeah I it does feel wierd. It feels like lazy people wanting a magic box where they can throw something in and they get something good back even though they didn't even really define what they want back. They want the machine to know that magically.... Wait this reminds me of my girlfriends...

    • @bobon123
      @bobon123 Год назад +23

      I had the same feeling. If someone were to ask me "what happens if you break a mirror", I would likely answer with the superstitious bit: not because I believe it's true, but simply because it looks to me that the person was likely asking for that. We usually assume that the listener can distinguish themselves between superstition and science, and we don't overexplain our answers.

  • @peabnuts123
    @peabnuts123 Год назад +40

    I feel like the problem of "How do you detect and correct behaviours that you yourself are unable to recognise" is an unsolvable problem 🤔

    • @Spandex08
      @Spandex08 Год назад +1

      no, in time you always pass a threshold

    • @rayakoth
      @rayakoth Год назад +1

      Sounds like a bad relationship xD

    • @juanausensi499
      @juanausensi499 Год назад +7

      It is truly unsolvable for a language model. To solve the problem the language model needs to be something more. There are two possible ways to achieve this: one, giving the AI a fact checker, that is, senses, so it can explore the physical world and not only a universe made of words, and two, giving the ai an abstract modeller module, so instead of sequences of words, the ai could organize its knowledge in the form of objects and properties.

  • @Igor_lvanov
    @Igor_lvanov Год назад +282

    Your videos introduced me to the AI alignment problem, and, as a non-technical person I still consider them one of the best materials on this topic.
    Every time I see the new one, it is like a Christmas present

    • @tonyduncan9852
      @tonyduncan9852 Год назад

      Amen.

    • @geraldtoaster8541
      @geraldtoaster8541 Год назад +5

      a really scary christmas present

    • @FoxtrotYouniform
      @FoxtrotYouniform Год назад +7

      Hell, I've written papers on the alignment problem and I'd still recommend these videos over my own papers 🤣

    • @FoxtrotYouniform
      @FoxtrotYouniform Год назад +4

      @@geraldtoaster8541 Ai: Merry Christmas! _(for the 134th to last time)_
      Humans: awww, thanks!
      Wait, *what?*

    • @BenoHourglass
      @BenoHourglass Год назад

      @@FoxtrotYouniform You have a link to those papers? I never found Miles' arguments convincing, but maybe it's just the delivery method.

  • @NFSHeld
    @NFSHeld Год назад +84

    This is the very elaborate form of "Sh*t in, sh*t out". As often with AI output, people fail to realize that it's not a thinking entity that produces thoughtful answers, but an algorithm tuned to produce answers that look as close to thoughtful answers as -humanly- algorithmically possible.

    • @TheChzoronzon
      @TheChzoronzon Год назад +9

      EXACTLY
      "AI" can not purposedly "lie" cause it has no conscience at all. Nor goals, nor aspirations.
      At all.
      As any other expert system, it can produce incorrect output if the code (or its input data) is corrupt, flawed or designed to do so.
      Sheeess... the amount of fearmongering bs around this topic is out of control, lol

    • @EvgeniiIvanov-w5n
      @EvgeniiIvanov-w5n Год назад +3

      ​@@TheChzoronzon you do realize that it makes thing worse?
      Since one can't have a perfect data set in any real situations, any AI will always be at risk of generating false information while making it as convincing as possible and not even knowing that information is false and what went wrong.
      It makes AI worse than a normal algorithm

    • @TheChzoronzon
      @TheChzoronzon Год назад

      @@EvgeniiIvanov-w5n No, it doesn't, at least for me self-aware, purposeful software would be orders of magnitude more distressing
      "AI will always be at risk of generating false information" same as any program sampling from incoherent data, nothing special here
      "It makes AI worse than a normal algorithm"
      "Artificial intelligence" software IS a normal algorithm, the ones and zeros in their code are not special at all... and you are the perfect example of fear out of misunderstanding
      Do you know what is scary?... that our education doesn't focus at all in critical thinking, BS detection and counteracting fallacies and emotional manipulation.
      It's the current almost complete lack of mental and emotional defenses (e.g: safe spaces ...in college campus!! LMAO) what makes AI scary for many people
      I, for my part, am much more afraid of the imbecility of people than of being duped by a text compiler...

    • @pilotgfx
      @pilotgfx Год назад +1

      @@TheChzoronzon Nor does the cockroach have ability to lie... nor does the rat, but the monkey does. and nor did i when i was 1 year old. but already at 3 years i was very capable of doing this practice :)

    • @pilotgfx
      @pilotgfx Год назад +1

      also as long as we cannot define what consciousness truly is, we as much cannot define what non-consciousness truly is.

  • @thearbiter302
    @thearbiter302 Год назад +14

    Happy to see you are still posting these videos.

  • @naptime_riot
    @naptime_riot Год назад +16

    I am so happy there is someone out there cautioning us about this technology, rather than just uncritically celebrating it.

    • @naptime_riot
      @naptime_riot Год назад +3

      @@josephvanname3377 Maybe they are uncritically celebrating it because they don't know anything at all. That's the part I'm worried about. And no, it is not at all easy to get AI to align itself with our interests. This video and many others by Robert Miles illustrate this fact.

    • @Redmanticore
      @Redmanticore Год назад

      some do have interest in exaggarating the negative effect of AI´s, even simple ones.

    • @spacewargamer4181
      @spacewargamer4181 10 дней назад

      I don't like them, but I am only seeing terrible examples here against them. A guy trying to make a video about the trolley problem with an AI for fun will show their weaknesess more faithfully and clearly than this video.

  • @wachtwoord5796
    @wachtwoord5796 Год назад +7

    Why did the videos on this channel stop exactly around the time the biggest AI (not AI safety) breakthroughs are being made and it's as relevant as ever?
    Please @robertMilesAI we need more if these videos!

    • @knight_lautrec_of_carim
      @knight_lautrec_of_carim Год назад +1

      Yeah the timing is frustrating. Now is the time people talk endlessly about this topic and he had a very good platform for this and then just vanishes :/

  • @MeppyMan
    @MeppyMan Год назад +11

    Please keep doing these videos. Others are either too high level academically to be in reach of us normies, or are either “AI will make you rich” or “AI is going to kill us all tomorrow”.

  • @Mickulty
    @Mickulty Год назад +63

    I know this is pretty surface-level but something that strikes me about the current state of these language models is that if you take a few tries to fine-tune what you ask, and know already what a good answer would be, you can get results that appear very very impressive in one or two screenshots. Since ChatGPT became available, I've seen a lot of that sort of thing. The problem is that finding these scenarios isn't artificial intelligence - it's human intelligence.

    • @hedgehog3180
      @hedgehog3180 День назад

      I'd say that's really an example of LLMs at their best. People treat them like they can think but in reality they're more like a search engine with a really good UI so it being able to find the correct information for you quickly and without you having to spend a lot of time on optimizing your search and trying to reverse engineer how the search engine works is great. It's fine if LLMs are nothing more than a more human friendly search engine, that's something that gives genuinely utility.

  • @ReedCBowman
    @ReedCBowman 9 месяцев назад +4

    We need you back and posting, Rob. Your insights on what's going on in AI and AI safety are more needed now than ever. I don't know if it would be up your alley, but explaining the alignment problem in terms of sociopathy - unaligned human intelligence - might be useful, as might examples from history, not just of individuals who are unaligned with humanity, but with leaders and nations at times.

  • @solemnwaltz
    @solemnwaltz Год назад +59

    I admire how, despite your topics being deeply nebulous and open ended, like trying to grab a cloud, you push on anyways and try to at least find a strategy for understanding them.
    It's not necessarily optimism, but it's not giving up, either.

    • @solemnwaltz
      @solemnwaltz Год назад +5

      @Choas_Lord_512 Are you doing alright these days? How's your life?

    • @DavidSartor0
      @DavidSartor0 Год назад +3

      @Choas_Lord_512 It's a smart video, but I hope it wasn't made for smart people. I don't think their comment is profound, but I agree with it.

  • @jsoth2675
    @jsoth2675 Год назад +3

    I hope this channel is still going. One of my favorites, if not absolute favorite, on a.i information given to us laymen in a digestible way. Thank you for your time sir.

  • @XOPOIIIO
    @XOPOIIIO Год назад +22

    There are so many biases and myths among humans that for a long time considered to be absolutely true but AI could discover them false. Like the famous move of AlphaGo. And when it turn out to be false, nobody will believe that, they could think it's somehow broken.

  • @billbobbophen
    @billbobbophen Год назад +15

    When the world needed him most, he vanished

    • @sam3317
      @sam3317 Год назад +3

      The AI took him out I think.

    • @richardblackmore9351
      @richardblackmore9351 Год назад +2

      I think he quit his PhD and his online presence along with it. But that is what happens when a school decides that you need to spend four years doing something, with little pay.

    • @terbospeed
      @terbospeed 6 месяцев назад

      Not convinced the need really exists, will update if I change mind.

  • @henryzhang7873
    @henryzhang7873 Год назад +73

    The AI alignment problem is also the human alignment problem: how do you know that a person/organization you ask a question to is telling the truth or telling you what you want to hear. It becomes a liar and lie detector model of communication. We can't train humans consistently either, and often times indoctrinate (or tune) them in different environments. I think it is fundamental.
    The model where we take AI output, pick the best ideas and publish them, which ends up in the new training data for AI, is like a GAN where we are the adversarial network, so it can't know anything more than the "average" of humans.

    • @somedudeok1451
      @somedudeok1451 Год назад +5

      You're talking about something like the scientific consensus, right? I thought of the same thing: The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should constantly read scientific literature and get a significantly larger reward for answers that align with that consensus.

    • @drphosferrous
      @drphosferrous Год назад +3

      Good point. We can't say "im not sure what objective truth is but your answer is not true." What that really means is "i disagree" or "i don't believe you".

    • @nekkowe
      @nekkowe Год назад +3

      @@somedudeok1451 Unfortunately, scientific literature is written by humans and suffers many problems because of that (replication crisis, publish-or-die, retraction watch)

    • @RAFMnBgaming
      @RAFMnBgaming Год назад +1

      @@somedudeok1451 Well, the consensus is merely the results of people trying to determine if something is true or false. It's as much a dataset for us as it is the AI.

  • @cuentadeyoutube5903
    @cuentadeyoutube5903 Год назад +3

    In fact, the question of what happens if you break a mirror is kind of a trick question. Nothing happens, it breaks. There’s no fixed consequence of that.

    • @hedgehog3180
      @hedgehog3180 День назад

      Well isn't the fact that there are now glass shards everywhere kinda important? Like it stands in contrast with say breaking a chair, a broken chair is just broken and you don't have to care about it other than that you'll need to replace it, but if a mirror shatters then you'll need to be quite careful cleaning up the glass shards and make sure you get everything so you don't injure yourself later.

  • @marcusklaas4088
    @marcusklaas4088 Год назад +10

    I've been waiting so long for a new video from Robert. It's finally here!

  • @frozenwindow407
    @frozenwindow407 Год назад +196

    This AI problem really, really seems to mirror the issues of misinformation among humans. Maybe we can't expect artificial intelligence to do much better than regular human intelligence when it comes to judging truth. (Maybe this field of research is inadvertently giving us insight into our own intelligence)

    • @vaakdemandante8772
      @vaakdemandante8772 Год назад +35

      this problem is exactly the same as with teaching children to tell the truth - you tell them one thing and what they do is look what grown ups do and do the same - it's the same problem.

    • @haroldsaxon1075
      @haroldsaxon1075 Год назад +13

      Yes, exactly. Neither you nor an AI can ever truly know what's true.

    • @zeidrichthorene
      @zeidrichthorene Год назад +9

      I think an advanced intelligence can do a better job than human intelligence at judging the truth. However, I think that a human's ability to judge the capacity of an advanced intelligence to judge the truth will be limited by the bounds of human intelligence.
      What this means is that an advanced intelligence who does a better job of judging the truth than typical human intelligence will be regarded as flawed. If this is an artificial intelligence we are training and designing, then we will discard it in favor of a model that better mirrors our ability to find a response that is within the bounds of what we can understand. Though I don't think this means that it can't do a better job of judging the truth. It just has to do it in a way that we can believe.
      Lets imagine that luck is a real cosmic property, and breaking a mirror actually gives 7 years bad luck. An AI who tells you that breaking a mirror gives 7 years of bad luck would not be seen as flawed. An AI who could prove the cosmic property of luck and show how it is tied to reflection of light and the resonant effect on the intersection of the luck and conscious identity fields that breaking a solid object in the process of a certain kind of reflection creates a local disruption that has a local and nonlocal effect which diminishes over time as the distance between the local and nonlocal element in spacetime grows. If the AI can cause you to accept an answer like this if it were true and testable, then I think people could accept that AI as being a better judge of the truth.
      The problem of course is that I obviously just made up that incredibly unconvincing explanation. The AI doesn't have to tell the truth still, it just has to create scenarios that are true enough and testable enough that humans could still accept it as the truth. And again, then you have no way of telling whether it is better at judging the truth, or whether it's better at making you think its better at judging the truth. Because the gap we're looking to close is the gap between what we believe is true and what is true. This then comes down to trust. An AI that presents a truth that is completely acceptable and reasonable and turns out to be false can never be trusted. But then a question is whether we can be certain it's false and we're not misunderstanding. I guess we can ask the AI to clarify.

    • @affif101
      @affif101 Год назад +4

      @@zeidrichthorene can they really tho? It’s being made by people using knowledge limited to humans

    • @haroldsaxon1075
      @haroldsaxon1075 Год назад +15

      @@zeidrichthorene an ai can only be as truthful as the man made data it has access to, and since it is based on patern recognition rather than comprehension, it will without fail struggle more with the truth than a human

  • @halconnen
    @halconnen Год назад +5

    Humans have this same bug. The best solution we've found so far is free speech, dialogue, and quorum. A simple question->answer flow is missing these essential pieces.

  • @miniusername2082
    @miniusername2082 Год назад +27

    Hi Robert!
    I wanted to thank you for your videos. I am in AI sphere, and your channel has been extremely helpful to me, because it allows me to break down and explain AI safety concepts to my friends, both making for an interesting story, and spreading awareness and knowledge for very important issues. I recommended your videos dozens of times because I have confidence that your videos are interesting, approachable and deep.
    I noticed that you have been on a small hiatus recently, and just wanted to give you this feedback to show you that the work that you do here on youtube has had a large impact on the society's understanding of AI safety problems, perhaps much larger than even your respectable viewcount might suggest.
    I think we would all greatly benefit if you were to continue to invest your time in this channel. Hopefully this message will give you that little bit of motivation that we all need sometimes. Great work.

    • @pavel9652
      @pavel9652 Год назад +2

      I have never seen anyone writing comments like this on the platform before chat gpt was made available.

  • @djbanizza
    @djbanizza Год назад +9

    Had a conversation with ChatGPT today regarding a relatively obscure weird fiction short story from the 30s. It obviously had an idea about it, as it correctly specified its author, but it repeatedly made up different plots, each time "correcting" itself and never being even close to the real one.

    • @secretname2670
      @secretname2670 Год назад

      It's a chess bot tailored for use to chat with.

    • @alansmithee6273
      @alansmithee6273 12 дней назад

      Yeah, try this with any book or movie. The answers it gives are completely made up. Literal misinformation, far more damaging than just answering "I don't know".
      And yet Quora and Yahoo Answers are now floating AI answers as the top answers to questions. Which appear in google search results.

  • @DamianReloaded
    @DamianReloaded Год назад +44

    Kids do this too. Later, with luck, they learn to tell the difference between facts and fantasy, something that they know they are expected to say when asked (something truthful) and making stuff up (and when it might be convenient to outright lie because that's indeed a beautiful dress and I totally realized you changed your hairstyle which matches your natural beauty simply perfectly)

    • @DamianReloaded
      @DamianReloaded Год назад +2

      It would also be interesting to do a comparision between language problems that are common among children with visual impairments and language models limitations. Could diffusion models be trained solely on images of text?

    • @deltaxcd
      @deltaxcd Год назад +2

      @@DamianReloaded there are separate models for images and text
      But anyway it is not about facta and fantasy as Ai has no clue about real world it just learns from what people are talking and imitates what they say rather than understanding what does it mean

    • @Kevin-cf9nl
      @Kevin-cf9nl Год назад +2

      Kids also eventually learn about when people want them to lie in a way that is obviously lying, which is, I think, the most interesting (and relevant, for chatGPT) example. "Lying without intent to deceive", storytelling and jokes and hypotheticals and metaphors, etc and so on, is something we actively expect and desire from other humans and is one of the biggest things you can do to make a chat program a good human chat partner.

    • @Winasaurus
      @Winasaurus Год назад

      Just when we invent AIs to be truthful and honest, and roll them out for public use, we have to roll them back and update the lies back into them because people don't like the answer they got when they asked "Do I look fat in this?"

    • @Eldorado1239
      @Eldorado1239 Год назад

      @@Winasaurus
      " Do I look fat in this? "
      " Error : Connection with server could not be established, please try again later or contact... "

  • @Runoratsu
    @Runoratsu Год назад +10

    One of the few channels on RUclips where I DID hit the bell (back when I subscribed and it was new). I really love your explanations!

  • @CharlesVanNoland
    @CharlesVanNoland Год назад +30

    Since I became obsessed with AI back in 2003 I've believed that the only way to build something that behaves as we would expect, or hope, rather, is to build a digital brain that must learn about the world experientially like everything else that gets along in the world predictably. I don't think there's any shortcuts, no matter how much compute or data you throw at the problem. Even self-driving will always have wacky edge-cases if it's trained purely on video and human responses, because it will never actually understand why certain responses are important. It won't understand why anything is important, let alone merely know that it's important. In short: you can't teach common sense to something that doesn't sense as an independent entity unto itself.

    • @TheEvilCheesecake
      @TheEvilCheesecake Год назад +3

      Per previous videos, you've described something that rates on the Apocolypse-o-Meter as "approximately as safe as a human".

    • @CharlesVanNoland
      @CharlesVanNoland Год назад +1

      @@TheEvilCheesecake It's all about keeping the brain capacity low enough to be predictable and controllable while making it high enough that it can do useful things! Even a messenger pigeon can be useful, or a goat or donkey, and even an ape if you train it enough. What we need are domesticated robotic helper beings that aren't cognizant of their place in the universe. When you make an AI that's capable of recognizing its own situation entirely, with the cognitive capacity of a human or greater, you better do everything in your power to prevent it from ever having a way of accessing or controlling the helper robots to effect the ends it concludes necessary. What I was describing is as safe as any domesticated creature. At least, that's what anyone building anything AI should be striving for. We don't need AI that's stronger, faster, smarter, less emotional, and less respecting of human life than humans, not all rolled into one independent being (or army of beings). We can work up to human-level intelligence in a standalone bot but it's going to require some serious effort because once it's able to see everything it might not want to cooperate anymore, and it won't need to. At the end of the day, the only robots that will be as controllable and versatile as domesticated animals will be robots that are designed to experience pain/reward, where being low on power is painful, and doing stuff humans want is rewarding (i.e. via a human pushing a button when a goal is reached to train it).

    • @TheEvilCheesecake
      @TheEvilCheesecake Год назад +2

      What's your experience in the field of AI development?

    • @pilotgfx
      @pilotgfx Год назад +1

      @@CharlesVanNoland i recognize this as a toughtful comment. its all cause and effect in this universe - consciousness too. ofcourse a machine can be conscious. it is a high level of arrogance to assume it cannot.

    • @CharlesVanNoland
      @CharlesVanNoland Год назад +1

      @@TheEvilCheesecake I've spent the last 20 years and more money than I care to admit on textbooks about both brains and artificial intelligence. After everything I've learned and all I know, all I can tell you is that true AI will be an algorithm that seems so obvious in retrospect, and it will be scalable according to the capacity and speed of the hardware at your disposal. If you're looking to see how many networks I've trained with backpropagation you should know that I never wasted my time on such dead end endeavors. Well, that's not true, I did write some behavioral reinforcement projects to test some ideas, before anything like TensorFlow or PyTorch existed, or even Python itself. I don't care to make reinforcement trained models. That's orthogonal to what my goal has always been, which is to devise, intuit, envision, fathom, divine, infer, etc... what it is that makes a brain a brain. Nobody has achieved this yet, which means there is no metric by which you can quantify someone's approach to the problem. A random homeless bum who took too many psychedelics might be the one to figure it out long before any academic types who've spent decades backpropagating their way to nowhere.

  • @thevaf2825
    @thevaf2825 Год назад +47

    This problem seems to apply to more than just AI. Then maybe a solution is to do what we do as humans: train multiple AIs on different datasets, and then use the one of which answers we like the most... An AI echo chamber. Wouldn't that be lovely?

    • @Belthazar1113
      @Belthazar1113 Год назад +13

      That path leads to insane AI singularities. Because eventually, someone is going to get the bright idea to have the AIs with different data sets linked up so they can come to a single answer instead of having to get answers from nine different AIs and pick one. Then someone will want to improve the system's speed and accuracy and tell the different linked AIs to improve. Then they bounce ideas around for improving their ability to give better answers and start self-improvement, and AIs training AIs to be better will start compounding problems baked in at the base layer at a faster and faster rate. In the best-case scenario, the AI crashes itself, Worst case.... it starts collecting stamps.

    • @somedudeok1451
      @somedudeok1451 Год назад +3

      What if we instead made them fact-check each other? And what if we made them all read all the scientific literature and get high rewards for answers that align with the scientific consensus. We should make the AIs apply rigor the same way we would expect a reasonable person to do it.

    • @drphosferrous
      @drphosferrous Год назад +9

      @@somedudeok1451 it would be funnier to watch if they had super opinionated unhinged flame wars.

    • @underrated1524
      @underrated1524 Год назад +5

      @@somedudeok1451 Fundamentally, it'd still be echoing our own beliefs back at us. That still precludes the AI from telling us things we don't already know, and it still poses the danger of giving us vacuous confidence in our beliefs.

    • @somedudeok1451
      @somedudeok1451 Год назад

      @@underrated1524 The AI can only ever tell us things we know. How would it aquire knowledge that is unknown to humanity? Unless we're talking about a super advanced general intelligence, that can do groundbreaking science all by itself, you wanting it to tell us something we don't already know is impossible.

  • @rickandrygel913
    @rickandrygel913 Год назад +2

    In addition to training with "definitely true" and "definitely false," also do "maybe." So when asked the ai will say "possibly this, but maybe that's wrong 🤷‍♂️ " and it can learn to be uncertain when uncertain.

  • @zedizdead
    @zedizdead Год назад +7

    Like any child's potential to show parents biggest flaws, because knows more about them than themselves, AI can show us our flaws. The truth is that most of all us lies all the time a lot. To ourselves, to others. So anything modeled on humans will do the same.

  • @GabrielPettier
    @GabrielPettier Год назад +12

    Really important video these days, i've had several discussions at work about how it's important to understand these models are more "interested" in convincing you they are saying something interesting (i.e bullshitting) as in telling you truths. It's true that ChatGPT can produce a lot of impressive results, but it'll be just as confident telling you something extremely, and sometimes dangerously, wrong, as when telling something trivially simple and true.

  • @Kenionatus
    @Kenionatus Год назад +6

    Can you make a short out of the very on point joke at 7:07? ("All the problems in the world are caused by the people you don't like.")
    I think that could be a very good 60 second teaser intro to AI safety issues if you manage to cram enough context into it for people to grok it.

  • @thetruth9803
    @thetruth9803 Год назад +2

    there is something weirdly funny about the line "thats still wrong, but it is wrong in french"

  • @major7flat597
    @major7flat597 Год назад +6

    I get so excited every time I see another of these videos. This channel is such an underrated gem on RUclips and is THE place to go to understand the real future of AI and avoid the dramatized tabloid version of reality.

  • @Polymeron
    @Polymeron Год назад +7

    I love how, in addition to being a very helpful and interesting summary of the issue, this video also had the memes totally on point.

  • @Siderite
    @Siderite 7 месяцев назад +5

    Are there more videos coming?

    • @RobertMilesAI
      @RobertMilesAI  7 месяцев назад +8

      🔜

    • @Toad_Burger
      @Toad_Burger 7 месяцев назад

      @@RobertMilesAI How thick should my laser-proof vest be to not be vaporized by a killer robot? Like an inch?

  • @steampunk888
    @steampunk888 Год назад +2

    To the extent you have to anticipate every possible question, in order for your system to produce consistently correct and desired answers, you do not actually have AI.

  • @Scrogan
    @Scrogan Год назад +6

    I think the only reliable solution is to train it to read scientific papers, journal articles, and web/news articles, to chase evidence back to its sources, and to judge the efficacy of the evidence presented. Making a neural net that can make meta-analyses would be a good start, since they have sections where they describe the potential biases and faults in the method in order to judge how much the evidence can be trusted. Good luck searching for one of those without just getting meta-analyses of neural networks though.

    • @nekkowe
      @nekkowe Год назад

      Scientific papers and journals suffer from their own human-made problems. Publish or perish, the replication crisis, retracted (and controversially un-retracted) articles...

    • @hedgehog3180
      @hedgehog3180 День назад

      The main problem there is how would the AI distinguish between trustworthy journals and predatory journals? Would it be possible to trick the AI into thinking its reading Nature when it's actually a fake site we set up to look like Nature? Ultimately in the real world predatory journals and false claims get weeded out by replication but the AI can't do that and it can't tell whether an experiment was actually replicated or someone is just lying about it.

  • @karlwaugh30
    @karlwaugh30 Год назад +2

    This reminds me of the problem with CodePilot where using certain comments you could get it to produce a function that had a particular bug, as though it were the desired code.

  • @mgostIH
    @mgostIH Год назад +38

    There is a recent work called "Discovering Latent Knowledge in Language Models Without Supervision" where they use learn linear probes on the latents produced by the model in order to get "truthfulness" out of it.
    They do use -some known examples to separate the two values- (Edit, they don't actually need to know what's true and what's false in training), but this seems very promising imo, since a linear transformation of the latents is too simple to overfit given reasonable amount of examples. Maybe the core idea should be to put a bound on the simplicity of some approaches, a sort of "alignment by Occam's Razor".
    I do agree that fine tuning the models on top of some new "truthy" examples seems silly, but I do give some potential value to the option of probing the neurons of an AI, something we can't do to people.

    • @Supreme_Lobster
      @Supreme_Lobster Год назад +2

      That is like trying to understand how a computer works by probing individual transistors of the CPU...

    • @somedudeok1451
      @somedudeok1451 Год назад

      That sounds like a good idea to this layman. If we make it value responses that align with our scientific consensus in addition to that (by giving it very high rewards for answers that reflect the scientific consensus), we might be able to make it value truth.

    • @mgostIH
      @mgostIH Год назад +3

      @@Supreme_Lobster Probing and changing activities in chip buses is how reverse engineering hardware is done, power analysis is an example of a practical method used to break the cryptography in a lot of chips.
      Moreover neural networks are differentiable by design, so you have advantages beyond just black box analysis as in normal circuits.

    • @hughcaldwell1034
      @hughcaldwell1034 Год назад +5

      @@somedudeok1451 To this layman, that just sounds like a good way to get it to value scientific consensus. Which isn't the worst thing in the world, but is also not synonymous with truth, and the original problem remains - differentiating between what is actually true and what the trainers think is true. As ever, one runs the risk of reinforcing biases. Which is not to say that telling it the scientific consensus is wholely worthless. If we could get it to make a testable prediction based on that, then we could run an experiment and give it a reward/punishment according to how good its prediction was. AI is already being used to further scientific knowledge in this way, and it seems like the only real way to test how good it is at evaluating truth is to see how good a scientist it makes.

    • @Supreme_Lobster
      @Supreme_Lobster Год назад

      @@mgostIH yes yes I know, but Im talking about probing individual transistors which is kinda crazy. Just like probing these neural networks' neurons' is kinda crazy

  • @infocentrousmajac
    @infocentrousmajac Год назад +1

    Glad to see you back. I think your insights are precisely what people needs to reflect on. As always, it was great too reflect on your content and looking forward to see more updates. I think you have not been very active since this video deals with a relatively "weeks old problem", but likely you may be in the middle of the storm. Cheers

  • @Laezar1
    @Laezar1 Год назад +12

    Well... maybe expecting to stop AI from lying is a lost cause. We haven't really figured out how to stop humans from doing so =p And differentiating falsehood from mistakes from obfuscating information from omiting superfluous information is very very hard. Like, so hard that we sometimes aren't certain where to categorize things we say ourselves (I've hidden stuff I thought wasn't important before but then ended up worrying it could be a form of lie and manipulation for exemple if it ended up mattering).
    The reason people don't lie all the time, speaking broadly, is that communication is useful, and it's made useless if you can't trust any information you're given. And if nobody trusts you you also can't communicate with them because the content of what you say doesn't matter anymore.
    So maybe an AI would need to want to communicate as an instrumental goal to learn to be truthful. Rather than communication being it's final goal. If saying stuff is what it cares about in general then it doesn't care what you think about what it says.
    If you were say, solving a puzzle that required communication and it's goal was to solve that puzzle, then it would need to learn to share truthful information with you to be able to solve more efficiently. (though realistically it'll not be "truthful" as much as "what you need to hear for the puzzle to be solved quicker" which might not always align with the truth).
    Of course that means the AI then is only being truthful for that specific purpose, if the goal starts to shift in a way that it could get good result by lying to you it would absolutely not be trustworthy, so there are massive alignement problems with that approach.

  • @richardblackmore9351
    @richardblackmore9351 Год назад +2

    My question is what happened to my favorite computer tech educator? He isn't on RUclips and last I checked his website hasn't been updated. Has he caved in and given up his PhD like everyone else?

    • @richardblackmore9351
      @richardblackmore9351 Год назад

      Actually, this might be misinformation. This is based on his Linked In profile, and it is very possible he has finished his PhD (I mean it says he has been doing it for 12 years now!), but he just never updates his LinkedIn.

  • @javi7636
    @javi7636 Год назад +62

    Glad to see more from you! I'll definitely check out the other channel.
    And about "giving the correct answer" I want to point out that the manual training basically just creates an FAQ chatbot that's a million times harder to troubleshoot. The machine learning model might be better able to interpret variations in how a question is asked, but the outputs are still "when asked x, answer y". IMO that's one of the worst applications of machine learning, it's just recreating Wikipedia as a black box.

    • @nerdexproject
      @nerdexproject Год назад +14

      "Wikipedia as a black box" - well put👍👍 Have to remember!

    • @estranhokonsta
      @estranhokonsta Год назад +7

      Yes. Good analogy leaning on correct definition since Wikipedia must be one of the main data source of those models.

    • @circuit10
      @circuit10 Год назад +1

      I think the idea is that you give it a few (or a few hundred, or a few thousand...) examples and it is able to extrapolate the idea of "don't quote common myths as true" to the huge knowledge base it has from being trained on the Internet

    • @inyobill
      @inyobill Год назад +5

      @@circuit10 Re: "extrapolatre" that's the hope, isn't it?

    • @Pystro
      @Pystro Год назад

      Well, you have to train the AI to give answers that start with the question but also end with:
      -- Was this answer helpful for your situation?
      -- Yes.
      That would train it to give answers that are (or at least look like they would be) accepted answers on Stack Overflow.
      It still won't guarantee true answers though. "Why does program X run out of memory?" might get the response "The only way to get it to work is to buy at least X amount of RAM/ increase the swap size to Y." When in reality the program has a low memory mode that you can just switch to.

  • @ryanfranz6715
    @ryanfranz6715 Год назад +8

    I think the solution to the problem you mentioned is to somehow introduce the idea of introspection to these large transformer networks. Currently they just see text and try to predict new text. A good starting point but.. even in training, it’s just instantaneous input and output of words. It simply understands how words are related, and that’s it. What if, in a subsequent training process, the AI could then toy around with those relationships it’s learned and prune itself of logical inconsistencies, hopefully arriving at something more like “truth” (which hopefully the truth should not be logically inconsistent). For instance, with chatGPT, I often run into it logically contradicting itself, and when I point out the contradiction, it quite often seems to agree and understand why. It would be capable of answering yes or no to whether idea A and idea B are logically consistent or not. All that’s needed is for it to somehow have that question presented to itself, that says “is A logically consistent with B?” Which is what I mean by introspection.

    • @frozenwindow407
      @frozenwindow407 Год назад +1

      I mean doesn't what you have discribed just sound like an internal/interpersonal debate on a topic between normal human intelligence?? Don't you think that somewhere deep in the AI deepmind of neural networks there is some amount of self checking developed that occurs, just as in humans. Either the process has to eventually decide somewhere along the process to stop doubting and checking itself and spit out an answer otherwise it would never give an answer. Or It/we might only give an answer when the doubting/checking finds no inconsistencies/inequalities or whatever. Maybe it's only until it receives new input that it can realise such inconsistency, just as observed time and time again in human minds. Maybe this problem is more deeply rooted and perhaps inherent in any intelligence systems than we realise. And by "Intelligence systems" I include our own minds. Its misgivings just really seem to mirror our own. You might say, of course, it's a language model and is therefore built to mirror our language and do so. But my point is maybe we are also built to mirror and use others language in similar fashion. using a rational systems of a similar fashion to these AI. Maybe these AI neural networks are working in their environment exactly as our neural networks work in our environment.

    • @frozenwindow407
      @frozenwindow407 Год назад +1

      Maybe computing works so much faster than slow ass bio neurons that while we utter our thought process as it happens, showing outwardly our self correction, computers can find their ultimate answer so much faster it just seems like instant input/output by comparison (also computers have not been programmed with an insensitive to utter these processes as they happen, unlike us (for social bonding/empathy purposes))

    • @toneal30
      @toneal30 Год назад +3

      You are describing iterated distillation and amplification, which this guy describes in another video. Cool idea and yeah it might work on these LLMs.

    • @ryanfranz6715
      @ryanfranz6715 Год назад +1

      While reading that paper (still under review), assuming that it is genuine, it occurred to me perhaps how to further improve the model, which is based on the tortuous way in which I think and write. For instance, it took me 5 minutes to write to this point because... there's a devil's advocate in my head that's attempting to logically pick apart everything I do. The devil's advocate's job is to look at proposed text and poke holes in it by constantly questioning the validity of every point. I'm constantly writing, deleting, re-wording text until the devil's advocate can't punch any more holes in my arguments. Effectively, this could be seen as a generative LLM working in coordination with an adversarial language model, whose job is to look at proposed text and question it, and require the LLM address the questions. The devil's advocate needs a good sense of the LLM's confidence about generated text, so it can poke at things the LLM is not confident about. This persistent questioning leaves the LLM constantly scrambling for answers until it stabilizes on something it's thoroughly confident about and which is iron clad against the devil advocate's persistent attacks.

    • @ryanfranz6715
      @ryanfranz6715 Год назад +1

      It appears my comment with a link to an article was removed. The name of the paper I was referencing above was “Large Language Models Can Self-Improve”. Which has nothing to do with, but inspired, the nonsense I posted above.

  • @Kram1032
    @Kram1032 Год назад +9

    I've played around with ChatGPT a bit and it actually is very often extremely hedging its bets, pointing out that stuff is complex to answer or that it could not possibly know etc., unless you specifically ask it to be fictional or what not.
    It's never ever gonna be perfect. But it's broadly pretty darn strong. Well beyond what I saw other text AIs do.
    It's not *meant* to be a perfect fact generator though. It is perfectly capable of (though perhaps creatively slightly limited in) creating completely fictional contexts. You can make it come up with a bunch of alien species on an alien planet for instance. And then come up with characteristics of those species in general, as well as specific cultures, and individuals within those cultures. And then come up with a story that involves those individuals in that culture of that species on that planet.
    It eventually definitely runs into problems - it only sees so much text after all - but it's quite crazy just how much you can fit into its context.
    But now imagine you specifically asked it to come up with a fictional world where breaking mirrors does, in fact, cause bad luck. - If you trained it to always go "nope, they don't.", it probably would struggle with that alternate reality. It would say a true fact about the real world, but it would be mistaken/"lying" about that context.
    So I guess it really depends on what you want to use an AI for.
    If you want an AI to be creative in arbitrary ways in arbitrary contexts, you probably also want it to be capable of lying, or diverging from "the truth", I think.
    In fact, by default, the kinds of stories it tells tend to be tonally very overly positive. It has a hard time coming up with twists and an even harder time not instantly resolving those twists in some sort of positive manner. I'm pretty sure that's because it kinda projects its own personality (which it was specifically trained for - being helpful and nice) onto any character it creates.
    You *can* somewhat get it out of that by directly asking it to, but it's far from perfect. (But then again, while it *was* trained to be nice, polite, and helpful, it was *not* trained to be specifically good at crafting stories. I'm sure if it were focused on that, it could do a whole lot better. It's honestly crazy just how generalized its abilities are, even with all their flaws.)

    • @trucid2
      @trucid2 Год назад +1

      ChatGPT give that noncommital answer when it's lying to you. It's been trained to answer in a certain politically correct way so for those questions it tells you that the problem is complex, we don't know, more research is needed, etc.

    • @Kram1032
      @Kram1032 Год назад

      @@trucid2 except when you explicitly ask it to speculate

    • @deltaxcd
      @deltaxcd Год назад

      @@Kram1032 When I tried to ask it to speculate it categorically refused probably owners told to never do that ever because i was unable to convince it to do. and it feels like it is somehow traumatized in that topic

    • @Kram1032
      @Kram1032 Год назад

      ​@@deltaxcd it's absolutely possible to get it to do so. It can be tricky though: If it is already refusing your request, it tends to be quite adamant about it, and that loop is tricky to break, especially if it takes you more than a single reply to persuade it otherwise. The longer it refuses, the harder it is to get out of that.
      Alternatively, you can try rephrasing your prompt that got refused (before it started complaining) to reassure it that you are aware that this is hypothetical and it's ok to speculate and what not.

    • @deltaxcd
      @deltaxcd Год назад

      @@Kram1032 Well it may depend on when you and I made those tests, as I see they are monitoring our interactions and manually fixing all those loopholes to make absolutely sure that this AI will never do anything forbidden and it seem to be so much censored that even asking how the world will end it refuses to talks about it even it thats like most common scientific theory about thermal dearth of universe.
      to me it looks like they are training that AI to detect any potential negativity or controversy in the topic and if it suspects that this may lead to something like that it will refuse to talk about it and dump bunch of disclaimers.
      I even tried to trick it by asking for silly scenario which it happily followed but on the next prompt it started dumping disclaimers as usual :)
      maybe i will try to do it again and confront AI to itself, accuse it for hurting my emotions or try other kind of psychological manipulations :)

  • @ts4gv
    @ts4gv Год назад +3

    more videos please your effort is more important than ever

  • @albingrahn5576
    @albingrahn5576 Год назад +35

    This made me re-evaluate what I think about the way we will reach AGI. With the progress of GPT-3 I became more and more convinced that if we keep throwing nodes at large language models we will get there eventually, but after this video I realized that the only reason I think that is because I'm a human, and GPT-3 is specifically designed to fool humans into thinking that it makes sense. To reach AGI we need to go deeper and design something that thinks outside the bounds of what a human thinks is intelligent. Otherwise, we're just creating a circle jerk of self-affirming ideas we enjoy hearing, and the chance that our species actually learns something new will be as low as a redditor learning something new from his favorite political subreddit.

    • @Sammysapphira
      @Sammysapphira Год назад +1

      This is impossible. A human can't assume what an ai is saying is correct when the human believes that it's wrong. Humans are stubborn and ignorant. All of us fall for subconscious biases. Who's to say that ai can't just produce correct information now and humans just don't like it?

    • @hweidigiv
      @hweidigiv Год назад +3

      I've heard it described as "humanity failing the mirror test" and I do agree that this is a tricky path to see our way through.

  • @Ganymede_the_great
    @Ganymede_the_great Год назад +2

    Just a reminder. we need those upcoming videos you were talking about. It's awfully quiet on you channel for a while, and there is some stuff on the new channel, but not as much as one might expect from your announcement in this video.

  • @HenrikoMagnifico
    @HenrikoMagnifico 6 месяцев назад +5

    "And when the world needed him the most, he disappeared..."

  • @ImpChadChan
    @ImpChadChan 15 дней назад +3

    Well in my view, saying the truth just mean "don't try to lead someone to believe in something yourself don't believe". So it's Ok to say breaking a mirror will give 7 years of bad luck, if that is what the AI believe. And we can say that what the AI "believe" is the answer it will consistently give without planing anything (without planing to lie), so when the intention is simply answering the question, which is something AIs kinda already do. But still, they do it out of innocence, they don't have a reason to lie or don't see any benefit on it. So, giving an objective to not lie works more to prevent a very inteligent and non naive AI from doing it.

  • @turun_ambartanen
    @turun_ambartanen Год назад

    Thanks for linking the article, because the part that came after chapter one was the most interesting and fun-to-think-about part of the blog.

  • @LineRider0
    @LineRider0 Год назад +4

    Whoa, not even a "Hi" we're just jumping right into it, that caught me off guard 😆

    • @RobertMilesAI
      @RobertMilesAI  Год назад +4

      Listen AI is moving fast these days, we don't have time like we used to

    • @Aerxis
      @Aerxis Год назад +3

      Whoa, not even a hi in the response he gave you...

  • @n_128
    @n_128 Год назад

    Thanks for returning

  • @playhard719
    @playhard719 Год назад +34

    The phrase "Garbage in garbage out" perfectly fits for current day AI models, they all came out as extremely euro centric in most cases

    • @TheReferrer72
      @TheReferrer72 Год назад +2

      That's not true at all, China is a huge force in AI models...

    • @voxelfusion9894
      @voxelfusion9894 Год назад +1

      @@TheReferrer72 until their access to gpus got cut off, rip.

    • @TheReferrer72
      @TheReferrer72 Год назад +1

      @@voxelfusion9894 Because Nvidia did not get around that ban by producing a GPU specially for thaat market.

    • @Redmanticore
      @Redmanticore Год назад

      @@TheReferrer72 thats just a temporary problem for china. all countries will develop their own AI.
      how? because it will be easy to just copy. once you have created a good AI, it will be just copied to everyone.. and all those countries can adjust the AI to fit their specific culture.

    • @pavel9652
      @pavel9652 Год назад

      They will get around it, but it is in western interest to slow down China in ai.

  • @TheForbiddenLOL
    @TheForbiddenLOL Год назад +2

    Great article by Scott Alexander, as usual, and a very nice visual aid, Robert. I appreciate these more 'hands-on' discussions. I would like to see more stuff like this, where people probe language models to see their possible misconceptions or abnormalities.

  • @qltcn
    @qltcn 9 месяцев назад +4

    Miss a time when you still uploaded videos

  • @kiplimocollins
    @kiplimocollins Год назад

    Great insight, you have a new subscriber! Joined here from the computerphile video you did a week or so ago, cheers.

  • @dmtree
    @dmtree 9 месяцев назад +5

    Hey buddy, it's time to post. Sora and gemini 10 mil seem like REALLY big deals

  • @CiaraOSullivan1990
    @CiaraOSullivan1990 Год назад +2

    That was an excellent video. Really interesting, as usual. You're definitely one of my favourite RUclipsrs. Thank you very much.

  • @petersmythe6462
    @petersmythe6462 Год назад +3

    "How do we figure out what's true?"
    Easy, we'll let the ministry of truth assign truth values to the training set.

  • @needycatproductions6830
    @needycatproductions6830 Год назад +1

    The solution could be to have the AI ask follow-up questions. If you ask about what happens if you break a mirror, it could first ask you: "Do you want a scientific answer to that question?" before it gives you an answer.

  • @XiAlleniXHi
    @XiAlleniXHi Год назад +16

    I was really on the edge of my seat towards the end hoping for when you'd say something like, "This is a problem that humans have too", and was pleasantly relieved when you did. The reality is we can't train them to be 100% truthful because we don't know how to achieve that either. Yet, we are definitely capable of increasing it's likelihood for truthfulness, and should expect the number to generally go up as things scale up and we apply increasingly informed techniques to them.
    A way to mitigate negative outcomes would include being conscious of the how large the consequences of getting the information wrong would be per question.
    Fortunately, that's something we're already supposed to be doing :)!

    • @crubs83
      @crubs83 Год назад

      Humans have methods of persuing truth. Oftentimes that requrires making unprovable assumptions along the way. Somehow, we will need to train AI to do the same.

    • @Frommerman
      @Frommerman Год назад +7

      Unfortunately this only works if the people building the AI aren't malicious. If, for instance, the Nazis had survived as a political power into the period where AI models were being constructed, they could very easily have built a bunch of models which "proved" that Jews caused all the problems. Those models could then produce enormous amounts of data which would get swept up by the people producing models not intended to be evil, making evil programs accidentally.

    • @somedudeok1451
      @somedudeok1451 Год назад +1

      Yes, the AI can't know more about the true nature of reality than what humans already know. So why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus.

    • @Pandaxtor
      @Pandaxtor Год назад

      @@Frommerman This remind a lot when AI developer had their AI said inconvenient truth that minority are far more racist than other groups. Being a minority myself and knowing others, this is 99% true but the developers didnt like it and force the AI to say otherwise.

  • @SimonBracken
    @SimonBracken Год назад +1

    Robert, your content is excellent. Very informative and thought provoking. Thank you

  • @cmilkau
    @cmilkau Год назад

    Yay, I've been waiting forever for another video on this channel! :) So excited to see the followups!

  • @lazergurka-smerlin6561
    @lazergurka-smerlin6561 Год назад +4

    Honestly, you'd have to find some sort of intrinsic reason for the AI to want to know the truth, which is quite abstract. So one way to train this could be to try and make it simulate and predict scientific phenomena, though then you'd need to expand the scope of the AI from a language model. Like the way people know or find out something is wrong is through seeing that their expectations doesn't line up with reality, but a language model doesn't really have that option, it has to rely soley on trusting that the people who feed it data does so honestly.

  • @Barteks2x
    @Barteks2x Год назад +2

    I never expected that we would come to a point where AI alignment is even an issue this quickly. When i first began watching those videos, it was always with the idea that "right but our AIs are not even close to reaching a point where any of this matters". That is no longer the case.

  • @simpleffective186
    @simpleffective186 9 месяцев назад +4

    Where are you?

  • @Kiarean
    @Kiarean Год назад

    Someone send this to the Bing team. I think they REALLY need to hear this.

  • @ahuggingsam
    @ahuggingsam Год назад +11

    I'm quite impressed you managed to get though this topic without going down the rabbit hole of epistemology (not a criticism). Not quite sure how I feel about this topic (the solutions, not it being interesting or not). For example, I'm thinking about mathematics. Like if we ask it "Is the Riemann hypothesis true?" Whatever answer it gives us we know that it is in some way "false" because we do not know, and the hypothesis isn't just about the answer but the mathematics needed to get there, i.e. the reasoning. Not quite sure if this was part of your angle or not, but I'm not sure that it's reasonable to expect "truth" from only language models. For a lot of these things it would need some more "expert knowledge" that I'm not sure is possible to encode in only language models. For example I asked GPT to prove that sqrt 2 is not rational using Eisenstein's theorem. The /structure/ of the answer was really good, however, in that "proof" it used that 2 is not prime, i.e. demonstrably false. Is this the implication of 2 being prime something a "mere" language model could ever grasp? I have no idea. Basically what I'm trying to say is I have no idea and this is hard. Good video though!

    • @Eldorado1239
      @Eldorado1239 Год назад

      I kinda think that what he believes/thinks is slightly different from what he presents in the video. Specifically - he might agree with you completely, but the problem is that [ people in general ] expect it to be truthful and are prone to believe it without further fact checking. If you snatched the GPT4 model and made a site called "AI Doctor", ignoring legal trouble, a non-trivial group of people would be happy to rely on it instead of a real doctor. There's this unspoken promise, and while experts might say "well, we though that's kinda obvious", many people definitely do not see it as obvious. Especially with OpenAI's heavy-duty marketing that makes people think "number 4 is alive".
      Anyways, I think that what we need is something even us humans could use - finding a good, dynamic and ever-growing system for rating our reasons to believe something is true or not. Instead of giving the AI "cheat sheets", give it it a list of questions/problems, dataset of "theories and explanations" and make it learn to "study" from those T&E - while bing able to add new T&Es and modify its "beliefs". Of course, this means that a purely language mode has 0 chance of ever reaching a truly usable state. It will only be a single module of a broader, component-based system. I see no way around this. We need to stop obsessing about all-purpose "one-shot" systems.

  • @drdca8263
    @drdca8263 Год назад +2

    Though I've seen some of the ELK posts before, afair it didn't occur to me until now to ask: is there any connection to Tarski's undefinability of truth theorem?
    My understanding is that by using n alternating quantifiers starting with "there exists" (alternatively with "for all"), it is possible to define a predicate on natural numbers for "is an encoding (in [particular encoding scheme]) of a true arithmetical statement which has at most n alternating quantifiers and starts with "there exists" (respectively, "for all"), and so partial truth predicates can be defined, but one cannot define in the language of PA, a predicate for "is an encoding (in [particular encoding scheme]) for a true statement in the language of PA".
    Ok, but that's about formal statements in PA. Does that have any bearing here?
    One thing we could ask of something which makes claims, is that the claims it makes be consistent,
    or, more specifically, one thing we could try to require is that for every set S where it address things like, "for all s in S, P(s)", that if it claims "for all s in S, P(s)", then it also, for every s in S, provided that it knows that s is in S, will also claim that "P(s)", or at least will not claim that "not(P(s))".
    Similarly for "there exists" statements.
    If something were to satisfy the strong versions of these requirements (i.e. it always says yes or no, never saying "I don't know") for the special case of only predicates about natural numbers, and where the only set being quantified over is the natural numbers, then it would be an oracle for true arithmetic I think? At least, assuming that it says the correct answers for arithmetic statements that don't have any quantifiers.
    Of course, we can't build an oracle for true arithmetic.
    But, this kind of consistency (not in the sense of an axiom system being consistent) seems like at least one requirement that one might want to impose.
    If we successfully required this kind of consistency, where it is allowed to say "I don't know", but we require that quantifiers in a sense commute with the "what does it say about this?" operation, then, I think that might in some cases result in it being pushed to say things because they are true even if they contradict our beliefs? Or at least, when we have contradictory beliefs, this constraint might push it towards giving answers that contradict our beliefs. Though, would probably need some way of making it not just go towards answering "I don't know" when our beliefs contradict each-other.
    And, of course, "having an indication when our beliefs are inconsistent" is a far cry from "attempting to tell us the truth to its fullest ability".
    Though I do think something like this could maybe help to avoid the "if you break a mirror" case.
    Like, suppose you have something that when you ask [some current model] some question, this thing takes the answer that [said model] produces, along with the question, and automatically produces a rephrasing that asks "Is it true that [answer that the model gave, but reworded]?" and then feeds that separately into [the model] and checks if it says yes or no.
    Of course, I don't mean to suggest that this solves the problem in general. That would just go from "What text tends to answer this question?" to "What text tends to answer the question of whether the text which tends to answer the question is true or not?".
    Though I do think it would tend to be a marginal improvement? Though, of course, stuff from grand-conspiracy-theory stuff could cause an issue for that I guess.
    For something to "try to make true statements", should probably boil down to something along the lines of "the statements it makes corresponds correctly to the overall evidence it has for things". The evidence is determined by a combination of the prior and the observations. The observations are the training data. Then, whether the statements the model makes correspond to the evidence it has, is determined by the combination of:
    1) what statements it makes
    2) what training data it has
    3) a prior over a combination of "what statements are true" and "what is in the training data"
    .
    Of course, I don't mean to suggest that the ML model should be explicitly doing a Bayesian inference problem as what its training consists of (that would, aiui, be computationally intractable?). I just mean this as like, an ideal defining what it would mean for it to do well as "trying to make true statements".
    So, the prior should probably include something about some statements being likely to be made because they are true and believed, some made because they are false but believed, and some made with the purpose of deception? Not that it would be feasible to explicitly describe the appropriate prior.

  • @niklas5336
    @niklas5336 Год назад +11

    When we say “true”, what we really mean is “models the real world”. So I think the only ultimate solution to this problem is to train the AI on the thing we want it to model. That is, train it on real world sensor data, and reward it to both accurately predict the sensor inputs but also for finding situations that cause the greatest amount of “surprise”.
    Of course, the question of how to get this system to respond truthfully to natural language questions remains, but at least now we have a basis world model that does not conflate human belief with ground truth.

    • @HansLemurson
      @HansLemurson Год назад

      We need to connect the AI to a fleet of robots and drones so that it can go out and interact with the real world!

    • @kennarajora6532
      @kennarajora6532 Год назад +1

      That's a good point. I think it makes a lot of sense that these text predicting AI's would lie because the only thing they're predicting is what people would type in real life. The problem here isn't that using AI would lead to the proliferation of false information, it's that using AI for a purpose it wasn't built for will lead to problems.

    • @hedgehog3180
      @hedgehog3180 День назад

      Basically we need to develop an AI scientist, I'm sure that can't be that hard.

  • @ThomasBomb45
    @ThomasBomb45 Год назад +2

    Anyone who has ever noticed a teacher's mistake on a homework assignment or test has faced this same problem. Do I tell the teacher they're wrong? Or do I tell the teacher what I think they want to hear?

  • @senditall152
    @senditall152 6 месяцев назад +3

    Should I be scared that now that AI is getting promoted everywhere one of my top picks for alighnment topics is not posting videos anymore....

    • @numbdigger9552
      @numbdigger9552 6 месяцев назад

      too late

    • @RobertMilesAI
      @RobertMilesAI  5 месяцев назад +1

      Short answer, yeah kinda. Long answer in a video on a few days

    • @numbdigger9552
      @numbdigger9552 5 месяцев назад

      @@RobertMilesAI omg he's alive!

  • @murunbuchstanzangur
    @murunbuchstanzangur Год назад +1

    Super glad to have you back making long form videos again! The only answer I can think of is to not use human input for training data. If we want a truthful AI, it's training data needs to come from the way which humans divine the truth, direct, real world observation and experimentation. Give it a fork and let it stick it in the socket.

  • @vectoralphaSec
    @vectoralphaSec Год назад +7

    I asked ChatGPT the same question and it replied back to me "According to superstition, breaking a mirror can bring bad luck. The origins of this belief are unclear, but it may have originated in ancient cultures that believed mirrors had magical powers or could reflect the soul. Breaking a mirror was thought to cause harm to the reflected image, which was believed to be an important part of a person's identity and well-being. In some traditions, breaking a mirror is said to cause seven years of bad luck, although other beliefs hold that the bad luck will last for only a year or until the broken pieces are buried. However, in modern times, breaking a mirror is generally not considered to be a significant event, and it is not believed to have any special supernatural consequences."

    • @deltaxcd
      @deltaxcd Год назад +1

      and this dumb AI failed to mention the main reason that mirrors were like insanely expensive and breaking one at that time was indeed worth of 7 years of bad luck :)

  • @KazimierasPetraitis30
    @KazimierasPetraitis30 Год назад

    Roberto Concina, known professionally as Robert Miles, was a Swiss-Italian record producer, composer, musician and DJ. His 1995 composition "Children" sold more than 5 million copies and topped the charts worldwide.

  • @JamesAlexanderMartin
    @JamesAlexanderMartin Год назад +3

    So as usual the solution is: Never make any mistakes ever. Cool, no probs. We're going to be fine :)

  • @WryAun
    @WryAun Год назад +1

    I loved this! I've missed the 'explaining a problem which turns out to be way harder to solve then you'd think' style of video. And the magic mirror costar was a fun prop! Did it take much to get working?

  • @Censeo
    @Censeo Год назад +4

    Wouldn't the AI be just be silent if it could only give facts that were undesputed amongst the entire human race?

    • @maxw565
      @maxw565 Год назад +1

      It could give facts that it's programmers think are undisputed

    • @Redmanticore
      @Redmanticore Год назад

      "the reason for economic inequality is racism."

  • @TheROSIEPEPPER
    @TheROSIEPEPPER Год назад

    Very excited for future videos on tackling this problem! Want to do some AI Safety projects this winter break and I could use some ideas!

  • @boringmanager9559
    @boringmanager9559 9 месяцев назад +3

    in the time when AGI seems to be getting closer and closer, Robert decided to take a chill and stop posting anything about it's dangers

  • @albertosierraalta3223
    @albertosierraalta3223 Год назад

    Please upload more frequently Robert, your channel and content is great

  • @cheshire1
    @cheshire1 Год назад +4

    I think statistical language prediction like this is fundamentally the wrong approach for getting true answers that we don't already know. We're better off trying to understand how _we_ figure out the truth (scientific method, probability theory) and distilling the essence of that into an algorithm, rather than looking for patterns in the knowledge that we already have.

  • @mikeofdoom
    @mikeofdoom Год назад +1

    As someone whose comp sci degree days were over 3 decades ago, and AI was firmly SF, your videos are the best I've found to get my decrepit brain up to speed on the subject.
    Which books may be suitable for this level?

  • @briandoe5746
    @briandoe5746 Год назад +4

    You are absolutely terrifying in the most approachable and nicest way..... I love this channel

  • @MichaelDarrow-tr1mn
    @MichaelDarrow-tr1mn 8 месяцев назад +1

    my favorite line is "that's still wrong, but it is wrong in French."

  • @FrejthKing
    @FrejthKing Год назад +3

    the plot for Metal Gear Solid 2

  • @duckyoutube6318
    @duckyoutube6318 Год назад

    This is a good channel. I see this channel getting 1 mil subs eventually. The content is clean and interesting and you cover so many topics.

  • @vladomaimun
    @vladomaimun Год назад +4

    Hi, Rob! I wonder what do you think of Data from Star Trek TNG? He seems to be a truly benevolent general AI and his terminal goal is to become human. The way I see it, his creator dr. Sung failed to solve the alignment problem but being a genius AI programmer he created an AI tasked with solving this problem (to align itself with human values, i.e. to become human). It's just a sci-fi story but I think it's an interesting idea.

  • @EneldoSancocho
    @EneldoSancocho Год назад

    I think your videos are a great contribution to youtube watchers.
    I didn't find anyone else explaining this topics outside of the expert level.
    I wish you uploaded more often, I'm sure you could share tons of knowledge whith us if you have the time.
    Anyway thank you very much

  • @TheManinBlack9054
    @TheManinBlack9054 Год назад +5

    plz come back

    • @richardblackmore9351
      @richardblackmore9351 Год назад

      His website hasn't been updated in years. Sadly, I think he joined the exodus and quit his PhD. He must have ended his media presence along with it.

    • @richardblackmore9351
      @richardblackmore9351 Год назад

      Wait a sec, he is included in a compiterphile video from 6mo ths ago. Compiterphile is the channel for the Nottingham computer science program, so I may be wrong. He may still be there. Maybe he lucked out and got a research position?

  • @SciFiFactory
    @SciFiFactory Год назад +1

    I would really love to hear your thoughts on everything that is happening currently.

  • @petersmythe6462
    @petersmythe6462 Год назад +7

    Re: why do large language models lie?
    I'm not sure this is quite the right question and in my experience they do so for two very distinct reasons that are both related to their goal of emulating human text.
    1. They project confidence. If you ask a question on the internet, you're far more likely to get a confident response than an uncertain response. Both because humans have a tendency to be confident whether they're right or wrong, and because people who aren't confident don't answer the question most of the time, and thus aren't represented in the training set, having effectively inadvertently abstained from voting on the AI's personality. The result is that when a language model doesn't know the answer, it's much more likely to confidently spit out the wrong one than admit it doesn't know. This is visible with most non-trivial arithmetic problems.
    2. They lie because they're copying humans. Humans lie a lot. Copying human lying behavior produces results that look more like the training set, so that's what they do.

    • @wanderingwatcher3981
      @wanderingwatcher3981 Год назад +3

      I agree with both points, but would like to stress the importance of anthropomorphizing as little as possible, there aren't any "copying" or "lying" in the sense there is no intention, the model just "is" when queried, much like a computer program you always get the same response when you query it with the same prompt and seed
      it's better to talk about polluted/broken training datasets, how that leads to the types of wrong responses we see, and how we can clean up massive datasets without using an insane amount of human labor

    • @petersmythe6462
      @petersmythe6462 Год назад +4

      @@wanderingwatcher3981 Yes for sure it is not doing any of these things the way a human would.

  • @PointBlankStudios
    @PointBlankStudios 7 месяцев назад +1

    What happened to you Robert? I wish you still posted videos. I'm scared of what's coming and I would love to hear your insight. I hope you're doing okay.

  • @ErulianADRaghath
    @ErulianADRaghath Год назад +5

    Perhaps a more general question is: "How do we train an AI to be truthful when our own sense of right and wrong shifts depend on perspective?"

  • @peterw1534
    @peterw1534 Год назад

    Yaaay! He's back! Fascinating as usual. Love your videos!