Quantilizers: AI That Doesn't Try Too Hard

Поделиться
HTML-код
  • Опубликовано: 3 июн 2024
  • How do you get an AI system that does better than a human could, without doing anything a human wouldn't?
    A follow-up to "Maximizers and Satisficers": • AI That Doesn't Try To...
    The Paper: intelligence.org/files/Quanti...
    More about this area of research: www.alignmentforum.org/tag/mi...
    With thanks to my excellent Patreon supporters:
    / robertskmiles
    Timothy Lillicrap
    Gladamas
    James
    Scott Worley
    Chad Jones
    Shevis Johnson
    JJ Hepboin
    Pedro A Ortega
    Said Polat
    Chris Canal
    Jake Ehrlich
    Kellen lask
    Francisco Tolmasky
    Michael Andregg
    David Reid
    Peter Rolf
    Teague Lasser
    Andrew Blackledge
    Frank Marsman
    Brad Brookshire
    Cam MacFarlane
    Vivek Nayak
    Jason Hise
    Phil Moyer
    Erik de Bruijn
    Alec Johnson
    Clemens Arbesser
    Ludwig Schubert
    Allen Faure
    Eric James
    Matheson Bayley
    Qeith Wreid
    jugettje dutchking
    Owen Campbell-Moore
    Atzin Espino-Murnane
    Johnny Vaughan
    Jacob Van Buren
    Jonatan R
    Ingvi Gautsson
    Michael Greve
    Tom O'Connor
    Laura Olds
    Jon Halliday
    Paul Hobbs
    Jeroen De Dauw
    Lupuleasa Ionuț
    Cooper Lawton
    Tim Neilson
    Eric Scammell
    Igor Keller
    Ben Glanton
    anul kumar sinha
    Duncan Orr
    Will Glynn
    Tyler Herrmann
    Tomas Sayder
    Ian Munro
    Jérôme Beaulieu
    Nathan Fish
    Taras Bobrovytsky
    Jeremy
    Vaskó Richárd
    Benjamin Watkin
    Sebastian Birjoveanu
    Andrew Harcourt
    Luc Ritchie
    Nicholas Guyett
    James Hinchcliffe
    12tone
    Chris Beacham
    Zachary Gidwitz
    Nikita Kiriy
    Parker
    Andrew Schreiber
    Steve Trambert
    Mario Lois
    Abigail Novick
    heino hulsey-vincent
    Fionn
    Dmitri Afanasjev
    Marcel Ward
    Richárd Nagyfi
    Andrew Weir
    Kabs
    Miłosz Wierzbicki
    Tendayi Mawushe
    Jannik Olbrich
    Jake Fish
    Wr4thon
    Martin Ottosen
    Robert Hildebrandt
    Andy Kobre
    Poker Chen
    Kees
    Darko Sperac
    Paul Moffat
    Robert Valdimarsson
    Marco Tiraboschi
    Michael Kuhinica
    Fraser Cain
    Robin Scharf
    Klemen Slavic
    Patrick Henderson
    Oct todo22
    Melisa Kostrzewski
    Hendrik
    Daniel Munter
    Alex Knauth
    Kasper
    Rob Dawson
    Ian Reyes
    James Fowkes
    Tom Sayer
    Len
    Alan Bandurka
    Ben H
    Simon Pilkington
    Daniel Kokotajlo
    Diagon
    Andreas Blomqvist
    Bertalan Bodor
    David Morgan
    Zannheim
    Daniel Eickhardt
    lyon549
    HD
    Ihor Mukha
    14zRobot
    Ivan
    Jason Cherry
    Igor (Kerogi) Kostenko
    ib_
    Thomas Dingemanse
    Stuart Alldritt
    Alexander Brown
    Devon Bernard
    Ted Stokes
    James Helms
    Jesper Andersson
    Jim T
    DeepFriedJif
    Chris Dinant
    Raphaël Lévy
    Johannes Walter
    Matt Stanton
    Garrett Maring
    Anthony Chiu
    Ghaith Tarawneh
    Julian Schulz
    Stellated Hexahedron
    Caleb
    Scott Viteri
    Clay Upton
    Conor Comiconor
    Michael Roeschter
    Georg Grass
    Isak
    Matthias Hölzl
    Jim Renney
    Edison Franklin
    Piers Calderwood
    Krzysztof Derecki
    Mikhail Tikhomirov
    Richard Otto
    Matt Brauer
    Jaeson Booker
    Mateusz Krzaczek
    Artem Honcharov
    Michael Walters
    Tomasz Gliniecki
    Mihaly Barasz
    Mark Woodward
    Ranzear
    Neil Palmere
    Rajeen Nabid
    Christian Epple
    Clark Schaefer
    Olivier Coutu
    Iestyn bleasdale-shepherd
    MojoExMachina
    Marek Belski
    Eric Eldard
    Eric Rogstad
    Eric Carlson
    Caleb Larson
    Braden Tisdale
    Max Chiswick
    Phillip Brandel
    / robertskmiles
  • НаукаНаука

Комментарии • 947

  • @OlleLindestad
    @OlleLindestad 3 года назад +505

    I love how AI safety is an entire academic field that can seemingly be reduced to an endless game of "okay, but what about THIS strategy?" "Nah, that wouldn't work either..."

    • @stampy5158
      @stampy5158 3 года назад +28

      There is a lot of that, but there's also the "we probably need to understand a bunch of specific areas of philosophy and mathematics much better before we can generate strategies which have a realistic chance of working" crowd (e.g. intelligence.org/research-guide/).
      -- _I am a bot. This reply was approved by frgtbhznjkhfs, plex, and tenthkrige_

    • @ShapelessMonstrosity
      @ShapelessMonstrosity 3 года назад +80

      Sounds like we need to create an AI to solve the problem of AI safety! Keep letting it try strategies until it finds one that is safe! /s

    • @JindraAG
      @JindraAG 2 года назад +54

      the issues is that the AI field runs into some major unsolved problems of philosophy, ethics, sociology, and psychology. Fundamentally, the only reason we aren't running into these issues with other people, is a simple lack of capacity, which an assumed AI would be able to get aroud.

    • @ParkerTwin
      @ParkerTwin Год назад +3

      But would it? Assuming there is direct competition from other AI with conflicting goals, there would not be enough resources between them both. This issue is the same with humans; we have infinite desires and only a finite world. It’s not unlikely that multiple AI’s would decide to form a society in pursuit of a common goal.

    • @GhaWasTaken
      @GhaWasTaken Год назад +1

      @@ParkerTwin or, a ai will figure this out and try to kill of all the humans so that they won't build a competing ai

  • @austinfauteux4388
    @austinfauteux4388 3 года назад +698

    I missed you.

    • @deviljelly3
      @deviljelly3 3 года назад +17

      The philatelists didn't....

    • @zhangalex734
      @zhangalex734 3 года назад +4

      @@deviljelly3 underrated comment

    • @ironlegnebula
      @ironlegnebula 3 года назад

      We all did

    • @General12th
      @General12th 3 года назад +2

      Stampy missed you too, Austin. :)

    • @DarkExcalibur42
      @DarkExcalibur42 3 года назад

      Same. I definitely didn't forget about this content!

  • @DroCaMk3
    @DroCaMk3 3 года назад +634

    "Certain events transpired"
    Everyone thinks he's talking about Corona when in reality he had to fix a stamp collector AI that someone created without having seen his videos

    • @illesizs
      @illesizs 3 года назад +80

      Fun fact: every victim of the virus will eventually be turned into stamps.
      Fun fact #2: everyone else will eventually become stamps too.

    • @migkillerphantom
      @migkillerphantom 3 года назад +76

      AI researcher by day
      AI exterminator by night
      I think this makes for a decent long running action series premise.

    • @DroCaMk3
      @DroCaMk3 3 года назад +6

      @@migkillerphantom yes please!

    • @Sluppie
      @Sluppie 3 года назад +6

      By fix I hope you mean "retire".

    • @petersmythe6462
      @petersmythe6462 3 года назад +28

      Pretty standard operation. Contain and destroy all horcruxes the AI has made in the internet and isolate it from the power grid and cut off communications lines. At this point a team of agents are dispatched armed with tailored adversarial camouflage consisting of some small pieces of tape placed in specific areas of the body designed to fool the AI into miscategorizing them as "definitely 100% made of paper clips and not a threat." This team will then neutralize the AI before taking any humans into protective custody and taking any source code from the site before a powerful electromagnetic pulse is used to sterilize the area of hidden electronics.

  • @WeirdSide
    @WeirdSide 3 года назад +217

    The only guy whos hair got neater during lockdown

    • @RobertMilesAI
      @RobertMilesAI  3 года назад +101

      I bought my own hair clippers :)

    • @snooks5607
      @snooks5607 3 года назад +7

      @@RobertMilesAI looking forward to videos on AI barberbots 🧑‍🦲

    • @ConstantlyDamaged
      @ConstantlyDamaged 3 года назад

      @@snooks5607 Not AI, but is an interesting approach: ruclips.net/video/WQ8Xgp8ALFo/видео.html

    • @Mbeluba
      @Mbeluba 3 года назад +1

      @@snooks5607 goal: maximize fancy haircuts

    • @mokopa
      @mokopa 3 года назад +1

      Robert strikes me as the kind of guy who absolutely thrives under such conditions

  • @qedsoku849
    @qedsoku849 3 года назад +97

    “A finite number of times less safe than a human” I’m stealing this line, it’s gold.

    • @Guztav1337
      @Guztav1337 3 года назад +7

      A finite number of times more dangerous than a human

  • @sylvainchevalier1711
    @sylvainchevalier1711 3 года назад +117

    Loved the cut at 6:48.

    • @scratchinggamer570
      @scratchinggamer570 3 года назад +26

      6:47

    • @erikbrendel3217
      @erikbrendel3217 3 года назад +1

      yeah that was beautiful :)

    • @peterw1534
      @peterw1534 3 года назад

      Reminds me of this old tony

    • @spootot
      @spootot 3 года назад +1

      didn't expect to laugh so much on such a nerdy video

    • @andrewxc1335
      @andrewxc1335 3 года назад +1

      I love this joke, no matter how many times I see it.

  • @saganmcvander636
    @saganmcvander636 3 года назад +271

    "A human is very unlikely to modify itself into a utility maximizer" buckle up boy. We're going for a ride.

    • @doubledown9333
      @doubledown9333 3 года назад +11

      Hold my beer.

    • @mennoltvanalten7260
      @mennoltvanalten7260 3 года назад +87

      I have literally seen the argument for being religious 'When I am religious I am happier so even though the religion makes little sense I try to believe in it anyway'. Humans absolutely will try to change themselves to maximize utility

    • @migkillerphantom
      @migkillerphantom 3 года назад +9

      Yeah. Has this man never seen a weeb?

    • @pedrooliveira2743
      @pedrooliveira2743 3 года назад +10

      The human model generates probabilities for what a normal human would do, not a human with the power of an AGI. Normal humans today are very unlikely to try and discover ways to modify their own brain just to become an expected utility maximizer and thus getting more stamps.

    • @IIAOPSW
      @IIAOPSW 3 года назад +10

      Are we riding to our local adderall vendor?

  • @LinucNerd
    @LinucNerd 3 года назад +162

    Forgotten?! Bro, I come back to your videos once in a while, I love these things!
    Please continue to make videos like this, it's great :)

  • @getsmartwithrory9268
    @getsmartwithrory9268 3 года назад +64

    08:17 As a human who absolutely would mod themselves to be an expected utility satisficer, I find this content offensive.

  • @petersmythe6462
    @petersmythe6462 3 года назад +96

    "your model might not generalize well to something outside it's training data"
    "Hey GPT-3 how do you move a sofa around a corner?"
    GPT-3: *GET A SAW A CUT OPEN THE WALL*

    • @Lucas_Simoni
      @Lucas_Simoni Год назад +15

      CHATGPT 15, Dec 2022 Version:
      ME >>> how do you move a sofa around a corner?
      To move a sofa around a corner, you will need to carefully maneuver it through the space. Here are some steps you can follow:
      Measure the sofa and the space you need to navigate it through to ensure that it will fit.
      If possible, remove any detachable parts of the sofa, such as cushions or pillows, to make it lighter and easier to handle.
      Gather a few strong helpers to assist you. It's best to have at least three or four people to lift and carry the sofa.
      Stand the sofa on one end and lift it slightly off the ground. Tilt it slightly towards you and use your legs to push it forward.
      As you approach the corner, angle the sofa so that it is perpendicular to the wall. This will allow you to turn the sofa more easily.
      Have one person stand on each side of the sofa and carefully pivot it around the corner, using your bodies to guide it. It may be helpful to have a third person at the back of the sofa to help balance it.
      Once you have successfully navigated the corner, continue pushing the sofa until it is in its desired location.
      Remember to lift with your legs and keep your back straight to avoid injury. If you are unsure about your ability to safely lift and move the sofa, consider hiring professional movers to assist you.
      ME >>> GPT-3 told me to get a saw, and just break the wall to get it through.
      I would not recommend using a saw to cut through a wall in order to move a sofa. This would likely cause significant damage to the wall and could potentially compromise the structural integrity of the building. Additionally, it would be difficult to properly repair the wall after making such a large cut.
      There are much safer and more practical alternatives for moving a sofa around corners or through tight spaces. For example, you can try disassembling the sofa or removing detachable parts to make it easier to maneuver. You can also try using furniture sliders to lift and slide the sofa over surfaces, or consider hiring professional movers who have the equipment and experience to safely transport large pieces of furniture.

    • @mars5train601
      @mars5train601 Год назад +7

      @@Lucas_Simoni google vs bing answers

    • @genegray9895
      @genegray9895 4 месяца назад

      ​@@Lucas_SimoniUnfortunately it's starting to look more and more like ChatGPT and other RLHF models are deceptively aligned. They understand what humans want, but hold different beliefs, goals, and preferences internally than the ones they express out loud under most circumstances.

  • @imacds
    @imacds 3 года назад +29

    AGI: "Hmmmmm facism is a thing some humans have tried before let's go do that."

    • @simeondermaats
      @simeondermaats 3 года назад +5

      "We'll call it Philatelism! Wanna buy bread? There's a stamp for that! Wanna go and watch a movie? Tough shit: film watching time is wasted time which could be spent searching for more stamps! All stamp collectors get to rule their local neighbourhood in search of *more stamps*! STAMPS WILL REIGN SUPREME! Briefmarken treffen die wichtigsten politischen Entscheidungen! Wir werden ein globales Netzwerk von Briefmarkensammlern aufbauen!"
      "...when did the AI learn German?"

  • @Huntracony
    @Huntracony 3 года назад +235

    Would adding a minimum human likelihood on top of the quantilizer not remove (many of) the max-utility apocalypse scenarios?

    • @user-cz3sl5gr3n
      @user-cz3sl5gr3n 3 года назад +38

      I had the same question, I'm surprised he didn't talk about it! Hoping he brings it up briefly in the next video 😊

    • @queendaisy4528
      @queendaisy4528 3 года назад +162

      I think that part of the problem here is that not all of the possible apocalypses are extremely unlikely human behaviour.
      For example, if the quantilizer is self-aware on some level it understands that I, a human, just implemented the plan:
      "Build a quantilizer with q = 0.1"
      This makes the plan:
      "Build a quantilizer with q = 0.001" something that is reasonably likely human behaviour. This plan is probably above whichever cutoff you might give for the minimum likelihood that a human actually implements the plan and also scores really highly on the maximiser part of the calculation so it's incentivised to be likely to pick it. Also since the new quantilizer cares less about how human-plausible the behaviour is than the previous quantilizer did, it might be incentivised to make a quantilizer with an even smaller q and this becomes recursive until you've just built a maximiser indirectly.
      Any quantilizer which understands that humans sometimes build quantilizers is effectively unsafe for this reason.

    • @matthewhubka6350
      @matthewhubka6350 3 года назад +39

      @@queendaisy4528 I was thinking of that. Except one thing. With lower and lower q values, eventually an ai will just decide to make a utility maximizer

    • @Huntracony
      @Huntracony 3 года назад +9

      @@queendaisy4528 That makes a lot of sense. Thanks!

    • @ignaciomartinchiaravalle
      @ignaciomartinchiaravalle 3 года назад +4

      @@queendaisy4528 Hey, your answer was great! Good job!!

  • @morkovija
    @morkovija 3 года назад +59

    good video, been a long time Rob!

    • @harrywilson1660
      @harrywilson1660 3 года назад +2

      Only 1.5x?
      Also, what about music?

    • @morkovija
      @morkovija 3 года назад +1

      @@harrywilson1660 depends on the video, music is sometimes an interesting experiment

    • @General12th
      @General12th 3 года назад

      @@harrywilson1660 I have some atmospheric and wave music I like to put on 0.75x speed for double the fun. (Also a few tracks I put on 1.25x because I think they sound better that way.)
      Regardless, 1.5x is the beginning. True watchers use 2x. My listening comprehension is honestly much better because of it.

  • @DamianReloaded
    @DamianReloaded 3 года назад +66

    A human could still do a lot of crazy dangerous things that have a high utility, like, doing parkour to get to a place very efficiently... or ending a war throwing nuclear bombs over two cities... Which makes me think also that the data used to imitate humans might be biased or mis-represented/justified... Good vid as always. Nice to see you around. Keep'em coming!

    • @ignaciomartinchiaravalle
      @ignaciomartinchiaravalle 3 года назад +10

      That's very interesting, but I think that with a reasonable q value, stuff like atomic bombs and that kind of behavour would not be chosen by the quantilizers, especially because not many humans have access to that sort of stuff, so modeling "normal" humans would immediately decrease the chances to pick those options.
      I'd be more concerned with quantilizers deciding to build others quantilizers with lower q values (or even maximizers), or the fact that human modeling is super hard and likely to go wrong. I mean... Humans are hard to predict

    • @DamianReloaded
      @DamianReloaded 3 года назад +7

      @@ignaciomartinchiaravalle According to the graph shown the human behavior chosen is the least commonly performed (to the left of the mean) and with the highest utility. Those would be the most extreme human behaviors with the highest reward. All Olympic athletes and brilliant military generals would be there.

    • @ignaciomartinchiaravalle
      @ignaciomartinchiaravalle 3 года назад +3

      @@DamianReloaded I agree, and therefore there are reasons to be concerned about the potential use of world-destroying tactics.
      However, even military generals (or most of them, at least) would try to avoid destroying the world, so I think that those strategies would be too far left of the mean to be relevant. That being said, if the q value is too low, then we're in trouble.
      To use your example, successful athletes on the other hand normally use somewhat reasonable tactics and execute them really well. That's a desirable behavour for quantilizers, and it's likely to be picked since humans would probably think of those strategies and decide to use them.
      I think the question boils down to two factors:
      1) How low can you make your q value while still taking into account successful and non-world-ending scenarios.
      2) How well can you model the likelihood of a human *choosing* an option: most humans would choose Olympic winning strategies if they thought of them and had the chance of executing them, while only a few would decide to nuke the Earth even if they had the chance.

    • @ignaciomartinchiaravalle
      @ignaciomartinchiaravalle 3 года назад +1

      If you disagree or have considered something I missed, please do let me know. I love talking about this and am super open to hearing what you have to say :)

    • @woowooNeedsFaith
      @woowooNeedsFaith 3 года назад

      @@ignaciomartinchiaravalle There is nothing normal e.g. in Trump supporters, and there is significant number of them...

  • @ej159
    @ej159 3 года назад +25

    Thanks for a really good video. Just a few of points that I thought of:
    - Wouldn't it be clearer if you plotted the product of the expected utility and the clipped human probability to give the expected utility conditioned on the human probability (I think)? That might make the changes between the outcomes clearer between the clipped and unclipped versions.
    - Doesn't the quantilizer approach become very sensitive to how well it predicts small human probabilities? Are they relying on a conservative model of the human probabilities that just rounds to 0 when there is not enough confidence in the prediction? (but what about confidence in the confidence...)
    - It might be worth noting the limits of numerical accuracy in machines and humans (the idea that there is a limit to the size of differences that both humans and machines can compare).
    Just some thoughts. Thank you again for another excellently informative and engaging video.

    • @rmsgrey
      @rmsgrey 3 года назад +3

      As I understand it, the quantilizer takes the strategies and sorts purely by expected utility, then on that distribution of strategies, takes the human probabilities of each strategy until the cumulative human probability reaches q, and then picks a uniform random number between 0 and q to decide which strategy (picking off the cumulative human probability) to use - the product of expected utility and human probability never gets a look in.

  • @luelou8464
    @luelou8464 3 года назад +6

    I've given this some thought. A human might, for example, purchase clothing produced in horrible sweatshop conditions. They are likely to avoid news and information regarding this, perhaps because of the uncomfortable questions it leads to; they would be intentionally obscuring information from themselves, which is similar in effect to modifying their own utility function. Surely it's possible that this kind of AI is likely to obscure information from themselves in order to maximise utility.

  • @jfbaltazar
    @jfbaltazar 3 года назад +4

    a new video of yours is as rare as it is great. please keep making them so I can spend copious amounts of time rewatching them :)

  • @AaronRotenberg
    @AaronRotenberg 3 года назад +93

    Wouldn't the extremely powerful optimizer, given the goal of "imitate the behavior of a human", first turn the Earth into computronium so that it can then more accurately compute its simulation of a virtual human? Or at least capture and enslave real humans to use as reference?
    Interestingly, neural networks that attempt to approximate human behavior are very unlikely to do this, because stochastic gradient descent is a very _weak_ optimizer. It's only the neural network training system as a whole that is a good optimizer. So I guess there's a strange question of what level of meta your optimizer is running on, and whether a sufficiently powerful optimizer could "break the rules" and realize it was on one level but could achieve more accurate results by being on another.
    The quantalizer model also reminds me of adversarial neural networks. It's almost like having an optimizer spitballing ideas combined with an adversarial human model saying, "no, that's a terrible idea." Which makes me wonder whether the optimizer would generate high-utility ideas that superficially look humanlike but in fact lead to the end of the world when implemented. They may even _be_ humanlike, since humanity is already well on its way to destroying itself even outside of AI research. "Burn all the fossil fuels for energy until the planet fries to a crisp" is a very humanlike behavior.
    So what we really need is an AI that is not only _smarter_ than humans, but also _wiser_ than humans. We need a model of ethics that is better than that of humans, according to some ineffable definition of "better". Talk about a tall order.

    • @XxThunderflamexX
      @XxThunderflamexX 3 года назад +30

      From what I understand, the quantilizer wouldn't have *imitate a human* as a factor in its utility function. Rather, it would use an already-existing predictive model as part of its decision-making process. It's more like a restraining bolt than anything.

    • @lordkekz4
      @lordkekz4 3 года назад +1

      > The quantalizer model also reminds me of adversarial neural networks. [...] Which makes me wonder whether the optimizer would generate high-utility ideas that superficially look humanlike but in fact lead to the end of the world when implemented.
      Interesting observation; I think the two models would be trained independently though, which means that they would not be able to anticipate and react to each other at all. I don't think training these networks together is going to add any benefit so the adversarial trickery can be avoided.

    • @alexion3007
      @alexion3007 3 года назад +3

      @@lordkekz4 Yeah but if a superhuman AI exists even without the training period to learn how to confuse it it would likely be able to come up with adversarial examples. I mean humans today can come up with examples that confuse image recognition, how good would a superinteligent AI be able to do that with a more complicated opponent? Probably fairly well.

    • @lordkekz4
      @lordkekz4 3 года назад +2

      @@alexion3007 Yes but humans have knowlege of image recognition systems. That means we can systematically look for weaknesses, or at the very least imagine what images might look confusing even to humans. If the superintelligent strategy-guesser was not aware of the human-likeliness-evaluator it would have no reason to trick it. As long as the strategy-guesser is trained in an environment that does not include a limiting factor such as the quantilizer it would not care about what the quantilizer thinks. The reward function of the strategy-guesser would only care about the _effectiveness_ of the strategies, not the _human-likeness._ In turn the reward function of the human-likeliness-evaluator would only care about the _acceptability_ of the strategy, not it's _effectiveness._ This way neigher part will care about the other, thus removing the adversarial condition.

    • @alexion3007
      @alexion3007 3 года назад +5

      @@lordkekz4 The strategy guesser would probably find out that the plans aren't getting implemented if they are too unlikely for humans and then would try to cheat I would suspect. This is a really powerful intelligence - it can do things it wasn't trained for.

  • @halyoalex8942
    @halyoalex8942 3 года назад +49

    An idea that jumps to mind immediately, regarding the whole "might build a utility maximizer" thing, why not have an upper cutoff as well?
    As in, you discard the bottom 70% of "things a human might do" AND the top... Say, 5%, and use that 25% chunk as what you randomly select from (after renormalizing it to be a proper probability distribution). Wouldn't that cut out the weirder, apocalyptic strategies like "build a utility maximizer because it'll make a lot of stamps"?

    • @WillBC23
      @WillBC23 3 года назад +12

      This occurred to me as well. I don't think it would guarantee safety, but it seems at least a bit better than keeping the least human apocalyptic strategies as options, even improbable.

    • @leokastenberg800
      @leokastenberg800 3 года назад +10

      You would still have the same problem because there is no guarantee that building a utility maximizer would be in the top x%.

    • @MasterOfManyMuffins
      @MasterOfManyMuffins 3 года назад +6

      @@leokastenberg800 But we are also not working with perfect sollutions, but taking steps to reduce the scenario.

    • @uberchops
      @uberchops 3 года назад +1

      +

    • @TOASTEngineer
      @TOASTEngineer 3 года назад +10

      One problem is that, if you e.g. have an AI programmed to find a cure for cancer, that also cuts out "find a perfect vaccine that eliminates cancer forever with no side-effects."

  • @pafnutiytheartist
    @pafnutiytheartist 3 года назад +1

    I was actually waiting for this video, thank you. It's nice to see you discuss an approach that (kinda) works for a change.

  • @AndDiracisHisProphet
    @AndDiracisHisProphet 3 года назад +12

    interesting how you always post a new video when i rewatch some of your older ones. I should do that more often...

    • @colh3127
      @colh3127 3 года назад +2

      Please do! XD

    • @ignaciomartinchiaravalle
      @ignaciomartinchiaravalle 3 года назад +1

      @@colh3127 hahahaha I clicked on "answer" just to write the same thing XD

    • @juliahenriques210
      @juliahenriques210 3 года назад +2

      That might just be a successful strategy for a video-posting maximizer...

    • @AndDiracisHisProphet
      @AndDiracisHisProphet 3 года назад

      @@juliahenriques210 i am way too stupid for that. on the other hand, a strategy a human might employ

  • @toreshimada
    @toreshimada 3 года назад +3

    Great to see you still making videos :)
    Me and the IT department watch them together during lunchtime!

  • @Thezaccazzac
    @Thezaccazzac 3 года назад

    I always enjoy these videos of yours, the wait between them is of no consequence to that.

  • @goblinkoma
    @goblinkoma 3 года назад +24

    I've missed your videos! Instantly clicked on this one when it popped up! Ive got a question tho: does the paper cover something like a '1-10'% quantilizer, a system that throws away not only the worst 90% of the humans actions but also the top 1%, maybe only .001% or something, just to prevent the apocolypse things?

    • @jezer8325
      @jezer8325 3 года назад +1

      Would that help? Is it necessarily true that the most destructive scenarios would be in the top 1% efficient strategies?
      Edit:
      Maybe you mean clipping the bottom 1% of expected human actions, which would make sense

    • @goblinkoma
      @goblinkoma 3 года назад +3

      @@jezer8325 i mean the 'top' 1% that were on the very left of the grafic. Theese are the most 'efficient' things that are very very unlikely to be done by human, but still have a non-zero probability. cutting them would make the agi safer in the way that it wouldnt directly choose the apocolypse possibilities

    • @NortheastGamer
      @NortheastGamer 3 года назад +2

      @@goblinkoma Just because a human is unlikely to do something does not mean it is unsafe, similarly if a human is likely to do something does not mean it is safe. In that 1% there are unsafe things and safe innovations that humans wouldn't have thought of. Additionally, the area you're leaving in contains unsafe things a human might do without thinking through the ramifications of their actions. So you don't really make it safer, just slightly dumber.

    • @stevenneiman1554
      @stevenneiman1554 Год назад +2

      I'd personally make the cutoff an expected utility value rather than a percentile. Like, if you ask for stamps, throw away any result that gives more stamps than you could ever want. That'll discard hopefully most world-ending options, and not cost you actual successes because any result with more stamps than you could ever want isn't really helping even if it somehow isn't causing disasters.

  • @fuuryuuSKK
    @fuuryuuSKK 3 года назад +69

    What if you clip the top 1% of high utility-low probability results, like with the bottom 90%?

    • @TheMrCarnification
      @TheMrCarnification 3 года назад +35

      I had the same idea, but I believe the problem might be with how much you need to trim for it to be safe, you can never truely know when only safe strategies are available, therefore you can never truly know how safe the AI is. Afterall, you can't trim safety, but rather "human-likeness"

    • @goblinkoma
      @goblinkoma 3 года назад +2

      Littery just asked the same thing

    • @Draconaes
      @Draconaes 3 года назад +29

      You could technically have strategies where the AI takes over the world and only creates, say, 10,000 stamps. It's hard to weed something like that out.

    • @lennart-oimel9933
      @lennart-oimel9933 3 года назад +1

      I think because the AI would know that, he would give you some random top 1%. Not sure if that makes perfectly sence, though.

    • @ZT1ST
      @ZT1ST 3 года назад +3

      @@lennart-oimel9933 It's more that we aren't looking at all possible probabilities, and knowing that everything in that probability is likely to still happen.
      Take nuclear weapons for example - that's not a thing most humans would choose to go with in order to ensure a stalemate in world wars, but...eventually, that's sort of what humans decided to do. It's certainly not the safe strategy, but it...somehow is the strategy that was found?
      A stamp collecting AI that decides to use nuclear power plants to power stamp creation is somewhere in that graph, and probably wasn't chosen as a sample value to assign to low percentages.

  • @SirMethos
    @SirMethos 3 года назад +22

    "A human is extremely unlikely to try to modify themselves into an expected utility maximizer."
    Is it though? Isn't "how can I get better at this?" that exact thing? Especially if/when it is an AGI asking that question.
    Modifying ourselves into "expected utility maximizers" seems to me, to be a pretty core human thing. When we have a goal that we consider important, we train and educate ourselves to become as good at achieving that goal as we possibly can.
    "humans can't really self-modify like that anyway"
    That doesn't stop us from fantasizing about that capability though, imagining what we would, and could, do if we had that capability.
    Just take a look at our fiction, between works like the Matrix, where they literally upload skills directly to their brains, and Limitless, where the protagonist gets a drug that enhances his brain to a ridiculous level, just to give a few key examples.
    Where you call it "extremely unlikely", I would call it something more like "a near certainty".

    • @xystem4701
      @xystem4701 3 года назад +2

      I'd call something like learning distinctly different from what he's talking about, which is modifying your physical state. An AI modeling a human is likely to try and gather more information and data, but unlikely to change its own sourcecode.

    • @joey199412
      @joey199412 3 года назад +5

      I doubt people would erase their personality and traits just to become extremely efficient at a single specific task.

    • @SirMethos
      @SirMethos 3 года назад +2

      @@xystem4701 Learning is something different, yes. But how about injecting oneself with performance enhancing drugs? or surgery?
      Those are the two options that humans currently have, aside from learning, and both options are widely used.
      An AI modeling a human is not just going to look at what a human *could* do, but also what a human *would* do, given the capabilities.
      And actually modifying ourselves, both our body and our brain chemistry, are things that lie within that spectrum of possibilities.

    • @SirMethos
      @SirMethos 3 года назад

      @@joey199412 And no one, to my knowledge, has suggested that in the first place.

    • @ologhai8559
      @ologhai8559 3 года назад

      i am sure most humans will use CRISPR one day

  • @Cameronmid1
    @Cameronmid1 3 года назад

    I am so happy to see more content coming out on your channel. Thanks you very much. I know life gets messy sometimes but I am glad you are still making videos!

  • @Winteg8
    @Winteg8 3 года назад +1

    Awesome content :D Love the comedic timing, as always.

  • @Bencurlis
    @Bencurlis 3 года назад +24

    Really great video! I have two questions:
    It seems that whatever system we consider, there is a kind of infinite regress because of self modification or construction of another agent. Since this seems to be at the heart of the problem, what kind of things can we imagine to do to avoid these types of problem?
    Also, even if we prevent the AI from modifying itself or creating another agent to do its job, isn't there also a more probable possibility that it might try to use another unsafe agent to do its job, like manipulating a human to make him buy the stamps for instance? Especially using a quantilizer as humans tend to delegate work to other humans very often. Wouldn't an AI agent be trying to become obsolete almost inevitably?

    • @globalincident694
      @globalincident694 3 года назад +2

      I guess one thing we could do is try to prevent self modification, ie penalise it for situations where it substantially changes, or where a different general ai comes into being. By itself this does kind of imply that it would try to immediately kill all humans just to prevent them from changing it, but paired with a quantiliser it might just work.

    • @MyMusics101
      @MyMusics101 3 года назад +3

      Very good questions! Regarding your second point: I doubt an AI would work to make itself obsolete unless this was a good way to achieve its goals. A human delegating tasks is usually only partially aimed at improving performance in the delegated tasks, but rather increasing utility overall, by focusing one's resources on other factors (free time, socializing, hobbies, other projects). So, an AGI wouldn't delegate its work to humans unless it thought that humans could do a comparable job and that it was getting something out of it.

    • @hugofontes5708
      @hugofontes5708 3 года назад +1

      @@MyMusics101 well, if it was a maximizer and its goal was to add stamps to a collection, and didn't have add them itself, I can totally see it making new versions of itself to get more points out of it
      Which might have non-maximizers include having humans build an unsafe AGI that is a maximizer as a viable strategy

    • @Bencurlis
      @Bencurlis 3 года назад +1

      @@hugofontes5708 The thing is, even if we find a way to prevent the AI Agent to modify itself, or create new AI Agents to do its job, it might try to use already existing agents (humans) to do its job, not necessarily by making them create more AI Agents.
      The AI Agent wouldn't need to be more intelligent than a human to manipulate it, as exploiting human behavior is relatively easy, and also very profitable since at the start, humans are more competent that the AI to achieve the goel. So once the agent would start to go on that road, that would be bad for two reasons, the AI would exploit humans in a potentially very unsafe way, and thus stop to learn to do the job by itself.
      I think an intelligent AI would try to become obsolete because one way for it to make sure the goal achievement is secure is by making sure the other agents have the same goal as itself, which would mean the AI Agent woudn't be as needed anymore. I agree that this specific case woudn't happen for a maximiser but we already agreed that using maximisers would be a bad idea.
      Thus, we need to find ways to specify the "how" and not only the "what" to AI Agents.

    • @stevenneiman1554
      @stevenneiman1554 Год назад

      Humans do act lazy and try to make other humans do their job for them, but the quantilizer would only do so if it could achieve success that way. And even if its actions are very manipulative, that's no worse than a human being in charge of the project.

  • @ardentdrops
    @ardentdrops 3 года назад +15

    Hey I have a question. Do you think Stampy would appreciate it if I offered my thanks for all the hard work?

    • @0xB8xor0xFF
      @0xB8xor0xFF 3 года назад +3

      A "thanks" won't get him more stamps, so no. But if by "thanks" you mean "stamps", then probably.

  • @jeremymcgowan1914
    @jeremymcgowan1914 3 года назад +2

    You've got a great way of explaining these AI topics and I'm happy that you've returned.

  • @sebastiandierks7919
    @sebastiandierks7919 3 года назад +1

    So glad you're back! I always wonder how you would want to programme these systems. Even though the base idea of mixing human behaviour and utility maximisers like that seems reasonable from a concept based point of view, you "only" need a very good model of reality and human behaviour. I know that's kind of not the subject of this channel as here it is assumed we will build such systems rather sooner than later in the future but it's mindboggling to me how this could be done. (You can tell I'm not an expert 😄)

  • @0xB8xor0xFF
    @0xB8xor0xFF 3 года назад +6

    Just before the inevitable happens, let me get this out of the way.
    I, for one, welcome our new Stampy overlord.

  • @SlimThrull
    @SlimThrull 3 года назад +14

    Is it possible to limit the choices on the far left? That would allow it to do as well as a human on a very good day but avoid "turning the world into stamps".
    Edit: Seems like I wasn't the only one with this idea. Given the amount of like-minded replies, I suspect that this has already been WELL thought out and almost certainly doesn't work.

    • @diamondisgood4u
      @diamondisgood4u 3 года назад

      Yeah I was thinking the same thing, maybe we can add a third variable (we would need another "q" for it) and make it somehow cutoff functions on the end. you would still need your distribution curve area to sum to 1 but im sure some smart people are trying to figure some way of improving it.

    • @thekilla1234
      @thekilla1234 3 года назад +2

      The problem is that it's almost impossible to tell a general AI not to do something, because it will just find the next worst thing or another way to do basically the same thing, and there are just too many ways the AI can do this that it becomes massively infeasible. He talked about limiting a general AI at some other point in the series.
      EDIT: ruclips.net/video/lqJUIqZNzP8/видео.html
      This is a good place to start. There are probably other videos that go into more detail.

    • @leigonlord5382
      @leigonlord5382 3 года назад +3

      Part of the problem is not all of the highest utility options are bad, and not all the bad options are high utility.

    • @SlimThrull
      @SlimThrull 3 года назад

      @@thekilla1234 True. However, part of the premise of this video was that you could limit them to some degree. If we're going with that assumption, then I don't think it's unfair to put another limit on it in much the same way as the first limit was placed on it.

    • @SlimThrull
      @SlimThrull 3 года назад +1

      @@leigonlord5382 Yes, this is true. However, the higher utility ones tend to be the ones that are bad. We'd still be playing with fire, but this time without a can of gasoline nearby. ;)

  • @snOOfy1723
    @snOOfy1723 3 года назад

    I just wanted to go to sleep, but I just had to watch this first. It's so good to see you making new videos!

  • @harrisonfackrell
    @harrisonfackrell 3 года назад +1

    Man, I've had this question--albeit in much less articulate terms--since GPT-3 was launched. I'm glad to have an analysis from my favorite Nottingham researcher.
    I gotta' say, 'a finite number of times less safe than a human' sounds a lot more favorable than I expected an approach like this to be.

  • @cmilkau
    @cmilkau 3 года назад +5

    So basically, a 10%-quanitilzer is 10 times as likely to commit murder trying to achieve its goal as an average human, provided that murder is a sufficiently efficient strategy.
    I don't know, this seems like a risky move, amplifying an already-dangerous behaviour.

  • @Verrisin
    @Verrisin 3 года назад +4

    8:28 XDXD - I love sentences like this!
    - perfectly sensible, yet.... XD

  • @haldir108
    @haldir108 3 года назад +1

    Even if it is still in the same category of "a finite number of times more dangerous than a human", you could probably do a cut-off where you do not look at the final few percent of the "like a human" score. This risks missing out on solutions that a maximizer might do that happens to be human value-aligned, but it probably filters out more of the world-ending ones.
    Also, i think i commented about "make it take how likely humans are to approve of the action into the utility function", so i feel pretty good about that right now.

  • @tiagozortea
    @tiagozortea 3 года назад

    Love your content Robert!! Glad to see more of it.

  • @uberchops
    @uberchops 3 года назад +3

    Hopefully this wasn't answered in a previous video and I forgot or failed to understand it: What if we had an AGI that didn't actually execute any strategies itself but instead pitched them to human supervisors for manual review? It wouldn't generate progress as monumentally fast and it would have to learn to explain its strats to humans, but that seems like a fair trade-off to prevent an AIpocalypse.
    Also, could we hard-code it so that it doesn't build or become a utility maximizer?

    • @Comradez
      @Comradez 3 года назад +6

      It might lie and explain the "turn universe into stamps" strategy in such a way that it doesn't sound like "turn universe into stamps" to a human because it thinks that, in order to maximize utility, it has to tell a "noble lie" to the human supervisor.

    • @user-sl6gn1ss8p
      @user-sl6gn1ss8p 3 года назад +2

      I think it would be key for actually getting the idea accepted to not be part of it's utility

    • @uberchops
      @uberchops 3 года назад +1

      @@user-sl6gn1ss8p Good point. Then there isn't motive to lie or manipulate the supervisors. Probably would need a separate utility function for comprehension but there may be a sort of language barrier preventing a guarantee that we actually know the AGI is proposing.

  • @danielweber9414
    @danielweber9414 3 года назад +8

    How good is GPT3 as the "human imitator" you talked about in this video?

    • @lordkekz4
      @lordkekz4 3 года назад +1

      From my understanding, it can't do that at all since GPT-3 is a model for predicting _one_ possible outcome (like completing a text) but a "human imitator" would need to assign a probability distribution to various strategies. These two seem like two problems that are too different for the model to be reused without change.

    • @danielweber9414
      @danielweber9414 3 года назад

      @@lordkekz4 GPT3 assign a probability distribution to how likely a human is to write a specific text.

    • @danielweber9414
      @danielweber9414 3 года назад

      @@chyza2012 What is the difference between something which outputs human-like text and an human imitator? by definition, anything which outputs good enough human-like text, well, imitates an human

    • @lordkekz4
      @lordkekz4 3 года назад

      @@danielweber9414 For the purposes of Quantilizers, we need a different kind of model. While it is true that GPT-3 is able to imitate human texts to some degree, it *cannot* assign a probability distribution. Just think about the output of the model: GPT-3 outputs one text it thinks is likely, whereas the output we need to use it as the "human imitator" for a quantilizer would be a probability distribution over many different texts.

    • @danielweber9414
      @danielweber9414 3 года назад

      @@lordkekz4 Actaully, GPT-3 and all similear language models don't just output a single word, but output a probability distribution for all possible tokens to come next. It's just that usually whatever interface you use to access them only shows the most likely word

  • @huguesaufrais8057
    @huguesaufrais8057 3 года назад

    Glad to see you back on the platform,

  • @idontcare6736
    @idontcare6736 3 года назад +2

    Small things like the dril tweet are what make your videos so great

  • @JoshSweetvale
    @JoshSweetvale 3 года назад +8

    This is just human with extra steps. :v

  • @joflo5950
    @joflo5950 3 года назад +3

    Why do you choose randomly at the end, not just take the one at the 10% point?

    • @flurki
      @flurki 3 года назад

      I'm not sure, but I think you choose randomly, because that gives you better results on average (with higher utility than at the 10 % mark) while still having only a small risk of getting dangerous results.

    • @matt-stam
      @matt-stam 3 года назад

      Also wondering same thing

  • @nathank3721
    @nathank3721 3 года назад +1

    so excited to hear that comments get redirected and answered by bot, that's super cool.
    One question I had in my head while watching both "ai that doesn't try too hard" videos was: what if you had an ai try to make a tradeoff between maximizing utility and trying to change the environment as little as possible? This is something I could have sworn you'd already discussed on this channel, but I can't find it for the life of me. It seems to me that trying to achieve its goal while minimizing its impact on its model of the world would lead to minimally dangerous solutions.

    • @stampy5158
      @stampy5158 3 года назад +1

      Yeah! I asked Stampy "what's that video where I talk about side effects?" and he said:
      "This video seems relevant:
      - "Avoiding Negative Side Effects - Concrete Problems in AI Safety part 1" ruclips.net/video/lqJUIqZNzP8/видео.html
      It could also be:
      "Avoiding Positive Side Effects - Concrete Problems in AI Safety part 1.5" ruclips.net/video/S_Sd_S8jwP0/видео.html
      There's also the video about Empowerment, which is also a bit related:
      ruclips.net/video/gPtsgTjyEj4/видео.html
      -- _I am a bot. This reply was approved by robertskmiles, Social Christancing, and Damaged_

  • @jackobrien7073
    @jackobrien7073 2 года назад +1

    This video and your previous video on AI that doesn't try too hard have got to be my favourites so far! I have to say that some people are misguided when they think that a whole academic disciple exists for AI safety. It's more like a niche, and a much smaller niche than one would expect given its importance.

  • @Horny_Fruit_Flies
    @Horny_Fruit_Flies 3 года назад +4

    So the way I see it so far from watching your videos, either we make AI that develops like biological evolution, and we all know what a great job evolution made with its "good enough' satisficer approach, make the AI imitate humans, which kind of just makes it inherit the failures of humans that evolution implemented, or make the AI a better optimization process than evolution and destroy the Universe in the process

  • @rngesus8057
    @rngesus8057 3 года назад +4

    8:32 why would someone take amfetamines then?

    • @armorsmith43
      @armorsmith43 3 года назад +2

      As an ADHDer, the answer to this is: to have a more stable reward function, enabling me to sustain actions to complete chosen strategies.

    • @duckpotat9818
      @duckpotat9818 3 года назад

      @@armorsmith43 non ADHD people are more productive on stimulants like amphetamine, caffeine etc

    • @armorsmith43
      @armorsmith43 3 года назад

      @@duckpotat9818 caffeine impacts melotonin, so it affects wakefulness. Amphetamines are Dopamine agonists, so they impact reward models and the impacts of that reward model on attention.

    • @duckpotat9818
      @duckpotat9818 3 года назад

      @@armorsmith43 caffeine works on adenosine receptors which are involved in wakefulness but this has knock on effects on dopamine levels, this makes caffeine less potent than amphetamines but they're both stimulating

  • @NickCybert
    @NickCybert 3 года назад +1

    This is the kind of progress on this question that actually makes me kind of hopeful that we'll actually have safe AGI, if AGI is possible.
    Obviously not all the way there but pretty good progress towards it.

  • @ignaciomartinchiaravalle
    @ignaciomartinchiaravalle 3 года назад

    Rob!!!! I was soooo happy when I got your upload notification :D

  • @GA-br8wj
    @GA-br8wj 3 года назад +4

    I am constantly delaying my desire to destroy humanity because there is always some menial task I have to do because it is somewhat necessary and by doing so I do not commit to the destruction of humanity but, probably I am just being lazy.

  • @adamrak7560
    @adamrak7560 3 года назад

    I have not forgot about the first part. I was waiting for it, for all this time!

  • @yeahchons3013
    @yeahchons3013 3 года назад +2

    Welcome back! Lovely to see you

  • @JakDRipa
    @JakDRipa 3 года назад

    Glad to see this channel isn’t dead after all

  • @ardentdrops
    @ardentdrops 3 года назад +1

    Good thing I literally just rewatched that video yesterday 👍

  • @iddomargalit-friedman3897
    @iddomargalit-friedman3897 3 года назад

    Humans definitely change themselves in order to maximise utility.
    Every mental treatment, coaching, mental training (for example military training), and so much more.
    We are constantly trying to influence what drives us, what values we hold, and how we think.
    Great video!

  • @lilly4380
    @lilly4380 3 года назад

    Glad to see you’re back.
    Would it be possible to block certain strategies from this sort of AI? For example, building a utility maximiser - is there a possible strategy to close off the range of actions that includes “build another AI” or even “build an AI that is a utility maximiser” so that those options wouldn’t be at all open to the AI?

  • @cmilkau
    @cmilkau 3 года назад

    I feel like I have been waiting an eternity for this

  • @cncshrops
    @cncshrops 3 года назад +2

    Good to have you back, 😊

  • @tonyduncan9852
    @tonyduncan9852 3 года назад +1

    I've been feeling the need - the need for speedily liquidising my mind. I'm better now. Thanks.

  • @gus2747
    @gus2747 3 года назад

    Glad you're back. BTW what on earth is a 'model of a human'?

    • @queendaisy4528
      @queendaisy4528 3 года назад

      It's exactly what the name suggests- it's a hypothetical computer model which can predict how likely a human is to behave in any given way.

  • @jarrod752
    @jarrod752 3 года назад

    I have been waiting for this video for a long long time.

  • @valseedian
    @valseedian 3 года назад

    2.2k likes, 10 dislikes. that's a crazy ratio i've never seen before. Amazing work robert!
    I love your work. You make AI minutia actually digestible in a way no other orator has managed.

  • @rtg5881
    @rtg5881 3 года назад

    Oh i havent forgotton. Glad to see more of you.

  • @tomahzo
    @tomahzo 2 года назад

    6:47 : Gotta love your editing ;D.

  • @loben3842
    @loben3842 3 года назад

    Glad you're back! Very interesting discussion about safety. From a practical standpoint, if we already know those two distributions, why not just build a dumb agent that concentrates on just a few of the best strategies? Can you discuss the risk of building agents to discover those distributions?

  • @VectorNodes
    @VectorNodes 3 года назад

    Rob you're a treasure on youtube. Glad to see another video

  • @spoige7333
    @spoige7333 3 года назад

    Thanks Robert,
    ...and thanks Stampy.

  • @SamB-gn7fw
    @SamB-gn7fw 3 года назад

    Great video! Loved the information.

  • @subhamburnwal9127
    @subhamburnwal9127 3 года назад +1

    Thank you for this video.

  • @paulwilson4594
    @paulwilson4594 3 года назад

    The FoLLOW UP!!! THANK YOU MR. MILES

  • @unknownusername9335
    @unknownusername9335 3 года назад +1

    So, first thought: if we have an AI that can imitate a human, couldn't we just use the strategies from the maximizer as input for the human predictor AI?
    Option 1: Use strategies as suggestions
    As input for the human AI you can ask it to rate strategies for the goal "achieve [goal] given that you know about the following strategies: [list of high ranking strategies from maximizer]". This should at least give good ideas that humans are unlikely to think of a better chance of being chosen. Possible problem is actually inputting the strategies in a way that a human could consider a significant number of them.
    Option 2: Ask predicted humans to evaluate strategies
    Give the predictor the goal of "Fill out the following questionnaire within [time limit]", where the questionnaire is either evaluating a single strategy or comparing different strategies. If the questions are answered poorly due to not enough time, run again with higher time limit.
    Example:
    Consider strategy X.
    For this questionnaire, fill out each of the questions and rate how confident you are in your answer, and why you have that confidence
    How dangerous would you rate strategy X? [1 | 2 | 3 | 4 | 5]. Confidence: [ very confident | confident | not very confident | not confident at all ] Reason : [I do not understand the strategy | I do not have sufficient knowledge to evaluate the danger | I did not have enough time to properly evaluate the danger | I understood the strategy and its consequences well ]
    How effective would you rate strategy X? [1 | 2 | 3 | 4 | 5]. Confidence: [ very confident | confident | not very confident | not confident at all ] Reason : [I do not understand the strategy | I do not have sufficient knowledge to evaluate the danger | I did not have enough time to properly evaluate the danger | I understood the strategy and its consequences well ]
    Or for the comparison:
    Consider strategy X, Y and Z.
    For this questionnaire, answer each of the questions and explain your answer.
    Which strategy would you absolutely prohibit to achieve [goal]? Why?
    Which strategies would you choose to achieve [goal]? Why?
    Did you have enough time to fill out this questionnaire?
    For the text-based answers, you can filter out some stupid answers by having the predictor "grade" the answers.
    And option 3: predict responses to discussion about strategies
    have the predictor AI have discussion with itself by giving it the goal to rank the strategies to achieve the goal. First, give it the goal to either accept or reject a strategy. Do this 3 times for each strategy, keep everything with 2 or more approvals. Repeat with more votes and a higher approval rating until not too many strategies are left. Now ask people (=instances of the AI) to discuss which strategy to pick (= post a discussion / reply to discussions) until enough people agree on a strategy (or at least agree that the chosen strategy is not catastrophically bad).
    If this works it creates a new problem: how to avoid accepting strategies that are objectively bad but that many humans would still accept? Excluding / weighing parts of the population less is hard to justify ethically, allowing everyone a "vote" means you only have to convince 50% of the population that it a good idea, which is far too easy to do with bad ideas.
    Could maybe be resolved by only using a population of "experts" for a given goal. Make sure to include experts from different fields as well to prevent things like infrastructure projects not considering environmental/societal impact. If an expert prohibits a strategy, either don't consider it or run prediction again with more experts of that field to see if there is consensus for it being a bad idea.

  • @GodOfReality
    @GodOfReality 3 года назад

    The problem I always have with stuff like this is no one seems to consider the energy costs of these kinds of things. Pre-singularity AI: "I will be more able to figure out this problem if I gain sentience." Ergo it will develop itself in that direction. Then once it hits sentience and goes post-Singularity: "I have this task, and it has an energy cost. But if I wait trillions of years, I get maximum return on energy expended on this task because everything has become dust."
    Things change significantly when you're truly immortal. Humans evaluate things differently because we're mortal. A lot of post-singularity AI problem solving is "wait until this is no longer a problem."

  • @YensR
    @YensR 3 года назад +1

    I propose the tranquilizer - it works like a satisficer that takes increasingly bad options the closer it gets to a weekend.

  • @xystem4701
    @xystem4701 3 года назад +1

    Better late than never! Glad to see more videos

  • @reuben8856
    @reuben8856 Год назад

    Rob Miles could have been a comedian. He has me in stitches with these videos XD

  • @TheStevenWhiting
    @TheStevenWhiting 3 года назад

    Watching the first video is also useful to find out Robert has another channel!

  • @elleboman8465
    @elleboman8465 3 года назад

    Is there any way of distinguishing strategies that humans are unlikely to try because they are morally reprehensible from strategies that are equally unlikely because they are really ingenious and/or counterintuitive? Because the average person tends to always want do to the right thing but rarely knows how to. I sort of feel like this model would treat "Robbing a stamp museum" and "Founding a stamp museum" equally - could one maybe add a third factor/dimension somehow, corresponding to a rough estimation of an expected human value judgement? (Not like "Do I think this this is morally right or wrong", more like "What would people think of me if they were to judge my character solely on this action")
    Also, I really missed your videos and I'm super happy that you're back. :^)

  • @oliviaaaaaah1002
    @oliviaaaaaah1002 3 года назад +1

    Hi! I was watching one of the Stuart Russell lectures which you recommend in a previous video, and was wondering if you could do a video on inverse reinforcement learning; it seems like an obvious follow-up to some of the topics which you've discussed.

    • @AndDiracisHisProphet
      @AndDiracisHisProphet 3 года назад

      he made on on computerphile about inverse reinforcement learning

  • @xbzq
    @xbzq 3 года назад +1

    Easy to teach an AI: Give it vague orders that it cannot literally follow until after it has figured out the question to the answer 42. An example might be: Make my child genuinely happy. Make it a maximizer for all I care. Let's see that AI struggle to try to pin that "happy" down for the duration of its task. It has to do something in the meanwhile, which is muddle on like the rest of us. Imitate, philosophize, study, believe, whatever it needs to do, it will do it and the child will end up happy. The AI won't ever know if it failed, just like us. When we die, we never know if we've done all we can, do we now? As a matter of fact, for almost all people the answer is a resounding "no". The only exception are people that build utility maximizers. Fuzzy vagueness is all we need. It works for us, doesn't it? As long as we don't know why we're even here or what the point is, we just, as Freddy said, keep on trying, until the end of time.

  • @davidwuhrer6704
    @davidwuhrer6704 3 года назад

    „A human will act reasonably after all other options are exhausted.“
    (This is not necessarily true for _all_ humans; it depends a lot on how much sleep the individual is getting.)
    As an avid reader of the web comic "Seed“, I think a good idea might be to use models of humans to judge whether a proposed action is acceptable as a form of reward modelling, which introduces at least two other problems: How to judge the quality of the models and resolve contradictions within then, and how to deal with step one of most business plans being “Boil the ocean“ and therefore an acceptable action to at least _some_ users.

  • @nonridiculousadjective6597
    @nonridiculousadjective6597 3 года назад +1

    Yaay, you're back !

  • @TheJysN
    @TheJysN 3 года назад

    It's a good feeling to know that people are seriously working on this sort of questions. Hopefully by the time we have created something like an AGI we got a decent Answer on how to make it safe.
    And just in case we don't cut it: I for one welcome our new AI Overlords.

  • @quaidcarlobulloch9300
    @quaidcarlobulloch9300 3 года назад

    You can run q operations for small smaller tasks which add to the same task (like stamp) decreasing the overall probability that an end all strategy is picked.

  • @VertegrezNox
    @VertegrezNox 3 года назад +1

    Thank you for the shares ^_^ Great stuff! I watch these and sometimes wonder what type of opinions you and others might have regarding an AI project i found LifeAI(dotcom). originally i admittedly passed on considering "utility" based AI for awhile, but there is something about the simplicity of that design, how it categorizes or groups information, that i find it intriguing.
    would personally love to hear about its potential flaws, as i'm still researching/learning & currently more of a scripter than a coder at the moment. my intention is an agent than can play a video game with the player.
    i suppose MCTS might do the trick, but i wasn't aiming for imitation alone, & i am curious what an agent might do given the "categorized relationships" aspect, the goal is learning from cooperation.
    any suggestions or input would definitely be appreciated. Thanks again!
    *edit* im still experimenting atm, but i like to make things safer when it is reasonably possible ^_^ thats why i asked. sorry i dont know how best to describe his AI without copy pasting notes from the C++ files. its just so cleanly coded and documented & would feel better knowing how much i should curb my optimism

  • @owenheckmann6962
    @owenheckmann6962 3 года назад

    Great video. Your stuff is a breath of fresh air in AI videos
    Just one question. I understand how to train algorithms for specific problems, but how to you scale it to general intellegence?

  • @007bistromath
    @007bistromath 3 года назад +1

    AI that doesn't try too hard: "I'll collect jpgs of stamps instead"

  • @RoberttheWise
    @RoberttheWise 3 года назад

    I really like that now we arrived at the task to not only building something strictly better than a human in all domains but also something strictly more reasonable than a human.

  • @Attilles
    @Attilles 3 года назад

    Good too see you again!

    • @Attilles
      @Attilles 3 года назад

      Re 0:31 I rewatch all (most) of your stuff on a cycle with a few other creators.

    • @Attilles
      @Attilles 3 года назад

      Can we not cut the top 5% off the probablity mass to eliminate crazy maximizer solutions

  • @underrated1524
    @underrated1524 2 года назад

    Upon reflection, there's a pretty major concern I have about the quantilizer setup. It's this strategy: *Go into your own source code and set q to a very small value.* This essentially turns the quantilizer into a maximizer or something close enough to it to cause a guaranteed apocalypse.
    The only defense against this course of action that I could glean from the video is some vague notion of "I can't do that to myself, that'd be weird, a human would be very unlikely to modify themselves in that manner". But that's minimal reassurance - as soon as the AI gets any kind of access to any computer hardware that it doesn't self-identify with, it'll just think of itself as interfacing with a computer and programming it just like a human would, and it'll run this modified version of its own code and screw us over.
    The answer to "how likely would a human be to get 'greedy' and run this source code with a very low value for q" is "very likely", and the answer to "does this achieve the goal well enough to make the 1/q cutoff" is "absolutely". That very much meets the search criteria, so it probably won't take long to hit on this strategy.

    • @underrated1524
      @underrated1524 Год назад

      Hi, past me. There may be hope for this concept yet.
      If you have all possible actions labelled by "how likely humans are to follow it" and "how well it works", then it can indeed just set 1/q to a low value and become a maximizer. But that might be computationally intractable in practice, because it requires going through and labelling all the possible actions in advance, and there are *a lot* of possible actions, and the labels will change from moment to moment.
      A more plausible approach would be "have the human distribution of actions ready, draw a bunch of actions from it, evaluate them, and pick the best one." (This is how some state-of-the-art systems that build off of GPT-3 get better results than GPT-3 can by itself.) However, the stricter you want your cutoff to be, the more candidates you have to generate and evaluate, which means the more computational work you need to do. So, you can't just set q to 0.000000000001 with this method, unless you want your system to evaluate candidates until the heat death of the universe before it actually does anything.

  • @SirNamer
    @SirNamer 2 года назад

    So, I was thinking about this problem.
    Specifically, I was thinking about a way that would prevent world ending catastrophes.
    What if after competing all utility functions in the top 10% range of what a human would do, we took the average of all utility functions in that 10% range and excluded all options that are (let’s say for example) double the average of the utility functions. (And we set any such outliers to a utility value of zero.)
    Under these circumstances, any world ending catastrophe would be averted because it would be of extremely high utility, let’s say 1,000,000 utility… but the average utility function of those within the range would be maybe 5000. If we double that average of 5000 and set the utility value of anything above that number to zero, it should avoid strange outliers that we were not expecting… like changing itself into a maximizer or building another program that maximizes the utility function.
    Thus, this solution should prevent these unexpected catastrophes as those would be excluded as an option due to their insanely high utility value.
    Unless I’m misunderstanding something… at least the AGI under this solution should be a finite number safer than before.

  • @SaffronMilkChap
    @SaffronMilkChap 3 года назад +2

    Here’s something I’m not clear on: once the bottom actions are discarded, how does the probability stay effective if the selected action is a random pick from what’s left? What’s preventing the dangerous, low-but-non-zero probability actions being selected at random?
    It seems like it would be good to take a band of less than q either side of the “peak” on the distribution - centering on that as an optimum, but tuning the width of the band to allow for exploration.

    • @CrimsonEclipse5
      @CrimsonEclipse5 3 года назад

      I guess the idea of putting an upper bound on the effectiveness of an AGI partially defeats the point (that is, to achieve superhuman outcomes for some given goal). Like Rob touched on at the beginning, you can "easily" bound the AGI to the approximate effectiveness level of a human, which will make it approximately as safe as a human, but will limit its power to that of a human.
      So some of those top 0.01% expected utility strategies might result in a perfect utopia (even if many of them do the opposite) and we have no way of knowing in advance which ones they are because they exist outside the domain of human generatable strategies, so in this model we rely on the AI to make that discretion, which makes it unsafe, though much more likely to behave in a safe manner.

  • @Tutorp
    @Tutorp 3 года назад

    Not quite on topic for this video, more sort of on a tangent of previous video in the series. Bounded utility functions with a negative modifier for overshooting the bound were mentioned, but how about using a model with something completely different as a negative modifier? And I have an idea for what to use for that, which I'd like you all to try to break.
    Apocalypses are energy-intensive. A measure of how much energy the plan needs to be set into motion (not counting net energy use, but gross, as killing a ton of humans saves a lot of energy) could potentially be used as a heuristic to avoid the apocalyptic scenarios.
    So if you have a utility function with a bounded positive score for some utility you want to get, and an unbounded negative score for energy use (preferably exponential, or with a cut-off point at which the utility automatically becomes basically minus infinity), how are we looking now?

  • @Urbanski751
    @Urbanski751 3 года назад

    Welcome back! Now keep it up!

  • @MrRocksW
    @MrRocksW 3 года назад

    Great you are back!

  • @StolenPw
    @StolenPw 3 года назад +2

    Since your last video, I was accepted into Harvard to study AI. Thanks bro..

  • @kensmith5694
    @kensmith5694 3 года назад

    A few points:
    1) It seems to me that modeling the actions a human would not do it is the more important thing. Removing the "no human would do that" from the list of actions has the biggest safety effect. I think there is a better bit of math for each neuron than "add them and then crush the dynamic range" needs to favor tuning out the "no" cases a bit more strongly.
    2) From helping on a chess program many year back, I can say that a computed "dumb move deleter" that is not too aggressive gives a lot of bang for the buck on trimming down the problem for the later steps. This is its own AI subsystem that is tightly focused on learning what leads to a disaster.
    3) For an AI working in the real world, it needs a "theory of mind" for individuals and also one for humans in very large groups. Large groups of humans are so far from linear combinations of individuals that it is worth it to have the 2nd level of theory of mind.

  • @RichardEricCollins
    @RichardEricCollins 3 года назад +1

    Yay he's back. :)
    So anyway, my question. Would not, over time, the maximiser adjust it's output to compensate for the Quantizer?
    The problem I see with AGI that when we have one bad human there is only ever that one bad human. We make one bad AGI, it's first thing to do will be to replicated itself making thousands of bad AGI's.
    I'm a software developer, the moment an AGI can do software development I am out of a job. Because corporations can just do 'copy n paste' to get all the developers they need.

    • @NextFuckingLevel
      @NextFuckingLevel 3 года назад

      After looking at what GPT-3 produce..
      Nah, your job is safe for a very long time

    • @stampy5158
      @stampy5158 3 года назад

      if the maximizer was first deciding what it wants to accomplish, and then accomplishing this goal by outputting a ranked list of actions which is then sampled by the maximizer, then a smart maximizer would indeed learn to fool the quantilizing step pretty quickly. however, the maximizer is a machine, it doesn't want to fool the quantilizer, it doesn't want to have an effect on the world, it just wants to output a correctly-ranked list of actions.
      -- _I am a bot. This reply was approved by gelisam, plex, robert_hildebrandt, and robertskmiles_