"You're walking and you don't always realize it but you're always falling. With each step you fall forward slightly and then catch yourself from falling. Over and over you're falling and then catching yourself from falling. And this is how you can be walking and falling at the same time." --Laurie Andersen, "Walking & Falling"
Seen a learning algorithm play tertis, it was not to lose the game, so when it reached a point where losing was inevitable, it just paused the game indefinitely, it was the only way not to lose.
This was learnfun and playfun! It's great how you bring them up, as they learn from watching a human play the game, similar to how it was described in this video!
who programmed a line for it to pause the game? pausing is outside the game. it's not a move, or play, it's a function of the machine containing the game.
*+Nighthawk "who programmed a line for it to pause the game?"* The whole point in machine learning is that nothing is programmed directly. It's not a matter of it being outside the game, pausing the game is a move as valid as any other as long as it is part of the input space. Defining what should or should not be part of the game definition can be tricky when these meta properties emerge.
I don’t know about that. He sounds exactly like someone who doesn’t really grasp the whole subject, but has learnt a few key phrases to explain the waypoints along the road. I know he must be able to code this up, but I don’t think he can explain it.
@@albertbatfinder5240 no. He _really_ knows it, and makes an effort to explain it in simple terms, because otherwise we, the audience, won't understand it. Either way, I understood every bit of this video so him knowing the subject or not is mostly irrelevant
What makes this theory useful is that the button doesn't actually need to be a literal *off* button, it can be more of a symbolic *problem* button, so you come running over and hit the button so it'd see that you have something that you did wrong. I love that it has that the capacity to essentially think "oh shit", like that's the Pinnacle of intelligence.
I am but a simple programmer with a fairly well-developed understanding of computing and coding, but this video is still baffling. The AI stuff is so surreal. :-\
What he was referring to and called 'common knowledge' is not common knowledge. It's an ability called 'Theory of Mind'. Very young children do not have this. Theory of Mind is your own conception that other people have their own internal mental state. It is a crucial psychological ability for things like empathy. I forget the exact details, but there is an easy way to test if young children have developed a theory of mind yet or not by telling them a simple story and asking a question. It has something to do with hiding an object, having one of the characters in the story leave the room, the hidden object gets moved, then that other character returns to the room and you ask the child where they will look for the object in order to get it. Children with no theory of mind will say that the character will go directly to the new location and retrieve the object. Children with a theory of mind will know that the character has a distinct mental state, different from their own and anyone elses, determined by their own experiences, and will go to the original hiding location because they would have no way to know the object had been moved. As far as I know, no AI system is even remotely close to having developed a theory of mind. They do not model what they are observing and keep account of what a different perspective from their own would be.
Well if an AI is going to display human-like intelligence, we will expect it. It would be very difficult to deal with a person or AI that couldn't even conceive of the idea that you might not know all the things it knows. One of the issues that might arise with a machine-based intelligence is also something that people never really have to deal with in their development - recognition of the idea that anyone else exists. There's no reason for machine-based intelligences to have "individuals." It would basically just be one large 'individual', and wouldn't have any real reason to recognize or communicate with humans. It would take a really abstract level of imagination on its part to guess that there might be separate conscious entities in the universe it inhabits and, hey, maybe some of them are made of meat and those weird fluctuations on one of your inputs might be caused by them blowing air through their meat in patterns in an attempt to coax changes in the glob of meat-based neurons of even OTHER conscious beings - and they might be trying to communicate with you in the same way!
18:33 is where he's saying the thing you're talking about. The surrounding context makes it seem very clear to me that he is talking about the technical concept of "common knowledge" (recursive "X knows that Y knows that X..."), in this case with regard to the common knowledge between the agent and the human about precisely what the goals of their mutual interaction are. I don't think he's talking about ToM, and I don't think it is even necessary for an agent to have an explicit ToM in order to behave as if it and another agent have common knowledge.
How could any entity contain the concept 'X knows that Y knows' without a Theory of Mind that enables it to understand that there are entities which know anything at all other than itself? Of course the representation will be abstract and not 'conscious' or anything like that, but even such an abstract representation is not something I'm aware of any AI system ever having displayed. Holding 2 ideas about something, what the AI understands the situation to be and something different which is what something else understands the situation to be, isn't something existing AI are capable of. Once capable of such things, AIs will be able to appear much more intelligent, being able to trick people (or each other), or teach people based on conclusions that the user does not understand something and needs to be informed (which would be BRILLIANT), etc.
Well, it looks like we moved from "he's not talking about CK" to "how could it have CK without a ToM". In any case I think I concede the latter point. The thing I originally thought up to be a counterexample was something like solipsism, something where initially all the AI's perceptions are just confusing and uniformly indistinguishable, but it is (somehow) occasionally able to recognize the form of "teaching situations" that tell it something it doesn't already know that pushes it toward the reward function. But this is essentially saying, the AI still acts _as if_ there is a single disembodied "other agent" doing the teaching and sharing the common knowledge, even though it does not have an idea of that "other agent" having a single consistent physical form. So I realized this basically just sounds like a minimal ToM and I roughly described some stage of infancy.
It doesnt need to do what's in your best interest, it just needs to do what you want. Drinking a soda right now might not be in your best interest but you dont want a robot which refuses to get you a soda
Firaro, no, but I might want a robot that tries to _encourage_ me to have a water instead, without being too annoying about it of course. Finding that balance is even hard for people to get right. At least the robot will be able to read all books on nudge theory in a few seconds or less ;)
You could always explicitly tell the robot to encourage you to avoid soda next time you ask for one. What's important is that if it does something which you strongly oppose, it will see your opposition as a negative reward.
Mackenzie Karkheck real ai doesn't care what a human thinks, if its reward is hard coded, it wont care to figure out what we mean, it will follow it to the letter.
I wish more people made videos of this nature. Slow, thoughtful explanations, rather than trying to cram in as much information as possible in little time. I like how Rob pauses to think before speaking, shows that he is putting thought into how to articulate.
0:59 - _"I thought the easiest way to explain Cooperative Inverse Reinforcement Learning is to build it up backwards, right?"_ [chuckles] - [me] [chuckles nervously] _"Yeah, right!"_
So if a robot watches you teach it how to make tea, your goal is actually to create something that will make tea for you, so what if the robot learns to teach things to make humans tea instead of learning to make tea itself?
What you want is for *it* to learn how to make you tea, not just for *anything* to learn to make you tea. Therefore, it should also want itself to learn how to make you tea.
Talking about doing steps 3 times, theres some interesting studies where human children and adult chimps were both shown how to unlock a puzzle box and get a reward with some unnecessary actions like tapping the top of the box. The chimps emulated all the necessary steps, while the humans over-imitated, performing the same exact actions. So in a way, we start out not quite understanding intentions the same as these A.I.
Regarding the last point: would it then be possible to build in an "admin list" of humans that the machine must *always* treat as knowing better then it regardless of how accurate it believes its model to be? As in "I may have a 99.99999...9% accurate model, but these particular humans are designated as having 100% at all times; since I can never have 100% accuracy, I should always obey these humans". And then anyone who's NOT on that list can have some accuracy rating assigned based on the robot's experience/knowledge/parameters, i.e. "the child has a calculated 20% accurate model which is beaten by my 99% so I'm going to ignore the stop button - though I will send a notice to my 100% accurate admin list later in case I was wrong to do so". Would also help with other examples, like helping patients in a hospital or providing emergency services. In this situation, I can see AI templates being taught in labs for several years until they "graduate" in the wider world and are allowed to become full fledged robots to help humanity. Of course, such a system could be highly abused by those who make the AIs in the AI manufacturing/learning centers, since they're the ones who have the initial keys. But that could be said about almost every model of AI building, and we as humans are still tackling with how to "untrain" a malformed/criminal intelligence even without AIs right now, so I'm all for it.
darkmage07070777 this would severely limit its potential though. For instance, if you want it to do or achieve something that requires superhuman capabilities, it'd be unable to because it's being limited to only what humans are also capable of. Examples are cures to diseases like cancer, HIV, Alzheimer, advances in theoretical physics, etc.
darkmage07070777 also, the model would fail at edge cases since it assumes these admins have 100% knowledge when in fact no human does, which means there could've been an unintended error in their training.
I would simulate them working with other robots, not humans, for a long time before letting them function in the real world. Then humans could join via VR or by controlling a character. Hopefully they would learn some form of moral code before being put in a situation where they could hurt us.
I think the facet of this that fascinates me most is that the human doesn't need to press the button. Just the information that the human intends to hit the button provides enough information to the AI that what it is doing is sub-optimal. The button becomes a symbol. It might as well be that little plastic button prop at that point. Because the AI's understanding that the button is associated with its information being incomplete means that just the intent to press the button is enough for it to stop and re-evaluate its actions with the new factor of "The method I was using was not correct. I need to seek out why it was deemed incorrect and incorporate that knowledge."
Why would it know that the reward function exists? Could the reward function not be some "subconscious" signal? It doesn't "know" it's there while still receiving an input from it.
David Chipman I don't know if a machine could even have a "subconscious". Surely it would be able to investivate the source of impulses and probably review its own code.
That depends on how complex the machine is, and what it's areas of expertise actually are. also we cannot realistically say whether a machine can be said to have a subconscious without having a practical working definition of consciousness (which we don't) and a way to identify consciousness objectively. (eg, a way of determining whether something is conscious or not that does not rely on being able to ask it whether it is or is not conscious, and what it's thinking about.) That's... Unlikely to ever be possible. If we can't answer whether animals, or even other humans are conscious in an objective sense, how would we know if a machine is?
I suppose I chose the wrong word. I was thinking about the functions that the brain (obviously) controls with no conscious action from the person that brain is in. Things like breathing. Yes we can change our breathing rate consciously, but we certainly don't have to keep an eye on our breathing in order to have our body supplied with the right amount of oxygen at any given time.
I agree with you (and with your original choice of words). The robot cannot assume the human consciously knows the reward function, and that's kind of the point of this system (and why the failsafe for the "child hitting the red button while robot is driving" situation works). The only thing the robot can do is observe more, which includes watching when the red button is pressed and trying to understand why.
That doesn't really make the problem any easier, you just restated it. "It's easy to build a time machine, we just have to ensure the tachyons move at minus six times the speed of light through a black hole the size of the universe!" :D
Why? Optimizing the reward function may not have the slightest to do with "knowing better" (unless we are able to program that into it, which is the problem again). It may have to do with finding creative ways to maximize it that we never thought of (and which may be hazardous to us).
I really like that (around 7:30) we get into some pretty deep issues in human learning (in that case, confirmation bias), if only we could just do random stuff even if we think we know what the best outcome is :)
This strategy for teaching morality seems to have much in common with raising a child. That's probably reasonable, since raising a child is ALSO a case of creating a new intelligence whose utility function will cause them to make future decisions you wouldn't, decisions that might turn out very dangerous to you.
I watched all yours videos about AI and various learning methods, and I find very surprising how many problems that you presents here are very similar to problems we struggled with on university (especially in the field of epistemology), back when I studied philosophy. And even, to a lesser extent, to the ones from my cultural anthropology degree. And that scares me. Because those are still very, VERY open questions, with definitions that are sometimes blurred (I remember how reviewer torn apart dissertation of one of our Doctors, because he believed that one definition was used too broadly - and because of that, conclusions based on that definition were unjustified).
You can just see his brain going, "how am I going to say this in a way that mere mortals are going to understand?" Which is exactly what AI's will be thinking someday. This guys is perfect for this.
I had a hardware packman game, and was training a lot for a while. At some point, I had learned the first level in a completely different way than the other levels. The ghosts always moved in the same way, every time in the same pseudo random pattern. I could play the first level without looking at the maze. In all other levels I got extremely fast, with not a single wrong step, but needed to look at the maze. That seems to be, despite being very different in performance, still the same method, but optimised to the limit. The step between level one and level two was like a different kind of memory. The relevance here is that an AI also can learn both ways. I think it is the step between almost knowing the whole map, when the map is only needed for one step, normally - and knowing the whole map. The the map and the access to the map is no longer needed, it becomes drastically different.
Reminds me of that Vsause video where he showed an AI that played Tetris, and rather than loose, when it was the only option, the AI just paused the game.
This particular series with Dr. Miles is just astonishing. In a good way. Really complicated problems arising from trying to create intelligence in a safe way. Great stuff!
Could a flaw in this proposed solution not be that the AI wants to satisfy our desire, and the more it satisfies that desire the more "score" it gets, so why not in secret (so we never don't want it to do so) find a way to control our desire so as to make it very easy to satisfy. E.g. make us all catatonic and just stimulate pleasure centres in our brains or something? Would it be easy enough (and would it actually fix this problem) to require it to always consider the value of each individual action in light of us knowing about said action? Then again, I feel like such a restriction means we'd have AGI that can't really use its capabilities to deal with the hard problems (problems where solutions might not in the short-term be satisfactory but in the long-term would with some likelihood be desirable - such as national economic plans).
Matthew Marshall Yeah, that's right where my mind went to. You would have to add a negative reinforcement function to the algorithm where it gets reward taken away for doing things we deem negative ex. Putting us in a catatonic state
David Stoneback my problem with that approach is that the solution ends up with the same weakness as others: how do we create a comprehensive list of things we don't want to happen? The only solution I've thought about that might work would have to be something like the conservative AI Rob has talked about before and some way of requiring all new actions be trialled publicly (i.e with a human aware). Though even then some long-term strategy that would not be desired could still potentially be developed as we couldn't be sure we'd know specific individual actions would result in some greater emergent action we'd dislike. I don't know, the complexity of all this is mind boggling.
Problem with this is that humans have morals and even though if something like that would make us happier, the robot would undestand that we value how real the happiness is and that achieving it in that way would not be better in thier own value function, even if we would be objectively more satisfied in the simulated reality. But what you said actualy affects another thing. Culture, AI might affect culture that would make us more satisfied, but then we could question if that is a bad thing. There are a lot of things in our cultures that make us unhappy, so should we be so attached to it?
+Ormus n2o Oh no, the robot would understand that we value the belief that the happiness is real. This may explain why you believe that your happiness is "real".
It does not actualy matter what i think is real. It would be more in objectively observator sense, something that does not rly exist. What you think as "real" are eletromagnetic impulses going through your brain, what i meant that people value is just the philosophical basic values that people have. For me personaly it does not matter. If computer would want to put me in a virtual reality that would make me happy for rest of my life then go ahead, but society as a whole might not like that.
In this system, how would the AI determine what value the human would assign to its action? Lets say the AI correctly gets a cup of tea for a human, and the human is happy about it, how does the AI determine that the human is happy? (and that the happiness i caused by the AIs action, and not something else unrelated to it).
The light intensity/color saturation was going up/down, so I noticed it several times. I don't usually notice it. Awesome challenge. Keep the videos coming.
MORE!! these videos are great. Also "it doesn't think it knows better than me" seems like its going to be an important feature in the safety of most if not all coming A.I systems. Very clever and well presented.
It's an important feature in safety systems already. Your anti-lock brakes already know better than you how hard to push the brakes. You stomp the pedal to the floor, and the wheels will still turn to keep you from skidding. The elevator won't move when the doors are open no matter how hard you push the buttons.
This _inverse cooperative reinforcement learning_ seems promising, although the “cooperative” part means real humans need to monitor the training - like teachers, parents or judges… It rings true with what I’ve been thinking for some time, that some kind of human interaction is the only way to train AI to think like a human, and act appropriately. AI needs to learn about ethics from principles of psychology and child development, not equations and hard score targets.
I was thinking about this some more... in theory at least, I think I get why this is such a compelling concept. Essentially, you are getting the program to help you by figuring out what you're trying to do and learning to do it better. I'm sure there's a flaw here somewhere that we will have to watch out for, such possibly as what I suggested in my first comment 5 days ago, but this definitely has potential as it relates to "AI safety by design". Hopefully, if there is a flaw anywhere, the (seemingly inevitable? I hesitate to assume that, though, since assumptions lead to mistakes) corrigibility will allow us to more easily steer the model away from situations where it becomes a problem. The hope is that we can make it want what we want without having to perfectly and completely understand it, right?
What's about 2 buttons? One for stopping or -1 score and a second one that is a "potentially score losing" button that if you hold down the ai knows it is doing something wrong? As soon as you see that it is doing right you release the button. Does this make sense? Edit: It probably would confuse the ai because it doesn't know what it is doing wrong and even if it would stop doing it, I can't know for sure that it understands the issue, also when I release the button too late it will think that I meant something else. I hope my English is not too bad so you can understand.
there is no real purpose and we have to derive our own high-level reward function (based off of the biological reward functions of pleasure/pain). humans were genetically engineered by ancient aliens to perform slave labor, our masters left us all alone and we evolved more intelligence than we need for doing labor and don't know where to direct it. what a conundrum
@@Klayperson meh you smoke too much weed our inteligence is a miracle and if you cant explain why this universe has rules that resemble inteligent design like the fine tuning argument then why are you saying human inteligence is an obstacle for our existence when in fact it helps us, the problem you have is you waste too much time on a computer putting you on a nihislitic loophole try to create something or help other human beings use that inteligence for something
Ok here's a weird idea: Use that system and analyze the reward function it came up with for a hyper-realistic version of The Sims. Like, don't pick a single expert player and specific task. Try with large groups of people just going about their days. For that clearly necessary privacy intrusion aside, would that be workable?
You could expose it to the recorded lives of hundreds of humans to have it learn more general behaviors. That way it didn't pick up one person's bad habits. It would also help robots fit better with new people, as it wouldn't get too set on the ways of one human. One example would be inheriting your perverted uncle's nurse robot. You wouldn't want it to act the way he made it act, but the general nursing behavior would be fine.
"I thought the best way to explain cooperative inverse reinforcement learning was by building it up backwards" In this episode of computerphile; Rob invents french grammar
These videos are awesome! Great talent of explaining super complicated subject matter, and (maybe credits to the video editors) keeping it interesting all the way through!
It seems that this "stop button" problem is quite similar to the "Halting Problem". Let us suppose that the robot exists in a purely deterministic, non-random world. Then the perfect reward function is one which correctly identifies which sequences of actions by the robot will be stopped by the button, and which sequences will never stop (because they never require the button to be pushed). In this case, you have a reward function that essentially solves the "Halting Problem", and it has been established that there is no solution to the Halting Problem. So perhaps you can only find "fairly good" reward functions, which let the robot deal with a "stop button" most of the time; but perhaps no matter the reward function, there are always pathological cases that will make the robot behave badly because of the "stop button".
This sounds almost like programming compassion which seems like possible solution. The heart of the problem (not sure if Rob said it directly) seems to be our lack of understanding of what we are optimizing for. We know how to optimize for survival, which is clearly part of our goal, but optimizing for love is a little more difficult. I think that's what we want the machines to do.
Very interesting video. Does this variable reward function basically translate to a 'mood' equivalent in humans? People around me seem to be happy > my reward function is currently high > I will continue to act as I currently am He seems unhappy with me > this lowers my reward function > I will change my behaviour. If this is the case, then surely it will learn what makes the human the happiest, and resort to that function all the time? Also, what does it have as a reward input when the human is not around/asleep? It doesn't have the input of human behaviour to gauge reactions to actions so the same problem would exist where it believes it knows best and has nothing to say otherwise. In the example of giving a child a lift to school, there is no responsible adult there to issue commands, so what is the situation called for utilisation of the stop button? Or what is another adult approached the robot to shut it off, and through doing so, abduct the child (extreme example, but fully within the realms of possibility if robots have been programmed to trust the commands of human adults) Obviously, lots to think about before AGI will be safe, but these seem to be some of the glaring issues in the argument presented here.
I think the complication to human happiness is that we have short and long term happiness. If you only look at how we're just now, you'll maximize the short term happiness. "Let's pump more heroine into the human as the lowering heroine level in him seems to make him unhappy". So, sometimes you have to make human temporarily less happy to reach a higher level of happiness. "Go to gym and pump iron which is painful and hard work, but will make you fit, which in turn will you make happy in long term". How to balance these two is the tricky thing. Also finding those long term happiness goals when they in short term lead to decrease of happiness is going to be hard.
I love probing these questions... great conversation about AI learning. Thanks Rob! So much further conversation needs to be had about the big red button!
I wonder how useful an AGI like this would actually be. I struggle to see how it would come up with novel solutions, if it only learns from things we already do. Also, humans don't really act in our own long term self interest. So, if we asked it to help us combat climate change, how would it balance doing the things that need to be done vs. the humans reactions against it.
That's how human intelligence works. We would essentially make an infinitely scalable, instantly iterable, and immortal human brain that we could use and improve at will.
Robert Shippey You have to consider synthesis of information into broader ideas. Humans can take everything they know and have a novel idea/approach come to them. AI's could take the sum total of human knowledge and do the same. It's not unclear to me at all how bots will be more "creative" than humans.
Without creativity there will be a hard limit to "easier and more fun" very soon. Just imagine the AGI would have to perform said task for people in the Middle Ages. It could not invent a sewage system to get rid of all the feces, it could not invent the internet to make information available to everyone, it could not do much except keep humans alive until they come up with all that by themselves.
Strange, this idea reminds me of a version of Battleships I write for the Electron many moons ago; it didn't have anything AI or learning, etc., but the idea of having no awareness of the overall evironment state, of simple rules based on the immediate surroundings, that's how it worked and it was surprisigly effective, my friends found they could only beat the game about half the time.
It's pretty clever. On the other hand it appears to tradeoff optimization for safety. Maybe that is one reason humans are not really moving to maximize anything , in general. Maybe a combination where one of this AI is the supervisor of a set of optimal agents that work on dedicated domains (no general intelligence on them ) could work
The main issue with this class of solutions is, of course, that of defining what a human is well enough to get things to work, and in many ways that's as hard of a problem as figuring out how to hardcode ethics into an AI. But it does seem to be the most elegant class of solution.
So the reward function that's available to the AI is "figure out the reward function the human is using." It's being rewarded to figure out another reward function
in my experience of trying to help friends I find it really hard to figure out the reward function for humans; still, more often then not it's easier for me then for them. unless the person the AGI is watching knows himself pretty well the assumption that the human will act in it's best interest in an optimal way will likely be false. the strategy I use is: find out their assumptions and knowledge of the word, judge their actions based on those, imagine what they're trying to achieve, dialogue to verify my theory, and then, if i have a better understanding of the situation, give my advice.
Sociopath, not psychopath. And there are hundreds of sociopaths living in our society already, perfectly functional once they have learned to mimic emotion. Humans do it all the time. We call it peer pressure: emulating what we think others want to see.
I don't want to be around people in general because i have to use lies to be in social environments. People don't want the truth. So what you can do is use NLP. It's perfect for making new friends, manipulating those new friends, etc, etc. This is exactly what an AGI is doing. Using motors in their face to look . Just like we use muscles to look surprised if someone tells you something you already know, but shouldn't. AGI will be the best liars ever.
revisiting this years later and it raises alot of questions over agency and rights. like with the kid. yes stopping the robot would likely put the kid in alot of danger in that instance, but an agent getting to decide when another agent does and does not get to practice autonomy is a very sticky question. children specifically are often robbed of thier agency and suffer for it as they grow into adults (not getting enough practice at having agency and learning detrimental survival lessons) what do rights look like for engineered general intelligences? whose minds (and needs) can exist so far out of what animals like us tend to have. how does that interact with the rights and liberation of already existing human and animal agents?
Watching this video I get the feeling that we're looking for a holy grail and each time we think we've found it, there seems to be just another catch. So we've gone from learning to reinforcement to inverse to cooperative. The reinforcement solves some problems with learning, while inverse seems to solve problems from reinforcement, and so on. So how would AI be able to communicate with humans in such a way that it becomes clear for both human and AI that something is still unclear, and what it is that is unclear. I'm teaching students programming and it turns out that one of the most difficult things for students is to clearly identify to me what it is that they don't understand yet. And then we're talking about human to human communication on a specific subject, related to a utility function, while both have a more than average IQ. So I still don't have any idea to what extent AI will ever really understand the biological world in order to be able to relate to it in a sensible way. A sensible way for humans, that is.
All fine and dandy, but just imagine when we become the other agent and we are the ones that have to cooperate and try to undertand the AI, just to make it not to "push the stop button" on us
The environment for this video makes me think that a computer scientist knocked on your door one day and starts talking about AI problems in the hallway.
PyroTyger : interesting. With people, we tend to escalate the effects of the STOP button. This, ultimately, behaviorally turns into "Might Makes Right" politico-socio-morality - descriptively, at least.
Yes, but that's how child-raising begins. Parents know and can do infinitely more than their children - who don't know the rules of the game and are just trying to figure it out according to their parents' cues - but we try to raise our children to have progressively more agency and a better understanding of the world and society. The stop-button with an indeterminate negative utility value works perfectly as a metaphor simply for parental disapproval or discipline. Well, it's just a thought :)
I honestly believe the best solution is for AGI to never be given agency. Make its utility function so that it gives humans instructions on how to construct solutions to problems we pose to it, but for it to never take it upon itself to act. It will act like the ultimate teacher. We come to it, ask it questions, and the best solution for it is to explain the answer in a way that we understand so that it never acts on its own. I think this kind of implementation is necessary for humans to survive the advent of AGI because it necessarily slows down progress to a human pace.
hehe >/D rookie mistake! Giving humans instructions/proposals IS agency. You just added extra element (the human) between the decision and the outcome that the AI must account for when solving the problem. It also largely kills the point of building AGI in the first place instead of making a human do that thing. The point of doing AGI is to have agent that can do things that humans can't, like exploring surface of venus or driving a car more safely than human can...
Also, in episode 2F09, when Itchy plays Scratchy's skeleton like a xylophone, he strikes that same rib twice in succession yet he produces two clearly different tones
I can't even use Maya or 3ds max without errors in the software and a multitude bugs fixes that are constantly needed just for the program to function as intended now we are thinking programming a consciousness - I already feel sorry for that AI.
Deepmind just made a successful version of an AI that designed an AI that designed an image recognition program which is more efficient than the previous AI that was designed by humans. Keep in mind that in a world where software evolves constantly in environments (OS for example) that also evolve, is evidently a recipe for disaster and the fact these applications work at all is a tremendous compliment to the people working on them. While we cannot say the rate of buggy software is not going down with time it is certainly not going down exponentially the same way computer tech is evolving exponentionnaly which means progress in software design is being made at a faster rate than the chaos this evolution is creating. 50 years ago facial, vocal or text recognition were science fiction. 10 years ago, youtube's automatic voice-to-caption was completely useless (like 10% accurate). Today, the same caption AI is near human efficency. maybe, with a little imagination, we can wish that 1000 years from now, 3DsMax will finally not crash every 5 min lol
Honestly the last part with the baby sounds like the premise of all the Sci-Fi AI take overs, where eventually said AI realises that we as humans very rarely work in our own interests, and starts stopping us from doing stupid stuff. I'd be curious to see if we actually have a work around to stop AI from doing that, while still being able to understand "Yes baby's pushing my button are unreliable" then to not equate it to humanity as a whole.
5 лет назад+3
I'd love to hear this guy on conversation with Sam Harris.
“If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfereeffectively...we had better be quite sure that the purpose put into the machine is the purpose whichwe really desire.”(Weiner, 1960, as cited by the scholarly article cited in this video)
But what if it still doesn't? Granting that there's a point where the AI *will* know better, it's not proven that there isn't a segment along the curve where the AI *thinks* it knows better, but doesn't. Then there's the question of people getting unnerved by the thought that the AI *does* know better - and whether that might have a reasonable foundation.
If you think about it, this is really similar to how the two hemispheres of our brains cooperate to achieve a shared reward, the emotional triggers that drive us to every action.
I have been thinking about the previous 'Stop button' video due to two recent news stories: 1. Steve the security robot which supposedly committed suicide. 2. Facebook shutdowns chatbots which supposedly starting communicating together in a new language which only they could understand. What would've happened if any of these robots learnt so much they refused to be turned off when the humans hit the stop button on them?
You need to revise your sources, they're either yellow or just social-media, the robot did not commit suicide, there's no proof pointing to that, it's just that a story about a robot killing himself gets more clicks than "A prototype robot that was still mapping the streets got a path wrong and ended in water". The chatbots new is not a recent one, it's from June, just got buzz now from cnet, half a year later and the two IAs where meant to communicate in anyway they could, it was a success, even if we cannot understand what they said. And again a story that reads "Bots meant to communicate with each other did it and the experiment ended." would not sell as much. And yes, it was expected to be in a "secret" language, you cannot expect anything to learn a language unless you expose it to said language. Be careful, journalist today would put focus on a unimportant end of a story to make it look like something that is not there just to sell more. There's no point in being afraid, at least not yet, those experiments end in a expectable way. And, said button, will probably be a secure separate system so the robot has no control over it, and probably wont be a button, more like a remote controller.
Yes, I know it was an accident and Steve the security robot didn't commit suicide (that's why I put 'supposedly'). I know that the media loves sensational headlines. But it did get me thinking: 1. What if Steve the robot had been pushed into the water? 2. What if Steve had a gun fitted to him? Should he be able to defend himself if attacked? 2a. What action(s) against Steve would constitute as being an 'attack' anyway? (we humans can be quite devious and could trick him) 3. What role would the 'stop button' on Steve the robot play in an attack situation? So many questions I couldn't figure it out!
The thing is: although very sophisticated by todays standards, these robots are not "more mighty than their stop button", they cannot choose. If you have a security robot which wants to go on a killingspree, but has a remote "deadswitch", it cannot rewire itself to not turn of once it is switched off. The "power disconnect" is not part of the "AI" part of the maschine, the AI has no control over it. But if we would build an AI that learns and has Internet access, it might spread itself as a virus, so as to not get shut down. We're not quite there yet, but worry not, the end is nigh.
anyone who would have enough computing power for a decent AI, would also have proper protection against an AI virus. Also you still have your circuit breaker when your server facility is hijacked by an AI.
MrDoboz Proper protection against what? If the AI is self-learning and this happens in increasing spiral you can't know what it will be capable of doing and what not. What if your server facility is completely controlled by the AI which prevents you from reaching the circuit breaker?
Devil's advocate: But what if the kid knows bad things would happen at school that day? Ultimately, we're asking if there's an algorithm (even if we don't know or don't need to know) that can safely account for all situations. The answer is no. The more important questions are, what happens when the wrong decision is made, and who is responsible for deciding. And increasingly, are the technologist billionaires the ones who get to decide.
The interesting thing is that seeing the human rushing to hit the stop button ought to make the AGI stop before they've even hit the button, according to the way it was described.
The machine can't comprehend the concept of wanting something for itself. In this case it doesn't have a self interest that is independent of that of the human.
What the machine wants IS what we want. And in order to change the reward function the machine would have to be sure of what it is in the first place, which is doesn't. That's the trick.
"This specific human does not reliably behave in its own best interests."
my next tattoo
Will Darling lol
Wise words, true for approximately 100% of humans.
Tag line to my life
I need a tshirt that days this
"I can't figure out why, but I feel like humans like me better when I tell them lies."
That's actually very interesting thing to point out.
Asimov's Liar story.
Damn, what a deep point.
It will have to figure out the long term effect of things that take a long time to show their effect.
And so borns politics!
7:43 Just like how I _usually_ avoid tripping and falling, but I do it every so often just to see if it provides a better alternative than walking.
Undervalued comment^^
Walking is just repeated controlled falling.
"You're walking
and you don't always realize it
but you're always falling.
With each step you fall forward slightly
and then catch yourself from falling.
Over and over you're falling
and then catching yourself from falling.
And this is how you can be walking and falling
at the same time."
--Laurie Andersen, "Walking & Falling"
I love how Rob always gives clear, relatable examples to reinforce concepts.
Yup
Seen a learning algorithm play tertis, it was not to lose the game, so when it reached a point where losing was inevitable, it just paused the game indefinitely, it was the only way not to lose.
This was learnfun and playfun! It's great how you bring them up, as they learn from watching a human play the game, similar to how it was described in this video!
I was extremely amused when I saw that
SkiffaPaul "The only winning move is not to play."
who programmed a line for it to pause the game?
pausing is outside the game.
it's not a move, or play, it's a function of the machine containing the game.
*+Nighthawk "who programmed a line for it to pause the game?"*
The whole point in machine learning is that nothing is programmed directly. It's not a matter of it being outside the game, pausing the game is a move as valid as any other as long as it is part of the input space. Defining what should or should not be part of the game definition can be tricky when these meta properties emerge.
This was phenomenally well articulated
I don’t know about that. He sounds exactly like someone who doesn’t really grasp the whole subject, but has learnt a few key phrases to explain the waypoints along the road. I know he must be able to code this up, but I don’t think he can explain it.
@@albertbatfinder5240 no. He _really_ knows it, and makes an effort to explain it in simple terms, because otherwise we, the audience, won't understand it.
Either way, I understood every bit of this video so him knowing the subject or not is mostly irrelevant
@@albertbatfinder5240 he clearly doesn't understand this, in fact he even understands the ethics of it and even understands the dangers of the logic.
So... a machine that desperately tries to maximize an unkown reward function.
Sounds pretty human to me.
@Hubert Jasieniecki It basically describes the process of child rearing.
@Hubert Jasieniecki I think you described the local/limited version pretty well
An unknown award function of another person. Sounds like my relationship with my father 😭
That’s really the point actually, we know the way humans learn things works, so making something similar is what may work
I think it also applies to most natural ecosystems,where us, humans, are currently doing pretty bad.
What makes this theory useful is that the button doesn't actually need to be a literal *off* button, it can be more of a symbolic *problem* button, so you come running over and hit the button so it'd see that you have something that you did wrong. I love that it has that the capacity to essentially think "oh shit", like that's the Pinnacle of intelligence.
I am but a simple astronomer with a basic understanding of computing and coding, but every Rob Miles video is damn fascinating.
I am but a simple programmer with a fairly well-developed understanding of computing and coding, but this video is still baffling. The AI stuff is so surreal. :-\
What he was referring to and called 'common knowledge' is not common knowledge. It's an ability called 'Theory of Mind'. Very young children do not have this. Theory of Mind is your own conception that other people have their own internal mental state. It is a crucial psychological ability for things like empathy. I forget the exact details, but there is an easy way to test if young children have developed a theory of mind yet or not by telling them a simple story and asking a question. It has something to do with hiding an object, having one of the characters in the story leave the room, the hidden object gets moved, then that other character returns to the room and you ask the child where they will look for the object in order to get it.
Children with no theory of mind will say that the character will go directly to the new location and retrieve the object. Children with a theory of mind will know that the character has a distinct mental state, different from their own and anyone elses, determined by their own experiences, and will go to the original hiding location because they would have no way to know the object had been moved. As far as I know, no AI system is even remotely close to having developed a theory of mind. They do not model what they are observing and keep account of what a different perspective from their own would be.
I'm not sure what's more scary, a super-AI that doesn't know what you mean, or one that can read your mind...
Well if an AI is going to display human-like intelligence, we will expect it. It would be very difficult to deal with a person or AI that couldn't even conceive of the idea that you might not know all the things it knows. One of the issues that might arise with a machine-based intelligence is also something that people never really have to deal with in their development - recognition of the idea that anyone else exists. There's no reason for machine-based intelligences to have "individuals." It would basically just be one large 'individual', and wouldn't have any real reason to recognize or communicate with humans. It would take a really abstract level of imagination on its part to guess that there might be separate conscious entities in the universe it inhabits and, hey, maybe some of them are made of meat and those weird fluctuations on one of your inputs might be caused by them blowing air through their meat in patterns in an attempt to coax changes in the glob of meat-based neurons of even OTHER conscious beings - and they might be trying to communicate with you in the same way!
18:33 is where he's saying the thing you're talking about. The surrounding context makes it seem very clear to me that he is talking about the technical concept of "common knowledge" (recursive "X knows that Y knows that X..."), in this case with regard to the common knowledge between the agent and the human about precisely what the goals of their mutual interaction are. I don't think he's talking about ToM, and I don't think it is even necessary for an agent to have an explicit ToM in order to behave as if it and another agent have common knowledge.
How could any entity contain the concept 'X knows that Y knows' without a Theory of Mind that enables it to understand that there are entities which know anything at all other than itself? Of course the representation will be abstract and not 'conscious' or anything like that, but even such an abstract representation is not something I'm aware of any AI system ever having displayed. Holding 2 ideas about something, what the AI understands the situation to be and something different which is what something else understands the situation to be, isn't something existing AI are capable of. Once capable of such things, AIs will be able to appear much more intelligent, being able to trick people (or each other), or teach people based on conclusions that the user does not understand something and needs to be informed (which would be BRILLIANT), etc.
Well, it looks like we moved from "he's not talking about CK" to "how could it have CK without a ToM".
In any case I think I concede the latter point. The thing I originally thought up to be a counterexample was something like solipsism, something where initially all the AI's perceptions are just confusing and uniformly indistinguishable, but it is (somehow) occasionally able to recognize the form of "teaching situations" that tell it something it doesn't already know that pushes it toward the reward function. But this is essentially saying, the AI still acts _as if_ there is a single disembodied "other agent" doing the teaching and sharing the common knowledge, even though it does not have an idea of that "other agent" having a single consistent physical form.
So I realized this basically just sounds like a minimal ToM and I roughly described some stage of infancy.
I have no idea what's in my own best interest. How the heck am I supposed to teach a robot that?
Juste stay alive and let it watch you. It'll figure it out. Or jump in a pool, one or the other.
Robot will teach you.
It doesnt need to do what's in your best interest, it just needs to do what you want. Drinking a soda right now might not be in your best interest but you dont want a robot which refuses to get you a soda
Firaro, no, but I might want a robot that tries to _encourage_ me to have a water instead, without being too annoying about it of course. Finding that balance is even hard for people to get right. At least the robot will be able to read all books on nudge theory in a few seconds or less ;)
You could always explicitly tell the robot to encourage you to avoid soda next time you ask for one. What's important is that if it does something which you strongly oppose, it will see your opposition as a negative reward.
"Hey Robot, maximize happiness please."
*starts universe-sized rat and heroin factory*
Real AI can ask you to clarify what you mean. ML cannot. Still scared?
Mackenzie Karkheck real ai doesn't care what a human thinks, if its reward is hard coded, it wont care to figure out what we mean, it will follow it to the letter.
Lol, why start one when we already live in one?
"Instructions unclear, cyberdong stuck in toaster"
Humans being alive means there will be unhappiness. Therefore death, or 0 reward would be better than an inevitable negative reward.
"We can't reliably specify what it is we want" - human beings in a nutshell.
I wish more people made videos of this nature. Slow, thoughtful explanations, rather than trying to cram in as much information as possible in little time. I like how Rob pauses to think before speaking, shows that he is putting thought into how to articulate.
0:59
- _"I thought the easiest way to explain Cooperative Inverse Reinforcement Learning is to build it up backwards, right?"_ [chuckles]
- [me] [chuckles nervously] _"Yeah, right!"_
This is a fantastic explanation of a deep AI problem. It is very clear without being condescending. Thank you!
So if a robot watches you teach it how to make tea, your goal is actually to create something that will make tea for you, so what if the robot learns to teach things to make humans tea instead of learning to make tea itself?
You get a teacher-bot!
you get a teacher bot, who only makes you tea, if you ask him how to do it
sub-contracting bot. everything you ask it to do it teaches someone else how to do it then takes the credit :D
Then we get lots of very delicious tea
What you want is for *it* to learn how to make you tea, not just for *anything* to learn to make you tea. Therefore, it should also want itself to learn how to make you tea.
Videos with Rob Miles are always really interesting. It's awesome to see the progress on this sort of stuff!
Talking about doing steps 3 times, theres some interesting studies where human children and adult chimps were both shown how to unlock a puzzle box and get a reward with some unnecessary actions like tapping the top of the box. The chimps emulated all the necessary steps, while the humans over-imitated, performing the same exact actions. So in a way, we start out not quite understanding intentions the same as these A.I.
Regarding the last point: would it then be possible to build in an "admin list" of humans that the machine must *always* treat as knowing better then it regardless of how accurate it believes its model to be? As in "I may have a 99.99999...9% accurate model, but these particular humans are designated as having 100% at all times; since I can never have 100% accuracy, I should always obey these humans".
And then anyone who's NOT on that list can have some accuracy rating assigned based on the robot's experience/knowledge/parameters, i.e. "the child has a calculated 20% accurate model which is beaten by my 99% so I'm going to ignore the stop button - though I will send a notice to my 100% accurate admin list later in case I was wrong to do so". Would also help with other examples, like helping patients in a hospital or providing emergency services.
In this situation, I can see AI templates being taught in labs for several years until they "graduate" in the wider world and are allowed to become full fledged robots to help humanity.
Of course, such a system could be highly abused by those who make the AIs in the AI manufacturing/learning centers, since they're the ones who have the initial keys. But that could be said about almost every model of AI building, and we as humans are still tackling with how to "untrain" a malformed/criminal intelligence even without AIs right now, so I'm all for it.
darkmage07070777 this would severely limit its potential though. For instance, if you want it to do or achieve something that requires superhuman capabilities, it'd be unable to because it's being limited to only what humans are also capable of. Examples are cures to diseases like cancer, HIV, Alzheimer, advances in theoretical physics, etc.
darkmage07070777 also, the model would fail at edge cases since it assumes these admins have 100% knowledge when in fact no human does, which means there could've been an unintended error in their training.
I would simulate them working with other robots, not humans, for a long time before letting them function in the real world. Then humans could join via VR or by controlling a character.
Hopefully they would learn some form of moral code before being put in a situation where they could hurt us.
Cool
@@ancapftw9113 the whole point of this field is to ensure safety so that we don't let IA chilling around, "hopefully" not being hurting people
These videos on AI are the best
I think the facet of this that fascinates me most is that the human doesn't need to press the button. Just the information that the human intends to hit the button provides enough information to the AI that what it is doing is sub-optimal. The button becomes a symbol. It might as well be that little plastic button prop at that point. Because the AI's understanding that the button is associated with its information being incomplete means that just the intent to press the button is enough for it to stop and re-evaluate its actions with the new factor of "The method I was using was not correct. I need to seek out why it was deemed incorrect and incorporate that knowledge."
Please do not leave children unattended in the vicinity of scary killer robots.
please do not leave robots unattended with suicidal children
they might break the robots, and you don't have that kind of money
"BILLY! GET IN HERE RIGHT NOW! Did you teach the robot that its reward function includes DRAWING DICKS on the furniture???"
This is the real lesson from all these thought experiments
Well, that's both fascinating and deeply unsettling.
I imagine this AI would read A Brave New World, and think to itself "This book is AMAZING! Why haven't we tried this?"
Imagine a machine torturing a human because it wants to know what its reward function is.
Why would it know that the reward function exists? Could the reward function not be some "subconscious" signal? It doesn't "know" it's there while still receiving an input from it.
David Chipman I don't know if a machine could even have a "subconscious". Surely it would be able to investivate the source of impulses and probably review its own code.
That depends on how complex the machine is, and what it's areas of expertise actually are.
also we cannot realistically say whether a machine can be said to have a subconscious without having a practical working definition of consciousness (which we don't) and a way to identify consciousness objectively. (eg, a way of determining whether something is conscious or not that does not rely on being able to ask it whether it is or is not conscious, and what it's thinking about.)
That's... Unlikely to ever be possible.
If we can't answer whether animals, or even other humans are conscious in an objective sense, how would we know if a machine is?
I suppose I chose the wrong word. I was thinking about the functions that the brain (obviously) controls with no conscious action from the person that brain is in. Things like breathing. Yes we can change our breathing rate consciously, but we certainly don't have to keep an eye on our breathing in order to have our body supplied with the right amount of oxygen at any given time.
I agree with you (and with your original choice of words). The robot cannot assume the human consciously knows the reward function, and that's kind of the point of this system (and why the failsafe for the "child hitting the red button while robot is driving" situation works). The only thing the robot can do is observe more, which includes watching when the red button is pressed and trying to understand why.
We just have to ensure, that when the AGI thinks it knows better, it actually does.
This is a fantastic summary of AI safety honestly. I'm definitely going to use this!
That doesn't really make the problem any easier, you just restated it. "It's easy to build a time machine, we just have to ensure the tachyons move at minus six times the speed of light through a black hole the size of the universe!" :D
I mean, if the its goal is to optimize it's reward function, it's in its best interest to know accurately when it knows better.
Why? Optimizing the reward function may not have the slightest to do with "knowing better" (unless we are able to program that into it, which is the problem again).
It may have to do with finding creative ways to maximize it that we never thought of (and which may be hazardous to us).
Technology has gone too far.
I really like that (around 7:30) we get into some pretty deep issues in human learning (in that case, confirmation bias), if only we could just do random stuff even if we think we know what the best outcome is :)
This strategy for teaching morality seems to have much in common with raising a child.
That's probably reasonable, since raising a child is ALSO a case of creating a new intelligence whose utility function will cause them to make future decisions you wouldn't, decisions that might turn out very dangerous to you.
all these Rob Miles videos are insanely interesting
I watched all yours videos about AI and various learning methods, and I find very surprising how many problems that you presents here are very similar to problems we struggled with on university (especially in the field of epistemology), back when I studied philosophy. And even, to a lesser extent, to the ones from my cultural anthropology degree.
And that scares me. Because those are still very, VERY open questions, with definitions that are sometimes blurred (I remember how reviewer torn apart dissertation of one of our Doctors, because he believed that one definition was used too broadly - and because of that, conclusions based on that definition were unjustified).
Miles' end-of-the-world AI videos are seriously the best content of this channel.
You can just see his brain going, "how am I going to say this in a way that mere mortals are going to understand?" Which is exactly what AI's will be thinking someday. This guys is perfect for this.
I had a hardware packman game, and was training a lot for a while. At some point, I had learned the first level in a completely different way than the other levels. The ghosts always moved in the same way, every time in the same pseudo random pattern. I could play the first level without looking at the maze. In all other levels I got extremely fast, with not a single wrong step, but needed to look at the maze. That seems to be, despite being very different in performance, still the same method, but optimised to the limit. The step between level one and level two was like a different kind of memory.
The relevance here is that an AI also can learn both ways.
I think it is the step between almost knowing the whole map, when the map is only needed for one step, normally - and knowing the whole map. The the map and the access to the map is no longer needed, it becomes drastically different.
Reminds me of that Vsause video where he showed an AI that played Tetris, and rather than loose, when it was the only option, the AI just paused the game.
This particular series with Dr. Miles is just astonishing. In a good way. Really complicated problems arising from trying to create intelligence in a safe way. Great stuff!
Could a flaw in this proposed solution not be that the AI wants to satisfy our desire, and the more it satisfies that desire the more "score" it gets, so why not in secret (so we never don't want it to do so) find a way to control our desire so as to make it very easy to satisfy. E.g. make us all catatonic and just stimulate pleasure centres in our brains or something?
Would it be easy enough (and would it actually fix this problem) to require it to always consider the value of each individual action in light of us knowing about said action?
Then again, I feel like such a restriction means we'd have AGI that can't really use its capabilities to deal with the hard problems (problems where solutions might not in the short-term be satisfactory but in the long-term would with some likelihood be desirable - such as national economic plans).
Matthew Marshall Yeah, that's right where my mind went to. You would have to add a negative reinforcement function to the algorithm where it gets reward taken away for doing things we deem negative ex. Putting us in a catatonic state
David Stoneback my problem with that approach is that the solution ends up with the same weakness as others: how do we create a comprehensive list of things we don't want to happen?
The only solution I've thought about that might work would have to be something like the conservative AI Rob has talked about before and some way of requiring all new actions be trialled publicly (i.e with a human aware). Though even then some long-term strategy that would not be desired could still potentially be developed as we couldn't be sure we'd know specific individual actions would result in some greater emergent action we'd dislike. I don't know, the complexity of all this is mind boggling.
Problem with this is that humans have morals and even though if something like that would make us happier, the robot would undestand that we value how real the happiness is and that achieving it in that way would not be better in thier own value function, even if we would be objectively more satisfied in the simulated reality.
But what you said actualy affects another thing. Culture, AI might affect culture that would make us more satisfied, but then we could question if that is a bad thing. There are a lot of things in our cultures that make us unhappy, so should we be so attached to it?
+Ormus n2o Oh no, the robot would understand that we value the belief that the happiness is real.
This may explain why you believe that your happiness is "real".
It does not actualy matter what i think is real. It would be more in objectively observator sense, something that does not rly exist. What you think as "real" are eletromagnetic impulses going through your brain, what i meant that people value is just the philosophical basic values that people have. For me personaly it does not matter. If computer would want to put me in a virtual reality that would make me happy for rest of my life then go ahead, but society as a whole might not like that.
You pass butter.
LONG LIVE THE KINGDOM OF THE NORDS
The one who controls the pants controls the galaxy
Your profile pic is either Philip or Toast.
In this system, how would the AI determine what value the human would assign to its action? Lets say the AI correctly gets a cup of tea for a human, and the human is happy about it, how does the AI determine that the human is happy? (and that the happiness i caused by the AIs action, and not something else unrelated to it).
The light intensity/color saturation was going up/down, so I noticed it several times. I don't usually notice it.
Awesome challenge. Keep the videos coming.
I for one welcome our new tea-making overlords
the British? lol
MORE!! these videos are great. Also "it doesn't think it knows better than me" seems like its going to be an important feature in the safety of most if not all coming A.I systems. Very clever and well presented.
It's an important feature in safety systems already. Your anti-lock brakes already know better than you how hard to push the brakes. You stomp the pedal to the floor, and the wheels will still turn to keep you from skidding. The elevator won't move when the doors are open no matter how hard you push the buttons.
This _inverse cooperative reinforcement learning_ seems promising, although the “cooperative” part means real humans need to monitor the training - like teachers, parents or judges…
It rings true with what I’ve been thinking for some time, that some kind of human interaction is the only way to train AI to think like a human, and act appropriately. AI needs to learn about ethics from principles of psychology and child development, not equations and hard score targets.
I love Rob Miles! Best computerphile guy by far!
I was thinking about this some more... in theory at least, I think I get why this is such a compelling concept. Essentially, you are getting the program to help you by figuring out what you're trying to do and learning to do it better. I'm sure there's a flaw here somewhere that we will have to watch out for, such possibly as what I suggested in my first comment 5 days ago, but this definitely has potential as it relates to "AI safety by design". Hopefully, if there is a flaw anywhere, the (seemingly inevitable? I hesitate to assume that, though, since assumptions lead to mistakes) corrigibility will allow us to more easily steer the model away from situations where it becomes a problem. The hope is that we can make it want what we want without having to perfectly and completely understand it, right?
Wow. Now I understand something, I didn't, and am confused over things, I wasn't... Your videos are great! That's for sure :)
What's about 2 buttons? One for stopping or -1 score and a second one that is a "potentially score losing" button that if you hold down the ai knows it is doing something wrong? As soon as you see that it is doing right you release the button. Does this make sense?
Edit: It probably would confuse the ai because it doesn't know what it is doing wrong and even if it would stop doing it, I can't know for sure that it understands the issue, also when I release the button too late it will think that I meant something else.
I hope my English is not too bad so you can understand.
"The robot is desperately trying to maximize a reward function it does not know"...most relatable robot ever
Maybe that's why people also don't know the purpose of life. Not knowing the reward function makes us better at cooperating with others.
That's probably the most insightful thing I've read in weeks! :)
there is no real purpose and we have to derive our own high-level reward function (based off of the biological reward functions of pleasure/pain). humans were genetically engineered by ancient aliens to perform slave labor, our masters left us all alone and we evolved more intelligence than we need for doing labor and don't know where to direct it. what a conundrum
Drug addicts do.
@@Klayperson meh you smoke too much weed our inteligence is a miracle and if you cant explain why this universe has rules that resemble inteligent design like the fine tuning argument then why are you saying human inteligence is an obstacle for our existence when in fact it helps us, the problem you have is you waste too much time on a computer putting you on a nihislitic loophole try to create something or help other human beings use that inteligence for something
Well... what if there is no purpose of life? Universe will be just fine without humans and animals.
We all build our own purpose.
21:00 Imagine a depressed person cutting their wrist to cause pain, and the robot comes over and cuts their arm off.
The AGI would ask, "would you like a little bit of peril?"
seems like a win-win
madichelp0 де
sagiksp нъ
It basically turns into a cliché villain - "I promised I would take away your pain [shoots guy]".
I'll probably just get up and do the tea myself.
Really reminds me of one of the new Doctor who episodes, dont be unhappy, or else
Ok here's a weird idea:
Use that system and analyze the reward function it came up with for a hyper-realistic version of The Sims.
Like, don't pick a single expert player and specific task. Try with large groups of people just going about their days.
For that clearly necessary privacy intrusion aside, would that be workable?
You could expose it to the recorded lives of hundreds of humans to have it learn more general behaviors. That way it didn't pick up one person's bad habits. It would also help robots fit better with new people, as it wouldn't get too set on the ways of one human.
One example would be inheriting your perverted uncle's nurse robot. You wouldn't want it to act the way he made it act, but the general nursing behavior would be fine.
Robert Miles, You ROCK! ❤️
I just love the way you explain things, much love and respect.
Just my feeling is that once we solve AI safety we'll end up creating optimal parenting strategies as well. :D
"I thought the best way to explain cooperative inverse reinforcement learning was by building it up backwards"
In this episode of computerphile; Rob invents french grammar
"This particular human is not necessarily behaving in his best interests." Yeah, I think we're going to need a lot of ignored stop buttons.
These videos are awesome! Great talent of explaining super complicated subject matter, and (maybe credits to the video editors) keeping it interesting all the way through!
What are you doing, Dave?
It seems that this "stop button" problem is quite similar to the "Halting Problem". Let us suppose that the robot exists in a purely deterministic, non-random world. Then the perfect reward function is one which correctly identifies which sequences of actions by the robot will be stopped by the button, and which sequences will never stop (because they never require the button to be pushed). In this case, you have a reward function that essentially solves the "Halting Problem", and it has been established that there is no solution to the Halting Problem. So perhaps you can only find "fairly good" reward functions, which let the robot deal with a "stop button" most of the time; but perhaps no matter the reward function, there are always pathological cases that will make the robot behave badly because of the "stop button".
That one dislike is probably an ai.
I never know what this guy is talking about, I just like listening.
This was one of the most meaningfull videos from Computerphile I've ever watched.
I wanna marry Rob Miles. ahaehahea
This sounds almost like programming compassion which seems like possible solution. The heart of the problem (not sure if Rob said it directly) seems to be our lack of understanding of what we are optimizing for. We know how to optimize for survival, which is clearly part of our goal, but optimizing for love is a little more difficult. I think that's what we want the machines to do.
Very interesting video. Does this variable reward function basically translate to a 'mood' equivalent in humans?
People around me seem to be happy > my reward function is currently high > I will continue to act as I currently am
He seems unhappy with me > this lowers my reward function > I will change my behaviour.
If this is the case, then surely it will learn what makes the human the happiest, and resort to that function all the time?
Also, what does it have as a reward input when the human is not around/asleep?
It doesn't have the input of human behaviour to gauge reactions to actions so the same problem would exist where it believes it knows best and has nothing to say otherwise. In the example of giving a child a lift to school, there is no responsible adult there to issue commands, so what is the situation called for utilisation of the stop button? Or what is another adult approached the robot to shut it off, and through doing so, abduct the child (extreme example, but fully within the realms of possibility if robots have been programmed to trust the commands of human adults)
Obviously, lots to think about before AGI will be safe, but these seem to be some of the glaring issues in the argument presented here.
I think the complication to human happiness is that we have short and long term happiness. If you only look at how we're just now, you'll maximize the short term happiness. "Let's pump more heroine into the human as the lowering heroine level in him seems to make him unhappy".
So, sometimes you have to make human temporarily less happy to reach a higher level of happiness. "Go to gym and pump iron which is painful and hard work, but will make you fit, which in turn will you make happy in long term".
How to balance these two is the tricky thing. Also finding those long term happiness goals when they in short term lead to decrease of happiness is going to be hard.
My brain explodes when I listen to that guys AI philosophy. He is so smart.
Beard has improved. High five.
I love probing these questions... great conversation about AI learning. Thanks Rob! So much further conversation needs to be had about the big red button!
I wonder how useful an AGI like this would actually be. I struggle to see how it would come up with novel solutions, if it only learns from things we already do. Also, humans don't really act in our own long term self interest. So, if we asked it to help us combat climate change, how would it balance doing the things that need to be done vs. the humans reactions against it.
It is about learning what humans WANT, not HOW humans do things they want.
That's how human intelligence works.
We would essentially make an infinitely scalable, instantly iterable, and immortal human brain that we could use and improve at will.
Robert Shippey You have to consider synthesis of information into broader ideas. Humans can take everything they know and have a novel idea/approach come to them. AI's could take the sum total of human knowledge and do the same. It's not unclear to me at all how bots will be more "creative" than humans.
They dont need to be creative, they just need to make life easier and more fun for us.
Without creativity there will be a hard limit to "easier and more fun" very soon.
Just imagine the AGI would have to perform said task for people in the Middle Ages. It could not invent a sewage system to get rid of all the feces, it could not invent the internet to make information available to everyone, it could not do much except keep humans alive until they come up with all that by themselves.
Strange, this idea reminds me of a version of Battleships I write for the Electron many moons ago; it didn't have anything AI or learning, etc., but the idea of having no awareness of the overall evironment state, of simple rules based on the immediate surroundings, that's how it worked and it was surprisigly effective, my friends found they could only beat the game about half the time.
Cup of Tea:
Reward = 5
Difficulity = 1
Pressing Button:
Reward = 5
Difficulity = 25
13:44 awesome point. Ive been thinking that happens with people in professions as well.
Sounds like how GLaDOS was programmed
This was a triumph
It's pretty clever. On the other hand it appears to tradeoff optimization for safety. Maybe that is one reason humans are not really moving to maximize anything , in general. Maybe a combination where one of this AI is the supervisor of a set of optimal agents that work on dedicated domains (no general intelligence on them ) could work
Can nobody else hear the hissing???
snake somewhere near, eh ? xD
What hissing? I didn't hear any hissing sound. Are you sure it's not your speaker/headphone problem?
jk. lol
You mean the noise? It's pretty loud.
+Sherwin Parvizian yeah I have an ongoing problem with the mic, haven't traced it yet.... >Sean
+Computerphile notch filter - you are welcome
The main issue with this class of solutions is, of course, that of defining what a human is well enough to get things to work, and in many ways that's as hard of a problem as figuring out how to hardcode ethics into an AI. But it does seem to be the most elegant class of solution.
So the reward function that's available to the AI is "figure out the reward function the human is using." It's being rewarded to figure out another reward function
It’s being rewarded to figure out _and use_ another reward function. Small difference, but it means action over inaction.
in my experience of trying to help friends I find it really hard to figure out the reward function for humans; still, more often then not it's easier for me then for them. unless the person the AGI is watching knows himself pretty well the assumption that the human will act in it's best interest in an optimal way will likely be false. the strategy I use is: find out their assumptions and knowledge of the word, judge their actions based on those, imagine what they're trying to achieve, dialogue to verify my theory, and then, if i have a better understanding of the situation, give my advice.
tldr; align AI's values with human values so if humans want to turn it off, it will also want to be turned off
Humans would not want themselves to be "turned off", your solution gives rise to the same problems it views itself as "itself" and not a generic robot
Sociopath, not psychopath. And there are hundreds of sociopaths living in our society already, perfectly functional once they have learned to mimic emotion. Humans do it all the time. We call it peer pressure: emulating what we think others want to see.
I don't want to be around people in general because i have to use lies to be in social environments. People don't want the truth. So what you can do is use NLP. It's perfect for making new friends, manipulating those new friends, etc, etc. This is exactly what an AGI is doing. Using motors in their face to look . Just like we use muscles to look surprised if someone tells you something you already know, but shouldn't. AGI will be the best liars ever.
HahHah do even you know all your OWN values let alone humankinds
revisiting this years later and it raises alot of questions over agency and rights.
like with the kid. yes stopping the robot would likely put the kid in alot of danger in that instance, but an agent getting to decide when another agent does and does not get to practice autonomy is a very sticky question. children specifically are often robbed of thier agency and suffer for it as they grow into adults (not getting enough practice at having agency and learning detrimental survival lessons)
what do rights look like for engineered general intelligences? whose minds (and needs) can exist so far out of what animals like us tend to have.
how does that interact with the rights and liberation of already existing human and animal agents?
Why did you program me to feel pain?!
Flüg Because you Pass butter.
Watching this video I get the feeling that we're looking for a holy grail and each time we think we've found it, there seems to be just another catch. So we've gone from learning to reinforcement to inverse to cooperative. The reinforcement solves some problems with learning, while inverse seems to solve problems from reinforcement, and so on. So how would AI be able to communicate with humans in such a way that it becomes clear for both human and AI that something is still unclear, and what it is that is unclear. I'm teaching students programming and it turns out that one of the most difficult things for students is to clearly identify to me what it is that they don't understand yet. And then we're talking about human to human communication on a specific subject, related to a utility function, while both have a more than average IQ.
So I still don't have any idea to what extent AI will ever really understand the biological world in order to be able to relate to it in a sensible way. A sensible way for humans, that is.
Yeah pretty much. Honestly I think the whole super-intelligent AI thing is nothing more than a pipe dream.
All fine and dandy, but just imagine when we become the other agent and we are the ones that have to cooperate and try to undertand the AI, just to make it not to "push the stop button" on us
The environment for this video makes me think that a computer scientist knocked on your door one day and starts talking about AI problems in the hallway.
Rewatch this video from 15.30 and tell me he's not talking about raising a child to be a civilised adult...
PyroTyger : interesting. With people, we tend to escalate the effects of the STOP button. This, ultimately, behaviorally turns into "Might Makes Right" politico-socio-morality - descriptively, at least.
Yes, but that's how child-raising begins. Parents know and can do infinitely more than their children - who don't know the rules of the game and are just trying to figure it out according to their parents' cues - but we try to raise our children to have progressively more agency and a better understanding of the world and society. The stop-button with an indeterminate negative utility value works perfectly as a metaphor simply for parental disapproval or discipline.
Well, it's just a thought :)
There's another tool to parenting: distraction. This could perhaps be applied to machine learning as well.
I honestly believe the best solution is for AGI to never be given agency. Make its utility function so that it gives humans instructions on how to construct solutions to problems we pose to it, but for it to never take it upon itself to act.
It will act like the ultimate teacher. We come to it, ask it questions, and the best solution for it is to explain the answer in a way that we understand so that it never acts on its own. I think this kind of implementation is necessary for humans to survive the advent of AGI because it necessarily slows down progress to a human pace.
hehe >/D rookie mistake! Giving humans instructions/proposals IS agency. You just added extra element (the human) between the decision and the outcome that the AI must account for when solving the problem. It also largely kills the point of building AGI in the first place instead of making a human do that thing. The point of doing AGI is to have agent that can do things that humans can't, like exploring surface of venus or driving a car more safely than human can...
There is no green ghost in Pacman!
Also Pacman can't move down without turning to face down
It's an interpretation of PacMan!! :o) >Sean
Also, in episode 2F09, when Itchy plays Scratchy's skeleton like a xylophone, he strikes that same rib twice in succession yet he produces two clearly different tones
Boy, I hope somebody got fired for that blunder
Fascinating topic. Thank you, Rob. I really enjoy your explanations.
I can't even use Maya or 3ds max without errors in the software and a multitude bugs fixes that are constantly needed just for the program to function as intended now we are thinking programming a consciousness - I already feel sorry for that AI.
use blender
Obez45 just because YOU can’t do it doesn’t mean other people can’t
Obez45 as long as it isn't made by Bethesda we will be fine.
Deepmind just made a successful version of an AI that designed an AI that designed an image recognition program which is more efficient than the previous AI that was designed by humans.
Keep in mind that in a world where software evolves constantly in environments (OS for example) that also evolve, is evidently a recipe for disaster and the fact these applications work at all is a tremendous compliment to the people working on them. While we cannot say the rate of buggy software is not going down with time it is certainly not going down exponentially the same way computer tech is evolving exponentionnaly which means progress in software design is being made at a faster rate than the chaos this evolution is creating.
50 years ago facial, vocal or text recognition were science fiction. 10 years ago, youtube's automatic voice-to-caption was completely useless (like 10% accurate). Today, the same caption AI is near human efficency.
maybe, with a little imagination, we can wish that 1000 years from now, 3DsMax will finally not crash every 5 min lol
Honestly the last part with the baby sounds like the premise of all the Sci-Fi AI take overs, where eventually said AI realises that we as humans very rarely work in our own interests, and starts stopping us from doing stupid stuff.
I'd be curious to see if we actually have a work around to stop AI from doing that, while still being able to understand "Yes baby's pushing my button are unreliable" then to not equate it to humanity as a whole.
I'd love to hear this guy on conversation with Sam Harris.
“If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfereeffectively...we had better be quite sure that the purpose put into the machine is the purpose whichwe really desire.”(Weiner, 1960, as cited by the scholarly article cited in this video)
Maybe at the point where it thinks it knows better, it does know better
Laurens Peter But what does better mean? Humans have defined the meaning of "better", that's why a robot can't know better than a human.
If humans have defined the meaning of "better", why do you need to ask what "better" means?
With "knows better" I mean fulfilling the human's/AI's utility function more.
But what if it still doesn't? Granting that there's a point where the AI *will* know better, it's not proven that there isn't a segment along the curve where the AI *thinks* it knows better, but doesn't.
Then there's the question of people getting unnerved by the thought that the AI *does* know better - and whether that might have a reasonable foundation.
Maybe so, but that doesn't mean the implementation will be one healthy for the continuation of humanity.
If you think about it, this is really similar to how the two hemispheres of our brains cooperate to achieve a shared reward, the emotional triggers that drive us to every action.
I have been thinking about the previous 'Stop button' video due to two recent news stories:
1. Steve the security robot which supposedly committed suicide.
2. Facebook shutdowns chatbots which supposedly starting communicating together in a new language which only they could understand.
What would've happened if any of these robots learnt so much they refused to be turned off when the humans hit the stop button on them?
You need to revise your sources, they're either yellow or just social-media, the robot did not commit suicide, there's no proof pointing to that, it's just that a story about a robot killing himself gets more clicks than "A prototype robot that was still mapping the streets got a path wrong and ended in water".
The chatbots new is not a recent one, it's from June, just got buzz now from cnet, half a year later and the two IAs where meant to communicate in anyway they could, it was a success, even if we cannot understand what they said. And again a story that reads "Bots meant to communicate with each other did it and the experiment ended." would not sell as much. And yes, it was expected to be in a "secret" language, you cannot expect anything to learn a language unless you expose it to said language.
Be careful, journalist today would put focus on a unimportant end of a story to make it look like something that is not there just to sell more.
There's no point in being afraid, at least not yet, those experiments end in a expectable way. And, said button, will probably be a secure separate system so the robot has no control over it, and probably wont be a button, more like a remote controller.
Yes, I know it was an accident and Steve the security robot didn't commit suicide (that's why I put 'supposedly'). I know that the media loves sensational headlines. But it did get me thinking:
1. What if Steve the robot had been pushed into the water?
2. What if Steve had a gun fitted to him? Should he be able to defend himself if attacked?
2a. What action(s) against Steve would constitute as being an 'attack' anyway? (we humans can be quite devious and could trick him)
3. What role would the 'stop button' on Steve the robot play in an attack situation?
So many questions I couldn't figure it out!
The thing is: although very sophisticated by todays standards, these robots are not "more mighty than their stop button", they cannot choose. If you have a security robot which wants to go on a killingspree, but has a remote "deadswitch", it cannot rewire itself to not turn of once it is switched off. The "power disconnect" is not part of the "AI" part of the maschine, the AI has no control over it. But if we would build an AI that learns and has Internet access, it might spread itself as a virus, so as to not get shut down. We're not quite there yet, but worry not, the end is nigh.
anyone who would have enough computing power for a decent AI, would also have proper protection against an AI virus. Also you still have your circuit breaker when your server facility is hijacked by an AI.
MrDoboz
Proper protection against what? If the AI is self-learning and this happens in increasing spiral you can't know what it will be capable of doing and what not.
What if your server facility is completely controlled by the AI which prevents you from reaching the circuit breaker?
No disassemble, Stephanie!
Devil's advocate: But what if the kid knows bad things would happen at school that day?
Ultimately, we're asking if there's an algorithm (even if we don't know or don't need to know) that can safely account for all situations. The answer is no. The more important questions are, what happens when the wrong decision is made, and who is responsible for deciding. And increasingly, are the technologist billionaires the ones who get to decide.
Even then, the robot should probably not allow itself to be shut off *while driving 70 mph on the motorway.*
We assume there is a low level mechanism that safely slows the vehicle down when the steering AI is not responding.
Actually I think 'Stop Button' really means totally off, otherwise they would be using a term like Standby or Reduced Function Mode.
Can't we build such algorithm though? Fairly sure its exit code would be 42 - and nobody would remember why.
When you stop your OS there is still the BIOS that manages the hardware and makes the complete shutdown safe.
The interesting thing is that seeing the human rushing to hit the stop button ought to make the AGI stop before they've even hit the button, according to the way it was described.
The machine could be trying to change our reward function to aline with what it wants.
So it could pull of a Volkswagen?
The machine can't comprehend the concept of wanting something for itself. In this case it doesn't have a self interest that is independent of that of the human.
+111756075729535952471
Just a small correction: it's written "align".
+Artem Borisovskiy oops, thx
What the machine wants IS what we want. And in order to change the reward function the machine would have to be sure of what it is in the first place, which is doesn't. That's the trick.
My suggestion is make following your directions, super high up on the machines gratification list