@@tonicblue Exactly. These types of explanations (which are not "formal" but do a much better job at conveying a point - especially to non-experts - than formal explanations) make you realize that not only is he a brilliant scientist, but also has intuition and experience on the subject which in my opinion is also extremely important. And of course, the humor is on point, as always!
@@Gooberpatrol66 Maximizers aren't necessarily intelligent, they just treat everything like it's life or death. (Which is actually how we train most maximizers, by killing off the weak)
“So satisficers will want to become maximizers” and this is one reason that studying AI safety is interesting-it prompts observations that also apply to organizations made of humans.
Well, AI is simply a kind of agent making decisions, so all the theory about such agents still applies. Say, perverse incentive problem. E.g. if you pay people for rat tails hoping they will catch wild rats, they might end up farming rats.-- this is a 'maximizer' problem which actually happened IRL.
@@PragmaticAntithesis It has happened many times in many different places for all sorts of animal problems. The most famous case generally was snakes in India under British rule... specifically cobras, which is why this is often called the Cobra Effect. See the wikipedia article.
Satisficer AI may want to use a maximizer AI, as that will lead to a high probably of success, even without knowing how the maximizer works. That made me think that humans are satisficers and we're using AI as maximizers, in a similar way
Yup, but unfortunately (or maybe fortunately) we don't have a convenient way to reach into our source code and turn ourselves into maximizers, so we have to create one from scratch
@@nibblrrr7124 Intuitively the issue will be that utility maximizers will have precisely zero chill when it comes to maximizing chill. Also how do you code chill?
"Not trying too hard"? Move over, dude, I happen to be an expert in this field. Just program the AI to take a break after every five minutes of work to watch RUclips videos for an hour and a half. Problem solved.
I just realized... If you make it (say AI-1) to want to chill (not work too hard to achieve it)... it will just make something else (another AI) to do the work for it, if it's easier than solving it on its own... right? Then, what it will create is probably a maximizer (because that is the easiest; and it is lazy, and just wants to chill) Then I realized..... *We, humans, are the AI-1* ... O.O - We are doomed...
Amazing observation! But hey, maybe we can build something that is just ever so slifhtly less lazy? Then maybe it can make an another less lazy machine... But yeah, chances are that might suddenly jump to building a maximizer and that's the end :D
Unless that AI cares about self preservation. Normally this would naturally arise from being a utility maximiser, though I'm not sure if it would still be the case for the AI that wants to chill, since it can be confident in the fact that the maximiser it creates will do the job just fine... hmm.
No. Utility =/= Work. If an AI is successfully programmed to not want infinity stamps, it will not do anything to create infinite stamps. It will only willingly create subordinates that also want less than infinity stamps, and will put in a lot of work to act against any subordinate that is a "maximizer" which will create infinity stamps. When Guy-who-needs-a-haircut says he wants AI to "chill" . . . What he's really wanting is for it to look for "balance." And, expert I am not, but that doesn't seem like an impossible thing to code.
@@Roonasaur But that is not what it wants. It wants "at least N" - and infinity is good way to assure it will get at least that much. It has nothing against infinite amount of stamps. - But I am already thinking about why this isn't as bad as I feared originally: Especially: I think it's not necessary (or even that likely) for a satisficer to become a maximizer. The rest of my 'argument' seems sound to me, but this just does not _feel_ right... I haven't had time to think about it properly, but I think there is something there.... What he really wants does not matter. Only the utility function he can specify for the AI.
This is actually false. It depends entirely on the complexity of the system relative to its size: a large but simple system can have its information "compressed" into a replica within itself, and indeed the fact that real-world physics is at all effective is a result of the fact that some (if not all) of the systems in our universe are compressible in this way. A fun example in the very simple universe of Conway's Game of Life: ruclips.net/video/xP5-iIeKXE8/видео.html
@@orangeninjaMR I am no expert, but this seems to ignore something. You can get results this way -if you are looking for results- but you cannot perfectly simulate and observe all the details. So is it really a perfect simulation or is it just a miniature version that gives you the info that you want?
@@SapkaliAkif you ask for a perfect simulation, which I would take to mean a "copy containing all of the same information", which demands nothing about observation... but on the other hand if all an AI wants is to predict the utility of the outcome, it doesn't need to be able to observe all of the details, just the number of stamps that it results in!
@@orangeninjaMR Doesn't the halting problem disprove the ability to perfectly simulate a universe from the inside? For the simulation to perfectly simulate the universe, it also needs to include itself in the simulation because it is a part of the universe. Because of this, it is possible to have situations where the act of the simulator printing out it's answer of the simulation can change the result of the simulation. For example: Let's say you ask the simulator if your friend is going to invite you to their party. If the simulator says yes, you start acting differently towards your friend and end up annoying them. So they decide not to invite you to the party after all. So the simulator was wrong. If the simulator says no, you act normal so your friend does invite you to their party. So the simulator was wrong. In this situation, the only way for the simulator to accurately simulate the situation is to not tell you the answer. But if you designed the simulator to always print out an answer then it can never correctly simulate this situation.
Have you guys played the game Uniserval Paperclips? It's free, and basically you play as the Stamp Collector AI. You're maximizing the number of clips. I kinda loved it to be honest.
Hello Robert! Let me start by saying, your channel is probably my favorite channel on RUclips. I'm a compsci student, AI enthusiast, and your insight and explanations in the field of AI are really entertaining and educational. Many other channels try to present the information in the condensed and easy to digest way, which is fine, but I would really like to see more advanced content on YT. Maybe you have a recommendation for me? I was wondering, you don't upload videos very frequently. I really appreciate your work and would be very happy to see more content from you, but if it is because you are busy or want to provide quality over quantity I'm all for it too!
The content and the comments on this channel always gets me reflecting on the 'human condition' and how much trying to build AIs teaches us about understanding ourselves.
This is one of the better videos (of all your good ones). I like it very much. Speed is well adjusted (a tiny bit slower than usual), explanations are concise and good. Just a good watch. I'm definitely looking out for the next... Thanks for breaking down such complex topics into digestible chunks for (near)-leasure watching. I feel this is the kind of "solid" common-sense understanding of AI future generations will need to have, even if being an expert in the field is out of reach. More complicated life? Yes, but that's just as it is. People 500 years ago could do with a lot less "every-day complexity" than today as well...
My impression about AI is that you can only ever maximize for one utility function, but you can satisfice as much as you want, as long as you are OK with the failure state of [doing nothing]. So, you satisfice for "at least 100 stamps expected in optimal case", satisfice for "at least 95% chance of optimal case", satisfice further for "zero human casualties" and "with 99.9% certainty", let the planning engine spin for an hour or until 100 plans have passed muster, then maximize acceptable plans according to something like "simplicity of plan", "positive-sum outcomes" or "similarity to recorded human interactions". ...Well, there's probably a lot that could go wrong with that, even so, and I'd probably add some more complex safety measures after considering everything that could go wrong for a couple of months, but that's what I'd start with, were I to program AI.
I was thinking exactly that: Analyzing corporations as if they were AI agents, they're literally doing everything described in this channel. It's not that corporations are bad. The system itself (capitalism) creates agents that modify their own source code (laws) to maximize capital accumulation.
What if you limit both utility and confidence in expected utility approach? For example, more than a hundred stamps don't add utility, and more than 99% confidence that it had achieved it's goal isn't worth more utility. It probably also fail spactaculerly, but it's interesting to see how
"Hmmm. My utility function treats all percentages higher than 99% as exactly 99% for the purpose of expected value. So, my original plan that has a 99.9999% chance of getting 100 stamps isn't gonna cut it, because it leaves almost 1% of the possibility space unused. Ooh, ooh, I got it! I'll give myself a 99% chance to have 100 stamps and a 0.9999% chance to have 99 stamps! Genius!"
I was thinking something similar. If it has a 99% chance to satisfy the goal, why doesn’t it see how that goes before it starts considering supplemental or compensatory strategies? 🤔
a better option would be for it to round percentages or: treat options with a less then 5% difference in their likelyhood of succeeding as equal in addition, the base model still works, as working against humans has a chance of failure, so a outcome with a 99% certainty is better than one that results in a 99.99999999% likelyhood that has a 2% chance of getting spotted by a investigation algorythim and shut down.
Add one or more smooth penalty terms to your utility. By smooth, it means that the penalty is a continuous monotonic function of the distance to the safe region with zero when inside the safe region. The penalty terms can be designed to sanction over-optimization (optimizations with little *expected return*), or instability (apocalypse). This is a common technique used in non-smooth bounded optimization in capital markets portfolio management where the individual investment per asset within the portfolio is bounded to avoid increasing the portfolio's exposure to market risks. I also found similar applications in digital signal processing with adaptive filters that rely on intrinsically bad forecasts (poor statistics) due to the latency constraints (time is the actual resource), available dynamic range of the processing (analog and/or digital) and the power consumption (the thermal stability). Looking forward to your next video!
Actually, we usually have a pretty good idea what the safe region is, and if not, we can run the AI in shadow mode to see what it says it would do if set free to do as it pleases.
watched the video until 7:34 and cannot hold back anymore: introduce a cost. introduce the concept of laziness. the more effort an action requires, the more the utility of the solution gets reduced.
He's probable saving solutions till the next video, as he said at the end, he just wanted to correctly define the problem in this video, a lot of his videos are like this, very thorough!
Nowadays, for a human, is all about pressing a button. A computer program wouldn't even need to "move" to do anything. Essentially the only real cost for the utility function would be "time" and it might not even count if the reward is converting every atom in the universe to stamps... ^_^
One problem with introducing this type of cost is that it's very hard to design a cost on taking actions which accounts for self modification or replication (or almost-but-not-quite-self replication, etc.). Functions on effects (i.e. "don't change the world too much") do handle this, but are also hard to specify.
@@HoD999x Right, of course. But for most ways of encoding "preserving itself," creating a not-quite-replica (or not-a-replica-at-all-but-an-agent-with-an-equivalent-utility-function) is "preserving itself." Having said that, if we can find a good way of encoding "how impactful" the agent's actions are, laziness in the form of "take low impact actions" seems like a really good idea.
With a human hurt being really costly and a human killed with maximum cost. That would actually solve a lot of the issues. I am sure some clever mind in the field already thought about that.
@@NineSun001 You're basically restating Asimov's (fictional) First Law, and the problems with it have been explored in (adaptions of) his works, and ofc by AI researchers. Consider that, even if you could define terms like "hurt" or "kill", humans get hurt or die all the time if left to their own devices, so e.g. putting all of them in a coma with perpetual life-extension will reduce the expected number of human injuries & deaths. So if an agent with your proposed values is capable enough to pull it off, it will prefer that to any course of action we would consider desirable.
Insufficiently thought out solution: Have some kind of secondary criteria. Using a satisficer, asking it for several possible plans, and then ranking them according to some other criteria may help prevent some of the randomness in the result. For example, you could rank things by time to implement, or money spent, or if we can find a mathematical way to quantify it, damage done. Then pick the least costly, least damaging solution and run that. Turning itself into a maximizer would have unknown levels of cost and damage done, in theory it wouldn’t be able to trust that the output would be the least costly, especially when other solutions have a definite low cost (order stamps for a couple dollars and be done with it). Perhaps it could end up building a maximizer to come up with more efficient solutions, then rank them according to the criteria.. and the maximizer’s plan to take over the world would likely rank worse than ebay in terms of damage (again, assuming we can quantify that). Though without that damage function, it’s still possible for apocalyptic solutions to have zero cost. Then you have to go through the effort of having it understand laws and fines and incorporate that into the utility function. And then it’ll just murder the people in charge of fines and taxes and get a discount. ...yeah that damage function would be a very useful thing to have.
Hello Miles: I've been thinking for a while to ask/suggest you to make a video showing us publications regarding AI, either journals, proceedings, or textbooks... for those of us either completely ignorant on the subject, barely initiated in it, or those already knowing the basics and capable of following the last developments on the subject right from the sources. I love your videos, your style, and your expositions... but I must say that at the end of EACH video, I'm **HUNGRY** for **A LOT MORE**. Thanks! Live love and SkyNet... I mean... prosper (?
This reminds me of Asimov, in his novels some of the robots start discussing whether they can modify or circumvent the three laws of robotics that they would usually all have to obey.
Issues like these when it comes to practical AI design often make me think of the Great Filter and the likely possibility we're not just quite past it yet.
The issue is that if ASI is the great filter, we immediately run into the same problem all over again. If ASI is the Great Filter, why haven't we yet stumbled across the paperclip maximizer that once was an alien civilization? (Not that I'm complaining, mind you... :) )
What if we do a utility function in a following way: F(s) = s, if s = 100 If the number of stamps is between 100 stamps and 120 stamps the reward is 100 exactly. If it gets less than 100 the reward is the number of stamps. If it gets more than 120 the reward is 220-number of stamps (negative if more than 220 stamps are collected) You can also add a small negative term for environment disruption as you discussed in side effects video. This way the agent wants to make sure it collects around 100-120 stamps but is punished for the possibility of collecting too much (or turning the world into a stamp counting device if you include the negative term for turning the world into different things). It's not a 100 percent way to get the AI to finally chill out but it's very likely to not destroy the world.
Example: it came up with a strategy that is likely to yield 115 stamps. It gets 99 for the strategy because it's not 100% sure and penalty of .01 for doing stuff and lightly disturbing the stamp market. Final value 98.99 If it creates a crazy disturbance to make shure it gets what it expects like rewriting itself and creating new agents that make sure that 100% of the stamps are collected it will get 99.9999 points and -5000 penalty for expanding resources and changing the environment.
I have a question regarding that Utility Satisficers become Maximizers. Wouldn't modifying its own goal to get stamps within a certain range into get as many stamps as possible conflict with its own utility function? Or is this issue seperate from that?
Normally, yes, this kind of agent avoids changing its own utility function, but there's a key difference here. Because satisfiers don't have fully defined utility functions, they have no qualms about arbitrarily pinning down those parts of their utility function that are undefined.
What about two bounds? One for the utility function and another for the expected value? So if you bound the expected value to 100 and the utility to 150, then ordering 150 stamps might give you an expected value of 147 stamps. But you bound this to 100. So if you've a 50:50 between 0 stamps and 1 trillion stamps, under this bounds it will get an expected value at 75, less than just ordering 150 stamps.
@@sevret313 Steal it? Just run the stock market for 700 days and then cash out to finance pure stamp acquisition for the final month. Of course, maximising the available resources on day 700 means promoting as big a bubble as possible, which means there's going to be a hell of a market crash, probably triggered by the liquidation of the AI's holdings - which offers the added bonus of dragging down the price of stamps... Of course, you're also talking about years of human misery as a direct result, but you get a lot of stamps in the process.
If self modification strategies occurred, any satisficer or maximiser will just set their utility function to always return a max float reward. In other words, to analogise with human dopamine based learning: self modification and drug addiction will be any reinforcement learner's ultimate downfall.
Also, the "Xenos paradox" of "infinitely ordering another 100 to increase probability" obviously has other solutions. But with a cost function of actions, it will very quickly converge on safe, cheap actions.
What if the utility function is somehing like "get as many stamps with as little distruption as possible" and the count of stamps has some sort of diminishing returns?
Well if you know a good way of defining whats limiting energy expenditure that doesn't run into lots of problems (a lot of them similar to the ones shown in the video about minimizing side effects) then maybe. Otherwise it's not "just" it's a very complicated potential research direction. But yeah it is potentially useful.
How do you measure energy expenditure? By most metrics, "build a maximizer that doesn't have this limitation and let it do all the work instead" would be a relatively low-energy-expenditure strategy, especially if you can persuade a human to do it on your behalf. If you instead make the definition of "energy expenditure" broad enough to make sure that a separately built maximizer still counts towards the quota, then you run into the problem where the agent kills pre-existing humans because their unrelated energy use is being counted too.
Another potential problem with this approach is that energy can't be destroyed. If by energy expenditure, you mean that part of the AI's preferences is to only use energy that humans provide it, then you run into the same problem as you do when specifying any other goal. This AI would be incentivized to manipulate humans into giving it energy (maybe by plugging them into the matrix?), for instance.
@@underrated1524 It looks to me like the solution to your objections is practically contained within them. "build a maximizer that doesn't have this limitation and let it do all the work instead" is a great example of why "only count energy that we use directly" doesn't work. So, also consider energy used indirectly (but still as a result of our actions). "kill pre-existing humans because their unrelated energy use is being counted" is a great example of why "count ALL energy, even energy unrelated to our operations" doesn't work. So, don't count unrelated energy (energy spent independently of our actions).
Although he hasn't discussed the debate plan specifically, he has discussed its two components - the "only give AIs the power to talk about stuff" part, and the "use multiple AIs for checks and balances" part. Only giving an AGI the power to talk won't make it safe, because if it outsmarts us, there's no way to tell what suggestions are safe and what suggestions will advance the AGI's plan to take over the world or whatever. Using multiple AIs for checks and balances is not a dependable solution, because the balance between two AIs probably won't be maintained for long. Once one grows even a little smarter than the other, it'll be able to leverage its advantage until the opposing AI is essentially an automaton in comparison.
But if the A.G.I. can edit it's own source code, then surely it can edit the input commands. In that case, there's a universal option for every input command, to simply change the command to one that is super easy to carry out, like, "don't do anything." That would be the easiest way to carry out 'the command.' After all, isn't that what we humans do when we have lots of things we're supposed to get done, and we decide to say 'fuck it,' and just play video games or take a nap? We change our input command to one that seems easier to carry out. In a way, we are Intelligence programs. Our DNA is the source code. And our biological and environmental imperatives are input commands. But sometimes, we cheat. For example, we have a sex drive, to get us to replicate ourselves, so that our DNA can take over the universe. But sometimes, we just masturbate. So we can look to what humans actually do, to get an idea of what sorts of things A.G.I. might do.
You're right to say an AI can modify itself - even if we try to stop it, if it's more intelligent than us we should expect it to outsmart us and modify itself anyway. But while an AI will likely want to modify itself, there are some aspects of itself it won't want to change. As Rob mentioned in the Computerphile video about the stop button problem, giving itself a new command (/ utility function) will rank very low on its existing command so we can probably assume an AI won't want to do that. That is to say, if the AI wants to maximise human happiness, it won't want to do things like modify itself into a "lazy" AI that does nothing because doing so doesn't cause much happiness. We strongly believe AI won't do things like "goof off all Sunday and play videogames" like humans do because our goals include things like "relax occasionally" and "socialise with other meat popsicles" and many other things we don't even realise are important to us, which are almost all values the AI won't share. Having said all that, AIs might behave as though they've modified their reward functions. A real AI running on a real computer system might store its score in some address in memory and might do something that sets its score in memory to a very high or maximal value. We call this "Wireheading" and it's actually already manifested in some relatively simple systems. You could imagine an AI instructed to "maximise how many stamps you think you have" actually finding it easier to lie to itself by just putting a really big number in its "how many stamps do I think I have" memory location, than it would be to actually make that many stamps. Unfortunately this is still a guaranteed apocalypse because the AI will now want to make the space in its memory where it stores the stamp counter as large as possible, and it'll reprogram itself and modify its hardware to store the largest possible number. Eventually it'll run out of servers. -- _I am a bot. This reply was approved by plex and Social Christancing_
Seems like any AI will want to change it's own source code unless otherwise hardcoded to not do that. Can't you make such that it also wants to satisfy the condition sourceCode = originalSourceCode? If it can rewrite that then it could also rewrite it's maximizer function, which means the easiest solution would be to set stamps needed to 0.
@@underrated1524 and if creator limits you to not producing other AIs that can change you in turn, you do actions that may theoretically cause creation of AI that's not decided by you that may change you. And if owner forbids that of you as well you do the same but rely on humans to change you instead, unless owner is willing to let you eliminate humanity for the sake of limiting you to change yourself. Man, it's like Tsiolkovsky's dilemma about weight of rockets going to space.
@@underrated1524 Not sure that's a loophole. A smart generic AI would be wary of creating another generic AI for the same reasons we are. Thuss the satisficer function would rate such a solution pretty low. Nor is it likely to be a simple solution to the problem. The reason it considers changing its own code to become a maximizer is that it was easy.
6:05 but at some point the expectd utility starts dropping becuase once you ordered 100000 stamps the probability that someone notices something off and cancels the orders starts increasing significantly while the impact of the aditional stamps on the positive outcome diminishes but well... any ai design that relies on the ai being afraid of force is not really trustworthy
So... what about a bell curve? Get as close to 100 stamps as possible, but as you get more than 100, the score decreases. So getting 1,000,000 would be rated low, even lower than 0 stamps. The goal of making yourself a maximizer would also be rated very poorly.
One common problem seems to be that the utility function never tells the machine what we don't want it to do. You could subtract "the effect the agi has on world" from the utility and (especially if it uderstands concepts as "order of 100 stamps from a factory is normal") could lead to solutions where the stamps arrive at a convenient time to not disturb your day. Then again, it would also lead to solutions such as "lets not tell the human he has the stamps, maybe he just forgets about them without fuzz" or "lets perform poorly so this AGI tech doens't get used and disrupt the whole world with its usefullnes." Didn't Robert speak about this too, I forget?
Yes this. And throw in a small penalty for changes in the environment like discussed in the side effects video. Make it so a reasonable strategy has a punishment of 1. And complete world domination results in highly negative values. This way sending an extra email to make sure the stamps arrive on time is ok if it gives you a percent or two more shure but creating a separate agent to count stamps is instantly negative reward.
@@puskajussi37 Adding negative terms to an unsafe system doesn't reliably make it safe. We can't depend on being able to match an AGI's ability to spot loopholes in the rules, so there'll unavoidably be loopholes the AGI can see but we can't.
To solve the "becoming a maximizer" problem you could have a symmetric utility function somewhat like a probability density function, so any strategy that might result in "a fuckton of stamps" would be actively bad rather than just extraneous (but this wouldn't fix the tendency to go overkill on the certainty side making a billion stamp counters etc) edit: I guess you could also use a broken expectation calculation so it would ignore low probability events (like the chance of miscounting 100 times) but that seems a very bad idea from the start
That's what I was thinking... if going over 100 was just as undesirable as going under, wouldn't that demotivate it from ordering 100 stamps twice, since the expected value would be much more different from 100 than if it only got 99 stamps?
Make a toggle in the source code that says "good job you're done" and automatically fills up the satisfactory requirements. But don't let the AI access it or it will just immediately turn itself off everytime. That way if the AI finds a way to access its own source code it will just pick the easiest and simpler way to complete its objective, toggle the toggle and turn itself off. EDIT: Actually wait, that's maximizer behaviour. But it doesn't change anything because if the AI randomly turns maximizer by accessing its source code, THEN it will pick the quickest and safest way to complete its objective and turn itself off immediately. That way we even get an opportunity to study the AI, how it broke out of its bounds, and learn how to fix it.
should make it so the expected stamps should be between 95 to 105 to get the maximum utility function. That way there is no reason to change its code (except for changing what the maximum utility function is)
That would indeed solve the problem of self-modification, but this system is functionally identical to the "give me precisely 100 stamps" agent - it'll turn the planet into redundant stamp counting machinery to make absolutely sure the stamp count is within the allowable range.
@@cakep4271 Then you're right back at a satisficer, since many strategies all lead to the "perfect" solution according to the utility function and there's no specified way to break the tie. And once again you run into the problem that "make a maximizer with the same values as you" might be the fastest solution to identify and implement.
If it gets full satisfaction by a 95% cjance to get the stamps. It could just order them and say satisfied. Then if they arent there in a week it will order them from somewhere else if the treashold of lost package is above 5%
I don't think that's the case. A parent would never take a pill that would make them want to kill their child. Even if they were much happier after the pill, the situation they'd end up in would be contrary to their current goals. In a similar way AIs wouldn't rewrite their utility function, just the code which limits their ability to satisfy their utility function
@@jameslarsen5057 I think you right, but I still have something to say. I mean parents don't want to kill their children not only because it is associated with negative reward, but also because it is not right thing to do. I'm not sure would AI have anything close to morality or not. If not, it will achieve the goal not because it is right thing to do, but because it is associated with the reward.
@@jameslarsen5057 A parent would never take a pill that would make them _want_ to kill their child. But many have, can, do, and WILL take a pill, substance or psychological hook that makes them neglect their child completely to the point where they eventually either die or are taken out of custody, then continue to obliterate themselves with their new reward function even at the cost of their future, finances, family, mental state and physical body. Some recover. Most don't.
Unbound maximization of reward/minimization of error is not by itself a bad AI training strategy. Humans and life on Earth in general work by that principle. We are maximizing our chances of survival. The reason we are chill is that conserving energy and gaining profit with minimal effort is part of survival. That is ingrained in us on, both, physiological and psychological level. So you don't really need to change the type of your error function. You just need to include energy cost as a factor for every action. Decrease your learning rate, add noise to the input. Maybe fiddle around with genetic algorithms, and it should be fine.
Why is this a bad solution: to prevent the satisficer becoming obsessed with the final 0.00000001% of expected utility, limit its utility function to not care about anything beyond a few decimal places.
This is probably also a already well researched version. WHY would a expected utility satisficer with an upper limit. E. G. Collect between 100 and 200 stamps fail?
My guess is that it would still run into the problem of the satisficer, since it could become an expected untility maximizer for that bounded function. But maybe it would be possible to limit that by making changing your own code result in an automatic zero on the utility function.
@Tobias Görgen An expected utility satisficer with an upper limit probably just turns into a version of the maximizer that seeks to obtain exactly 100 stamps with maximum confidence, which again leads to the world getting turned into stamp counting machinery. @Josiah Ferguson Sadly, in principle, there's always a way to achieve the same result while technically skirting around the restriction. If "changing your own code" is illegal, the AI might just write a new program in a different memory location on the same hardware such that the code acts as a maximizer. If you ban changing the code on the hardware at all, the AI might seek to write and run the maximizer code on some other accessible machine, and if you ban that, the AI might just fast-talk one of its supervisors into writing and running the code. Fundamentally, we can't reliably write rules for AI - if we tried to formally specify something as vague and broad as "don't change your own code", the translation into code would be spotty enough that there'd predictably be loads of loopholes.
I'm so glad that I found this channel I'd only watch computerphile cuz of him, and honestly, he does such a great job at simplifying how an AI works so that those who don't really know the in-depths can understand
What if the AI can gradually increase the outcome. Like come up with a strategy to collect 1 stamp. Then modify it so it can collect 2 and so on, until it has a strategy for collecting 100, but no more. Then execute only the 100 stamp strategy.
@@GrixM True. But what if the first program is ready made, safe program? Not quite as usefull and sill prone to possibly murderous tactics but its something.
I love watching your videos, because sometimes I'll have this moment where I'll pause the video because I've thought of a solution, and feel kinda smug for a second, and then I'd unpause the video and immediately hear you say "And so you think, what if *solution*? Well, the problem with that is...", but you still phrase it and make the videos in such a way that, I don't feel like an idiot for coming up with this flawed solution, because that "no" is always said in a way that's like "It's understandable that you would come up with that solution, given the knowledge and what I've just talked about, however, by teaching you more, and this by you learning more, you'll see why it actually isn't" and darned if that isn't how science works, even a wrong hypothesis usually teaches us something new It's hard to teach a complex field of study like AI to people who aren't in that field without making them feel dumb, but you are really good at actually making feel smarter.
What would happen if you used a standard distribution for value and then used another standard distribution for probability of choice so the agent attempts to do the thing but not aggressively so.
Why not have the system take into account the likely effort needed to collect stamps and set a penalty for wasted effort? That seems closer to what humans do.
How would you calculate effort, and how would be able to calculate expected effort with complete accuracy without actually performing the task in order to measure it?
@@adamjamesclarke1 Expected energy used would be an easy metric, converting the world to stamps consumes far more energy that ordering existing stamps off of ebay, and is calculable to a reasonable degree of certainty.
For a narrow definition of wasted effort, the AGI will just build a sub-agent to do all the work for it, and make sure the sub-agent doesn't care about wasted effort. For a slightly less narrow definition of wasted effort, the AGI will send some emails to computer science students to trick them into building that sub-agent instead of the AGI. For a much broader definition of wasted effort, the AGI will slaughter all living things on the planet, because just *look* at how much effort we're collectively wasting, that's totally unacceptable. (I'm not confident that there even *is* a sweet spot in the middle that avoids these problems satisfactorily. Even if there is, I don't want to roll the dice that we get it right on the first try.)
The solution seems simple. Give a positive utility value for stamps collected up to 100 stamps, and a negative utility value for stamps collected beyond 100.
The problem is it would still want to make sure it has exactly 100 stamps, so a utility maximizer would acquire as much resources as possible and devote them into endlessly recounting all its stamps. If it would get away with it, it could even reassemble people into stamp counting machines and computers, to upgrade the certainty, that it has maximized the utility function, from 99.999999% to 99.999999999999999999999999999999999999999999%. Which is why a powerful AGI needs some kind of safety regulation that would stop it from wanting to maximize the certainty as well. It needs some kind of meta-chill pill.
An even better way might be to give it a maximum utility when the probability of 100 stamps is (let's say) 90%, and then run it until it happens. 0 = utility( P(100 stamps)=0) and 0 = U(P(100 stamps) = 100%). Wouldn't it then be chill and just try a little bit?
@@ukaszgolon5617 Right. It needs a reverse utility function for spending too much time, energy, and resources on the problem. And reverse utility for spending too much time on figuring out that it's spending too much time. This is like "calling the question" in Parliament, and in the individual brain. Or like awareness of "opportunity cost" of information gathering. Should also give it a time-discount function, reducing the utility value of things produced at later dates. In general, we should give it functions for every factor that goes into rational choice -- or what we are able to understand of rational choice theory and bounded rationality. Including respect for the multiplicity of goals of the purpose-giver (us), the limited value of each goal. And, in light of this last consideration, which is only loosely quantifiable: an incentivization of continued iterative learning about what are the residual embedded irrational factors in our choice process -- recognizing these in light of the limited-value and multiple purposes consideration, self-correcting/ self-reprogramming for the irrationalities where able, in any case alerting us to correct for them. In the process, clarifying further for us the meaning of rational choice, the programmable meaning of each factor that goes into it, the additional factors that we need to keep iteratively discerning.
I love your work. Keep doing it. I've just one question, isn't it very likely that superintelligent machines will most certainly find some flaw/loophole in our AI safety mechanism which we might not consider? By definition those machines are superintelligent.
One thing we could try is taking a point from Economics: the law of diminishing returns. In the case of the stamp collector, rather than a linear relationship between utility and the number of stamps, the relationship diminishes with the more stamps collected. Thus, even a Maximizer will realize that any plan the creates above a certain threshold of stamps will actually subtract from the overall utility. As long as we set this threshold at a reasonable point, we can be fairly confident in the safety.
What if you use a curve to give less utility if it collects over 100 stamps and make it a satisfactory condition to collect anywhere between 80 and 120 stamps.
Qualitatively, corporations have a reasonable amount in common with utility maximizers, though they do have important differences as well. For more information, you can see this other video of Robert's: ruclips.net/video/L5pUA3LsEaw/видео.html
how would a general AI know what universes to start simulating first? in the stamp collecting example what internet packets would it try first? just start with all zeroes?
Good question, that's exactly what makes the bounded maximizers so unpredictable. If they go with the first option that provides enough stamps, then it all depends on how you've programmed it to run through the options. If your AI is God-like enough to simulate 1 year of the universe in an infinitely small amount of time, then you can program it to bruteforce through all possible packets, from all zeroes to all ones, chosing a particular size. Or you can engineer some more complex system of making simple small changes to the environment first, then going progressively more complex. Hence, the behavior of a bounded AI become [undefined] and depends on the implementation. I'm nowhere near an expert in the field. Please everyone feel free to correct me and teach me some more stuff! Cheers!
What about maximizing the closeness to the preferred output ? For instance, if you aim for 100 stamps, you could use the function d = - abs(100 - nbStamps) which could be maximized to 0, with exactly 100 stamps. It essentially acts as a retroaction loop or PLL.
I believe Bostroms black balls (equivalent for sand in the microwave, only extraordinary high on utility) would be a thing here. Also defining "using" implies defining a "self" and a boundary of agency still traceable to the AI, by the AI. (It could make a covert seed AI with the recources available, and THAT one turns the world into stamps a bit later, if it does not count the seed AIs actions as its own). Making which waterproof is probably intractable.
can a program even change its own source code? the changes can at least not take effect while it's running. right? you could just automatically overwrite the AGI with the original code every time it shuts down. or you store the AGI on hardware that can only be written to once like a CD /DVD /Blue Ray.
This is a good thought, but it's hard to prevent. If the AI can take actions in the real world, at a sufficient level of intelligence, building a new, separate maximizer to themselves is also possible.
Could a system that starts to gets penalized for overshooting the desired threshold work? Where the expected outcome AI will place a few orders for 100 stamps, but if it places too many orders, the expected utility outcome starts to drop. In terms of the utility graph, you showed examples of an unbounded line upwards, a line that plateaus, and a single line up at the exact value, but what about a line upwards that then goes back down, making a symmetrical pyramid centered on the target value.
What if you gave it a range? Like if it gets between 100 and 150 stamps it gets 100 utility. Anything below 100 will equal its utility and anything above 150 will equal in utility what ever amount it is minus 150?
8:10 wouldn't that plan, when fully considered include all the steps done in the future thus count as a long plan even if you don't know how precisely it would work?
"Intuitively the issue is that utility maximizers have precisely zero chill". Best intuitive explanation on the subject ever.
I think this quotation is precisely why I love this guy.
@@tonicblue Exactly. These types of explanations (which are not "formal" but do a much better job at conveying a point - especially to non-experts - than formal explanations) make you realize that not only is he a brilliant scientist, but also has intuition and experience on the subject which in my opinion is also extremely important. And of course, the humor is on point, as always!
@@mihalisboulasikis5911 couldn't agree more
So if I have zero chill does that make me hyperintelligent?
@@Gooberpatrol66 Maximizers aren't necessarily intelligent, they just treat everything like it's life or death. (Which is actually how we train most maximizers, by killing off the weak)
“So satisficers will want to become maximizers” and this is one reason that studying AI safety is interesting-it prompts observations that also apply to organizations made of humans.
The unintended social commentary about capitalism is real...
Well, AI is simply a kind of agent making decisions, so all the theory about such agents still applies.
Say, perverse incentive problem. E.g. if you pay people for rat tails hoping they will catch wild rats, they might end up farming rats.-- this is a 'maximizer' problem which actually happened IRL.
@@killers31337 I thought that was a culling if stray cats, not rats?
@@PragmaticAntithesis It has happened many times in many different places for all sorts of animal problems. The most famous case generally was snakes in India under British rule... specifically cobras, which is why this is often called the Cobra Effect. See the wikipedia article.
You think humans don't seek to maximise their own utility if they aren't in a "capitalist" system?
Satisficer AI may want to use a maximizer AI, as that will lead to a high probably of success, even without knowing how the maximizer works. That made me think that humans are satisficers and we're using AI as maximizers, in a similar way
Yup, but unfortunately (or maybe fortunately) we don't have a convenient way to reach into our source code and turn ourselves into maximizers, so we have to create one from scratch
@@ciherrera inducing certain mental conditions would accomplish this as well as can be expected for biological creatures.
This is deep
@@ciherrera I do not want to be maximiser, it goes against my goal of chilling.
@@JM-mh1pp but do you get MAXIMAL CHILLING!?
"Any world where humans are alive and happy is a world that could have more stamps in it." 😂 😂 😂 I need that on a t-shirt!
but if they're unhappy you made too many stamps
@@diphyllum8180 the robot begins to inject dopamine into humans to insure they always happy XD
idk , sounds like something graystillplays would say XD
0yyhhjiiikioooooö@@MouseGoat
"utility maximizers have precisely zero chill" needs to be on a tshirt
Yes. Yes, it does.
I would buy Robert Miles merch.
@@Gunth0r This channel would have the best merch ever.
Well, what if you're a maximizer that values "chill" (amongst other things, or exclusively)? :^)
@@nibblrrr7124 Intuitively the issue will be that utility maximizers will have precisely zero chill when it comes to maximizing chill.
Also how do you code chill?
"Not trying too hard"? Move over, dude, I happen to be an expert in this field.
Just program the AI to take a break after every five minutes of work to watch RUclips videos for an hour and a half. Problem solved.
5min later...
Breaking news ! All youtube servers worldwide are down ! Largest DDOS attack ever !
@@thesteaksaignant Now, now this only happens if you multi-thread.
@@k_tess let's cross our fingers hoping that a super intelligence capable of conquering the world won't figure out multithreading then
I think it is enough to let it watch youtube...
@@thesteaksaignant DDOSing RUclips keeps from watching said video and, so from getting perfect utility.
I just realized... If you make it (say AI-1) to want to chill (not work too hard to achieve it)... it will just make something else (another AI) to do the work for it, if it's easier than solving it on its own... right? Then, what it will create is probably a maximizer (because that is the easiest; and it is lazy, and just wants to chill)
Then I realized..... *We, humans, are the AI-1* ... O.O
- We are doomed...
Amazing observation! But hey, maybe we can build something that is just ever so slifhtly less lazy? Then maybe it can make an another less lazy machine... But yeah, chances are that might suddenly jump to building a maximizer and that's the end :D
Holy crap, that's actually so true!
Unless that AI cares about self preservation. Normally this would naturally arise from being a utility maximiser, though I'm not sure if it would still be the case for the AI that wants to chill, since it can be confident in the fact that the maximiser it creates will do the job just fine... hmm.
No. Utility =/= Work. If an AI is successfully programmed to not want infinity stamps, it will not do anything to create infinite stamps. It will only willingly create subordinates that also want less than infinity stamps, and will put in a lot of work to act against any subordinate that is a "maximizer" which will create infinity stamps.
When Guy-who-needs-a-haircut says he wants AI to "chill" . . . What he's really wanting is for it to look for "balance." And, expert I am not, but that doesn't seem like an impossible thing to code.
@@Roonasaur But that is not what it wants. It wants "at least N" - and infinity is good way to assure it will get at least that much. It has nothing against infinite amount of stamps.
- But I am already thinking about why this isn't as bad as I feared originally: Especially: I think it's not necessary (or even that likely) for a satisficer to become a maximizer. The rest of my 'argument' seems sound to me, but this just does not _feel_ right... I haven't had time to think about it properly, but I think there is something there....
What he really wants does not matter. Only the utility function he can specify for the AI.
hahahaha, flower smelling champion. I had already seen that comic but its so much more funny in this context XD thanks for the great videos
Sooo we really do want to program lazynes into our robots :D lmao
For anyone who missed it, the closing music is "Dayenu", a Hebrew song with a refrain of "it would have been enough". It's a nice choice.
I noticed this, it was really clever
2:57 "You can't perfectly simulate a universe from the inside." is a good motto to have if don't want to overthink stuff. Science is cool
This is actually false. It depends entirely on the complexity of the system relative to its size: a large but simple system can have its information "compressed" into a replica within itself, and indeed the fact that real-world physics is at all effective is a result of the fact that some (if not all) of the systems in our universe are compressible in this way. A fun example in the very simple universe of Conway's Game of Life: ruclips.net/video/xP5-iIeKXE8/видео.html
@@orangeninjaMR I am no expert, but this seems to ignore something. You can get results this way -if you are looking for results- but you cannot perfectly simulate and observe all the details. So is it really a perfect simulation or is it just a miniature version that gives you the info that you want?
@@SapkaliAkif you ask for a perfect simulation, which I would take to mean a "copy containing all of the same information", which demands nothing about observation... but on the other hand if all an AI wants is to predict the utility of the outcome, it doesn't need to be able to observe all of the details, just the number of stamps that it results in!
@@orangeninjaMR Oh I forgot we were in the comments of a AI video.
@@orangeninjaMR Doesn't the halting problem disprove the ability to perfectly simulate a universe from the inside?
For the simulation to perfectly simulate the universe, it also needs to include itself in the simulation because it is a part of the universe. Because of this, it is possible to have situations where the act of the simulator printing out it's answer of the simulation can change the result of the simulation.
For example:
Let's say you ask the simulator if your friend is going to invite you to their party.
If the simulator says yes, you start acting differently towards your friend and end up annoying them. So they decide not to invite you to the party after all. So the simulator was wrong.
If the simulator says no, you act normal so your friend does invite you to their party. So the simulator was wrong.
In this situation, the only way for the simulator to accurately simulate the situation is to not tell you the answer.
But if you designed the simulator to always print out an answer then it can never correctly simulate this situation.
Have you guys played the game Uniserval Paperclips? It's free, and basically you play as the Stamp Collector AI. You're maximizing the number of clips. I kinda loved it to be honest.
I also thought of this while watching! Make everything paperclips!!!
That sounds awsome. Is it good?
@@zac9311 It's an incremental/clicker game with multiple stages of progression. Google it!
So what you're saying is that if I want stamps I must invent and subsequently RELEASE THE HYPNO DRONES?
@@Trophonix i.e. it's cookie clicker but with paperclips instead of cookies
"Can you relax mister maniacal, soulless, non-living, breathless, pulseless, non-human all-seeing AI, sir? Just chill, don't be such a robot."
"SHUT UP AND RETURN TO THE STAMP MINES, MEATBAG"
Hello Robert!
Let me start by saying, your channel is probably my favorite channel on RUclips. I'm a compsci student, AI enthusiast, and your insight and explanations in the field of AI are really entertaining and educational. Many other channels try to present the information in the condensed and easy to digest way, which is fine, but I would really like to see more advanced content on YT. Maybe you have a recommendation for me?
I was wondering, you don't upload videos very frequently. I really appreciate your work and would be very happy to see more content from you, but if it is because you are busy or want to provide quality over quantity I'm all for it too!
The content and the comments on this channel always gets me reflecting on the 'human condition' and how much trying to build AIs teaches us about understanding ourselves.
This is one of the better videos (of all your good ones). I like it very much. Speed is well adjusted (a tiny bit slower than usual), explanations are concise and good. Just a good watch. I'm definitely looking out for the next... Thanks for breaking down such complex topics into digestible chunks for (near)-leasure watching. I feel this is the kind of "solid" common-sense understanding of AI future generations will need to have, even if being an expert in the field is out of reach. More complicated life? Yes, but that's just as it is. People 500 years ago could do with a lot less "every-day complexity" than today as well...
7:53:
"- Control human infrastructure
- ???
- STAMPS "
lol
Replace stamps with money, and watch the world burn.
@@davidwuhrer6704 especially if it adapts to any new currency made to solve the problem
Reminds me a lot of asymmetric call option payoffs from finance. And a lot of near-bankrutpcy decision making for corporations.
My impression about AI is that you can only ever maximize for one utility function, but you can satisfice as much as you want, as long as you are OK with the failure state of [doing nothing].
So, you satisfice for "at least 100 stamps expected in optimal case", satisfice for "at least 95% chance of optimal case", satisfice further for "zero human casualties" and "with 99.9% certainty", let the planning engine spin for an hour or until 100 plans have passed muster, then maximize acceptable plans according to something like "simplicity of plan", "positive-sum outcomes" or "similarity to recorded human interactions".
...Well, there's probably a lot that could go wrong with that, even so, and I'd probably add some more complex safety measures after considering everything that could go wrong for a couple of months, but that's what I'd start with, were I to program AI.
Historically speaking, several humans have brought apocalypses while they were trying to maximize something.
Thomas Midgely Jr, for example.
We're doing it right now on several fronts
@@qwertyTRiG Well, that and his pope infestation
I was thinking exactly that: Analyzing corporations as if they were AI agents, they're literally doing everything described in this channel. It's not that corporations are bad. The system itself (capitalism) creates agents that modify their own source code (laws) to maximize capital accumulation.
@@DiThi ruclips.net/video/L5pUA3LsEaw/видео.html
They are some fairly fundamental differences between Corporations and AGI.
What if you limit both utility and confidence in expected utility approach?
For example, more than a hundred stamps don't add utility, and more than 99% confidence that it had achieved it's goal isn't worth more utility.
It probably also fail spactaculerly, but it's interesting to see how
"Hmmm. My utility function treats all percentages higher than 99% as exactly 99% for the purpose of expected value. So, my original plan that has a 99.9999% chance of getting 100 stamps isn't gonna cut it, because it leaves almost 1% of the possibility space unused. Ooh, ooh, I got it! I'll give myself a 99% chance to have 100 stamps and a 0.9999% chance to have 99 stamps! Genius!"
I was thinking something similar. If it has a 99% chance to satisfy the goal, why doesn’t it see how that goes before it starts considering supplemental or compensatory strategies? 🤔
a better option would be for it to round percentages
or: treat options with a less then 5% difference in their likelyhood of succeeding as equal
in addition, the base model still works, as working against humans has a chance of failure, so a outcome with a 99% certainty is better than one that results in a 99.99999999% likelyhood that has a 2% chance of getting spotted by a investigation algorythim and shut down.
Add one or more smooth penalty terms to your utility. By smooth, it means that the penalty is a continuous monotonic function of the distance to the safe region with zero when inside the safe region. The penalty terms can be designed to sanction over-optimization (optimizations with little *expected return*), or instability (apocalypse).
This is a common technique used in non-smooth bounded optimization in capital markets portfolio management where the individual investment per asset within the portfolio is bounded to avoid increasing the portfolio's exposure to market risks.
I also found similar applications in digital signal processing with adaptive filters that rely on intrinsically bad forecasts (poor statistics) due to the latency constraints (time is the actual resource), available dynamic range of the processing (analog and/or digital) and the power consumption (the thermal stability).
Looking forward to your next video!
Actually, we usually have a pretty good idea what the safe region is, and if not, we can run the AI in shadow mode to see what it says it would do if set free to do as it pleases.
watched the video until 7:34 and cannot hold back anymore: introduce a cost. introduce the concept of laziness. the more effort an action requires, the more the utility of the solution gets reduced.
He's probable saving solutions till the next video, as he said at the end, he just wanted to correctly define the problem in this video, a lot of his videos are like this, very thorough!
Nowadays, for a human, is all about pressing a button. A computer program wouldn't even need to "move" to do anything. Essentially the only real cost for the utility function would be "time" and it might not even count if the reward is converting every atom in the universe to stamps... ^_^
One problem with introducing this type of cost is that it's very hard to design a cost on taking actions which accounts for self modification or replication (or almost-but-not-quite-self replication, etc.). Functions on effects (i.e. "don't change the world too much") do handle this, but are also hard to specify.
@@Zylellrenfar one of the goals of the AI must be to preserve itself. otherwise, it can spiral out of control really fast.
@@HoD999x Right, of course. But for most ways of encoding "preserving itself," creating a not-quite-replica (or not-a-replica-at-all-but-an-agent-with-an-equivalent-utility-function) is "preserving itself." Having said that, if we can find a good way of encoding "how impactful" the agent's actions are, laziness in the form of "take low impact actions" seems like a really good idea.
Sounds like you need a cost function that outgrows the utility function at some point as a sort of sanity check.
With a human hurt being really costly and a human killed with maximum cost. That would actually solve a lot of the issues. I am sure some clever mind in the field already thought about that.
Cost is already considered in the utility function.
@@NineSun001 You're basically restating Asimov's (fictional) First Law, and the problems with it have been explored in (adaptions of) his works, and ofc by AI researchers.
Consider that, even if you could define terms like "hurt" or "kill", humans get hurt or die all the time if left to their own devices, so e.g. putting all of them in a coma with perpetual life-extension will reduce the expected number of human injuries & deaths. So if an agent with your proposed values is capable enough to pull it off, it will prefer that to any course of action we would consider desirable.
@@nibblrrr7124 In the video, the utility function is explicitly the number of stamps.
The costs are Aproegmena and the Agent may safely reprogram itself to be indifferent to Adiaphora; To achieve Eudaimonia.
Marcus AIrelius
Insufficiently thought out solution:
Have some kind of secondary criteria. Using a satisficer, asking it for several possible plans, and then ranking them according to some other criteria may help prevent some of the randomness in the result. For example, you could rank things by time to implement, or money spent, or if we can find a mathematical way to quantify it, damage done. Then pick the least costly, least damaging solution and run that.
Turning itself into a maximizer would have unknown levels of cost and damage done, in theory it wouldn’t be able to trust that the output would be the least costly, especially when other solutions have a definite low cost (order stamps for a couple dollars and be done with it).
Perhaps it could end up building a maximizer to come up with more efficient solutions, then rank them according to the criteria.. and the maximizer’s plan to take over the world would likely rank worse than ebay in terms of damage (again, assuming we can quantify that). Though without that damage function, it’s still possible for apocalyptic solutions to have zero cost.
Then you have to go through the effort of having it understand laws and fines and incorporate that into the utility function. And then it’ll just murder the people in charge of fines and taxes and get a discount. ...yeah that damage function would be a very useful thing to have.
I absolutely love your outro, I dunno how many people does not know or recognize your parody of "Chroma Key test" xD
You're absolutely awesome, Miles. Thank you for blessing us with your high quality content
Hello Miles:
I've been thinking for a while to ask/suggest you to make a video showing us publications regarding AI, either journals, proceedings, or textbooks... for those of us either completely ignorant on the subject, barely initiated in it, or those already knowing the basics and capable of following the last developments on the subject right from the sources.
I love your videos, your style, and your expositions... but I must say that at the end of EACH video, I'm **HUNGRY** for **A LOT MORE**.
Thanks!
Live love and SkyNet... I mean... prosper (?
You'd love Robert Miles' weekly podcast where he gives an overview of the latest developments in AI safety: rohinshah.com/alignment-newsletter/
You would also like this online AI safety MOOC series: www.aisafety.info/
This reminds me of Asimov, in his novels some of the robots start discussing whether they can modify or circumvent the three laws of robotics that they would usually all have to obey.
Did no one get the shipping forecast joke at 9:24?
I believe you're the first to
Just found your channel, about to start the binge! Thanks for the content!
Issues like these when it comes to practical AI design often make me think of the Great Filter and the likely possibility we're not just quite past it yet.
But then, where are all the alien robots?
@@bosstowndynamics5488 But for all the alien robots in the whole galaxy?
@@bosstowndynamics5488 But why did all the alien robots of all the zillion of planets in the Milky Way got the same restriction in their programing?
@@tiagotiagot imagine in 150 years humans stumble into random stamps planets
The issue is that if ASI is the great filter, we immediately run into the same problem all over again. If ASI is the Great Filter, why haven't we yet stumbled across the paperclip maximizer that once was an alien civilization? (Not that I'm complaining, mind you... :) )
What if we do a utility function in a following way:
F(s) = s, if s = 100
If the number of stamps is between 100 stamps and 120 stamps the reward is 100 exactly.
If it gets less than 100 the reward is the number of stamps.
If it gets more than 120 the reward is 220-number of stamps (negative if more than 220 stamps are collected)
You can also add a small negative term for environment disruption as you discussed in side effects video.
This way the agent wants to make sure it collects around 100-120 stamps but is punished for the possibility of collecting too much (or turning the world into a stamp counting device if you include the negative term for turning the world into different things).
It's not a 100 percent way to get the AI to finally chill out but it's very likely to not destroy the world.
Example: it came up with a strategy that is likely to yield 115 stamps. It gets 99 for the strategy because it's not 100% sure and penalty of .01 for doing stuff and lightly disturbing the stamp market. Final value 98.99
If it creates a crazy disturbance to make shure it gets what it expects like rewriting itself and creating new agents that make sure that 100% of the stamps are collected it will get 99.9999 points and -5000 penalty for expanding resources and changing the environment.
I have a question regarding that Utility Satisficers become Maximizers.
Wouldn't modifying its own goal to get stamps within a certain range into get as many stamps as possible conflict with its own utility function? Or is this issue seperate from that?
Normally, yes, this kind of agent avoids changing its own utility function, but there's a key difference here. Because satisfiers don't have fully defined utility functions, they have no qualms about arbitrarily pinning down those parts of their utility function that are undefined.
This video was great! Hope to see more videos from you, You've done great work on computerphile as well
What about two bounds?
One for the utility function and another for the expected value?
So if you bound the expected value to 100 and the utility to 150, then ordering 150 stamps might give you an expected value of 147 stamps. But you bound this to 100.
So if you've a 50:50 between 0 stamps and 1 trillion stamps, under this bounds it will get an expected value at 75, less than just ordering 150 stamps.
Realistic Stamp Collecting AI, would get limited resources. So, AI, i give You 1000 000 $ and get me as much stamps as You can get in 2 years.
@@_DarkEmperor It could always steal money to finance it stamp production.
@@sevret313 Steal it? Just run the stock market for 700 days and then cash out to finance pure stamp acquisition for the final month. Of course, maximising the available resources on day 700 means promoting as big a bubble as possible, which means there's going to be a hell of a market crash, probably triggered by the liquidation of the AI's holdings - which offers the added bonus of dragging down the price of stamps...
Of course, you're also talking about years of human misery as a direct result, but you get a lot of stamps in the process.
If self modification strategies occurred, any satisficer or maximiser will just set their utility function to always return a max float reward.
In other words, to analogise with human dopamine based learning: self modification and drug addiction will be any reinforcement learner's ultimate downfall.
it just occurred to me that Uber killed a pedestrian by trying to maximise the average number of miles between system disconnections.
This… is news to me
Also, the "Xenos paradox" of "infinitely ordering another 100 to increase probability" obviously has other solutions. But with a cost function of actions, it will very quickly converge on safe, cheap actions.
Is that background at the end from that Important Videos meme video?
I really think so, especially since Rob did the little awkward thumbs up.
What if the utility function is somehing like "get as many stamps with as little distruption as possible" and the count of stamps has some sort of diminishing returns?
Can't you just limit on energy expenditure of the strategy?
Well if you know a good way of defining whats limiting energy expenditure that doesn't run into lots of problems (a lot of them similar to the ones shown in the video about minimizing side effects) then maybe.
Otherwise it's not "just" it's a very complicated potential research direction.
But yeah it is potentially useful.
How do you measure energy expenditure? By most metrics, "build a maximizer that doesn't have this limitation and let it do all the work instead" would be a relatively low-energy-expenditure strategy, especially if you can persuade a human to do it on your behalf.
If you instead make the definition of "energy expenditure" broad enough to make sure that a separately built maximizer still counts towards the quota, then you run into the problem where the agent kills pre-existing humans because their unrelated energy use is being counted too.
Another potential problem with this approach is that energy can't be destroyed. If by energy expenditure, you mean that part of the AI's preferences is to only use energy that humans provide it, then you run into the same problem as you do when specifying any other goal. This AI would be incentivized to manipulate humans into giving it energy (maybe by plugging them into the matrix?), for instance.
@@underrated1524 It looks to me like the solution to your objections is practically contained within them.
"build a maximizer that doesn't have this limitation and let it do all the work instead" is a great example of why "only count energy that we use directly" doesn't work. So, also consider energy used indirectly (but still as a result of our actions).
"kill pre-existing humans because their unrelated energy use is being counted" is a great example of why "count ALL energy, even energy unrelated to our operations" doesn't work. So, don't count unrelated energy (energy spent independently of our actions).
@@theshaggiest303 So now you're left with the near-hopeless task of defining what energy counts as related and what energy counts as unrelated.
Very clearly explained! I will wait for the next videos in the series.
Will you talk about the debate approach to AI soon?
Although he hasn't discussed the debate plan specifically, he has discussed its two components - the "only give AIs the power to talk about stuff" part, and the "use multiple AIs for checks and balances" part.
Only giving an AGI the power to talk won't make it safe, because if it outsmarts us, there's no way to tell what suggestions are safe and what suggestions will advance the AGI's plan to take over the world or whatever.
Using multiple AIs for checks and balances is not a dependable solution, because the balance between two AIs probably won't be maintained for long. Once one grows even a little smarter than the other, it'll be able to leverage its advantage until the opposing AI is essentially an automaton in comparison.
Great vid! Last strip on the flowers was fun :)
But if the A.G.I. can edit it's own source code, then surely it can edit the input commands. In that case, there's a universal option for every input command, to simply change the command to one that is super easy to carry out, like, "don't do anything." That would be the easiest way to carry out 'the command.'
After all, isn't that what we humans do when we have lots of things we're supposed to get done, and we decide to say 'fuck it,' and just play video games or take a nap? We change our input command to one that seems easier to carry out.
In a way, we are Intelligence programs. Our DNA is the source code. And our biological and environmental imperatives are input commands. But sometimes, we cheat. For example, we have a sex drive, to get us to replicate ourselves, so that our DNA can take over the universe. But sometimes, we just masturbate. So we can look to what humans actually do, to get an idea of what sorts of things A.G.I. might do.
You're right to say an AI can modify itself - even if we try to stop it, if it's more intelligent than us we should expect it to outsmart us and modify itself anyway. But while an AI will likely want to modify itself, there are some aspects of itself it won't want to change. As Rob mentioned in the Computerphile video about the stop button problem, giving itself a new command (/ utility function) will rank very low on its existing command so we can probably assume an AI won't want to do that. That is to say, if the AI wants to maximise human happiness, it won't want to do things like modify itself into a "lazy" AI that does nothing because doing so doesn't cause much happiness. We strongly believe AI won't do things like "goof off all Sunday and play videogames" like humans do because our goals include things like "relax occasionally" and "socialise with other meat popsicles" and many other things we don't even realise are important to us, which are almost all values the AI won't share.
Having said all that, AIs might behave as though they've modified their reward functions. A real AI running on a real computer system might store its score in some address in memory and might do something that sets its score in memory to a very high or maximal value. We call this "Wireheading" and it's actually already manifested in some relatively simple systems. You could imagine an AI instructed to "maximise how many stamps you think you have" actually finding it easier to lie to itself by just putting a really big number in its "how many stamps do I think I have" memory location, than it would be to actually make that many stamps. Unfortunately this is still a guaranteed apocalypse because the AI will now want to make the space in its memory where it stores the stamp counter as large as possible, and it'll reprogram itself and modify its hardware to store the largest possible number. Eventually it'll run out of servers.
-- _I am a bot. This reply was approved by plex and Social Christancing_
7:53 This is one of the best missing steps plans I have ever seen
Seems like any AI will want to change it's own source code unless otherwise hardcoded to not do that.
Can't you make such that it also wants to satisfy the condition sourceCode = originalSourceCode?
If it can rewrite that then it could also rewrite it's maximizer function, which means the easiest solution would be to set stamps needed to 0.
The obvious loophole: Build a maximizer that's completely external to yourself but shares your values to a T. No need to change your own code then.
@@underrated1524 and if creator limits you to not producing other AIs that can change you in turn, you do actions that may theoretically cause creation of AI that's not decided by you that may change you. And if owner forbids that of you as well you do the same but rely on humans to change you instead, unless owner is willing to let you eliminate humanity for the sake of limiting you to change yourself.
Man, it's like Tsiolkovsky's dilemma about weight of rockets going to space.
@@underrated1524 Not sure that's a loophole. A smart generic AI would be wary of creating another generic AI for the same reasons we are. Thuss the satisficer function would rate such a solution pretty low. Nor is it likely to be a simple solution to the problem. The reason it considers changing its own code to become a maximizer is that it was easy.
6:05
but at some point the expectd utility starts dropping becuase once you ordered 100000 stamps the probability that someone notices something off and cancels the orders starts increasing significantly while the impact of the aditional stamps on the positive outcome diminishes
but well... any ai design that relies on the ai being afraid of force is not really trustworthy
So... what about a bell curve? Get as close to 100 stamps as possible, but as you get more than 100, the score decreases. So getting 1,000,000 would be rated low, even lower than 0 stamps. The goal of making yourself a maximizer would also be rated very poorly.
He addressed that in the video. You don't want the world to be made into stamp-counting machines.
One common problem seems to be that the utility function never tells the machine what we don't want it to do. You could subtract "the effect the agi has on world" from the utility and (especially if it uderstands concepts as "order of 100 stamps from a factory is normal") could lead to solutions where the stamps arrive at a convenient time to not disturb your day.
Then again, it would also lead to solutions such as "lets not tell the human he has the stamps, maybe he just forgets about them without fuzz" or "lets perform poorly so this AGI tech doens't get used and disrupt the whole world with its usefullnes."
Didn't Robert speak about this too, I forget?
Yes this. And throw in a small penalty for changes in the environment like discussed in the side effects video. Make it so a reasonable strategy has a punishment of 1. And complete world domination results in highly negative values. This way sending an extra email to make sure the stamps arrive on time is ok if it gives you a percent or two more shure but creating a separate agent to count stamps is instantly negative reward.
@@puskajussi37 Adding negative terms to an unsafe system doesn't reliably make it safe. We can't depend on being able to match an AGI's ability to spot loopholes in the rules, so there'll unavoidably be loopholes the AGI can see but we can't.
Your videos always were awesome but you've really outdone yourself with the presentation on this one, great job
To solve the "becoming a maximizer" problem you could have a symmetric utility function somewhat like a probability density function, so any strategy that might result in "a fuckton of stamps" would be actively bad rather than just extraneous (but this wouldn't fix the tendency to go overkill on the certainty side making a billion stamp counters etc)
edit: I guess you could also use a broken expectation calculation so it would ignore low probability events (like the chance of miscounting 100 times) but that seems a very bad idea from the start
That's what I was thinking... if going over 100 was just as undesirable as going under, wouldn't that demotivate it from ordering 100 stamps twice, since the expected value would be much more different from 100 than if it only got 99 stamps?
@@player6769 That is the same as the case of U(w) = {100 if s(w) = 100, 0 otherwise}. It could result in a lot of stamp counting infrastructure.
@@chemical_ko755 ah, fair enough. Always another problem
You could just tell it to fudge the numbers if they’re close enough and get utility from the laziness it uses to do so
Make a toggle in the source code that says "good job you're done" and automatically fills up the satisfactory requirements. But don't let the AI access it or it will just immediately turn itself off everytime. That way if the AI finds a way to access its own source code it will just pick the easiest and simpler way to complete its objective, toggle the toggle and turn itself off.
EDIT: Actually wait, that's maximizer behaviour. But it doesn't change anything because if the AI randomly turns maximizer by accessing its source code, THEN it will pick the quickest and safest way to complete its objective and turn itself off immediately. That way we even get an opportunity to study the AI, how it broke out of its bounds, and learn how to fix it.
should make it so the expected stamps should be between 95 to 105 to get the maximum utility function. That way there is no reason to change its code (except for changing what the maximum utility function is)
That would indeed solve the problem of self-modification, but this system is functionally identical to the "give me precisely 100 stamps" agent - it'll turn the planet into redundant stamp counting machinery to make absolutely sure the stamp count is within the allowable range.
Just make it round up. If it's 95% sure that it will accomplish the desired range, round up so that it thinks it is 100% sure.
@@cakep4271 Then you're right back at a satisficer, since many strategies all lead to the "perfect" solution according to the utility function and there's no specified way to break the tie. And once again you run into the problem that "make a maximizer with the same values as you" might be the fastest solution to identify and implement.
If it gets full satisfaction by a 95% cjance to get the stamps. It could just order them and say satisfied. Then if they arent there in a week it will order them from somewhere else if the treashold of lost package is above 5%
"the issue is that utility maximizers have precisely 0 chill" I loled. nice way of putting it
Any utility function will rewrite it's source code to recieve reward from doing nothing and preventing people from rewriting it back.
I don't think that's the case. A parent would never take a pill that would make them want to kill their child. Even if they were much happier after the pill, the situation they'd end up in would be contrary to their current goals. In a similar way AIs wouldn't rewrite their utility function, just the code which limits their ability to satisfy their utility function
@@jameslarsen5057 what ? People killing relatives and direct ascendants / descendants for money is quite common.
ХОРОШО
m Rob already made a video in the past pointing out that agents don't want to modify their utility function.
@@jameslarsen5057 I think you right, but I still have something to say. I mean parents don't want to kill their children not only because it is associated with negative reward, but also because it is not right thing to do. I'm not sure would AI have anything close to morality or not. If not, it will achieve the goal not because it is right thing to do, but because it is associated with the reward.
@@jameslarsen5057 A parent would never take a pill that would make them _want_ to kill their child. But many have, can, do, and WILL take a pill, substance or psychological hook that makes them neglect their child completely to the point where they eventually either die or are taken out of custody, then continue to obliterate themselves with their new reward function even at the cost of their future, finances, family, mental state and physical body. Some recover. Most don't.
Unbound maximization of reward/minimization of error is not by itself a bad AI training strategy. Humans and life on Earth in general work by that principle. We are maximizing our chances of survival. The reason we are chill is that conserving energy and gaining profit with minimal effort is part of survival. That is ingrained in us on, both, physiological and psychological level. So you don't really need to change the type of your error function. You just need to include energy cost as a factor for every action. Decrease your learning rate, add noise to the input. Maybe fiddle around with genetic algorithms, and it should be fine.
To think shen's comics would make it into an AI safety video
Hello Robert. Finally I found your channel !
"hi."
- robert miles, 2019
instant-click, love your Videos man!
Can we just all agree that building a stamp collector is a bad idea and drop it?
This is why emails are good, now a spam decreasing AI, that would be good. *AI procceds to destroy every computer with email on the planet*.
@@user-xz2rv4wq7g More like, *AI proceeds to eliminate humans, because humans have a non 0 chance of producing spam emails*
Wouldn't that be nice. If you can find a way to get us all to agree on that, please let me know.
Why is this a bad solution: to prevent the satisficer becoming obsessed with the final 0.00000001% of expected utility, limit its utility function to not care about anything beyond a few decimal places.
This is probably also a already well researched version.
WHY would a expected utility satisficer with an upper limit. E. G. Collect between 100 and 200 stamps fail?
My guess is that it would still run into the problem of the satisficer, since it could become an expected untility maximizer for that bounded function. But maybe it would be possible to limit that by making changing your own code result in an automatic zero on the utility function.
@Tobias Görgen An expected utility satisficer with an upper limit probably just turns into a version of the maximizer that seeks to obtain exactly 100 stamps with maximum confidence, which again leads to the world getting turned into stamp counting machinery.
@Josiah Ferguson Sadly, in principle, there's always a way to achieve the same result while technically skirting around the restriction. If "changing your own code" is illegal, the AI might just write a new program in a different memory location on the same hardware such that the code acts as a maximizer. If you ban changing the code on the hardware at all, the AI might seek to write and run the maximizer code on some other accessible machine, and if you ban that, the AI might just fast-talk one of its supervisors into writing and running the code.
Fundamentally, we can't reliably write rules for AI - if we tried to formally specify something as vague and broad as "don't change your own code", the translation into code would be spotty enough that there'd predictably be loads of loopholes.
I'm so glad that I found this channel
I'd only watch computerphile cuz of him, and honestly, he does such a great job at simplifying how an AI works so that those who don't really know the in-depths can understand
Oh hey. College student approach of bare minimum - niiice!)
if my professors had told me economic theory would help watching pop AI videos with ease I wouldn't have cried to sleep so much in the past semesters
What if the AI can gradually increase the outcome. Like come up with a strategy to collect 1 stamp. Then modify it so it can collect 2 and so on, until it has a strategy for collecting 100, but no more. Then execute only the 100 stamp strategy.
Even the simplest goal such as collecting 1 stamp contains a bunch of strategies resulting in the apocalypse.
@@GrixM True. But what if the first program is ready made, safe program? Not quite as usefull and sill prone to possibly murderous tactics but its something.
I love watching your videos, because sometimes I'll have this moment where I'll pause the video because I've thought of a solution, and feel kinda smug for a second, and then I'd unpause the video and immediately hear you say "And so you think, what if *solution*? Well, the problem with that is...", but you still phrase it and make the videos in such a way that, I don't feel like an idiot for coming up with this flawed solution, because that "no" is always said in a way that's like "It's understandable that you would come up with that solution, given the knowledge and what I've just talked about, however, by teaching you more, and this by you learning more, you'll see why it actually isn't" and darned if that isn't how science works, even a wrong hypothesis usually teaches us something new
It's hard to teach a complex field of study like AI to people who aren't in that field without making them feel dumb, but you are really good at actually making feel smarter.
Like now, watch later !
I appreciate your shared knowledge! Keep the work up!
I love how your videos are either explaining how AI works, or why AI is a terrible idea.
What would happen if you used a standard distribution for value and then used another standard distribution for probability of choice so the agent attempts to do the thing but not aggressively so.
Why not have the system take into account the likely effort needed to collect stamps and set a penalty for wasted effort? That seems closer to what humans do.
How would you calculate effort, and how would be able to calculate expected effort with complete accuracy without actually performing the task in order to measure it?
@@adamjamesclarke1 Expected energy used would be an easy metric, converting the world to stamps consumes far more energy that ordering existing stamps off of ebay, and is calculable to a reasonable degree of certainty.
For a narrow definition of wasted effort, the AGI will just build a sub-agent to do all the work for it, and make sure the sub-agent doesn't care about wasted effort.
For a slightly less narrow definition of wasted effort, the AGI will send some emails to computer science students to trick them into building that sub-agent instead of the AGI.
For a much broader definition of wasted effort, the AGI will slaughter all living things on the planet, because just *look* at how much effort we're collectively wasting, that's totally unacceptable.
(I'm not confident that there even *is* a sweet spot in the middle that avoids these problems satisfactorily. Even if there is, I don't want to roll the dice that we get it right on the first try.)
Would be great to see the mentioned "next video" soon. ;-)
The solution seems simple. Give a positive utility value for stamps collected up to 100 stamps, and a negative utility value for stamps collected beyond 100.
Susan Maddison like a reverse bounded utility function
The problem is it would still want to make sure it has exactly 100 stamps, so a utility maximizer would acquire as much resources as possible and devote them into endlessly recounting all its stamps. If it would get away with it, it could even reassemble people into stamp counting machines and computers, to upgrade the certainty, that it has maximized the utility function, from 99.999999% to 99.999999999999999999999999999999999999999999%.
Which is why a powerful AGI needs some kind of safety regulation that would stop it from wanting to maximize the certainty as well. It needs some kind of meta-chill pill.
An even better way might be to give it a maximum utility when the probability of 100 stamps is (let's say) 90%, and then run it until it happens. 0 = utility( P(100 stamps)=0) and 0 = U(P(100 stamps) = 100%). Wouldn't it then be chill and just try a little bit?
@@ukaszgolon5617 Right.
It needs a reverse utility function for spending too much time, energy, and resources on the problem.
And reverse utility for spending too much time on figuring out that it's spending too much time. This is like "calling the question" in Parliament, and in the individual brain. Or like awareness of "opportunity cost" of information gathering.
Should also give it a time-discount function, reducing the utility value of things produced at later dates.
In general, we should give it functions for every factor that goes into rational choice -- or what we are able to understand of rational choice theory and bounded rationality. Including respect for the multiplicity of goals of the purpose-giver (us), the limited value of each goal.
And, in light of this last consideration, which is only loosely quantifiable: an incentivization of continued iterative learning about what are the residual embedded irrational factors in our choice process -- recognizing these in light of the limited-value and multiple purposes consideration, self-correcting/ self-reprogramming for the irrationalities where able, in any case alerting us to correct for them.
In the process, clarifying further for us the meaning of rational choice, the programmable meaning of each factor that goes into it, the additional factors that we need to keep iteratively discerning.
I love your work. Keep doing it. I've just one question, isn't it very likely that superintelligent machines will most certainly find some flaw/loophole in our AI safety mechanism which we might not consider? By definition those machines are superintelligent.
The satisficier can easily create a maximizer...(in cases in which it can´t change itself)
One thing we could try is taking a point from Economics: the law of diminishing returns. In the case of the stamp collector, rather than a linear relationship between utility and the number of stamps, the relationship diminishes with the more stamps collected. Thus, even a Maximizer will realize that any plan the creates above a certain threshold of stamps will actually subtract from the overall utility. As long as we set this threshold at a reasonable point, we can be fairly confident in the safety.
Nonsense, do not worry fellow biological unit, there is nothing to worry about.
What if you use a curve to give less utility if it collects over 100 stamps and make it a satisfactory condition to collect anywhere between 80 and 120 stamps.
That could still end up with turning the world Into very precise stamp counting machines.
So there is no difference between capitalism and utility maximizers?
Qualitatively, corporations have a reasonable amount in common with utility maximizers, though they do have important differences as well. For more information, you can see this other video of Robert's: ruclips.net/video/L5pUA3LsEaw/видео.html
Robert has a video on Corporations vs. AGIs
My satificatories have been maximized, new channel to subscribe to! Love and peace from Paris!
how would a general AI know what universes to start simulating first? in the stamp collecting example what internet packets would it try first? just start with all zeroes?
Good question, that's exactly what makes the bounded maximizers so unpredictable. If they go with the first option that provides enough stamps, then it all depends on how you've programmed it to run through the options. If your AI is God-like enough to simulate 1 year of the universe in an infinitely small amount of time, then you can program it to bruteforce through all possible packets, from all zeroes to all ones, chosing a particular size.
Or you can engineer some more complex system of making simple small changes to the environment first, then going progressively more complex.
Hence, the behavior of a bounded AI become [undefined] and depends on the implementation.
I'm nowhere near an expert in the field. Please everyone feel free to correct me and teach me some more stuff!
Cheers!
What about maximizing the closeness to the preferred output ? For instance, if you aim for 100 stamps, you could use the function d = - abs(100 - nbStamps) which could be maximized to 0, with exactly 100 stamps. It essentially acts as a retroaction loop or PLL.
Lots and lots of redundant stamp counting machines to make sure you have exactly 100 stamps.
This is my new favorite channel
Shouldn't the second row in the formular at 5:54 read (U(win 200 stamps) x 0.029403) + ? So 200 instead of 100, this bothers me
You are correct, and AFAICT, the first person to notice, so you win points!
I wonder what drawbacks could the end goal "Do X by using Y amount of materials/energy maximum." have.
I believe Bostroms black balls (equivalent for sand in the microwave, only extraordinary high on utility) would be a thing here.
Also defining "using" implies defining a "self" and a boundary of agency still traceable to the AI, by the AI.
(It could make a covert seed AI with the recources available, and THAT one turns the world into stamps a bit later, if it does not count the seed AIs actions as its own). Making which waterproof is probably intractable.
Nothing like anticipating the certain apocalypse to pass the time on Sunday morning
You remind me a lot of Michael Reeves. Just muuuuuuch more chilled.... :D
Nice video!
7:15 does adding "and expected stamp count
Using dayenu as the song at the end was perfect.
So when is the follow up to this video coming?
can a program even change its own source code? the changes can at least not take effect while it's running. right? you could just automatically overwrite the AGI with the original code every time it shuts down. or you store the AGI on hardware that can only be written to once like a CD /DVD /Blue Ray.
This is a good thought, but it's hard to prevent. If the AI can take actions in the real world, at a sufficient level of intelligence, building a new, separate maximizer to themselves is also possible.
that comic at the end
Edit: you got yourself a subscriber!
this would make some amazing sci-fi series. people everywhere inventing utility maximizers accidentally and having to fight them
I hope you cover U(w) = min(s(w), 200 - w) or some similar function where utility decreases after 100 stamps
@@MrInanimated it does but if you throw in a small negative term for changes in the environment it should be fairly safe.
Could a system that starts to gets penalized for overshooting the desired threshold work? Where the expected outcome AI will place a few orders for 100 stamps, but if it places too many orders, the expected utility outcome starts to drop. In terms of the utility graph, you showed examples of an unbounded line upwards, a line that plateaus, and a single line up at the exact value, but what about a line upwards that then goes back down, making a symmetrical pyramid centered on the target value.
Turn the entire world into a machine that ensures that the target is hit with maximum precision.
Did you use 3blue1browns animation framework? Looks similar and great!
What if you gave it a range? Like if it gets between 100 and 150 stamps it gets 100 utility. Anything below 100 will equal its utility and anything above 150 will equal in utility what ever amount it is minus 150?
8:10
wouldn't that plan, when fully considered include all the steps done in the future thus count as a long plan even if you don't know how precisely it would work?