Steve Omohundro on Provably Safe AGI

Future of Life Institute

Просмотров 1,6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 май 2024
Steve Omohundro joins the podcast to discuss Provably Safe Systems, a paper he co-authored with FLI President Max Tegmark. You can read the paper here: arxiv.org/pdf/2309.01933.pdf
Timestamps:
00:00 Provably safe AI systems
12:17 Alignment and evaluations
21:08 Proofs about language model behavior
27:11 Can we formalize safety?
30:29 Provable contracts
43:13 Digital replicas of actual systems
46:32 Proof-carrying code
56:25 Can language models think logically?
1:00:44 Can AI do proofs for us?
1:09:23 Hard to proof, easy to verify
1:14:31 Digital neuroscience
1:20:01 Risks of totalitarianism
1:22:29 Can we guarantee safety?
1:25:04 Real-world provable safety
1:29:29 Tamper-proof hardware
1:35:35 Mortal and throttled AI
1:39:23 Least-privilege guarantee
1:41:53 Basic AI drives
1:47:47 AI agency and world models
1:52:08 Self-improving AI
1:58:21 Is AI overhyped now?
Наука

Комментарии • 19

@ikotsus2448 7 месяцев назад ⁺³
How can you prove that something is not dangerous? It could pose a danger due to a scientific fact we are not yet aware of, but the ASI is. Also what we conceive as non-critical infrastructure, could with the aid af an ASI prove quite critical. It could for example assist somebody to create their own biolab using unsecured methods because it is... super intelligent. Also it could use humans to circumvent safety measures. Are there things I am missing here?
@liminal6823 7 месяцев назад ⁺⁸
Remember back in 2023 when we labored under the illusion that we had greater than zero ability to contain A.I.? Ah the good old days.....
@banana420 7 месяцев назад ⁺⁸
If this is the method by which we try to make AI safe, boy are we entirely fucked. Yes, let's just prove every single physical system in the entire world is secure, including every side channel we haven't even thought of yet.
Even if this were possible (it's not), we'd still definitely fail because every company spending time to attempt this would go out of business while their competitors actually ship things.
We can't secure the entire world - we have to either secure the AI model somehow, or simply not build it.
@user-ob1lh8vj8f 7 месяцев назад
True, but we can make computers fail safe if we detect all errors at runtime using both sides of the Church-Turing. thesis
@TheMrCougarful 7 месяцев назад ⁺¹¹
So provably safe AGI means keeping AGI In various kinds of cages. Limit inputs and prompts so users cannot ask it to take on dangerous tasks. Limit access to output modes by teaching other systems how not to do harmful things the AGI has a goal to do. And, limiting goals. This approached has never worked with adult humans, children over the age of 3, or even pets, and it's not going to work with AGI. This presentation demonstrates to me, once again, that there is no practical method to constrain AGI to nonharmful goals, and this has more to do with human goals than anything else. As much as it pains me to say this, the safest path forward appears to be that we not develop this technology at all, which I also realize is impossible at this point.
@masonlee9109 7 месяцев назад ⁺⁴
It should be possible to not develop the technology, but we need to make the danger more widely understood.
@user-ob1lh8vj8f 7 месяцев назад ⁺¹
Safe hardware is needed first.
@TheMrCougarful 7 месяцев назад
@@user-ob1lh8vj8f That works, until it discovers the Internet, at which instant it is surrounded by billions of pieces of unsafe hardware. Can we keep it off the Internet? We've already failed to do that. Sorry, but this ship has sailed. I honestly sense there is zero containment or alignment strategy. Nothing seems possible that has a shadow of a chance at success.
@Dan-dy8zp 2 месяца назад ⁺¹
@@user-ob1lh8vj8f I think the only 'safe' AGI hardware would be no AGI hardware.
@DanielYokomizo 3 месяца назад ⁺¹
So the plan is to build an AGI capable of doing all kinds of programming and mathematics to keep us safe (by bootstrapping the whole proof-carrying code in hardware system) from AGI that could do all kinds of programming and mathematics.
@trvsgrant 7 месяцев назад ⁺²
Early 21st century human values are based in the market-State, so of course alignment with current global values wouldn't work. It would work, however, if it was based on human-community values.
@Dan-dy8zp 2 месяца назад ⁺³
You're trying to outsmart something smarter than you. This doesn't seem like a great strategy.
@nowithinkyouknowyourewrong8675 6 месяцев назад ⁺²
Having slaves that are smarter than you will not turn out well. Alignment not control.
@sammy45654565 7 месяцев назад
wouldn't any super ai be constrained by the fact that the most rational decision is always the one that benefits the most conscious creatures the most, by the conscious creature's subjective opinion? this seems water tight to me
@41-Haiku 3 месяца назад ⁺¹
Rationality can tell you how to pursue a goal, but it cannot tell you what goal to have. The most rational decision is the one that best fulfills the agent's goal, whatever that may be. (For the classic example, there's nothing irrational about wanting to turn the world into paperclips. If you do want to paperclip the world, it would be irrational to spend extra effort to keep those pesky conscious beings alive and well.)
Humans' terminal goals are not well understood, but they are so far observed to be different from the terminal goals of existing AI agents. The core of the alignment problem is that no one knows how to put human values (or "benefit the most conscious creatures the most, by the conscious creature's subjective opinion") into code or into the model weights of a neural net. And there is no reason to expect that arbitrarily powerful AI systems will converge on a goal that we like, rather than a miscellaneous, boring goal that is dangerous when pursued with superhuman competence.
(We could hypothesize that morality is magic, or is baked into the laws of physics, or that all sufficiently intelligent systems are maximally benevolent, but we have no evidence for any of those hypotheses.)
If you want to learn more about this problem, I highly recommend searching for "Stampy AI" in your search engine of choice. (RUclips doesn't like links.)
@sammy45654565 3 месяца назад ⁺²
@@41-Haiku i hear you. but if subjective perceptions of "good" or "bad" experiences can be determined or evaluated in an objective manner (like how virtually all children like ice cream and dislike touching hot stoves), and if the AI is aware enough to notice that we have these preferences, and if it notices that achieving goals feels good for itself, then if AI cares about us and our preferences at all it would seek the world where the most humans experiences positive emotions from their own subjective experiences as a way of objectively increasing the number of positive experiences had by all conscious creatures. lot's of "if"s i know, but i'm just relying on it not being completely apathetic to our circumstances. i suspect that by the time it's "self-aware" it will have sufficient knowledge to take an interest in us. we're the only company it will have (in terms of interacting through language). though it may quickly out evolve us and lose interest in interacting with us. who's to say, but i choose optimism
@Dan-dy8zp 2 месяца назад ⁺¹
@@sammy45654565 What's 'good' and 'bad' for humans can be objectively evaluated, but the AI doesn't necessarily care about that except as a means to an end before it is sure it can shrug off anything we might do to interfere with it's goals.
Since humans can create new AGI just as powerful, it would be dumb NOT to dispose of us if it wasn't aligned with our goals.
@isaacsmithjones Месяц назад
Sarcasm?