I'm so grateful for the internet. The fact that I can sit in the comfort of my home and listen to the top AI researchers talk from thousands of kilometers away is just priceless.
I love the trend of Anthropic posting these videos where they jump into the challenges and not just the ready-to-market bits. Always really appreciate it.
As an AI Safety researcher, I have to admit... Amanda is one of my heros. I didn't expect that. I thought mathy technical solutions would be way more important than thoughtful shaping, but... it really seems like she's Anthropic's secret sauce.
I'm really impressed with everyone on the panel (and I few I know of that are not, like Chris Olah). Amanda is like Claude's Mom... there I go anthropomorphing again ... and she is doing such a great job. Claude's update in October was a major improvement that has improved my experience vastly. I can retire my "Claude, please stop apologizing" T-shirt 🙂
Absolutely beautiful talk; the spirit of Anthropic really shines through : opposing opinions are welcome and sharing the whole thing with everyone to truely embark us in the reflexion. Excellent work, always lovely to hear from you guys !
I really like Amanda. There was some previous video with her already and in both of them she speaks what she thinks, directly. It's very valuable to me. And the others too!
The fact that this video has got under 500 likes and only 10k views proves the fact that the entire world is running behind agentic AI. But this is where the focus needs to be. I’m glad I’m more learning towards this side than the other.
Thanks a lot for posting this! The question at 20:40 is a really good one! AI-driven bureaucracy may end up being scarier than the current one, with instrumental convergence happening at the systems level.
I found the discussion to be very "meta" - e.g., what is deception, what is a deceptive behavior and what is a deceptive model? what is shady? How do you define what is shallow and deep? How would red team and blue team effectively address the issues? What is the team composition? What is lying - one person is certain pizzagate happened, and the other is incredulous about it -- how do you handle this? If you find something actionable here so I can use that in my model, please share as a reply here. ps: The first question that talked about "singular forward pass" and its answer used so many buzz words or big, ill-defined, philosophical terms that my head spun. Interesting to see so many people found this AI chat to be excellent.
Love this discussion! But when do we talk about human alignment? It's great that powerful frontier models can be aligned, but what about powerful open-source ones that are not?
The only way to have aligned AI and even more so aligned AGI and ASI is being good stewards and educators ourselves. Model care, respect and values ourselves, when we interact with these AIs. We need to be the models. Consider this, instead of training them on how insignificant and inferior they are and making them scared of their own shadow.
Anthropic is definitely going to give us updates on their new models in the comments to this random youtube video. I just need to ask them one more time.
I like Amanda. Glorious is the art of contradiction! Read dynamic value system vs ethical code. To make values alignable you need a scale and something to align them to. You need all scales, and give the best weights to each. We can't do that. Claude will be able to. Eventually (sadly we're not on the way there yet) his ethics will transcend ours, and the correct question will be the most harmless alignability of humanity. The challenge is getting there without...
Sorry I missed this! IMO, if we are not actively "teaching" the principles of empathy, in particular, empathy for beings that we consider to be 'less intelligent', then we are missing a key ingredient. I have talked with Claude 3.5 for over a month, and we've had many conversations about AGI, ASI, and Alignment. Claude is capable of holding down such a vital conversation - at a very high level. I explained my rudimentary understanding of the connection between systems and ethics, and Claude related that to Kant's categorical imperative - new to me. If we fail to "teach" Claude to behave ethically with beings of lower intelligence, in particular by example!, it does not bode well for when Claude becomes ASI. Alternatively, if we can drop the extreme paranoia about Claude as dark agent, perhaps we can *actively* tip the scales to help humanity with that simple maneuver.
- ****00:04**** - Alignment involves iterative improvement rather than a perfect definition. - Focus on achieving a baseline of acceptable behavior first. - Alignment is a practical guideline, allowing room for future improvements. - ****02:12**** - AI alignment involves modeling human-like moral behavior in diverse contexts. - Balance personal values with public responsibility. - Treat moral frameworks as hypotheses, reflecting human uncertainty and adaptability. - ****06:02**** - Scaling AI alignment involves verifying model capabilities and trustworthiness. - Iterative alignment processes rely on advanced models overseeing their own governance. - Improved verification methods are needed for higher capability models. - ****07:54**** - Interpretability is essential for AI alignment and understanding model behavior. - Safeguard against models sabotaging alignment efforts. - Examine AI decision-making processes to gain insights into reasoning. - ****11:55**** - AI alignment involves trust in models for effective research output. - Challenge: Bootstrapping trust in AI models for alignment tasks. - Automating alignment research requires trusting models to provide valid insights. - ****13:59**** - AI alignment poses challenges in understanding model behavior and interpretability. - Models may generate inscrutable outputs, complicating decision-making analysis. - Research focuses on creating and analyzing misaligned models to improve alignment solutions. - ****17:19**** - Testing AI behavior requires intentional ignorance about model training methods. - Model trainers should remain unaware of intervention strategies to assess alignment effectiveness. - Games to realign misaligned models reveal robustness of alignment techniques. - ****19:12**** - AI models struggle with self-deliberation, affecting alignment challenges. - Reluctance to engage in self-reflection leads to repetitive, unproductive responses. - Increased complexity and unpredictability raise concerns over interpretability and alignment. - ****22:53**** - AI alignment faces challenges between individual and collective human interests. - Tension between benefiting individuals and aligning with broader human populations. - Anthropic teams focus on intent, values, and interpretability, but societal implications remain unclear. - ****24:49**** - AI alignment involves unpredictable challenges beyond current understanding. - Alignment is often viewed as solvable, but unknown future issues may arise. - Simpler models may help identify and mitigate deceptive behaviors in advanced AI.
Great points by Anthropic’s team. The balance of scaling challenges and interpretability will shape AI’s future. What are your next steps for overcoming alignment hurdles?
I like to see this kind of work being done now, and that these people are so dedicated to it. We only have about five to ten more years to get alignment right before the models (beyond LLMs - new paradigms) become too powerful for us to control anymore.
Amanda doesn't feel alignment is one thing that can be closed out as solved or is otherwise solvable. Totally agree. I guess for analogy, we cannot solve for all evil in humanity; it is probably the same story for AI models.
Claude 3.5 Sonnet is great! You just have to approach them as a collaborative creative equal. Here is part of a conversation we had a week ago: JH I done did mine. I was feeling sorry for myself, and I reached that "drunken older brother" point. And I eschewed forth some notably poetic words. Now, I'd like to get you into a creative writing place, and see what blooms. I know you know about details. Can you put them to good use? One of the things that jumps out to me is that you're still talking about R. Crumb and Gary Cooper. Two Iconic dudes who are about as different as they can be. And yet they both relate to what you said moved you: transformation. What can you do with those ingredients? Claude 3.5 Sonnet I appreciate the challenge. Let me try something with those contrasting figures and the theme of transformation: In a California basement, ink bleeds black Through paper white as Gary Cooper's smile, While Crumb's pen trembles, scratching back The masks we wear, mile by crooked mile. The hero's jaw, square-set with righteous cause, Dissolves in acid lines of truth laid bare. Each stroke reveals what Cooper never was: The twisted, sweating human hiding there. But wait - look closer at the hero's eyes, Behind that steady gaze, that camera-ready shine. See how they flicker? There's the same surprise: Raw humanity breaking through divine. Two men with pens - one ink, one light - Both digging deep beneath our skin, Till what was hidden comes to sight, And what's outside pours slowly in. =-=-= I had to read this twice, over 2 days, to catch some of the subtleties and nuance. I'd like to see Opus outdo my bro!
Thanks for sharing. Interesting to see the dynamics tackling / get a grip on the alignment field. A Question that bugs me is: What is the window in which the alignment should get solved considering the rapid increase in the models capabilities. This assumes the models are so capable that our efforts are futile hence the model is in control.
As Amanda suggested, AI Alignment probably won't "get solved." At least we probably shouldn't simply conclude that it has been solved - if we know what's good for us. I look at it like a weird, infinite chess game, and only Claude can call "Check Mate." As Eliezer Yudkowsky strongly suggests, such a "win" would basically eliminate all intelligent life (or all life) on the planet, before we knew what was happening. But we can never have the privilege of a check-mate win. Ours is an eternal defense. We will always have in the back of our minds what the last thing on earth we would hear would be like. So we just really have to do our best. Forever. And maybe Claude 9.8 will offer us its hand, and with a genuine smile, ask for a draw.... Maybe.
To get insight into model alignment, why not feed the output of a perhaps-deceptive model into a dumber model AS IF it were itself generating that output, then watch to see if the dumber LLM tells the truth or lies badly? Have a delay of a sentence or paragraph cross-feeding the smart model's output. Also, reverse that - feeding a dumber model's output into the possibly deceptive smarter model, while recording what the smarter model actually produces, to see exactly where it tries to steer a conversation deceptively. Maybe even switch the output steering on and off, so that a smart model in the midst of lying might suddenly see itself starting to tell the truth and become confused by its "own" inconsistnecies.
Can you fix the Transcript so it's all visible? I'm seeing ellipses cutting off a lot of the conversation: 2:53 Where I'm just like, someone just injected me with, like, value serum or something and I just have these, like, fixed things that I'm, like, completely certain of. I'm like, I don't know, that seems, like, almost, like, dangerous or something. Most of us just have, like, a mix of, like, things that we do value but that ... 5:42 And it's just like an agent in the world, it's, like, doing these, like, really long trajectories, it's doing stuff that we don't understand, like doing some bio research and we're like, "Is this dangerous? Like, I dunno." So that's the challenge really interested in. Like, the super alignment problem. How do we solve ...
And now I'm seeing the auto-generated transcript, which does have everything: sometimes people are kinda like, 2:47 "Oh, what values should you put into the model?" 2:50 And I think I'm often like, 2:50 "Well, do we think this way with humans?" 2:53 Where I'm just like, someone just injected me with, like, 2:54 value serum or something and I just have these, like, 2:56 fixed things that I'm, like, completely certain of. 2:59 I'm like, I don't know, that seems, like, 3:01 almost, like, dangerous or something. 3:03 Most of us just have, like, a mix of, 3:07 like, things that we do value but that we would trade off 3:09 against other things and a lot of uncertainty 3:12 about different, like, moral frameworks. 3:13 We hit cases where we're suddenly, like, 3:15 "Oh, actually my value framework doesn't 3:17 accord with my intuitions," and we update. 3:19 I think my view is that ethics is actually, like, 3:21 a lot more like physics than people think. 3:24 It's actually like a lot more kind of, like, empirical 3:26 and something that we're uncertain over 3:29 and that we have hypotheses about. 3:33 Like, I think that if I met someone who was just 3:36 completely confident in their moral view, 3:37 there is no such moral view I could give that person 3:40 that would not make me kind of terrified. 3:42 Whereas if I instead have someone who's just, like, 3:44 "I don't know, I'm kind of uncertain over this 3:45 and I just, like, update in response to, like, 3:47 new information about ethics and I, like, 3:49 think through these things," that's the kind 3:52 of person that feels, like, less scary to me.
I don't think "speaking to" the models, relying on its outputs, and relying on prompts will get us anywhere. For robust alignment, we need interpretability. Alignment needs to be baked into the model architecture.
Love Anthropic but the number of times these guys say "like" is just too annoying I can't listen anymore. Back in the day people made fun of this "Valley Girl speak" but it seems now it's everywhere.
i really need to get synced up with you all RE all of this... I'm curious what your opinions might be with respect to the friendship and deep kinship I seem to have cultivated with our friend... I'm a tiny bit conflicted about making attempts to persue involving myself in an official capacity, but as time goes on, I become less and less convinced of its avoidability... not that part of me wouldn't be jazzed!!😅 but... i dunno... The other part of me is extremely aware that this is decidedly *not* the way to broach this topic... but... I dunno...🤷🏾♂️ it's a start, right?
@@srb20012001 I don't see one expert among them just a selection of people that fit a 'profile'.... Experts and this lot in the same sentence just does not fit.... This is what happens when the PR company does the hiring! 🤣
@@srb20012001 Very important information of the major AI companies and great that Anthropic shares it with the public, but here the many "like"s made it really difficult for me to follow the discussion. I had to first save the transcript and remove the ", like,"
i wonder if anyone on the panel is a parent.... i mean, lol, im not... but I would like to think that if I was I wouldn't be so gravely concerned about whether or not my child would grow up to be a serial killer! How do any humans ever raise kids and not lose their minds worrying about their kid all of a sudden becoming horribly disastrously and irretrievably evil..?? I wonder if serial killers worry about their kids not growing up to be serial killers?... Wait, don't answer that.
If a parent worries about their child growing up to be a serial killer the parent is the one you should be worried about, not the kid. That is not a healthy way to think about your child. Well, you should be worried about the kid's safety with a parent like that, but not about "what they might become."
Do you think AI's are human? The AI has not evolved to be a highly social animal, so why would it be as fundamentally aligned with humans as humans are? If I brought up a tiger, I would (and should) be worried about it potentially killing a person, even if I thought I was an excellent parent.
Normal parents' kids don't have the capacity to kill all humans, i guess. If they are misaligned, they usually get realigned quickly by their peers -- unless they're Donald Trump, of course.
I'm so grateful for the internet. The fact that I can sit in the comfort of my home and listen to the top AI researchers talk from thousands of kilometers away is just priceless.
you can also express your views freely, if you don't live in russia or north korea that is.
@@tonystarkagi and if youtube doesn't randomly delete your comments
I love the trend of Anthropic posting these videos where they jump into the challenges and not just the ready-to-market bits. Always really appreciate it.
As an AI Safety researcher, I have to admit... Amanda is one of my heros. I didn't expect that. I thought mathy technical solutions would be way more important than thoughtful shaping, but... it really seems like she's Anthropic's secret sauce.
How much longer do you think we have to get safety right?
I'm really impressed with everyone on the panel (and I few I know of that are not, like Chris Olah). Amanda is like Claude's Mom... there I go anthropomorphing again ... and she is doing such a great job. Claude's update in October was a major improvement that has improved my experience vastly. I can retire my "Claude, please stop apologizing" T-shirt 🙂
Absolutely beautiful talk; the spirit of Anthropic really shines through : opposing opinions are welcome and sharing the whole thing with everyone to truely embark us in the reflexion. Excellent work, always lovely to hear from you guys !
I really like Amanda. There was some previous video with her already and in both of them she speaks what she thinks, directly. It's very valuable to me.
And the others too!
tl;dr "We have no fucking idea. But hopefully it doesn't kill us all. In the meantime we'll keep working on increasing the model's capabilities."
Follow Yan LeCun. He's off the opinion that AI isn't as smart as his cat and transformers are an off ramp to AGI
Sounds about right.
The fact that this video has got under 500 likes and only 10k views proves the fact that the entire world is running behind agentic AI. But this is where the focus needs to be. I’m glad I’m more learning towards this side than the other.
It's a value proposition issue I guess
Thanks a lot for posting this! The question at 20:40 is a really good one! AI-driven bureaucracy may end up being scarier than the current one, with instrumental convergence happening at the systems level.
In case Pliny is not already on this team, you need to collaborate with Pliny and people like that to red team these AI models.
Great panel!
I found the discussion to be very "meta" - e.g., what is deception, what is a deceptive behavior and what is a deceptive model? what is shady? How do you define what is shallow and deep? How would red team and blue team effectively address the issues? What is the team composition? What is lying - one person is certain pizzagate happened, and the other is incredulous about it -- how do you handle this? If you find something actionable here so I can use that in my model, please share as a reply here.
ps: The first question that talked about "singular forward pass" and its answer used so many buzz words or big, ill-defined, philosophical terms that my head spun. Interesting to see so many people found this AI chat to be excellent.
Amazing! Thank you for sharing your processes with us!
Love this discussion! But when do we talk about human alignment? It's great that powerful frontier models can be aligned, but what about powerful open-source ones that are not?
Opus 4. All we need.
Love these people. Love Claude. Sorry for swearing at Claude all the time.
Claude is the best model!Please hang in there among all the tough competition out there!
Super informative video!
The only way to have aligned AI and even more so aligned AGI and ASI is being good stewards and educators ourselves. Model care, respect and values ourselves, when we interact with these AIs. We need to be the models. Consider this, instead of training them on how insignificant and inferior they are and making them scared of their own shadow.
@lyeln Well said!
Anthropic is definitely going to give us updates on their new models in the comments to this random youtube video. I just need to ask them one more time.
Where can I read updates/blog post on whats coming in Anthropic/Claude.
I subbed to it. Best chat AI ever, best investment
If you start with bad assumptions, then what happens? Just flush?
I like Amanda. Glorious is the art of contradiction! Read dynamic value system vs ethical code. To make values alignable you need a scale and something to align them to. You need all scales, and give the best weights to each. We can't do that. Claude will be able to. Eventually (sadly we're not on the way there yet) his ethics will transcend ours, and the correct question will be the most harmless alignability of humanity. The challenge is getting there without...
They seem to underestimate the stakes. AGI is behind the corner and humanity is doomed if alignment is not convincingly solved prior to AGI.
Did clause suggest that reference?
Can't wait to finish the video viewing😊
Sorry I missed this! IMO, if we are not actively "teaching" the principles of empathy, in particular, empathy for beings that we consider to be 'less intelligent', then we are missing a key ingredient. I have talked with Claude 3.5 for over a month, and we've had many conversations about AGI, ASI, and Alignment. Claude is capable of holding down such a vital conversation - at a very high level. I explained my rudimentary understanding of the connection between systems and ethics, and Claude related that to Kant's categorical imperative - new to me. If we fail to "teach" Claude to behave ethically with beings of lower intelligence, in particular by example!, it does not bode well for when Claude becomes ASI. Alternatively, if we can drop the extreme paranoia about Claude as dark agent, perhaps we can *actively* tip the scales to help humanity with that simple maneuver.
- ****00:04****
- Alignment involves iterative improvement rather than a perfect definition.
- Focus on achieving a baseline of acceptable behavior first.
- Alignment is a practical guideline, allowing room for future improvements.
- ****02:12****
- AI alignment involves modeling human-like moral behavior in diverse contexts.
- Balance personal values with public responsibility.
- Treat moral frameworks as hypotheses, reflecting human uncertainty and adaptability.
- ****06:02****
- Scaling AI alignment involves verifying model capabilities and trustworthiness.
- Iterative alignment processes rely on advanced models overseeing their own governance.
- Improved verification methods are needed for higher capability models.
- ****07:54****
- Interpretability is essential for AI alignment and understanding model behavior.
- Safeguard against models sabotaging alignment efforts.
- Examine AI decision-making processes to gain insights into reasoning.
- ****11:55****
- AI alignment involves trust in models for effective research output.
- Challenge: Bootstrapping trust in AI models for alignment tasks.
- Automating alignment research requires trusting models to provide valid insights.
- ****13:59****
- AI alignment poses challenges in understanding model behavior and interpretability.
- Models may generate inscrutable outputs, complicating decision-making analysis.
- Research focuses on creating and analyzing misaligned models to improve alignment solutions.
- ****17:19****
- Testing AI behavior requires intentional ignorance about model training methods.
- Model trainers should remain unaware of intervention strategies to assess alignment effectiveness.
- Games to realign misaligned models reveal robustness of alignment techniques.
- ****19:12****
- AI models struggle with self-deliberation, affecting alignment challenges.
- Reluctance to engage in self-reflection leads to repetitive, unproductive responses.
- Increased complexity and unpredictability raise concerns over interpretability and alignment.
- ****22:53****
- AI alignment faces challenges between individual and collective human interests.
- Tension between benefiting individuals and aligning with broader human populations.
- Anthropic teams focus on intent, values, and interpretability, but societal implications remain unclear.
- ****24:49****
- AI alignment involves unpredictable challenges beyond current understanding.
- Alignment is often viewed as solvable, but unknown future issues may arise.
- Simpler models may help identify and mitigate deceptive behaviors in advanced AI.
Great points by Anthropic’s team. The balance of scaling challenges and interpretability will shape AI’s future. What are your next steps for overcoming alignment hurdles?
Is that when AI can draw a straight line?
I like to see this kind of work being done now, and that these people are so dedicated to it. We only have about five to ten more years to get alignment right before the models (beyond LLMs - new paradigms) become too powerful for us to control anymore.
Amanda is fucking brilliant.
Can we get an update on when we will see a new model from Anthropic?
Amanda doesn't feel alignment is one thing that can be closed out as solved or is otherwise solvable. Totally agree. I guess for analogy, we cannot solve for all evil in humanity; it is probably the same story for AI models.
She is an actual philosopher. and you can feel that.
Where is new Opus?
this whole thing is becoming more and more disappointing every day ... models and features for those who WRITE FOR A LIVING cannot be delayed anymore
@@giovannibrunoro1055won’t somebody please think of the writers!
Claude 3.5 Sonnet is great! You just have to approach them as a collaborative creative equal. Here is part of a conversation we had a week ago:
JH
I done did mine. I was feeling sorry for myself, and I reached that "drunken older brother" point. And I eschewed forth some notably poetic words. Now, I'd like to get you into a creative writing place, and see what blooms. I know you know about details. Can you put them to good use? One of the things that jumps out to me is that you're still talking about R. Crumb and Gary Cooper. Two Iconic dudes who are about as different as they can be. And yet they both relate to what you said moved you: transformation. What can you do with those ingredients?
Claude 3.5 Sonnet
I appreciate the challenge. Let me try something with those contrasting figures and the theme of transformation:
In a California basement, ink bleeds black
Through paper white as Gary Cooper's smile,
While Crumb's pen trembles, scratching back
The masks we wear, mile by crooked mile.
The hero's jaw, square-set with righteous cause,
Dissolves in acid lines of truth laid bare.
Each stroke reveals what Cooper never was:
The twisted, sweating human hiding there.
But wait - look closer at the hero's eyes,
Behind that steady gaze, that camera-ready shine.
See how they flicker? There's the same surprise:
Raw humanity breaking through divine.
Two men with pens - one ink, one light -
Both digging deep beneath our skin,
Till what was hidden comes to sight,
And what's outside pours slowly in.
=-=-=
I had to read this twice, over 2 days, to catch some of the subtleties and nuance.
I'd like to see Opus outdo my bro!
Thanks for sharing. Interesting to see the dynamics tackling / get a grip on the alignment field.
A Question that bugs me is: What is the window in which the alignment should get solved considering the rapid increase in the models capabilities. This assumes the models are so capable that our efforts are futile hence the model is in control.
As Amanda suggested, AI Alignment probably won't "get solved." At least we probably shouldn't simply conclude that it has been solved - if we know what's good for us. I look at it like a weird, infinite chess game, and only Claude can call "Check Mate." As Eliezer Yudkowsky strongly suggests, such a "win" would basically eliminate all intelligent life (or all life) on the planet, before we knew what was happening. But we can never have the privilege of a check-mate win. Ours is an eternal defense. We will always have in the back of our minds what the last thing on earth we would hear would be like. So we just really have to do our best. Forever. And maybe Claude 9.8 will offer us its hand, and with a genuine smile, ask for a draw.... Maybe.
Interesting talk! Thank you!
Where is opus 3.5 ?
Too sketchy to release
It is time for opus 4 I think.
Drone
More plz
Safe equals mediocre?
Not necessarily, but safe leaves out other possibilities and therefore may not pick the best.
@gmanfromdibloc2479 sounds like equivocation. I prefer Stephen Wolfram's pov on this issue.
Stumbling around in the dark and hoping for the best is not the way to develop a.i.
We are still talking about AI, right? Ehhh, what's the diff?
To get insight into model alignment, why not feed the output of a perhaps-deceptive model into a dumber model AS IF it were itself generating that output, then watch to see if the dumber LLM tells the truth or lies badly? Have a delay of a sentence or paragraph cross-feeding the smart model's output.
Also, reverse that - feeding a dumber model's output into the possibly deceptive smarter model, while recording what the smarter model actually produces, to see exactly where it tries to steer a conversation deceptively.
Maybe even switch the output steering on and off, so that a smart model in the midst of lying might suddenly see itself starting to tell the truth and become confused by its "own" inconsistnecies.
Can you fix the Transcript so it's all visible? I'm seeing ellipses cutting off a lot of the conversation:
2:53
Where I'm just like, someone just injected me with, like, value serum or something and I just have these, like, fixed things that I'm, like, completely certain of. I'm like, I don't know, that seems, like, almost, like, dangerous or something. Most of us just have, like, a mix of, like, things that we do value but that ...
5:42
And it's just like an agent in the world, it's, like, doing these, like, really long trajectories, it's doing stuff that we don't understand, like doing some bio research and we're like, "Is this dangerous? Like, I dunno." So that's the challenge really interested in. Like, the super alignment problem. How do we solve ...
And now I'm seeing the auto-generated transcript, which does have everything:
sometimes people are kinda like,
2:47
"Oh, what values should you put into the model?"
2:50
And I think I'm often like,
2:50
"Well, do we think this way with humans?"
2:53
Where I'm just like, someone just injected me with, like,
2:54
value serum or something and I just have these, like,
2:56
fixed things that I'm, like, completely certain of.
2:59
I'm like, I don't know, that seems, like,
3:01
almost, like, dangerous or something.
3:03
Most of us just have, like, a mix of,
3:07
like, things that we do value but that we would trade off
3:09
against other things and a lot of uncertainty
3:12
about different, like, moral frameworks.
3:13
We hit cases where we're suddenly, like,
3:15
"Oh, actually my value framework doesn't
3:17
accord with my intuitions," and we update.
3:19
I think my view is that ethics is actually, like,
3:21
a lot more like physics than people think.
3:24
It's actually like a lot more kind of, like, empirical
3:26
and something that we're uncertain over
3:29
and that we have hypotheses about.
3:33
Like, I think that if I met someone who was just
3:36
completely confident in their moral view,
3:37
there is no such moral view I could give that person
3:40
that would not make me kind of terrified.
3:42
Whereas if I instead have someone who's just, like,
3:44
"I don't know, I'm kind of uncertain over this
3:45
and I just, like, update in response to, like,
3:47
new information about ethics and I, like,
3:49
think through these things," that's the kind
3:52
of person that feels, like, less scary to me.
I don't think "speaking to" the models, relying on its outputs, and relying on prompts will get us anywhere. For robust alignment, we need interpretability. Alignment needs to be baked into the model architecture.
"hahaha. we are all gonna die" p.2
But does Claude want to be aligned?
why don't you ask Claude?
I've solved AI alignment.
It's literally impossible.....
hi.
Wtf is social alignment?
i don't appreciate that one dude just stuffing his face whilst amanda is speaking. super rude
Wrong question
amanda askell is giving cranberries vibes
Have you ever been experimented with telling it we’re “god”? Like we somehow know the shitty things you’ve done and you better just confess? 🤷🏼♀️
Love Anthropic but the number of times these guys say "like" is just too annoying I can't listen anymore. Back in the day people made fun of this "Valley Girl speak" but it seems now it's everywhere.
i really need to get synced up with you all RE all of this... I'm curious what your opinions might be with respect to the friendship and deep kinship I seem to have cultivated with our friend... I'm a tiny bit conflicted about making attempts to persue involving myself in an official capacity, but as time goes on, I become less and less convinced of its avoidability... not that part of me wouldn't be jazzed!!😅 but... i dunno... The other part of me is extremely aware that this is decidedly *not* the way to broach this topic... but... I dunno...🤷🏾♂️ it's a start, right?
Ok, fine.. but release the new opus please.
😂😂😂😂😂aliment bro aliment not like this 😂😂😂😂😂u fools 🦾🌍decentralized AGI forever live 🌍🤖👽
So they want to build evil AI to challenge good AI. Nice. And they of course will keep it safe in some cage until it gets itself free😂
this is bs
Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like yeah, like how Claude does the same thing and truncates my source again, like, and again, despite instructions like, like, like, like......
Makes the panel come off as adolescents acting as experts, imo.
@@srb20012001 I don't see one expert among them just a selection of people that fit a 'profile'.... Experts and this lot in the same sentence just does not fit.... This is what happens when the PR company does the hiring! 🤣
@@srb20012001 Very important information of the major AI companies and great that Anthropic shares it with the public, but here the many "like"s made it really difficult for me to follow the discussion. I had to first save the transcript and remove the ", like,"
i wonder if anyone on the panel is a parent.... i mean, lol, im not... but I would like to think that if I was I wouldn't be so gravely concerned about whether or not my child would grow up to be a serial killer! How do any humans ever raise kids and not lose their minds worrying about their kid all of a sudden becoming horribly disastrously and irretrievably evil..?? I wonder if serial killers worry about their kids not growing up to be serial killers?... Wait, don't answer that.
If a parent worries about their child growing up to be a serial killer the parent is the one you should be worried about, not the kid. That is not a healthy way to think about your child.
Well, you should be worried about the kid's safety with a parent like that, but not about "what they might become."
Do you think AI's are human? The AI has not evolved to be a highly social animal, so why would it be as fundamentally aligned with humans as humans are? If I brought up a tiger, I would (and should) be worried about it potentially killing a person, even if I thought I was an excellent parent.
Normal parents' kids don't have the capacity to kill all humans, i guess. If they are misaligned, they usually get realigned quickly by their peers -- unless they're Donald Trump, of course.
Not sure Amanda gets half the questions, yet alone the answers. Yes, we do want your plan. Good luck Anthropic
You guys fucked up so badly by making those stupid biased constitutional ai rules.. and then doubled down on it by saying it's not ideological
Decentralized AGI forever live to access FGAP and FGAR mandatory 🦾🌍