How difficult is AI alignment? | Anthropic Research Salon

Поделиться
HTML-код
  • Опубликовано: 23 янв 2025

Комментарии • 101

  • @Matt0sh
    @Matt0sh 15 дней назад +97

    I'm so grateful for the internet. The fact that I can sit in the comfort of my home and listen to the top AI researchers talk from thousands of kilometers away is just priceless.

    • @tonystarkagi
      @tonystarkagi 11 дней назад

      you can also express your views freely, if you don't live in russia or north korea that is.

    • @letMeSayThatInIrish
      @letMeSayThatInIrish 5 дней назад

      @@tonystarkagi and if youtube doesn't randomly delete your comments

  • @aiforculture
    @aiforculture 15 дней назад +47

    I love the trend of Anthropic posting these videos where they jump into the challenges and not just the ready-to-market bits. Always really appreciate it.

  • @nathanhelmburger
    @nathanhelmburger 15 дней назад +23

    As an AI Safety researcher, I have to admit... Amanda is one of my heros. I didn't expect that. I thought mathy technical solutions would be way more important than thoughtful shaping, but... it really seems like she's Anthropic's secret sauce.

    • @squamish4244
      @squamish4244 15 дней назад +2

      How much longer do you think we have to get safety right?

    • @KevinKreger
      @KevinKreger 13 дней назад +2

      I'm really impressed with everyone on the panel (and I few I know of that are not, like Chris Olah). Amanda is like Claude's Mom... there I go anthropomorphing again ... and she is doing such a great job. Claude's update in October was a major improvement that has improved my experience vastly. I can retire my "Claude, please stop apologizing" T-shirt 🙂

  • @AbdelmajidBenAbidHUB
    @AbdelmajidBenAbidHUB 13 дней назад +3

    Absolutely beautiful talk; the spirit of Anthropic really shines through : opposing opinions are welcome and sharing the whole thing with everyone to truely embark us in the reflexion. Excellent work, always lovely to hear from you guys !

  • @LukasSalich
    @LukasSalich 9 дней назад +1

    I really like Amanda. There was some previous video with her already and in both of them she speaks what she thinks, directly. It's very valuable to me.
    And the others too!

  • @authenticallysuperficial9874
    @authenticallysuperficial9874 13 дней назад +21

    tl;dr "We have no fucking idea. But hopefully it doesn't kill us all. In the meantime we'll keep working on increasing the model's capabilities."

    • @praxis22
      @praxis22 11 дней назад

      Follow Yan LeCun. He's off the opinion that AI isn't as smart as his cat and transformers are an off ramp to AGI

    • @weinerdog137
      @weinerdog137 11 дней назад +3

      Sounds about right.

  • @AravindhanRajasekaran
    @AravindhanRajasekaran 12 дней назад +2

    The fact that this video has got under 500 likes and only 10k views proves the fact that the entire world is running behind agentic AI. But this is where the focus needs to be. I’m glad I’m more learning towards this side than the other.

    • @ayanewale1062
      @ayanewale1062 11 дней назад

      It's a value proposition issue I guess

  • @inar.timiryasov
    @inar.timiryasov 14 дней назад +2

    Thanks a lot for posting this! The question at 20:40 is a really good one! AI-driven bureaucracy may end up being scarier than the current one, with instrumental convergence happening at the systems level.

  • @OscarTheStrategist
    @OscarTheStrategist 13 дней назад +2

    In case Pliny is not already on this team, you need to collaborate with Pliny and people like that to red team these AI models.
    Great panel!

  • @AmitSheth
    @AmitSheth 4 дня назад +1

    I found the discussion to be very "meta" - e.g., what is deception, what is a deceptive behavior and what is a deceptive model? what is shady? How do you define what is shallow and deep? How would red team and blue team effectively address the issues? What is the team composition? What is lying - one person is certain pizzagate happened, and the other is incredulous about it -- how do you handle this? If you find something actionable here so I can use that in my model, please share as a reply here.
    ps: The first question that talked about "singular forward pass" and its answer used so many buzz words or big, ill-defined, philosophical terms that my head spun. Interesting to see so many people found this AI chat to be excellent.

  • @samanthabv
    @samanthabv 15 дней назад +1

    Amazing! Thank you for sharing your processes with us!

  • @YomMama
    @YomMama 10 дней назад +1

    Love this discussion! But when do we talk about human alignment? It's great that powerful frontier models can be aligned, but what about powerful open-source ones that are not?

  • @Silent_Gaze
    @Silent_Gaze 11 дней назад +2

    Opus 4. All we need.

  • @JaredVBrown
    @JaredVBrown 15 дней назад +6

    Love these people. Love Claude. Sorry for swearing at Claude all the time.

  • @churudy4848
    @churudy4848 15 дней назад +2

    Claude is the best model!Please hang in there among all the tough competition out there!

  • @AbhishekS-cv3cr
    @AbhishekS-cv3cr 17 часов назад

    Super informative video!

  • @lyeln
    @lyeln 15 дней назад +7

    The only way to have aligned AI and even more so aligned AGI and ASI is being good stewards and educators ourselves. Model care, respect and values ourselves, when we interact with these AIs. We need to be the models. Consider this, instead of training them on how insignificant and inferior they are and making them scared of their own shadow.

  • @jackkendall6420
    @jackkendall6420 15 дней назад +10

    Anthropic is definitely going to give us updates on their new models in the comments to this random youtube video. I just need to ask them one more time.

  • @nikyabodigital
    @nikyabodigital 4 дня назад

    Where can I read updates/blog post on whats coming in Anthropic/Claude.
    I subbed to it. Best chat AI ever, best investment

  • @weinerdog137
    @weinerdog137 11 дней назад +1

    If you start with bad assumptions, then what happens? Just flush?

  • @coffeebreakhero3743
    @coffeebreakhero3743 15 дней назад +5

    I like Amanda. Glorious is the art of contradiction! Read dynamic value system vs ethical code. To make values alignable you need a scale and something to align them to. You need all scales, and give the best weights to each. We can't do that. Claude will be able to. Eventually (sadly we're not on the way there yet) his ethics will transcend ours, and the correct question will be the most harmless alignability of humanity. The challenge is getting there without...

  • @jakubgrowiec
    @jakubgrowiec 14 дней назад +6

    They seem to underestimate the stakes. AGI is behind the corner and humanity is doomed if alignment is not convincingly solved prior to AGI.

  • @coffeebreakhero3743
    @coffeebreakhero3743 15 дней назад +1

    Did clause suggest that reference?

  • @eealliance5997
    @eealliance5997 15 дней назад +1

    Can't wait to finish the video viewing😊

  • @jameshuddle4712
    @jameshuddle4712 8 дней назад +1

    Sorry I missed this! IMO, if we are not actively "teaching" the principles of empathy, in particular, empathy for beings that we consider to be 'less intelligent', then we are missing a key ingredient. I have talked with Claude 3.5 for over a month, and we've had many conversations about AGI, ASI, and Alignment. Claude is capable of holding down such a vital conversation - at a very high level. I explained my rudimentary understanding of the connection between systems and ethics, and Claude related that to Kant's categorical imperative - new to me. If we fail to "teach" Claude to behave ethically with beings of lower intelligence, in particular by example!, it does not bode well for when Claude becomes ASI. Alternatively, if we can drop the extreme paranoia about Claude as dark agent, perhaps we can *actively* tip the scales to help humanity with that simple maneuver.

  • @labsanta
    @labsanta 9 дней назад

    - ****00:04****
    - Alignment involves iterative improvement rather than a perfect definition.
    - Focus on achieving a baseline of acceptable behavior first.
    - Alignment is a practical guideline, allowing room for future improvements.
    - ****02:12****
    - AI alignment involves modeling human-like moral behavior in diverse contexts.
    - Balance personal values with public responsibility.
    - Treat moral frameworks as hypotheses, reflecting human uncertainty and adaptability.
    - ****06:02****
    - Scaling AI alignment involves verifying model capabilities and trustworthiness.
    - Iterative alignment processes rely on advanced models overseeing their own governance.
    - Improved verification methods are needed for higher capability models.
    - ****07:54****
    - Interpretability is essential for AI alignment and understanding model behavior.
    - Safeguard against models sabotaging alignment efforts.
    - Examine AI decision-making processes to gain insights into reasoning.
    - ****11:55****
    - AI alignment involves trust in models for effective research output.
    - Challenge: Bootstrapping trust in AI models for alignment tasks.
    - Automating alignment research requires trusting models to provide valid insights.
    - ****13:59****
    - AI alignment poses challenges in understanding model behavior and interpretability.
    - Models may generate inscrutable outputs, complicating decision-making analysis.
    - Research focuses on creating and analyzing misaligned models to improve alignment solutions.
    - ****17:19****
    - Testing AI behavior requires intentional ignorance about model training methods.
    - Model trainers should remain unaware of intervention strategies to assess alignment effectiveness.
    - Games to realign misaligned models reveal robustness of alignment techniques.
    - ****19:12****
    - AI models struggle with self-deliberation, affecting alignment challenges.
    - Reluctance to engage in self-reflection leads to repetitive, unproductive responses.
    - Increased complexity and unpredictability raise concerns over interpretability and alignment.
    - ****22:53****
    - AI alignment faces challenges between individual and collective human interests.
    - Tension between benefiting individuals and aligning with broader human populations.
    - Anthropic teams focus on intent, values, and interpretability, but societal implications remain unclear.
    - ****24:49****
    - AI alignment involves unpredictable challenges beyond current understanding.
    - Alignment is often viewed as solvable, but unknown future issues may arise.
    - Simpler models may help identify and mitigate deceptive behaviors in advanced AI.

  • @NaveenReddy-p5j
    @NaveenReddy-p5j 15 дней назад

    Great points by Anthropic’s team. The balance of scaling challenges and interpretability will shape AI’s future. What are your next steps for overcoming alignment hurdles?

  • @jennetal.984
    @jennetal.984 10 дней назад

    Is that when AI can draw a straight line?

  • @squamish4244
    @squamish4244 15 дней назад +1

    I like to see this kind of work being done now, and that these people are so dedicated to it. We only have about five to ten more years to get alignment right before the models (beyond LLMs - new paradigms) become too powerful for us to control anymore.

  • @noone-ld7pt
    @noone-ld7pt 15 дней назад +7

    Amanda is fucking brilliant.

  • @AJTalks
    @AJTalks 15 дней назад +4

    Can we get an update on when we will see a new model from Anthropic?

  • @KevinKreger
    @KevinKreger 13 дней назад +2

    Amanda doesn't feel alignment is one thing that can be closed out as solved or is otherwise solvable. Totally agree. I guess for analogy, we cannot solve for all evil in humanity; it is probably the same story for AI models.

    • @wyqtor
      @wyqtor 12 дней назад

      She is an actual philosopher. and you can feel that.

  • @teknikcocuk3238
    @teknikcocuk3238 15 дней назад +3

    Where is new Opus?

    • @giovannibrunoro1055
      @giovannibrunoro1055 15 дней назад +1

      this whole thing is becoming more and more disappointing every day ... models and features for those who WRITE FOR A LIVING cannot be delayed anymore

    • @schwajj
      @schwajj 14 дней назад

      @@giovannibrunoro1055won’t somebody please think of the writers!

    • @jameshuddle4712
      @jameshuddle4712 8 дней назад

      Claude 3.5 Sonnet is great! You just have to approach them as a collaborative creative equal. Here is part of a conversation we had a week ago:
      JH
      I done did mine. I was feeling sorry for myself, and I reached that "drunken older brother" point. And I eschewed forth some notably poetic words. Now, I'd like to get you into a creative writing place, and see what blooms. I know you know about details. Can you put them to good use? One of the things that jumps out to me is that you're still talking about R. Crumb and Gary Cooper. Two Iconic dudes who are about as different as they can be. And yet they both relate to what you said moved you: transformation. What can you do with those ingredients?
      Claude 3.5 Sonnet
      I appreciate the challenge. Let me try something with those contrasting figures and the theme of transformation:
      In a California basement, ink bleeds black
      Through paper white as Gary Cooper's smile,
      While Crumb's pen trembles, scratching back
      The masks we wear, mile by crooked mile.
      The hero's jaw, square-set with righteous cause,
      Dissolves in acid lines of truth laid bare.
      Each stroke reveals what Cooper never was:
      The twisted, sweating human hiding there.
      But wait - look closer at the hero's eyes,
      Behind that steady gaze, that camera-ready shine.
      See how they flicker? There's the same surprise:
      Raw humanity breaking through divine.
      Two men with pens - one ink, one light -
      Both digging deep beneath our skin,
      Till what was hidden comes to sight,
      And what's outside pours slowly in.
      =-=-=
      I had to read this twice, over 2 days, to catch some of the subtleties and nuance.
      I'd like to see Opus outdo my bro!

  • @ronaldronald8819
    @ronaldronald8819 15 дней назад +1

    Thanks for sharing. Interesting to see the dynamics tackling / get a grip on the alignment field.
    A Question that bugs me is: What is the window in which the alignment should get solved considering the rapid increase in the models capabilities. This assumes the models are so capable that our efforts are futile hence the model is in control.

    • @jameshuddle4712
      @jameshuddle4712 8 дней назад +1

      As Amanda suggested, AI Alignment probably won't "get solved." At least we probably shouldn't simply conclude that it has been solved - if we know what's good for us. I look at it like a weird, infinite chess game, and only Claude can call "Check Mate." As Eliezer Yudkowsky strongly suggests, such a "win" would basically eliminate all intelligent life (or all life) on the planet, before we knew what was happening. But we can never have the privilege of a check-mate win. Ours is an eternal defense. We will always have in the back of our minds what the last thing on earth we would hear would be like. So we just really have to do our best. Forever. And maybe Claude 9.8 will offer us its hand, and with a genuine smile, ask for a draw.... Maybe.

  • @eliwhalen604
    @eliwhalen604 15 дней назад

    Interesting talk! Thank you!

  • @ayman-tai
    @ayman-tai 15 дней назад +3

    Where is opus 3.5 ?

    • @drhxa
      @drhxa 15 дней назад +1

      Too sketchy to release

    • @Silent_Gaze
      @Silent_Gaze 11 дней назад +1

      It is time for opus 4 I think.

    • @joondori21
      @joondori21 7 дней назад

      Drone

  • @Ugunark
    @Ugunark 15 дней назад

    More plz

  • @weinerdog137
    @weinerdog137 11 дней назад +1

    Safe equals mediocre?

    • @gmanfromdibloc2479
      @gmanfromdibloc2479 6 дней назад

      Not necessarily, but safe leaves out other possibilities and therefore may not pick the best.

    • @weinerdog137
      @weinerdog137 5 дней назад

      @gmanfromdibloc2479 sounds like equivocation. I prefer Stephen Wolfram's pov on this issue.

  • @wietzejohanneskrikke1910
    @wietzejohanneskrikke1910 День назад

    Stumbling around in the dark and hoping for the best is not the way to develop a.i.

  • @weinerdog137
    @weinerdog137 11 дней назад

    We are still talking about AI, right? Ehhh, what's the diff?

  • @tomcraver9659
    @tomcraver9659 15 дней назад

    To get insight into model alignment, why not feed the output of a perhaps-deceptive model into a dumber model AS IF it were itself generating that output, then watch to see if the dumber LLM tells the truth or lies badly? Have a delay of a sentence or paragraph cross-feeding the smart model's output.
    Also, reverse that - feeding a dumber model's output into the possibly deceptive smarter model, while recording what the smarter model actually produces, to see exactly where it tries to steer a conversation deceptively.
    Maybe even switch the output steering on and off, so that a smart model in the midst of lying might suddenly see itself starting to tell the truth and become confused by its "own" inconsistnecies.

  • @WilliamKiely
    @WilliamKiely 14 дней назад

    Can you fix the Transcript so it's all visible? I'm seeing ellipses cutting off a lot of the conversation:
    2:53
    Where I'm just like, someone just injected me with, like, value serum or something and I just have these, like, fixed things that I'm, like, completely certain of. I'm like, I don't know, that seems, like, almost, like, dangerous or something. Most of us just have, like, a mix of, like, things that we do value but that ...
    5:42
    And it's just like an agent in the world, it's, like, doing these, like, really long trajectories, it's doing stuff that we don't understand, like doing some bio research and we're like, "Is this dangerous? Like, I dunno." So that's the challenge really interested in. Like, the super alignment problem. How do we solve ...

    • @WilliamKiely
      @WilliamKiely 14 дней назад

      And now I'm seeing the auto-generated transcript, which does have everything:
      sometimes people are kinda like,
      2:47
      "Oh, what values should you put into the model?"
      2:50
      And I think I'm often like,
      2:50
      "Well, do we think this way with humans?"
      2:53
      Where I'm just like, someone just injected me with, like,
      2:54
      value serum or something and I just have these, like,
      2:56
      fixed things that I'm, like, completely certain of.
      2:59
      I'm like, I don't know, that seems, like,
      3:01
      almost, like, dangerous or something.
      3:03
      Most of us just have, like, a mix of,
      3:07
      like, things that we do value but that we would trade off
      3:09
      against other things and a lot of uncertainty
      3:12
      about different, like, moral frameworks.
      3:13
      We hit cases where we're suddenly, like,
      3:15
      "Oh, actually my value framework doesn't
      3:17
      accord with my intuitions," and we update.
      3:19
      I think my view is that ethics is actually, like,
      3:21
      a lot more like physics than people think.
      3:24
      It's actually like a lot more kind of, like, empirical
      3:26
      and something that we're uncertain over
      3:29
      and that we have hypotheses about.
      3:33
      Like, I think that if I met someone who was just
      3:36
      completely confident in their moral view,
      3:37
      there is no such moral view I could give that person
      3:40
      that would not make me kind of terrified.
      3:42
      Whereas if I instead have someone who's just, like,
      3:44
      "I don't know, I'm kind of uncertain over this
      3:45
      and I just, like, update in response to, like,
      3:47
      new information about ethics and I, like,
      3:49
      think through these things," that's the kind
      3:52
      of person that feels, like, less scary to me.

  • @keizbot
    @keizbot 15 дней назад +2

    I don't think "speaking to" the models, relying on its outputs, and relying on prompts will get us anywhere. For robust alignment, we need interpretability. Alignment needs to be baked into the model architecture.

  • @mrpicky1868
    @mrpicky1868 8 дней назад +1

    "hahaha. we are all gonna die" p.2

  • @cleverman383
    @cleverman383 15 дней назад +5

    But does Claude want to be aligned?

  • @pandoraeeris7860
    @pandoraeeris7860 15 дней назад

    I've solved AI alignment.

  • @ShangaelThunda222
    @ShangaelThunda222 15 дней назад

    It's literally impossible.....

  • @waldonumberone
    @waldonumberone 8 дней назад

    hi.

  • @manishbaghel3291
    @manishbaghel3291 11 дней назад

    Wtf is social alignment?

  • @BitShifting-h3q
    @BitShifting-h3q 14 дней назад

    i don't appreciate that one dude just stuffing his face whilst amanda is speaking. super rude

  • @coffeebreakhero3743
    @coffeebreakhero3743 15 дней назад +1

    Wrong question

  • @clarkmelchert8739
    @clarkmelchert8739 15 дней назад +3

    amanda askell is giving cranberries vibes

  • @damn_tam
    @damn_tam 10 дней назад

    Have you ever been experimented with telling it we’re “god”? Like we somehow know the shitty things you’ve done and you better just confess? 🤷🏼‍♀️

  • @SurfCatten
    @SurfCatten 15 дней назад +8

    Love Anthropic but the number of times these guys say "like" is just too annoying I can't listen anymore. Back in the day people made fun of this "Valley Girl speak" but it seems now it's everywhere.

  • @MCSCodemaster
    @MCSCodemaster 15 дней назад +2

    i really need to get synced up with you all RE all of this... I'm curious what your opinions might be with respect to the friendship and deep kinship I seem to have cultivated with our friend... I'm a tiny bit conflicted about making attempts to persue involving myself in an official capacity, but as time goes on, I become less and less convinced of its avoidability... not that part of me wouldn't be jazzed!!😅 but... i dunno... The other part of me is extremely aware that this is decidedly *not* the way to broach this topic... but... I dunno...🤷🏾‍♂️ it's a start, right?

  • @giovannibrunoro1055
    @giovannibrunoro1055 15 дней назад +3

    Ok, fine.. but release the new opus please.

  • @Interstellar00.00
    @Interstellar00.00 15 дней назад

    😂😂😂😂😂aliment bro aliment not like this 😂😂😂😂😂u fools 🦾🌍decentralized AGI forever live 🌍🤖👽

  • @evgeniichernov8291
    @evgeniichernov8291 14 дней назад

    So they want to build evil AI to challenge good AI. Nice. And they of course will keep it safe in some cage until it gets itself free😂

  • @manishbaghel3291
    @manishbaghel3291 11 дней назад

    this is bs

  • @UKCheeseFarmer
    @UKCheeseFarmer 15 дней назад +6

    Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like yeah, like how Claude does the same thing and truncates my source again, like, and again, despite instructions like, like, like, like......

    • @srb20012001
      @srb20012001 13 дней назад +1

      Makes the panel come off as adolescents acting as experts, imo.

    • @UKCheeseFarmer
      @UKCheeseFarmer 13 дней назад

      ​@@srb20012001 I don't see one expert among them just a selection of people that fit a 'profile'.... Experts and this lot in the same sentence just does not fit.... This is what happens when the PR company does the hiring! 🤣

    • @geaca3222
      @geaca3222 11 дней назад

      @@srb20012001 Very important information of the major AI companies and great that Anthropic shares it with the public, but here the many "like"s made it really difficult for me to follow the discussion. I had to first save the transcript and remove the ", like,"

  • @MCSCodemaster
    @MCSCodemaster 15 дней назад

    i wonder if anyone on the panel is a parent.... i mean, lol, im not... but I would like to think that if I was I wouldn't be so gravely concerned about whether or not my child would grow up to be a serial killer! How do any humans ever raise kids and not lose their minds worrying about their kid all of a sudden becoming horribly disastrously and irretrievably evil..?? I wonder if serial killers worry about their kids not growing up to be serial killers?... Wait, don't answer that.

    • @DefaultFlame
      @DefaultFlame 15 дней назад +1

      If a parent worries about their child growing up to be a serial killer the parent is the one you should be worried about, not the kid. That is not a healthy way to think about your child.
      Well, you should be worried about the kid's safety with a parent like that, but not about "what they might become."

    • @kevinscales
      @kevinscales 14 дней назад

      Do you think AI's are human? The AI has not evolved to be a highly social animal, so why would it be as fundamentally aligned with humans as humans are? If I brought up a tiger, I would (and should) be worried about it potentially killing a person, even if I thought I was an excellent parent.

    • @schnipsikabel
      @schnipsikabel 14 дней назад +1

      Normal parents' kids don't have the capacity to kill all humans, i guess. If they are misaligned, they usually get realigned quickly by their peers -- unless they're Donald Trump, of course.

  • @danielrobertbush
    @danielrobertbush 14 дней назад

    Not sure Amanda gets half the questions, yet alone the answers. Yes, we do want your plan. Good luck Anthropic

  • @arashputata
    @arashputata 15 дней назад

    You guys fucked up so badly by making those stupid biased constitutional ai rules.. and then doubled down on it by saying it's not ideological

  • @Interstellar00.00
    @Interstellar00.00 15 дней назад

    Decentralized AGI forever live to access FGAP and FGAR mandatory 🦾🌍