Scaling interpretability

Поделиться
HTML-код
  • Опубликовано: 12 июн 2024
  • Science and engineering are inseparable. Our researchers reflect on the close relationship between scientific and engineering progress, and discuss the technical challenges they encountered in scaling our interpretability research to much larger AI models.
    Read more: anthropic.com/research/engine...

Комментарии • 66

  • @palimondo
    @palimondo 17 дней назад +53

    It’s win for humanity when quants quit finance to work on AI interpretability! Thank you 🙏

    • @df4privateyoutube722
      @df4privateyoutube722 13 дней назад +1

      Honestly, this is amazing if it becomes a widespread trend.

  • @taiyoinoue
    @taiyoinoue 17 дней назад +11

    I absolutely love the results on interpretability discussed here. The Scaling Monosemanticity paper blew my mind, and I was raving about it to anyone who would listen. It is so wonderful to get the chance to see you all talk about this stuff. When I was a kid, I wanted to be one of those NASA engineers who sat in the command center doing calculations to explore outer space. Alas, I'm now a middle-aged pure mathematician. But now, if I was a kid, I'd want to be an AI interp researcher and do calculations to explore the space of possible minds.

  • @shawnvandever3917
    @shawnvandever3917 18 дней назад +22

    More videos like this please !!!

  • @johnnykidblue
    @johnnykidblue 18 дней назад +46

    Now maybe people will finally stop saying “they don’t really understand, they’re just predicting the next word.”
    They do understand, and they will take your job.

    • @penguinista
      @penguinista 18 дней назад +6

      We are going to have to wait a while before that is generally understood still.

    • @therealestmc85
      @therealestmc85 18 дней назад +1

      They don't understand anything.

    • @TheRealUsername
      @TheRealUsername 18 дней назад +1

      It's not understanding, it's memorization

    • @mathiastossens3653
      @mathiastossens3653 17 дней назад +7

      @@TheRealUsername Its really a pointless discussion unless we agree on some definition of "understanding" if you look it up you'll see there is no commonly accepted definition. It's just some abstract concept we can use as an ever moving goalpost for these AI's to reach. I would say that if you are able to predict what someone really smart would say it doesnt really matter whether you want to call that understanding or something else.

    • @lyeln
      @lyeln 17 дней назад +4

      People still saying that these models don't understand anything are just scared and in denial

  • @user-to9ub5xv7o
    @user-to9ub5xv7o 17 дней назад +8

    # Interpretability Engineering at Anthropic
    ## Chapter 1: Introductions and Background
    ****0:00**-**1:15****
    - Team members: Josh Batson, Jonathan Marcus, Adly, TC.
    - Backgrounds in finance, exchange learning, and backend work.
    ## Chapter 2: Recent Interpretability Release
    ****1:15**-**4:41****
    - Transition from small to large models.
    - Goal: Extract interpretable features from production models.
    ## Chapter 3: Discoveries and Features
    ****4:41**-**8:16****
    - Examples: Functions that add numbers, veganism feature.
    - Multimodal features: code backdoors, hidden cameras.
    ## Chapter 4: Golden Gate Claude Experiment
    ****8:16**-**10:42****
    - Experiment: Claude responding with Golden Gate Bridge information.
    - Rapid implementation and success.
    ## Chapter 5: Scaling Challenges
    ****10:42**-**13:24****
    - Scaling dictionary learning technique.
    - Transition from single GPU to multiple GPUs.
    - Sparse auto-encoders: scalability and initial doubts.
    ## Chapter 6: Engineering Efforts and Trade-offs
    ****13:24**-**17:17****
    - Efficiently shuffling large data sets.
    - Balancing short-term experiments and long-term infrastructure.
    - Parallel shuffling for massive data.
    ## Chapter 7: Research Engineering Dynamics
    ****17:17**-**23:43****
    - Differences between product and research engineering.
    - Importance of flexible, iterative development.
    - Strategies for testing and debugging.
    ## Chapter 8: Interdisciplinary Collaboration
    ****23:43**-**32:03****
    - Collaboration enhances outcomes.
    - Importance of diverse skill sets.
    - Pairing different experts together.
    ## Chapter 9: Future of Interpretability
    ****32:03**-**39:29****
    - Vision: Analyze all layers of production models.
    - Goals: Understand feature interactions and model circuits.
    - Scaling techniques to address AI safety challenges.
    ## Chapter 10: Personal Reflections and Team Dynamics
    ****39:29**-**53:12****
    - Personal motivations for working in interpretability.
    - Challenges and satisfactions of the field.
    - Encouragement for new team members to apply.

  • @TomGally
    @TomGally 18 дней назад +16

    This is a great discussion. Many thanks for posting it.
    I read your “Scaling Monosemanticity” paper soon after it was released and have been telling people how important it is. It’s pretty dense reading, though, and its implications are not yet widely recognized. Nearly every day I still see comments from people dismissing large language models as “just predicting the next word.” I hope Anthropic can produce more videos like this but aimed at a wider audience, so that more people will understand how meaning is represented in LLMs and how their performance can be adjusted for safety and other purposes.

  • @cheshirecat111
    @cheshirecat111 17 дней назад +7

    I just want to say thank you very much to Anthropic and its employees for being more considerate of the risks of AI, compared to OpenAI. Thank you for working on interpretability which has the promise of being able to control models, which will be very important when AGI comes.

  • @kekekekatie
    @kekekekatie 17 дней назад +3

    More of this please! There's a real hunger out here in reality-land for this stuff.

  • @trpultz
    @trpultz 18 дней назад +23

    I have my issues with Claude, but I appreciate the openness! Look forward to (hopefully) future round tables!

    • @sarahdrawz
      @sarahdrawz 18 дней назад +6

      like what? just curious

    • @battlepug3122
      @battlepug3122 10 дней назад +1

      @@sarahdrawz censorship, bias, political correctness etc.

  • @nossonweissman
    @nossonweissman 7 дней назад

    40:27 this is so on point and commonly overlooked. Applies in all areas of life!

  • @jimberry7865
    @jimberry7865 16 дней назад +2

    Nice to see behind the curtains!

  • @NandoPr1m3
    @NandoPr1m3 18 дней назад +5

    Thank you all for the work you do! Interpretability is a cornerstone for adaptability by the general public. As societal impact of this technology also SCALES, we need both the transparency and the tools to mitigate fear of the unknown. Please keep this type of content coming!

  • @haihuang5879
    @haihuang5879 18 дней назад +1

    It's so cool to see big achievements made by newly joined team members!

  • @mattwesney
    @mattwesney 17 дней назад

    love this. glad to see you guys putting content out like this! a lot of us are rooting for you

  • @mustafaozgul5427
    @mustafaozgul5427 18 дней назад

    This is amazing how wonderful achievements succeed by new members 👏👏👏👏👏.
    Understanding nanostructures, molecular levels physiology and chemistry in cells, these knowledge and experience allow you to understand whole human organism in macro-level. The researchers here use various versions of these in AI to scale interpretability and understand mechanism answers of AI. They are really interesting as a researcher.

  • @TheFeedRocket
    @TheFeedRocket 15 дней назад +1

    The difficulty is that these models may come up with the same answer in different ways, just as if you ask 10 humans to come up with an answer or an idea, it would be somewhat different for each person on how they arrived at the answer. Being that the models are not fully matureI, just adding vision changes things a lot, so how they come to answer might continually change. Just when you think you understand a certain aspect of how it arrives at an answer, you could pull out some key features that was assumed was needed and it still comes up with answer.
    What we have built here is truly alien, then again how we think is really alien to us as we have so little understanding of how the human mind works either. I actually think humans are closer to an LLM than we want to believe. Words are very important, and like an LLM sometimes the content doesn't matter, just the act of reading and absorbing more words increases a child's ability in many areas as it does for the LLM. But you can't compare how we think to these models, although similar in some ways, it's very dangerous to compare the two.
    Look forward to more of these, or even a live Q&A.

  • @TheLegendaryHacker
    @TheLegendaryHacker 18 дней назад +7

    7:35 Huh, this makes me wonder if you could "rip out" the part of Sonnet that fires for hidden cameras, put that into a smaller model, and get a lightweight SOTA hidden camera detector

    • @ehza
      @ehza 16 дней назад

      That's interesting!

  • @bobbyjunelive1993
    @bobbyjunelive1993 15 дней назад

    Thank you for not saying ‘right’ after every claim. I would enjoy more bench-engineering discussions like this.
    So we could have our leaders remain in our view…and so happy, I would say, that we have likable, believable and even attractive figureheads. Even in the hardware zone like Nvidia…everyone is stellar. Love it. Now we can have this top level working scientists. Brains we need to hear from to fill in the gargantuan gaps the founders must leave out. Would love more.
    One last level would be worth trying is a group that consists of zero leads. Maybe not only coders but maybe one coder, a marketer, tech support, psych, etc.
    Let’s expand this transparency (not for safety) so we can not only enjoy the exercise of it all but also another smattering of education to let us out here move with you as you unleash it all.
    Thank you.

  • @TheExodusLost
    @TheExodusLost 18 дней назад +3

    This is a great type of content. Getting ai researchers in a room together talking to other researchers is a new take for most of us.
    I’m curious if they get a bonus or incentive to do this podcast, the woman seems a little nervous!

  • @devinhansa2137
    @devinhansa2137 18 дней назад +1

    Great!!!

  • @ehza
    @ehza 18 дней назад +2

    ❤ This is cool!

  • @ItzGanked
    @ItzGanked 17 дней назад +1

    more content like this

  • @samyman2006
    @samyman2006 18 дней назад +1

    Nice!

  • @arnavprakash7991
    @arnavprakash7991 6 дней назад

    Really hope claude sonnet 3.5 brings more attention to Anthropic, rn the only solid competition to Open AI

  • @RalphDratman
    @RalphDratman 13 дней назад

    This is fantastic news -- that you've been able to do this.
    Is it possible for the world to experience some of this directly?

  • @FlorentTavernier
    @FlorentTavernier 7 дней назад

    based

  • @imai-pg3cz
    @imai-pg3cz 18 дней назад +1

  • @elhorriabdelbasset2550
    @elhorriabdelbasset2550 17 дней назад +1

    more like this

  • @fintech1378
    @fintech1378 17 дней назад +1

    wheres Ilya

  • @jinsong2231
    @jinsong2231 7 дней назад +1

    ‪I'm a veteran of llm, developer to be exact, I've used gpt, qwen, kimi, 01, llama, etc, but claude was locked out of my account after I signed up, I've never even experienced it once, you're never going to get past openai and meta, cheapskates‬

  • @sutthiguy1584
    @sutthiguy1584 17 дней назад

    why i thinking about private cloud compute of AI from apple at will coming soon 🤔

  • @jaydwivedi8399
    @jaydwivedi8399 9 дней назад

    one of them sounds like sam altman

  • @shawnfromportland
    @shawnfromportland 18 дней назад +1

    heady stuff 🍻

  • @420_gunna
    @420_gunna 17 дней назад

    50:00 superalignment in shambles

  • @nanow1990
    @nanow1990 17 дней назад +1

    Multi Modal Image to text Feature is because YOU TRAINED IT WITH CAPTIONS THUS TEXT CAPTION ON IMAGE IS TIED TO TEXT

    • @nanow1990
      @nanow1990 17 дней назад +1

      HOW CAN"T THEY UNDERSTAND. IT'S INSANE THAT THEY ARE GETTING PAID FOR THIS

    • @nanow1990
      @nanow1990 17 дней назад +1

      when multi-modal models WILL form features with images and text WHEN images were WITHOUT captions then we have something BIG.

    • @nanow1990
      @nanow1990 17 дней назад +1

      in other words, models have to learn visuals without captions.

  • @CantataOnslaughta
    @CantataOnslaughta 17 дней назад +3

    These are the kids you used to pick on in school and now they’re building your robot overlord

  • @dharmaone77
    @dharmaone77 17 дней назад +2

    OpenAI office still looks more comfy - oak panelling and more plants

  • @oowaz
    @oowaz 17 дней назад

    but why would the image of a dog firing at the same time as the word "dog" is mentioned be impressive tho? aren't those images fed to the model with labels and stuff, meaning it had reason to connect the word dog and its image...

    • @lyeln
      @lyeln 17 дней назад

      I think the impressive thing is not the single embedding for "dog", but that if you present an image of a dog, many features can fire at the same time, such as "protectiveness", "loyalty", "wilderness", "playing games", "veterinary knowledge" etc. Basically concepts related to the object "dog" which demonstrates that the model has a fine conceptual and multidimensional understanding of what a dog is.

    • @oowaz
      @oowaz 17 дней назад

      @@lyeln yeah but i feel like that happens as a result of the nature of organization, the patterns related to a dog repeat often enough that it reinforces those connections, that part feels intuitive to me... like would there be a more efficient way to learn and organize things based on the it's being fed?

    • @nanow1990
      @nanow1990 17 дней назад

      you are right.
      models have to learn visuals without captions
      these people have neither a real knowledge nor foundation to speak about machine learning.

    • @oowaz
      @oowaz 17 дней назад

      @nanow1990 if that's towards me, i was just asking a question. if that's the case the sheer amount of pattern repetition resulting in those associations isn't necessarily the most impressive thing - what i question is how the researcher can't think of one thing that is more impressive than a model doing something that was designed and fine tuned to do, to make correct predictions it must make correct associations... it would be impressive if it made nonsensical connections in it's entirety and still managed to output coherent answers

    • @nanow1990
      @nanow1990 17 дней назад

      @@oowaz i agree. I was talking about people from the video.

  • @wi2rd
    @wi2rd 17 дней назад +2

    This whole ordeal reminds me of doom.
    bunch of mad scientists spend immense amounts of time and money to try and open a gate to another dimension, only to find it was a gate a to hell and a never-ending stream of demons take over the planet.

    • @keepmehomeplease
      @keepmehomeplease 17 дней назад

      Except that is a fairytale and this is REAL LIFE. You are a prime example of “fear what you don’t understand”. Compared to the rest of the world, these people are rather sane, sharing an intimate passion in advancing human knowledge.
      I fear that the real doom is in the silence and ignorance of masses like you.
      Scared? Help us understand it. Angry? Voice your concerns. Tired? So are we.
      Hell is very real, and we are living in it. Who you should FEAR is the man in the mirror remaining absent in the unfolding events of human extinction. Bring food to the table if you want to be fed.

  • @lukalot_
    @lukalot_ 17 дней назад +1

    why does this video look AI generated to me. I think my AI detection has went haywire. or just the mouths are out of sync and the skin is softened.

  • @grantemerson5932
    @grantemerson5932 13 дней назад

    Claude is vegan

  • @mikezooper
    @mikezooper 16 дней назад

    This video isn’t useful to me. I’ll have to do research before it makes sense.

  • @jinsong2231
    @jinsong2231 7 дней назад

    ‪I'm a veteran of llm, developer to be exact, I've used gpt, qwen, kimi, 01, llama, etc, but claude was locked out of my account after I signed up, I've never even experienced it once, you're never going to get past openai and meta, cheapskates‬