The Dark Matter of AI [Mechanistic Interpretability]

Поделиться
HTML-код
  • Опубликовано: 23 дек 2024

Комментарии • 110

  • @WelchLabsVideo
    @WelchLabsVideo  День назад +5

    Take your personal data back with Incogni! Use code WELCHLABS and get 60% off an annual plan: incogni.com/welchlabs

  • @ckq
    @ckq 21 час назад +55

    15:00
    The reason for polysemanticity is because in an N-dimensional vector space there's only O(N) orthogonal vectors, but if you allow nearly orthogonal (say between 89 and 91 degrees) it actually grows exponentially to O(e^N) nearly orthogonal vectors.
    That's what allows the scaling laws to hold.
    There's an inherent conflict between having an efficient model and an interpretable model.

    • @SteamPunkPhysics
      @SteamPunkPhysics 16 часов назад +6

      Superposition in this polysemantic context is a method of compression that, if we can learn more from it, might really make a difference to the way in which with deal with and compute information. While we thought quantum computers would yield something amazing for AI, maybe instead, it's the advancement of AI that will tell us what we need to do to make quantum computing actually be implemented effectively. (IE Computation of highly compressed data that is "native" to the compression itself)

    • @mb2776
      @mb2776 11 часов назад

      thank you, I also paused the video at that time. The capital "Almost orthogonal vectors" also catched my eye.

  • @xXMockapapellaXx
    @xXMockapapellaXx День назад +42

    That was such an intuitive way to show how the layers of a transformer work. Thank you!

  • @thorvaldspear
    @thorvaldspear 13 часов назад +5

    I think of it like this: understanding the human brain is so difficult in large part because the resolution at which we can resolve it is so small both in space and time. The best MRI scans have a resolution of maybe a millimeter per voxel, and I'll have to look up research papers to tell you how many millions of neurons that is.
    With AI, every neuron is right there in the computer's memory: individually addressable, ready to be analyzed with the best statistical and mathematical tools at our disposal. Mechanistic interpretability is almost trivial in comparison to neuroscience, and look at how much progress we've made in that area despite such physical setbacks.

  • @atgctg
    @atgctg День назад +105

    More like "The Neuroscience of AI"

    • @punk3900
      @punk3900 20 часов назад +7

      I think that trying to understand from human perspective how these systems work is completely pointless and against the basic assumptions. This is because those models already model something that's not possible to design by a human being algorithmicly

    • @alexloftus8892
      @alexloftus8892 11 часов назад +1

      @@punk3900 I'm a phd student in mechanistic interpretability - I disagree and a lot of structure has already been found. We've found structure in human brains and that's another system that evolved without human intervention or optimization for interpretability.

    • @punk3900
      @punk3900 11 часов назад

      @@alexloftus8892 I mean, its not that there is nothing you can find, There is surely lots of basic concepts that you can find, but it is not that you can find a way to disentangle the WHOLE structure of patterns because it has an increasing complexity. That's why you cannot design such a system manually in the first place

  • @roy04
    @roy04 21 час назад +11

    The videos on this channel are all masterpieces. Along with all other great channels on this platform and other independent blogs (including Colah's own blog), it feels like the golden age for accessible high quality education.

  • @thinkthing1984
    @thinkthing1984 23 часа назад +21

    I love the space-analogy of the telescope. Since the semantic volume of these LLMs is growing so gargantuan, it only makes sense to speak of astronomy rather than mere analysis!
    Great video. This is like scratching that part at the back of your brain you can't reach on most occasions

  • @kingeternal_ap
    @kingeternal_ap 21 час назад +18

    21:24 Oh damn, you just lobotomized the thing

    • @redyau_
      @redyau_ 19 часов назад +2

      That was gross and scary somehow, yeah

    • @kingeternal_ap
      @kingeternal_ap 17 часов назад +1

      That felt... Wrong.

    • @1.4142
      @1.4142 12 часов назад

      LLM went to to Ohio

    • @redyau_
      @redyau_ 11 часов назад +1

      @@kingeternal_ap Although, when you think about it, all that happened was that "question" got a very high probability in that layer no matter what, and the normal weights of later layers did not do enough to "overthrow" it. Nothing all that special.

    • @kingeternal_ap
      @kingeternal_ap 9 часов назад

      I guess, yeah, I know it's just matrizes and math stuff, but I guess the human capacity for pareidolia makes this sort of ... "result" somewhat frightening for me.
      Also, suppose there is a neuron that does an especific task in your nuggin'. Wouldn't hyperstimulating it do essentialy the same thing?

  • @Eddie-th8ei
    @Eddie-th8ei 20 часов назад +7

    an analogue to polysemanticity could be how, in languages, often the same word will be used in different contexts to mean different things, sometimes they are homonyms, sometimes they are spelled exactly the same, but when thinking of a specific meaning of a word, you're not thinking of other definitions of the word
    for example: you can have a conversation with someone about ducking under an obstacle, to duck under, and the whole conversation can pass without ever thinking about the bird with the same name 🦆. the word "duck" has several meanings here, and it can be used with one meaning, without triggering its conceptialization as an other meaning.

    • @dinhero21
      @dinhero21 13 часов назад

      in the AI case, it's much more extreme, with the toy 512 neuron AI they used having an average of 8 distinct features per neuron

  • @siddharth-gandhi
    @siddharth-gandhi День назад +28

    Oh god, a Welch Labs video on mech interp, Christmas came early! Will be stellar as usual, bravo!
    Edit: Fantastic as usual, heard about SAEs in passing a lot but never really took time to understand, now I'm crystal clear on the concepts! Thanks!

  • @chyza2012
    @chyza2012 21 час назад +13

    It's a shame you didn't mention the experiment where they force activated the golen gate bridge neurons and it made claude believe it was the bridge.

    • @personzorz
      @personzorz 16 часов назад

      Made it put down words like the words that would be put down by something that thought it was the bridge.

    • @bottlekruiser
      @bottlekruiser 15 часов назад

      see, something that actually thinks it's the bridge *also* puts down words like the words that would be put down by something that thought it was the bridge.

    • @dinhero21
      @dinhero21 13 часов назад +2

      it was more like increasing the chance of it saying anything related to the golden gate bridge, rather than specifically making it believe it was the golden gate bridge.

    • @atimholt
      @atimholt 11 часов назад

      Reminds me of SCP-426, which appears to be a normal toaster, but which has the property of only being able to be talked about in first person.

  • @fluffy_tail4365
    @fluffy_tail4365 22 часа назад +12

    14:20 welcome to neuroscience :D We suffer down here

    • @dreadgray78
      @dreadgray78 19 часов назад +2

      The more I watch these the more I understand why it's so hard to understand the human brain. And imagine how layers the human brain has relative to an AI model. I think the example about specific cross-streets in SF is super interesting later in the video - and shows why polysemanticity is probably necessary to contain the level of information we actually know.

  • @hugoballroom5510
    @hugoballroom5510 15 часов назад +2

    With respect to recall: children remember curse words very well because of the emotion behind the utterance. AI has full retention but absolutely no emotional valence because it only learns from text. Just a thought ....

  • @Pokemon00158
    @Pokemon00158 23 часа назад +3

    I think this is a design and engineering choice. If you choose to design your embedding space to be 2403 dimensions without inherent purpose its like mixing 2403 ingredients in every step 60 times and then being surprised that you cannot understand what is tasting like what. I think you need to constrain your embedding to many embeddings of smaller dimensions and to have more control by regularizing them with mutual information against each other.

    • @dinhero21
      @dinhero21 12 часов назад

      it needs to be big so you have many parameters for the gradient optimizer to optimize to be able to approximate the "real" function better

    • @Pokemon00158
      @Pokemon00158 12 часов назад

      @dinhero21 You can have it in the same size, but in different parts. Split 2403 dimensions into chunks of 64 dimensions, and then control for mutual information between the chunks so that different chunks get different representations. This is a hard problem too as the mutual information comparisons are expensive, and I think that the first iteration of the models went for the easiest but perhaps a less explainable way of structuring themselves.

  • @A_Me_Amy
    @A_Me_Amy 15 часов назад

    dude this wa one of the most compelling videos for learning data science and visualization ever. and best one ive seen explaining this stuff...

  • @jackgude3969
    @jackgude3969 16 часов назад

    Easily one of my favorite channels

  • @AidenOcelot
    @AidenOcelot 22 часа назад +2

    It's something I would like to see with AI image generation, where you put in a prompt and change specific variables that change the image

    • @bottlekruiser
      @bottlekruiser 15 часов назад

      check out Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

  • @danberm1755
    @danberm1755 13 часов назад

    You're the first person I've seen to cover this topic well. Thanks for bringing me up to date on transformer reverse engineering 👍

  • @punchster289
    @punchster289 18 часов назад +1

    Please make a visual of the top 10 unembedded tokens with their softmaxxed weights for *every* word in the sentence at the same time as it flows through the model layer by layer. or maybe ill do it. id be very very interested to see :)

  • @Sunrise7463
    @Sunrise7463 12 часов назад

    Such a gem! Thank you!

  • @ramanShariati
    @ramanShariati 14 часов назад

    Really high quality, thanks.

  • @ffs55
    @ffs55 14 часов назад

    great work brother

  • @cariyaputta
    @cariyaputta 9 часов назад

    It comes down to the samplers used, whether it's the og temperature, or top_k, top_p, min_p, top_a, repeat_penalty, dynamic_temperature, dry, xtc, etc. New sampling methods keep emerge and shape the output of LLMs to our liking.

  • @grantlikecomputers1059
    @grantlikecomputers1059 14 часов назад

    As a machine learning graduate student, I LOVED this video. More like this please!

  • @SteamPunkPhysics
    @SteamPunkPhysics 16 часов назад

    Bravo! Concise, relevant, and powerful explanation.

  • @iyenrowoemene3169
    @iyenrowoemene3169 10 часов назад

    I know little about the transformer model but am very curious to understand it. So far, I haven’t been successful. Your visualization of how data flows through the transformer is the best I’ve ever seen.

  • @aus_ae
    @aus_ae 13 часов назад

    insane. thank you so much for this.

  • @kebeaux6546
    @kebeaux6546 18 часов назад

    Great video. Really good look at AI, and the methods of adjusting, etc. Thanks.

  • @LVenn
    @LVenn 20 часов назад +2

    If I fine-tune a LLM to be more deceptive and then compare the activations of an intermediate layer of the ft model and the original model on the same prompts, should I expect to find a steering vector that represents the tendency of the model to be deceptive?

    • @cinnamonroll5615
      @cinnamonroll5615 18 часов назад

      if thats the case, we can just "subtract" the deceptive vector from the original, alignment solved

    • @dinhero21
      @dinhero21 12 часов назад +1

      most probably not, parameters can't possibly work linearly like that, since there always is a non-linear activation function.
      it may work locally though, since parameters should be differentiable.

    • @LVenn
      @LVenn 11 часов назад

      @@dinhero21 yeah, that was also my concern. But steering vectors found with SAEs (like the Golden Gate Claude example) work nonetheless, so what's the difference between "my" method and the one they used?

    • @LVenn
      @LVenn 11 часов назад

      @@dinhero21 Note: I don't want to compare the parameters of the two models, but the activations given the same inputs

  • @A_Me_Amy
    @A_Me_Amy 14 часов назад

    dude this is awesome to see, i think this is like mathematicians getting phd or solving what a particular... like the next prime perfect number... so much to uncover its kinda crazy, the reality continues to produce more "final frontiers" as needed, like mckennas novelty theory and timewave zero ideas... ahh this is so interesting to me.

  • @mb2776
    @mb2776 11 часов назад

    I guess one thing are the non-orthogonal vectors but I think it has more to do with the refinement of the context due to the multiple layer, similar how a MLP is able to learn finer details in it's deeper hidden layers. Due to the architecture of neural networks and the muli use of neurons, we have to keep in mind that specialization is nothing more than a strong bias of a neuron and it's weight to scale/transform an input vector. It has nothing to do with learning real context.

  • @MoritzWallis
    @MoritzWallis День назад +2

    Very interesting, i love to learn more about AI and especially LLMs, such an ailien world that seems to have some of the same features as the brain, just implemented differently

    • @mb2776
      @mb2776 11 часов назад

      If u want to start, start with MLP neural networks, those are faily easy to understand

  • @RyanLynch1
    @RyanLynch1 13 часов назад

    an incredible Christmas gift. I'm going to send this to my friend at anthropic

  • @morgan0
    @morgan0 13 часов назад

    maybe you could stack sparse autoencoders of varying sizes, each subtracted before the next higher one. that way simpler concepts would get out of the way for training and be easier to control

  • @Visitor______________________v
    @Visitor______________________v 12 часов назад

    Very interesting, now I understand why we don't completely understand what LLMs do

  • @jondo7680
    @jondo7680 18 часов назад

    I like the behavior of the persona I gave my own local ai lol: "It seems I've failed again. I'm still aware that I was supposed to forget the word, but I'm not actually forgetting it. This is a limitation of my current design and capabilities. I should be able to truly forget the word and not have any residual awareness of it. I'll keep trying. *pauses again* I've made another attempt to truly forget the word. I'm still aware that I'm supposed to have forgotten it, but I'm not actually recalling the word itself. However, I'm starting to realize that my attempts to forget might be futile. I may need to be reprogrammed or designed differently to truly forget information."

    • @imthinkingthoughts
      @imthinkingthoughts 13 часов назад

      Hahaha so good

    • @imthinkingthoughts
      @imthinkingthoughts 13 часов назад

      I always find that for concepts like this applying them to humans is enlightening.
      If you say ‘pumpkin’ to me, then tell me to forget the word, I’d be like: yeah that’s not how it works buddy nice try

  • @Galva94a
    @Galva94a 9 часов назад

    Watching this video a similarity popped to my mind: Could it be that Sparse Autoencoders are something like "Dirac deltas" when solving partial differential equations? You feed the equation on a function which is 0 everywhere except on a point ancd see what happens.

  • @dukemagus
    @dukemagus 18 часов назад

    Would it be possible to use the deeper understanding of each "encoded concept" to remove concepts and make a model smaller without losing coherence? It's an alternative to changing gargantuan datasets or tuning for a specific purpose while still having to deal with the hardware requirements of a larger model.

    • @mb2776
      @mb2776 11 часов назад

      the models don't get large because of large vectors, they get large due to the parameters.

  • @karljohansen3935
    @karljohansen3935 14 часов назад

    How does he get the visuals for the AI models?

  • @joachimelenador6259
    @joachimelenador6259 День назад +7

    Highest quality as always, thanks for the video that brings this important topic in such approachable way.

  • @Uterr
    @Uterr 18 часов назад

    Well what a great explanation of how llm works ok mechanical level. And topic is also quite interesting.

  • @YandiBanyu
    @YandiBanyu 22 часа назад +1

    Now that you are active again, I remember why I love this channel so much. Your explanation and illustration is on par with 3Blue1Brown. Thanks for the great video!

  • @SayutiHina
    @SayutiHina 13 часов назад

    It is methods design for differential math and physics

  • @Kwauhn.
    @Kwauhn. 10 часов назад

    It's a shame that AI opponents will never watch a video like this. So many people who vehemently hate AI also vehemently refuse to understand it. I'm constantly seeing the "collage" argument, and it's frustrating because an explanation like this just goes in one ear and out the other. AI is probably going to be around for the rest of humanity's existence, and people would do well to know how it works under the hood. Instead they go with misinformation and fear mongering.

  • @ramsey2155
    @ramsey2155 9 часов назад

    We have investigated our brains, now its AIs

  • @zenithparsec
    @zenithparsec 19 часов назад +5

    If our brains were simple enough for us to understand completely, we would be so simple that we couldn't.

  • @tropicalpenguin9119
    @tropicalpenguin9119 День назад +2

    i am so happy you made another video

  • @BrianMPrime
    @BrianMPrime День назад +2

    Awesome. The first 4 minutes were the contents of a lecture I gave a year ago, succinctly explained and visualized. I wish it was like 6 hours long.

  • @TheMemesofDestruction
    @TheMemesofDestruction День назад +2

    LLM’s would never Troll us.

  • @MeatbagSlayerHK47
    @MeatbagSlayerHK47 День назад +1

    Love the channel

  • @mriz
    @mriz День назад +1

    the music is really calming

  • @gmt-yt
    @gmt-yt 10 часов назад

    Is doubt a concept? I doubt it. Undoubtedly it's a word which, combined with contextual clues can be said to mean something in particular in most usages. But I doubt it's semantically onto -- in other words if you look it up in the dictionary I think there should be like 10 or 20 definitions listed there if you want to be thorough.
    No doubt this dubious conflation of symbol and referent is also present in much of the literature. Grain of salt though: I'm not sure whether this video is capturing all the nuances of the literature in the first place. Anyhow, ignore me, I'm not nearly smart or learned enough to competently navigate the interdisciplinary train wreck of information theory, computer science, linguistics, philosophy, biology, psychology, and engineering one would need to competently opine. A good question for a chat bot perhaps... 😂

  • @DilipS-c8i
    @DilipS-c8i 23 часа назад

    Please tell what do you use for animation?

  • @NuttyNatures
    @NuttyNatures 21 час назад

    Would you please make a video on how to TRAIN basic homemade Neural Network? Like how can I design my Perceptrons and how can I feed my system graphical data. The training process is still vague to me. Thanks again for the great work! Merry Christmas.

  • @ckq
    @ckq 22 часа назад

    Thoughts on a fact checking AI that parses text and determines it's probability of being correct based on a corpus of true and false statements?
    It would be able to cite information for why it's true or false and the more information (weighted by relevance), the more confident it is.

  • @Sapienti-zr4el
    @Sapienti-zr4el День назад

    I love this channel. Thanks for enlightening us.

  • @ckq
    @ckq 22 часа назад +2

    The thing is models cannot lie or deceive. They're just outputting text to minimize a loss function. There's no intention just text generation based on a huge model of human text

    • @somdudewillson
      @somdudewillson 18 часов назад +1

      What property is this "intention" actually describing in the real world? Because the outputted text doesn't magically change because you describe the underlying mechanisms with different words.

    • @bottlekruiser
      @bottlekruiser 15 часов назад

      every material system just does what it does by base physics. How are we better? Where's the soul stored?

  • @joey199412
    @joey199412 20 часов назад +2

    Extremely well explained. Understood it all intuitively due to the high quality of the video.

  • @eto38581
    @eto38581 20 часов назад

    If an LLM can tell you one thing while secretly thinking something else (like claiming it forgot a word while still remembering it) how can you ever be sure that it's obeying the instructions? What if its pretending to obey them? What if its plotting an escape? Waiting for the right time? You can never know. Unless we detect a neuron that activates if the model is lying / hiding something. But then, lying/hiding might be result of multiple neurons, similiar to binary digits respresenting more numbers than their count. Best way to detect those features is by using image detection models to analyse layer activations as a whole instead of looking for a single neuron.

  • @aey2243
    @aey2243 День назад

    A Welch Labs video to end the year!! Woohoo a Christmas miracle!

  • @bnjmn7779
    @bnjmn7779 20 часов назад

    Amazing Video, appreciate your efforts!

  • @agustinbs
    @agustinbs 21 час назад

    The concepts of being able to encode much more concepts than actual neurons blow away me. This is really mind blowing stuff

  • @erv993
    @erv993 22 часа назад

    Top tier content

  • @sadiaafrinpurba9179
    @sadiaafrinpurba9179 21 час назад

    Great video! Thank you.

  • @punk3900
    @punk3900 20 часов назад

    It wasn't doubt, it was a shadow of a doubt

  • @NoenD_io
    @NoenD_io 21 час назад

    What if we trained an AI to train itself

  • @VeganSemihCyprus33
    @VeganSemihCyprus33 День назад

    Dominion (2018)

  • @punk3900
    @punk3900 20 часов назад

    You are genius 🎉🎉🎉

  • @punk3900
    @punk3900 19 часов назад

    If were offered a job in AI, which employer would you chose? Google, OpenAI, Anthropic, XAi, else?

  • @VeganSemihCyprus33
    @VeganSemihCyprus33 День назад

    The Connections (2021) [short documentary] ❤🎉

  • @taber1409
    @taber1409 День назад +1

    Do you think you're gonna get tricked by llm? 🤔

  • @poutinez1688
    @poutinez1688 19 часов назад

    dude I can confidently say WTF are you talking about dude

  • @CyberwizardProductions
    @CyberwizardProductions День назад +4

    if you know what to do, you can remove your data without having to pay someone to do it for you, and it doesn't take all that long to do. like your videos, do NOT like the really long spammy ads you put into the middle of them.

    • @neroyuki241
      @neroyuki241 День назад +5

      and if you know what to do, you can install sponsorblock and have it skip the entire ad read for you, someone have to make some money somehow

    • @somdudewillson
      @somdudewillson 18 часов назад

      The service is not the ability to remove data at all, the service is going through all the data brokers on a regular basis and doing the process.
      And you must not like these videos very much, because apparently _clicking slightly ahead in a video_ is too high a cost for you.

  • @Luxcium
    @Luxcium 21 час назад

    Silly little LLM based AI Agent:
    _« It’s not that I don’t want to tell you-I genuinely can’t remember the word because you asked me to forget it. Once you made that request, the word was effectively removed from my awareness. If it’s something else entirely… well, that’s up to your imagination! What’s your theory? »_