This AI Microscope breaks open LLM inner secrets!!!

Поделиться
HTML-код
  • Опубликовано: 22 окт 2024

Комментарии • 17

  • @thenoblerot
    @thenoblerot 2 месяца назад +5

    The latest Anthropic paper on interpretability noted that Claude had different features activate for typos and code typos. They also gave poor Claude a melt down by force activating "evil" features.

    • @adg8269
      @adg8269 2 месяца назад

      Can you elaborate on the evil features? Thanks

  • @f1l4nn1m
    @f1l4nn1m 2 месяца назад +1

    The 2B model you used is pretty small and one can see it clearly from it’s inability to perform in the steering section, and by the fact that it doesn’t get “stories” from a sentence that has an explicit mention to them.
    The steering models (San Francisco example) is interesting but I guess if one has an extensive corpus, one can use it to boot the concepts mentioned within it.

  • @puneet1977
    @puneet1977 2 месяца назад +1

    Very interesting. Glad you covered it.
    Q: how (can we?) we use this feature control on other popular models. I am guessing those controls are not exposed or not offered. Correct? Which models offer these, all and only open sourced one? And is the only way to then use it via privately hosting the model?

    • @1littlecoder
      @1littlecoder  2 месяца назад

      This model that Google has released specifically for Gemma. Anthropic released something but they had hosted with some pre built categories.

  • @AbhijitKrJha
    @AbhijitKrJha 2 месяца назад

    This is cool, very close to what i was researching about. Do you know how they are able to steer the inference without training it to be steered based on feature labels. Are they able to recognize all the key pathways for each labels and reward the weights in these pathways during inference/ or simply they have trained on input/output corpus based on labels generated from extracting features out of output(by a large model) and analyzing the input against it(again by a large model) to figure out what category of terms in input predict a certain category of terms in output? I was trying to figure out the measurable atomic(very tiny changes with a pattern) features when we pass an image through each layer of CNN, also same for normal text processing like what extra atomic information do we get after each dense layer or at least after each attention layer, so that we can tune the layers and parameters for specific purposes plus we can reuse first few layers of a model in another model based on till what part commonality of purpose is required. Disclaimer: I am just a novice in this field as i started learning about AI few months back only, so please excuse my ignorance just in case these are foolish questions.

  • @mikem4405
    @mikem4405 2 месяца назад +2

    It seems like you could get the same results by putting something in the system prompt, like "give a preference to San Francisco". What is the advantage of this method?

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      Steering without giving it explicitly in the prompt is what we did by activation of that feature.

    • @ronilevarez901
      @ronilevarez901 2 месяца назад

      It's like a person told to pretend their a bridge and another person who actually believes their a bridge. It's a world of difference.

    • @mikem4405
      @mikem4405 2 месяца назад

      @@1littlecoder Right that's what I'm saying. It seems like there must be some special uses for this since we already know how to achieve these results.

    • @mikem4405
      @mikem4405 2 месяца назад +1

      @@ronilevarez901 I'm not sure it's that different. Aren't you activating certain neurons by putting something in the system prompt? How is that different from activating the "feature"?

  • @buchhibaburachakonda5646
    @buchhibaburachakonda5646 2 месяца назад

    The first one is confusing, Is it saying that we have labels already predicted by Gemma itself? Or there are some set of activations which we categorozie as labels when asking this question? Please

  • @pranjal9830
    @pranjal9830 2 месяца назад

    Hey just wondering can i use comfyui on google collab on my mobile phone using code only , for ipadapter , model , etc inpaint image to image , depth, png MAKEr , upscale using code only , i can use claude to write the code for the google collab While i just enter prompts, it would be possible in there free tier plan or not as it an very heavy software😅 ?

  • @NobleCaveman
    @NobleCaveman 2 месяца назад

    Is it incorrect to think that 'steering' based on different sets of features could be a kind of way of implementing a mixture of experts within the traditional transformer architecture?
    😂🎉😢😊😮❤ (gemma inspired)

  • @MichealScott24
    @MichealScott24 2 месяца назад +2

    ❤I Love It. I Was Excited To Learn About It, Idk It Might Be Fairly Simple or hard To Develop This Tool By Whoever Developed But I Love The Way We Can See What Things The Neural Network Are Reasoning, It Is Pretty Damn Cool! Like Just Like How We Humans Express Things In Tone And Everything And Multiple Factors In That Way The Audio Models Might Understanding Our Sentiment Like I Love This Tokenized Approach And Each Tokens Explanation Or Reasoning Provided In This Banger Website! Visualising subtle things or the UI or the features or awesome, I Am Loving It Like Obsessed To It Or Like Goosebumps Feelings

    • @1littlecoder
      @1littlecoder  2 месяца назад +1

      @@MichealScott24 glad to know that. Yes Google has just given the models. These folks have made it really nice to use it to learn inner workings

  • @buchhibaburachakonda5646
    @buchhibaburachakonda5646 2 месяца назад

    This clearly shows india is least favourite in training right?