Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 6

  • @ShaneGraffiti
    @ShaneGraffiti 25 дней назад +4

    YOU HAVE THE BEST PODCAST ON RUclips. Edge of my seat over here with every episode. Thanks for making these!

  • @xt-89907
    @xt-89907 25 дней назад +5

    I tried reproducing this research from scratch and the surprising thing was how expensive it would be from a compute perspective.
    I imagine that in the future, we’ll have pretrained sparse autoencoders shipped with pretrained models. Then the end user would fine tune both of them for their downstream application. That’s the thing that’s needed and maybe their approach

  • @xinehat
    @xinehat 23 дня назад

    What an absolutely fascinating and exciting subject. I hope the field gets all the money thrown at it!

  • @Ken00001010
    @Ken00001010 25 дней назад +4

    This is a terrific subject, and I hope things from mechanistic interpretability work back into neuroscience as better brain scanning comes on line. Extraction of information structure is the root of compression. What abstraction do we get when we train a model on the weights from another pre-trained model? Would it be able to predict the changes to those weights as that pre-trained model is then given more training? ("Model, model thyself.")

  • @mohammadzare2204
    @mohammadzare2204 25 дней назад +2

    👏

  • @GNARGNARHEAD
    @GNARGNARHEAD 25 дней назад +1

    if you've got 100 million in a model and you can't figure out what's going awry, who you gonna call.. Goodfire! 😂 that's interesting though, I wasn't aware of sparse autoencoders