Neel Nanda on What is Going on Inside Neural Networks

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024

Комментарии • 5

  • @snarkyboojum
    @snarkyboojum Год назад +2

    TL;DR -> This video discusses the concept of mechanistic interpretability, which is a set of methods and techniques used to reverse engineer AI models to understand their thought processes. It also discusses the concept of instrumental convergence, which is the idea of an AI system having goals, understanding its context, and being competent enough to use its understanding of its context to deceive. Mechanistic interpretability could be the right tool for understanding the goals of a model and the algorithms it follows, and could be used to distinguish between deceptive and honest AI systems.

  • @antigonemerlin
    @antigonemerlin Год назад +2

    The example with fourier transforms used to solve addition is really interesting. Our current AIs are truly alien ways of thinking, and perhaps by studying them, we can also learn about our own blindspots.
    What's simple for us isn't necessarily objectively simple. For example, graphs and visualizations are a form of data analysis that relies on using our innate visual abilities to make connections, but quite frankly it would be simpler to ingest the data and spit out analysis, save for the fact that evolution did not bless us with a data analysis organ.

  • @spasibushki
    @spasibushki Год назад +3

    Was it mentioned in a podcast how he gets funded for independent research?

    • @artemisgaming7625
      @artemisgaming7625 Год назад +2

      The Effective Altruism Long Term Future Fund. He wrote about it briefly on his LessWrong posts.

  • @chrismathwin6971
    @chrismathwin6971 Год назад +1

    Really interesting talk, I’m looking forward to the next part!