Open Problems in Mechanistic Interpretability: A Whirlwind Tour | Neel Nanda | EAGxVirtual 2023

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • Mechanistic Interpretability is a sub-field of AI Alignment that studies trained neural networks and tries to reverse-engineer the algorithms they've learned. In this talk, Neel Nanda gave an overview of the field, key works, and some of the open problems.
    Learn more about effective altruism at: www.effectivealtruism.org
    Find out more about EA Global conferences at: www.eaglobal.org

Комментарии •