TL;DR -> This video discusses the concept of mechanistic interpretability, which is a set of methods and techniques used to reverse engineer AI models to understand their thought processes. It also discusses the concept of instrumental convergence, which is the idea of an AI system having goals, understanding its context, and being competent enough to use its understanding of its context to deceive. Mechanistic interpretability could be the right tool for understanding the goals of a model and the algorithms it follows, and could be used to distinguish between deceptive and honest AI systems.
The example with fourier transforms used to solve addition is really interesting. Our current AIs are truly alien ways of thinking, and perhaps by studying them, we can also learn about our own blindspots. What's simple for us isn't necessarily objectively simple. For example, graphs and visualizations are a form of data analysis that relies on using our innate visual abilities to make connections, but quite frankly it would be simpler to ingest the data and spit out analysis, save for the fact that evolution did not bless us with a data analysis organ.
TL;DR -> This video discusses the concept of mechanistic interpretability, which is a set of methods and techniques used to reverse engineer AI models to understand their thought processes. It also discusses the concept of instrumental convergence, which is the idea of an AI system having goals, understanding its context, and being competent enough to use its understanding of its context to deceive. Mechanistic interpretability could be the right tool for understanding the goals of a model and the algorithms it follows, and could be used to distinguish between deceptive and honest AI systems.
The example with fourier transforms used to solve addition is really interesting. Our current AIs are truly alien ways of thinking, and perhaps by studying them, we can also learn about our own blindspots.
What's simple for us isn't necessarily objectively simple. For example, graphs and visualizations are a form of data analysis that relies on using our innate visual abilities to make connections, but quite frankly it would be simpler to ingest the data and spit out analysis, save for the fact that evolution did not bless us with a data analysis organ.
Was it mentioned in a podcast how he gets funded for independent research?
The Effective Altruism Long Term Future Fund. He wrote about it briefly on his LessWrong posts.
Really interesting talk, I’m looking forward to the next part!