EA Global Bay Area: 2024 | The Sword of Damocles | Dan Zimmer

The Turing Lectures: The future of generative AI

Do you think that ChatGPT can reason?

This Game is NOT The Office

HIGHLIGHTS | South Africa v All Blacks | Cape Town, 2024

NickEh30 reacts to Doom Event in Fortnite!

EA Global Bay Area: 2024 | Sleeper Agents | Evan Hubinger

Centre for Effective Altruism

Просмотров 391

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024
If an AI system learned a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? That's the question that Evan and his coauthors at Anthropic sought to answer in their work on ""Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training"", which Evan will be discussing.
Evan Hubinger leads the new Alignment Stress-Testing team at Anthropic, which is tasked with red-teaming Anthropic's internal alignment techniques and evaluations. Prior to joining Anthropic, Evan was a Research Fellow at the Machine Intelligence Research Institute and worked on a variety of theoretical alignment work, including ""Risks from Learned Optimization in Advanced Machine Learning Systems"". Evan will be talking about the Anthropic Alignment Stress-Testing team's first paper, ""Sleeper Agents: Building Deceptive LLMs that Persist Through Safety Training"".
Find out more about EA Global conferences at: www.eaglobal.org
Learn more about effective altruism at: www.effectivealtruism.org

Комментарии •

Следующие

Автовоспроизведение

EA Global Bay Area: 2024 | The Sword of Damocles | Dan Zimmer

EA Global Bay Area: 2024 | The Sword of Damocles | Dan Zimmer

The Turing Lectures: The future of generative AI

The Turing Lectures: The future of generative AI

Do you think that ChatGPT can reason?

Do you think that ChatGPT can reason?

This Game is NOT The Office

This Game is NOT The Office

HIGHLIGHTS | South Africa v All Blacks | Cape Town, 2024

HIGHLIGHTS | South Africa v All Blacks | Cape Town, 2024

NickEh30 reacts to Doom Event in Fortnite!

NickEh30 reacts to Doom Event in Fortnite!

PLAYING DRESS TO IMPRESS WITH LARRAY

PLAYING DRESS TO IMPRESS WITH LARRAY

Robert Greene: A Process for Finding & Achieving Your Unique Purpose

Robert Greene: A Process for Finding & Achieving Your Unique Purpose

AI-enhanced biodesign, DNA synthesis and risk mitigation | Nicole Wheeler | EAG London: 2024

AI-enhanced biodesign, DNA synthesis and risk mitigation | Nicole Wheeler | EAG London: 2024

Three Journeys for EA | Zach Robinson | EAG London: 2024

Three Journeys for EA | Zach Robinson | EAG London: 2024

How to Operationalize AI for Business, with IBM Consulting COO | CXOTalk #832

How to Operationalize AI for Business, with IBM Consulting COO | CXOTalk #832

SOC Core Skills w/ John Strand | August 2024 Day 1

SOC Core Skills w/ John Strand | August 2024 Day 1

GEOMETRIC DEEP LEARNING BLUEPRINT

GEOMETRIC DEEP LEARNING BLUEPRINT

Welfare and moral patienthood | Jeff Sebo, Daniela Waldhorn, & Patrick Butlin | EAG London: 2024

Welfare and moral patienthood | Jeff Sebo, Daniela Waldhorn, & Patrick Butlin | EAG London: 2024

Donald Hoffman: Reality is an Illusion - How Evolution Hid the Truth | Lex Fridman Podcast #293

Donald Hoffman: Reality is an Illusion - How Evolution Hid the Truth | Lex Fridman Podcast #293

Wild Animal Welfare Through the Lens of Population Ethics | Tim Campbell | EAGxNordics 2024

Wild Animal Welfare Through the Lens of Population Ethics | Tim Campbell | EAGxNordics 2024

Всегда проверяйте под кроватью

Всегда проверяйте под кроватью

Шок. Никокадо Авокадо похудел на 110 кг

Шок. Никокадо Авокадо похудел на 110 кг

ПЕРЕПИСКА НА САЙТЕ ЗНАКОМСТВ | БЕРЕМЕННАЯ против СТУДЕНТКИ

ПЕРЕПИСКА НА САЙТЕ ЗНАКОМСТВ | БЕРЕМЕННАЯ против СТУДЕНТКИ

ФОКУС -СВЕТОФОР

ФОКУС -СВЕТОФОР

🤯 Отдыхавшие на море нашли мину и удивили всех своими действиями! | Новостничок

🤯 Отдыхавшие на море нашли мину и удивили всех своими действиями! | Новостничок

«Солнечный» Сергей принял черную технику. Как вам?

«Солнечный» Сергей принял черную технику. Как вам?

爸爸误以为钱生钱，怎料又被儿子套路了！ #funny #萌娃 #comedy

爸爸误以为钱生钱，怎料又被儿子套路了！ #funny #萌娃 #comedy