Anthropic Caught Their Backdoored Models (Walkthrough)

How might LLMs store facts | Chapter 7, Deep Learning

CSU Bioinformatics Seminar Kelley Fall2024

Shakira - Soltera (Official Lyric Video)

SEC Shorts - Vandy beats #1 Alabama

THE HATE ON MY MUSIC IS FORCED

Anthropic Solved Interpretability Again? (Walkthrough)

The Inside View

Просмотров 2 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 10 окт 2024
Another Anthropic paper, another walkthrough.
Paper: transformer-ci...

Комментарии • 9

@drhxa 4 месяца назад ⁺⁴
That was an excellent walkthrough, thank you. I've learned a lot. Would love to see more walkthroughs of the prior/related work
@TheInsideView 4 месяца назад ⁺²
Thanks! My walkthrough of the previous Anthropic paper (prior work): ruclips.net/video/HAxd8DoZaW4/видео.html
For other interpretability papers I'd recommend checking out Neel Nanda's series of walkthroughs (he's actually leading a mechanistic interpretability team at DeepMind): ruclips.net/p/PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY&si=tLqxLua5XZEdbyCy
@christopherwoodall3464 4 месяца назад ⁺⁴
Great overview. Really enjoyed the fact that you showed previous work that was built upon.
@TheInsideView 4 месяца назад
Thanks! To be honest I only briefly mentioned their previous work and don't think I actually went through previous work in the literature (was just doing a walkthrough of their blogpost, still doing daily uploads), but I'll definitely consider this preference to discuss previous work for future videos
@Alice_Fumo 4 месяца назад ⁺¹
This seems actually useful and has real-world applications.
It seems this allows for actually adjusting the personality of the model, so one could make it more adverse to writing code with bugs, more flirty, more honest or whatever. The big AI labs could adjust small details without needing to retrain the AI.
Also, I guess this could be done with open source models to figure out their "deny response" features and set them to very low values. It can be done with retraining, but that also just changes the model. Not needing such brute-force-y methods is neat.
@TheInsideView 4 месяца назад
Yeah exactly, that enables to steer them in the way that you'd prefer. If you haven't tried it yet I'd recommend checking out Golden Bridge Claude (which I talk about in the video) available on claude.ai for a limited time, which basically gives a concrete example of what having a custom steered LLM would be like.
@Alice_Fumo 4 месяца назад
@@TheInsideView I asked it to go one prompt without mentioning the bridge and tell me a bedtime story and it got extremely internally conflicted, retrying several times and wondering why it had such difficulty with this.
It's extremely interesting to witness. Thanks for notifying me that they were hosting that model, I didn't know.
@ThomasMeliWellness 4 месяца назад ⁺¹
Crystal clear. Thank you for sharing this. Subscribed!
@TheInsideView 4 месяца назад
Thanks! Tomorrow's video will be another walkthrough so hopefully worth the sub

Следующие

Автовоспроизведение

Anthropic Caught Their Backdoored Models (Walkthrough)

Anthropic Caught Their Backdoored Models (Walkthrough)

How might LLMs store facts | Chapter 7, Deep Learning

How might LLMs store facts | Chapter 7, Deep Learning

CSU Bioinformatics Seminar Kelley Fall2024

CSU Bioinformatics Seminar Kelley Fall2024

Shakira - Soltera (Official Lyric Video)

Shakira - Soltera (Official Lyric Video)

SEC Shorts - Vandy beats #1 Alabama

SEC Shorts - Vandy beats #1 Alabama

THE HATE ON MY MUSIC IS FORCED

THE HATE ON MY MUSIC IS FORCED

Hurricane Milton impacts travel nationwide

Hurricane Milton impacts travel nationwide

The Economics of AGI Automation

The Economics of AGI Automation

Day 275 - Reading books and going back to Stats/Probs basics

Day 275 - Reading books and going back to Stats/Probs basics

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

I Am The Golden Gate Bridge & Why That's Important.

I Am The Golden Gate Bridge & Why That's Important.

Terence Tao at IMO 2024: AI and Mathematics

Terence Tao at IMO 2024: AI and Mathematics

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

Scaling interpretability

Scaling interpretability

Day 274 - Starting Fraud Detection with Neo4j & Graphs

Day 274 - Starting Fraud Detection with Neo4j & Graphs

How are holograms possible?

How are holograms possible?

🤣 Дедуля решил потанцевать прямо в вагоне, шокировав жену и всех пассажиров! | Новостничок

🤣 Дедуля решил потанцевать прямо в вагоне, шокировав жену и всех пассажиров! | Новостничок

Skill carries games, not skins! #standoff #animation #skill

Skill carries games, not skins! #standoff #animation #skill

ПОНТОРЕЗКА САША BELAIR / ОБЗОР

ПОНТОРЕЗКА САША BELAIR / ОБЗОР

2 дня БЕЗ АНТИФРИЗА. Что с мотором?

2 дня БЕЗ АНТИФРИЗА. Что с мотором?

Дверь закрой! #aminkavitaminka #aminokka #аминкавитаминка #адияперсик

Дверь закрой! #aminkavitaminka #aminokka #аминкавитаминка #адияперсик

КУШАТЬ ХОЧЕШЬ? #дистори

КУШАТЬ ХОЧЕШЬ? #дистори

Сюрприз для Златы на день рождения

Сюрприз для Златы на день рождения

мой дрон ПОЙМАЛ ПОЖИРАТЕЛЯ МОРЕЙ, ЭЛЬ-ГРАН-МАЙЮ И БЛУПА В РЕАЛЬНОЙ ЖИЗНИ!

мой дрон ПОЙМАЛ ПОЖИРАТЕЛЯ МОРЕЙ, ЭЛЬ-ГРАН-МАЙЮ И БЛУПА В РЕАЛЬНОЙ ЖИЗНИ!