The SIR Model : Data Science Concepts

The unexpected probability result confusing everyone

Why You Shouldn't Trust Your ML Models (...too much)

Captain America: Brave New World | Official Trailer

I Found Rarest Secret Mobs In Minecraft

The Onion buys Alex Jones' Infowars at auction with help from Sandy Hook families

All The Math You Need For Attention In 15 Minutes

ritvikmath

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 14 ноя 2024

Комментарии • 32

@ritvikmath Месяц назад ⁺⁷
Ongoing Notes:
1. I should note that the concept of which words pay attention to which others doesn't always line up with our human expectations. In this video, I frequently claim that "meal" should attend to "savory" and "delicious" but if you look at the attention weights matrix at 9:25, "meal" attends the most to "savory" but not so much to "delicious". In reality, the model is going to do what it needs to do to excel at next word prediction, which might mean taking a different approach to setting the attention layer weights than what our human brains would "neatly expect". Still, the illustration of "meal" attending to "savory" and "delicious" is usually correct, but I wanted to clarify that it's not guaranteed and that's not a bad thing.
@kumaraman147 21 день назад ⁺¹
Best video I have ever seen for explaining attention mechanism and now I got cleared about attention ❤
@YourDailyR Месяц назад
This channel is gold!!!
@eingram Месяц назад ⁺³
Perfect timing, learning about this in class right now!
@ritvikmath Месяц назад ⁺²
You got this!
@tantzer6113 Месяц назад ⁺⁵
Question: LLMs obviously 1) account for hierarchies of concepts/abstractions, 2) perform complicated logical operations, decision-tree-like, on those concepts (and words). Having read about attention and having watched a dozen videos on it, I have never encountered an explanation of how attention can do these things. My guess is that the layering of attention layers is instrumental in all of that but I have seen no discussion or explanation of this.
@ScilentE Месяц назад
I'm not sure if I would say LLMs "obviously" do those two things, but they are certainly emergent behaviors due to increases in compute. Scaling laws are pretty cool!
@rubncarmona Месяц назад ⁺¹
great job. I've been studying the subject by myself and had missed the visualization of vector sums in the value space. thanks for posting.
@ritvikmath Месяц назад ⁺²
Glad it was helpful!
@Omsip123 Месяц назад ⁺¹
Liked , subscribed, and commented. This is pure gold!
@ritvikmath Месяц назад
Thanks a ton!
@Darkev77 Месяц назад ⁺²
Oof, I really needed this a while ago, finally!
@ritvikmath Месяц назад
Sorry to be late but I hope it was worth it!
@softerseltzer Месяц назад
Great explanation, loved it!
@ritvikmath Месяц назад
Glad you liked it!
@bin4ry_d3struct0r Месяц назад
Fantastic explanation! For the next videos in this series, please touch upon the role of the residual connection. I'm still iffy on what it's doing.
@ritvikmath Месяц назад
Great suggestion!
@horoshuhin Месяц назад
yessssss. let's talk about those in the next videos. this is a great channel for the way you explain things. I don't know if it;s too far ahead but it would be awesome to see some small code examples too.
@ritvikmath Месяц назад
Working on it!
@jessicatran5467 Месяц назад
thank you for these videos !!!
@ritvikmath Месяц назад
Of course!
@vedantvashi9051 Месяц назад ⁺¹
Can you do a video on how input(ex. Words, videos, audio) are tokenized into vectors
@juanluisesteban7394 Месяц назад
This great. Thanks
@ritvikmath Месяц назад
No problem!
@sonaliganguli6553 Месяц назад ⁺³
I waited for it for months..
@ritvikmath Месяц назад ⁺¹
sorry for the wait! hope it is worth it 😎
@TheGhostWinner Месяц назад
The values for the attention of a word on the Attention Matrix are on the lines? What does the columns represent? I always imagined this matrix to be like a covariance matrix, but by the looks of it I could be more wrong
@TechWithAbee Месяц назад
❤thanks
@ritvikmath Месяц назад ⁺¹
Of course!
@radionnazmiev546 Месяц назад
Amazing video! Would be nice to see how its actually calculated on a small few words sentence
@ritvikmath Месяц назад
Great suggestion!

Следующие

Автовоспроизведение

The SIR Model : Data Science Concepts

The SIR Model : Data Science Concepts

The unexpected probability result confusing everyone

The unexpected probability result confusing everyone

Why You Shouldn't Trust Your ML Models (...too much)

Why You Shouldn't Trust Your ML Models (...too much)

Captain America: Brave New World | Official Trailer

Captain America: Brave New World | Official Trailer

I Found Rarest Secret Mobs In Minecraft

I Found Rarest Secret Mobs In Minecraft

The Onion buys Alex Jones' Infowars at auction with help from Sandy Hook families

The Onion buys Alex Jones' Infowars at auction with help from Sandy Hook families

Attention is all you need explained

Attention is all you need explained

Gaussian Processes : Data Science Concepts

Gaussian Processes : Data Science Concepts

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

The Bayesian Trap

The Bayesian Trap

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

Self-Attention Using Scaled Dot-Product Approach

Self-Attention Using Scaled Dot-Product Approach

Neural Networks Explained from Scratch using Python

Neural Networks Explained from Scratch using Python

Researchers thought this was a bug (Borwein integrals)

Researchers thought this was a bug (Borwein integrals)

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

СМАРТФОН ИЛИ ПРОФЕССИОНАЛЬНЫЙ 3D-СКАНЕР? КТО ЛУЧШЕ СПРАВИТСЯ?

СМАРТФОН ИЛИ ПРОФЕССИОНАЛЬНЫЙ 3D-СКАНЕР? КТО ЛУЧШЕ СПРАВИТСЯ?

Выжимаем максимум на МТЗ-82! Это уже страшно...

Выжимаем максимум на МТЗ-82! Это уже страшно...

Can a Helmet Save Your Life? 😱

Can a Helmet Save Your Life? 😱

Academeg - о популярности блогеров, бизнесе, семье и детстве

Academeg — о популярности блогеров, бизнесе, семье и детстве

smart sigma kid#Harley Quinn #joker

smart sigma kid#Harley Quinn #joker

حلوى علقت في حلقي مباشرةً 🫣

حلوى علقت في حلقي مباشرةً 🫣

Вася и Ваня ❤️🥰

Вася и Ваня ❤️🥰