How did the Attention Mechanism start an AI frenzy? | LM3

Watching Neural Networks Learn

The moment we stopped understanding AI [AlexNet]

What we know about the Jordan Love injury as of right now | Green Bay Packers Vs Philadelphia Eagles

The Stupidest Truck in the World

Jim Harbaugh Locker Room Victory Speech vs Raiders | LA Chargers

Why Recurrent Neural Networks are cursed | LM2

vcubingx

Просмотров 15 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 сен 2024

Комментарии • 25

@vcubingx 5 месяцев назад ⁺¹⁰
If you enjoyed the video, please consider subscribing!
Part 3! ruclips.net/video/lOrTlKrdmkQ/видео.html
A small mistake I _just_ realized is that I say trigram/3-gram for the neural language model when I have 3 words to input, but it's a 4-gram model, not 3 gram, since I'm considering 4 words at a time (including the output word). Hopefully that didn't confuse anyone!
@l16h7code 5 месяцев назад ⁺¹⁰
Please keep making these machine learning videos. Animations are all we need. They make me 10x easier to understand the concepts.
@vcubingx 5 месяцев назад ⁺³
Thanks! I’ll try my best to:)
@ShashankBhatta 4 месяца назад ⁺¹
Isn't"attention all we need"
@aero-mk9ld 4 месяца назад
@@ShashankBhatta
@eellauu Месяц назад
@@ShashankBhatta ahahhahaha, nice one
@tomtom5821 Месяц назад ⁺¹
I had so many 'aha' moments in this video I lost count! I'm convinced that it is possible to learn any concept- if it's broken down into its simplistic components
@drdca8263 5 месяцев назад ⁺²
I sometimes wonder how well it would work to take something that was mostly an n-gram model, but which added something that was meant to be like, a poor man’s approximation of the copying heads that have been found in transformers.
So, like, in addition to looking at “when the previous (n-1) tokens were like this, how often were different following things, the next token?” as in an n-gram model, it would also look at, “previously in this document, did the previous token appear, and if so, what followed it?”, and “in the training data set, for the previous few tokens, how often did this kind of copying strategy do well, and how often did the the plain n-gram strategy do well?” , to weight between those.
(Oh, and also maybe throw in some “what tokens are correlated just considering being in the same document” to the mix.)
I imagine that this still wouldn’t even come *close* to GPT2 , but I do wonder how much better it could be than plain n-grams.
I’m pretty sure it would be *very* fast at inference time, and “training” it would consist of just doing a bunch of counting, which would be highly parallelizable (or possibly counting and then taking a low-rank decomposition of a matrix, for the “correlations between what tokens appear in the same document” part)
@vcubingx 5 месяцев назад ⁺¹
I think you've gained a key insight, that the approximation does indeed work. I mean heck, if I was only generating two words, a bigram model would be pretty good too.
I remember seeing a paper that shows that GPT-2 itself has learnt a bi-gram model inside itself. Given this, it might be fair to say that what you're describing could potentially even be what the LLMs today learn under the hood. I think your description is great though, as it's an interpretable way to see how models make predictions. Maybe a future line of research!
@ZalexMusic 4 месяца назад ⁺²
Outstanding work, this series is required LM viewing now, like 3b1b. Also, are you from Singapore? That's the only way I can reconcile good weather meaning high temperature and high humidity 😂
@David-gn5rp 3 месяца назад
I'm 90% sure the accent is Indian.
@calix-tang 5 месяцев назад ⁺⁴
Incredible job mfv I look forward to seeing more videos
@vcubingx 5 месяцев назад
More to come!
@1XxDoubleshotxX1 5 месяцев назад ⁺¹
Oh yes Vivek Vivek omg yes
@varunmohanraj5031 5 месяцев назад
So insightful ‼️
@VisibilityO2 5 месяцев назад ⁺⁵
I am not criticizing your whole hard work but at some point you just messed up like at 7:54 without explaining the sum of weights you were computing it `Ht` and you could say` Backpropagation Through The Time ` as query in the video .
. Also you could introduce "gated cells" in LSTMS . Long Short Term Memory networks most often rely on a gated cell to track information throughout many time steps.
And Activation function like 'sigmoid' could be replaced by ' ReLu' and packages like TensorFlow have also preferred it in their documentation.
But , honestly you've a created a good intermediate class for learning Recurrence .
@vcubingx 5 месяцев назад ⁺²
Hey, thanks for the feedback.
I personally found little value in mentioning BPTT, as I felt like it would confuse the viewer more in case they weren't familiar with backpropagation. The algorithm itself is pretty straightforward, and I personally felt like it didn't need an entire section explaining it.
In response to LSTMs, the video wasn't meant to cover LSTMs at all. I last-minute introduced the section towards the end for curious viewers. I appreciate you talking about them though! I plan on making a short 5-7 minute video on them in the future.
@usama57926 5 месяцев назад
Nice video
@vcubingx 5 месяцев назад
Thanks!
@ml-ok3xq 5 месяцев назад
maybe you can loop around to mamba and explain why it's popular again, what has changed to uncurse the model.
@vcubingx 5 месяцев назад ⁺²
Sure! I wanted to make two follow ups - transformers beyond language and language beyond transformers. In the second part I’d talk about mamba and the future of language modeling
@adithyashanker2852 5 месяцев назад ⁺⁴
Music is fire
@vcubingx 5 месяцев назад
ye
@BooleanDisorder 5 месяцев назад ⁺¹
RNN = Remember Nothing Now
@vcubingx 5 месяцев назад ⁺²
Hahaha, RNNs did indeed have "memory loss" issues :)

Следующие

Автовоспроизведение

How did the Attention Mechanism start an AI frenzy? | LM3

How did the Attention Mechanism start an AI frenzy? | LM3

Watching Neural Networks Learn

Watching Neural Networks Learn

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

What we know about the Jordan Love injury as of right now | Green Bay Packers Vs Philadelphia Eagles

What we know about the Jordan Love injury as of right now | Green Bay Packers Vs Philadelphia Eagles

The Stupidest Truck in the World

The Stupidest Truck in the World

Jim Harbaugh Locker Room Victory Speech vs Raiders | LA Chargers

Jim Harbaugh Locker Room Victory Speech vs Raiders | LA Chargers

September Update: Magic Snacks, Supercharge, and more! | Clash On!

September Update: Magic Snacks, Supercharge, and more! | Clash On!

What does it mean for computers to understand language? | LM1

What does it mean for computers to understand language? | LM1

Visualizing Neural Network Training and Predictions: A Universal Function Approximator

Visualizing Neural Network Training and Predictions: A Universal Function Approximator

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

Miles Cranmer - The Next Great Scientific Theory is Hiding Inside a Neural Network (April 3, 2024)

What happens *inside* a neural network?

What happens *inside* a neural network?

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Dendrites: Why Biological Neurons Are Deep Neural Networks

Dendrites: Why Biological Neurons Are Deep Neural Networks

The secret π in the Mandelbrot Set

The secret π in the Mandelbrot Set

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

Chuck Be Like : Not My Problem😂 | #brawlstars #shorts

▼ЮТУБ ВСЁ, Я НА ЗАВОД 🚧⛔

▼ЮТУБ ВСЁ, Я НА ЗАВОД 🚧⛔

Бомба бом бом #uzbwedding #rek #той #kulgilivideo #hahaidea #svadbauz #uzbekistanmusic #dance бомба

Бомба бом бом #uzbwedding #rek #той #kulgilivideo #hahaidea #svadbauz #uzbekistanmusic #dance бомба

АХХАХАХАХАХАХАХАХ

АХХАХАХАХАХАХАХАХ

Нарвался на сотрудника ФСБ⚡️

Нарвался на сотрудника ФСБ⚡️

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

Сестра не поделила надувной матрас с братом..🤦‍♂️🪡⚓️

ПОТЕРЯЛСЯ ЧЕЛОВЕК ПАУК!?😲😲😲 @Studia_Animatorov_Koodesnik

ПОТЕРЯЛСЯ ЧЕЛОВЕК ПАУК!?😲😲😲 @Studia_Animatorov_Koodesnik