Appendix to Building a ML Transformer in a Spreadsheet

Transformer Neural Networks Derived from Scratch

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

I Tested Restaurants with No Reviews

Highlights: Rams Top Plays In Overtime Win vs. Seahawks | NFL Week 9

Sporting CP vs. Man. City: Extended Highlights | UCL League Phase MD 4 | CBS Sports Golazo

Building a ML Transformer in a Spreadsheet

Concepts Illuminated

Просмотров 9 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 6 ноя 2024

Комментарии • 21

@maravilhasdobrasil4498 2 года назад ⁺⁶
Please, don't stop making videos. You are a legend!
@varunshenoy532 Год назад
After reading so many blogs, papers which explained transformers, this video helped me understand the intuition behind the Query/Key and Value matrix.
@yraion0 Год назад ⁺¹
The best video that explain in details how the "magics" works. This is an amazing video!!!!!!
@1973vider1973 Год назад ⁺¹
Amazing explanation of how attention magic works!! Thanks a lot
@nguyenanhnguyen7658 Год назад
My Lord, so cool and genius !
@planet403 Год назад
Superb video
2 года назад ⁺¹
This is fantastic video!
(It made me think we need spreadsheets with autodiff. :D)
@queasybeetle 2 года назад ⁺²
Excel is a deep learning cheat code
@3472215 2 года назад
Thank you. Great work ♥️
@twitterpaited 2 года назад
Beautiful
@hamletrex 2 года назад
thank you for this gems!
@KarthikNaga329 2 года назад
Great stuff!
Do you know why those values were used for the positional encoding? instead of 0,1,2,3,45? Or can any values be used?
I assume you wanted the last position to have 111, but does it have to be 111?
@ConceptsIlluminated 2 года назад ⁺³
Yes, I picked those values in particular (0, 1, 3, 4, 6, 7) because it was easy to construct query and key matrices that would make the first query row up with the last key row, the second query row line up with the second to last key row and so on; This was easy for me to find because it just involved multiplying the query weights by -1.
If I did 0, 1, 2, 3, 4, 5, I would have needed to find some other weights that made the 0 query row line up closest with the 5 key row and the 2 query row line up with the 4 row and this would have been tougher.
You could probably use a single positional encoding column that ranges from -1.0 (beginning of the string) to 1.0 (end of the string). Then, you would have the same property as my 3 binary bits, where you can multiply the query weights by -1 to most strongly match with a key of opposite position.
@dezl Год назад
Thanks for explaining this in a understandable way. I don't get how this then is used to process multiple words (am I missing something)
@ConceptsIlluminated Год назад ⁺¹
The toy example in this video mapped the ASCII characters to numbers. In practice, words (or subwords) are turned into numbers in a process called embedding:
- medium.com/deeper-learning/glossary-of-deep-learning-word-embedding-f90c3cec34ca
- blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/
- dugas.ch/artificial_curiosity/GPT_architecture.html
So there's a pre-processing step where usually the sentence or sentences (a list of words) are turned into their embeddings (a list of numbers) as they are fed into the transformer. The transformer does a bunch of math and outputs embeddings (list of numbers), which are converted back into a list of words.
@bshako64 2 года назад ⁺²
I'm very confused by the claim that you cannot find a matrix to swap the order of rows at 17:50. How is the answer not just a standard permutation matrix? en.wikipedia.org/wiki/Permutation_matrix
@ConceptsIlluminated 2 года назад
As far as I understand them, permutation matrices multiplied on the right swap columns around, not rows. For example, if you put 1s on the minor diagonal of your permutation matrix, it mirrors the result across the vertical axis because it swaps the first and last column, the second and second-to-last column. I could not find a permutation matrix (or any matrix) that achieved the mirroring across the horizontal axis, i.e. that swapped rows.
I could still be wrong though about such a matrix not existing (when multiplied on the right). It was the boldest claim in the whole video for which I did not have a great citation/proof for.
@bshako64 2 года назад ⁺³
@@ConceptsIlluminated A function f is linear if f(x + y) = f(x) + f(y) and f(c x) = c f(x) for all x, y in domain and scalar c. It's clear that the function you describe in the video is linear by this definition.
@ConceptsIlluminated 2 года назад
I think you are correct that swapping rows is linear, given that definition, and I was incorrect in saying it was nonlinear.
I still have been unable to find a matrix R that transposes the rows of matrix M when multiplied on the right, a la M x R. Assuming no such matrix exists, a more accurate phrasing should have been "The self-attention mechanism lets us do transformations that are impossible with just a single matrix multiplication step", yeah?
@ethanryu4170 2 года назад ⁺¹
I actually had the same question. BTW, transpose can also be a linear transformation if combined with reshapes.
@bshako64 2 года назад ⁺¹
@@ConceptsIlluminated I think it can be done with a regular left matrix multiplication, ie, z = Px.

Следующие

Автовоспроизведение

Appendix to Building a ML Transformer in a Spreadsheet

Appendix to Building a ML Transformer in a Spreadsheet

Transformer Neural Networks Derived from Scratch

Transformer Neural Networks Derived from Scratch

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

I Tested Restaurants with No Reviews

I Tested Restaurants with No Reviews

Highlights: Rams Top Plays In Overtime Win vs. Seahawks | NFL Week 9

Highlights: Rams Top Plays In Overtime Win vs. Seahawks | NFL Week 9

Sporting CP vs. Man. City: Extended Highlights | UCL League Phase MD 4 | CBS Sports Golazo

Sporting CP vs. Man. City: Extended Highlights | UCL League Phase MD 4 | CBS Sports Golazo

Tom Brady breaks down the Lions' DOMINANT win over the Packers | NFL on FOX

Tom Brady breaks down the Lions' DOMINANT win over the Packers | NFL on FOX

Training a Deep Neural Network in a Spreadsheet

Training a Deep Neural Network in a Spreadsheet

Transformer Encoder in 100 lines of code!

Transformer Encoder in 100 lines of code!

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Training a Neural Network in a Spreadsheet

Training a Neural Network in a Spreadsheet

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention in transformers, visually explained | Chapter 6, Deep Learning

The animated Transformer: the Transformer model explained the fun way!

The animated Transformer: the Transformer model explained the fun way!

Self-Attention and Transformers

Self-Attention and Transformers

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Building a Recurrent Neural Network in a spreadsheet

Building a Recurrent Neural Network in a spreadsheet

Healthy lungs test !! 🫁💪

Healthy lungs test !! 🫁💪

Academeg выбирает лучшего автоблогера: Булкин, Давидыч, Ильдар, Жекич, Асафьев? #авто

Academeg выбирает лучшего автоблогера: Булкин, Давидыч, Ильдар, Жекич, Асафьев? #авто

ДИКИЙ ДИРЕКТОР НОКАУТИРОВАЛ ПОКУПАТЕЛЯ. ПРИЕХАЛИ СПРОСИТЬ ЗА ПОСТУПОК. БЕСПРЕДЕЛ в ПОДМОСКОВЬЕ, Ч1

ДИКИЙ ДИРЕКТОР НОКАУТИРОВАЛ ПОКУПАТЕЛЯ. ПРИЕХАЛИ СПРОСИТЬ ЗА ПОСТУПОК. БЕСПРЕДЕЛ в ПОДМОСКОВЬЕ, Ч1

ПРОВЕРКА ПАРНЯ НА ВЕРНОСТЬ! ОН ТАКОЕ ПОКАЗАЛ....

ПРОВЕРКА ПАРНЯ НА ВЕРНОСТЬ! ОН ТАКОЕ ПОКАЗАЛ....

Lp. Сердце Вселенной #41 ДАЛЁКОЕ ПРОШЛОЕ [Воспоминание] • Майнкрафт

Lp. Сердце Вселенной #41 ДАЛЁКОЕ ПРОШЛОЕ [Воспоминание] • Майнкрафт

Чем похожи все курящие женщины? #психология #леракудрявцева #курениеубивает #женщины

Чем похожи все курящие женщины? #психология #леракудрявцева #курениеубивает #женщины

УЛЬТРА ВРЕДНЫЙ vs УЛЬТРА ПОЛЕЗНЫЙ БУРГЕР

УЛЬТРА ВРЕДНЫЙ vs УЛЬТРА ПОЛЕЗНЫЙ БУРГЕР

He took off like a grasshopper! #shorts

He took off like a grasshopper! #shorts