Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes

The Backpropagation Algorithm Explained!

Vision Transformer for Image Classification

FREE CHESTS & HAMMER JAM ARE HERE! | Clash On!

Triangle Indian community speaks on Usha Vance making history

jschlatt - Santa Claus Is Coming To Town (Official)

What is the Vision Transformer?

The ML Tech Lead!

Просмотров 665

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 ноя 2024

Комментарии • 12

@jacquesgouimenou9668 5 месяцев назад ⁺¹
Good job.
@eddysaoudi253 5 месяцев назад
Merci pour ton travail. C'est vachement bien ce que tu fais.
@jaskiratbenipal8255 5 месяцев назад
Can you help me understand how and why are the positional embeddings effective in transformers (vision or text). Can't the model just learn that through its existing weights. How does adding extra positional embeddings to the vision/text embeddings help? Even if we have a unique vector for each position, when we add those to the text embeddings the result won't be unique. Would the result after addition even have useful information, since we can get same addition values from multiple combinations.
Let's say we have a text model that has input limit of only two tokens. and the embedding size is 3.
Text embeddings:
[0, 1.1, 0.3], [0, 0.1,1.3]
position embeddings:
[0, 0, 1], [0, 1, 0]
Embeddings after addition:
[0, 1.1, 1.3], [0, 1, 1.3]
We get same vectors
Is the magic in the actual function that we use for embeddings or is it just empirically better and we can't understand it fully.
@TheMLTechLead 5 месяцев назад
Inside the model, we compute the self-attentions. They are pretty much just a measure of interaction between the different tokens in the input sequence. Inside the attention layer, we have the queries, the keys and the values. The keys and queries are used to compute the self-attentions and the resulting hidden states is the weighted average of the values where we use the attentions as weights. At that point, the order of the tokens in completely lost because we are just summing stuff together without knowing in what order they were before the sum. That is why we keep the position information through the positional encoding. We systematically add the same vector for the same position so the model starts to understand how that shift relates to that position. The value of the same token varies depending on its position. To be fair, we do it a bit differently in 2024.
Video coming!
@jaskiratbenipal8255 5 месяцев назад
@@TheMLTechLead Looking forward to it!
After commenting, I read about RoPE (can't say fully understood it) and learnable positional embeddings.
P.S. I really liked your idea of using routing in attention, a bit ambitious goal, but I want to use it to train a small language model or I will see if it is possible to simply add it in a pre-trained model without losing the learned weights.
@TheMLTechLead 5 месяцев назад
@@jaskiratbenipal8255 I may not make a video about RoPE but I wrote something about it here: www.linkedin.com/posts/damienbenveniste_most-modern-llms-are-built-using-the-rope-activity-7188571849084096515-mmUk. For the routed self-attentions, I am looking forward to see somebody implementing those and training a model with it.
@jaskiratbenipal8255 4 месяца назад
@@TheMLTechLead I tried it, I trained a language model from scratch for next character prediction (to have small vocabulary). The results were good using normal attention, the model was able to form words and phrases and some jibberish that looked like words. With the routed attention (I tried 0.1 and 0.3 sparcity values), it started to diverge and model was not converging at all after first epoch. The training time did decrease from 34 to 24 mins.
@marthasamuel 5 месяцев назад
So this is basically used for classification? For example, cat and dogs, right?
@TheMLTechLead 5 месяцев назад
It can be used for any computer vision ML task.
@marthasamuel 5 месяцев назад
@TheMLTechLead great!
I was thinking of image generation for a given prompt or user input.., what would the process be?
@TheMLTechLead 5 месяцев назад
Oh no, you would need a very different model for that. Although, the vision transformer can be an element of it.
@marthasamuel 5 месяцев назад
@@TheMLTechLead got you!

Следующие

Автовоспроизведение

Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes

Orignal transformer paper "Attention is all you need" introduced by a layman | Shawn's ML Notes

The Backpropagation Algorithm Explained!

The Backpropagation Algorithm Explained!

Vision Transformer for Image Classification

Vision Transformer for Image Classification

FREE CHESTS & HAMMER JAM ARE HERE! | Clash On!

FREE CHESTS & HAMMER JAM ARE HERE! | Clash On!

Triangle Indian community speaks on Usha Vance making history

Triangle Indian community speaks on Usha Vance making history

jschlatt - Santa Claus Is Coming To Town (Official)

jschlatt — Santa Claus Is Coming To Town (Official)

One Pokeball to Catch the Stronger Pokémon

One Pokeball to Catch the Stronger Pokémon

Vision Transformer and its Applications

Vision Transformer and its Applications

The Position Encoding In Transformers

The Position Encoding In Transformers

Vision Transformers (ViT) Explained + Fine-tuning in Python

Vision Transformers (ViT) Explained + Fine-tuning in Python

Vision Transformer Basics

Vision Transformer Basics

Understanding CatBoost!

Understanding CatBoost!

Transformers explained | The architecture behind LLMs

Transformers explained | The architecture behind LLMs

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

What Is Machine Learning System Design?

What Is Machine Learning System Design?

Vision Transformers explained

Vision Transformers explained

КТО ЖЕ НАСТОЯЩАЯ МАМА?!😰 Я ДОЛЖЕН УЗНАТЬ ПРАВДУ! 😠 #robloxshorts #roblox #brookhaven

КТО ЖЕ НАСТОЯЩАЯ МАМА?!😰 Я ДОЛЖЕН УЗНАТЬ ПРАВДУ! 😠 #robloxshorts #roblox #brookhaven

Кто круче, как думаешь?

Кто круче, как думаешь?

КАРИНА СДАЛА ТЕСТ НА 90% 🤓 НО ЕСТЬ НЮАНС...😱 #robloxshorts #roblox #brookhaven

КАРИНА СДАЛА ТЕСТ НА 90% 🤓 НО ЕСТЬ НЮАНС...😱 #robloxshorts #roblox #brookhaven

BMW M3 на 1000+ СИЛ! Рекорд на САМОМ БЫСТРОМ УНИВЕРСАЛЕ!

BMW M3 на 1000+ СИЛ! Рекорд на САМОМ БЫСТРОМ УНИВЕРСАЛЕ!

Hoodie gets wicked makeover! 😲

Hoodie gets wicked makeover! 😲

Donald Trump empezó el día de las elecciones bailando YMCA en Michigan

Donald Trump empezó el día de las elecciones bailando YMCA en Michigan

Lp. Сердце Вселенной #43 ГОВОРЯЩАЯ СКАЗКА [Облачко!] • Майнкрафт

Lp. Сердце Вселенной #43 ГОВОРЯЩАЯ СКАЗКА [Облачко!] • Майнкрафт