How do Vision Transformers work? - Paper explained | multi-head self-attention & convolutions

ConvNeXt: A ConvNet for the 2020s | Paper Explained

The entire history of Computer Vision explained one great visualization at a time.

Tornado touches down in Santa Cruz County, several injured

Off Grid Cabin Disaster !

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)

AI Coffee Break with Letitia

Просмотров 23 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 фев 2025

Комментарии • 30

@Chris9-z7v 3 года назад ⁺⁸
Hello Letitia, thank you for this great video!
Correct me if I am wrong, but I am pretty sure that I have seen the 1/4 size ratio you talk about in 12:38 in both the original ViT paper and the "Training data-efficient image transformers
& distillation through attention" paper that I have read.
In the original ViT paper they use this MLP block ratio in all almost all of their experiments, without mentioning it implicitly whilst in the second one, they mention the 1/4 ratio of the MLP block in page 5 of the paper. I am a newbie in Deep learning and transformers though so take everything I say with a grain of salt 😅
@AICoffeeBreak 3 года назад ⁺⁸
Thanks! Yes, it's Table 1 in the ViT paper. Then we totally misunderstood what that factor 4 was referring to while making the video. 🙈
@charlesfoster6326 3 года назад ⁺³
Tacking on, an expansion ratio of 3 or 4 in the MLP is also pretty standard in transformers for natural language tasks.
@DerPylz 3 года назад ⁺¹¹
First coffee bean of the year!! 🎉 congrats on the 11k subs!
@CristianGarcia 3 года назад ⁺¹⁶
They went all in with the storytelling on this paper, they even extracted the core design choices as "wisdom bits". I really don't believe they achieved the final architecture this way but reading the "linear improvement story" was very entertaining.
@Kartik_C 3 года назад ⁺⁸
Thank you Miss Coffee Bean! The 60 sec explanation of translational equivariance was amazing!
@ElQaheryProductions 2 года назад ⁺³
This is a really nice way of reviewing papers! Keep it up!
@pourmohammaddeveloper2034 3 года назад ⁺³
many many thanks from Iran
@hannesstark5024 3 года назад ⁺⁶
10K subscriber congrats! ^^
@AICoffeeBreak 3 года назад ⁺⁴
Yes! Thank you! 🤝 Means a lot from an early subscriber like yourself.
@giantbee9763 3 года назад ⁺³
Great point on how we jumped right into transformers and forgotten to exactly pin down the effect of small tweaks.
Great video again! :D
@RodrigoCh 3 года назад ⁺³
Fastest 20 min ever! Thank you for the clear explanation. I especially like how you animate the explanation!
May I ask what do you use to do the animations? Maybe you could add some FAQ section; I can imagine you get this question a lot.
@AICoffeeBreak 3 года назад ⁺⁷
Thanks, this comments makes us very happy!
I do not want to make a FAQ section: comments and questions are good for making the Algorithm believe it should push us further up into your recommendations:
I animate everything but Ms. Coffee Bean in good old PowerPoint (yeah, tools are what you can make of them 🙈 ).
Ms. Coffee Bean is animated in the video editing software: kdenlive (available for all operating systems and open source).
@marverickbin Год назад ⁺³
Cant wait to try a unet with convnext backbone
@MrMadmaggot 10 месяцев назад
3:32 I love how SKEWED that fookin graph is maam is just fkn nuts.
@hyunkim2172 3 года назад ⁺³
Many thanks!
@edwardbrown2873 3 года назад ⁺⁵
Love this. Superb. Keep it up!
@AICoffeeBreak 3 года назад ⁺⁴
Thank you! Will do! 😀
@butterkaffee910 3 года назад ⁺⁹
LeCun must be so happy right now
@AICoffeeBreak 3 года назад ⁺⁶
Absolutely. 😆
@TimScarfe 3 года назад ⁺⁵
Awesome 🔥🔥😎😎
@hararani Год назад ⁺¹
Hello Letitia, thank you so much for you video it's great inspiration for my thesis. If you don't mind can I ask you question? In your opinion Is it possible if I do research paper that compare between ViT, DEiT and ConvNext for image classification in 10.000 images as newbie? because the model is considered new and not so many paper already implement those models. Thank you.
@JapiSandhu 2 года назад ⁺³
Can convnext be used for video classification with time series data?
Can there be a 3D-Convnext ? Like how there would be a 3DCNN?
@AICoffeeBreak 2 года назад ⁺³
I do not see why this wouldn't be extendable to video. :)
@giantbee9763 3 года назад ⁺⁴
I think what they might have meant by inverted bottle neck : Key, value, query and the residual connections :D Though would you call that an inverted bottle neck? What do you think @letitia?
@AICoffeeBreak 3 года назад ⁺³
No, it is a tiny detail that concerns how the MLP layer is built. d -> 4d -> d. Here is Alexa explaining this (link with the right time stamp: ruclips.net/video/idiIllIQOfU/видео.html )
@AICoffeeBreak 3 года назад ⁺²
I missed the point there in the video when talking about inverted bottlenecks. I thought about the Swin Transformer 🙈
@giantbee9763 3 года назад ⁺¹
@@AICoffeeBreak That's right! I forgot about how the positional feedforward layer is constructed.Which indeed is an inverted bottleneck.
@gabrieldealca4829 2 года назад
What is the best state-of-the-art architecture for regression tasks involving images?
@stewartjohnston7213 3 года назад ⁺¹
Needed to hear this 🙌!! Get the stats you deserve = P r o m o s m!

Следующие

Автовоспроизведение

How do Vision Transformers work? - Paper explained | multi-head self-attention & convolutions

How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions

ConvNeXt: A ConvNet for the 2020s | Paper Explained

ConvNeXt: A ConvNet for the 2020s | Paper Explained

The entire history of Computer Vision explained one great visualization at a time.

The entire history of Computer Vision explained one great visualization at a time.

Tornado touches down in Santa Cruz County, several injured

Tornado touches down in Santa Cruz County, several injured

Off Grid Cabin Disaster !

Off Grid Cabin Disaster !

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Pachuca (MEX) vs Al Ahly (EGY) Penalty Shootout | Intercontinental Cup | 12/14/2024 | beIN SPORTS

Pachuca (MEX) vs Al Ahly (EGY) Penalty Shootout | Intercontinental Cup | 12/14/2024 | beIN SPORTS

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

The Making of ChatGPT (35 Year History)

The Making of ChatGPT (35 Year History)

AI Is Making You An Illiterate Programmer

AI Is Making You An Illiterate Programmer

Masked Autoencoders Are Scalable Vision Learners - Paper explained and animated!

Masked Autoencoders Are Scalable Vision Learners – Paper explained and animated!

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

ConvNet beats Vision Transformers (ConvNeXt) Paper explained

ConvNet beats Vision Transformers (ConvNeXt) Paper explained

Ицык Цыпер - первое интервью с автором «Дымка» / вДудь

Ицык Цыпер – первое интервью с автором «Дымка» / вДудь

Ирина попала в больницу..

Ирина попала в больницу..

Он живёт внутри часов #амитеш #амстердам

Он живёт внутри часов #амитеш #амстердам

ДОНК ПРОТИВ ВАНДЕРФУЛА! SPIRIT - NAVI IEM KATOWICE 2025

ДОНК ПРОТИВ ВАНДЕРФУЛА! SPIRIT - NAVI IEM KATOWICE 2025

Feeling someone’s watching you👩🏻‍💻 #VictoriaPfeifer

Feeling someone’s watching you👩🏻‍💻 #VictoriaPfeifer

Как грузин обдурил СССР на десятки миллионов, используя хитрую схему

Как грузин обдурил СССР на десятки миллионов, используя хитрую схему

Lp. Точка Невозврата #13 ПЕРВОЕ ПОЯВЛЕНИЕ [Организация] • Майнкрафт

Lp. Точка Невозврата #13 ПЕРВОЕ ПОЯВЛЕНИЕ [Организация] • Майнкрафт

Куда делись яйца в США ? И чем теперь завтракать?

Куда делись яйца в США ? И чем теперь завтракать?