ConvNeXt: A ConvNet for the 2020s | Paper Explained

The moment we stopped understanding AI [AlexNet]

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Exclusive insights from A MINECRAFT MOVIE | Only in Theaters

HIGHLIGHTS | ENGLAND V AUSTRALIA | AUTUMN NATIONS SERIES

24 Hours Overnight In Worlds Biggest Dog House!!

Do Vision Transformers See Like Convolutional Neural Networks? | Paper Explained

Aleksa Gordić - The AI Epiphany

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 19 ноя 2024

Комментарии • 17

@TheAIEpiphany 3 года назад ⁺¹
👨‍👩‍👧‍👦 JOIN OUR DISCORD COMMUNITY:
Discord ► discord.gg/peBrCpheKE
📢 SUBSCRIBE TO MY MONTHLY AI NEWSLETTER (it's comin'!):
Substack ► aiepiphany.substack.com/
@sammay1540 2 года назад ⁺¹
You’ve earned my subscription!
@tongluo9860 2 года назад
great work explaining Fit
@siddharthkapoor3178 3 года назад ⁺¹
The Figure 9 similarity where tokens at the corners or edges have high similarity to the tokens at the rest of the tokens at the boundary. It could be due the type of data where corner/edge tokens are generally background stuff and are uniform and similar in nature.
@dontaskme1625 3 года назад ⁺³
I want to suggest this paper for a future video "Variational Diffusion Models"
@TheAIEpiphany 3 года назад
Great I'll check it out, feel free to share your suggestions on Discord as well: discord.com/invite/peBrCpheKE
@MrMIB983 3 года назад
Simon Simon, los Modelos de difusión están calientes
@bowenchen4908 3 года назад ⁺¹
Thank you for the explanation! A quick dumb question on this paper, what does CLS token mean? And why do we have multiple when we are training for a classification task?
@sayakpaul3152 3 года назад ⁺¹
Could explain the sqrt(18) part in a bit more detail? I could not quite follow how you got to that.
@Peebuttnutter 3 года назад ⁺¹
my guess it's just the euclidean distance
@sacramentofwilderness6656 3 года назад ⁺¹
Thanks a lot for a nice review of the paper. Few points raised questions in my mind. First of all, what can be the purpose of the matrix H in the HSIC projection? It is actually a projection on the subspace, orthogonal to the vector of ones - comparsion is between the non-uniformities inside the Gram matrices in some sense? Is there explanation, why they've chosen 50 layer ResNet for comparison? Seems like more fair comparison would between models of comparable scale, say ResNet 152 - or one should not expect noticeable change for this choice?
@RavenTheCute 3 года назад
From my understanding, H may be used for normalization - because when we multiply a vector with centering matrix, it has the same effect as subtracting the mean of the vector; Also I believe they have shwon the comparison of 14 patches ViT with ResNet152 in appendix pages (Figure B.1)
@manub.n2451 2 года назад
When will you do a video on Swin Transformers ?
@fast_harmonic_psychedelic 3 года назад
have you had a chance to try any of the notebooks where a vit guides any image generation model, such as vqgan, or even just raw rgb noise, to generate imagery from text prompts? The abilities of the vit, even vit-base-32, are VAST. Compared to resnet101, resnet50, resnet 50x4, resnet50x16 etc - we've experimented with all of the above - and resnet is absolute garbage compared to the vits lol. I dont think you've experienced what vit is capable of or else you'd be raving about it haha.
Im not sure the scientists who created it even know what it can do since literally all they talk about is classification and getting scores and benchmarks. Make an image. Tell it to create a universe where heads are upside down. Tell it to show an image of a car with square wheels. Then try resnet -- resnet just falls short in every case.
also the smaller the patch size the better. smaller patch size means higher resolution and more details per image. Of course id love to be able to try this with vit-H-14 but im not advanced enough to rig googles generic version for it- the regular vit-transformer on google vision doesnt have a text encoder trained with it multimodally.
@TheAIEpiphany 3 года назад
I haven't yet but I will over the next period I'll be doing code walk-throughs - thanks for flagging that!
And it makes sense I guess the spatial information being preserved part contributes heavily to that fact.
@fast_harmonic_psychedelic 3 года назад
@@TheAIEpiphany it even suggests some knowledge of temporal flow but I'm not 100% , it might be vqgan 16384 itself
@vybhavramachandran 3 года назад
Any links to these notebooks? Thank you!

Следующие

Автовоспроизведение

ConvNeXt: A ConvNet for the 2020s | Paper Explained

ConvNeXt: A ConvNet for the 2020s | Paper Explained

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

MAMBA from Scratch: Neural Nets Better and Faster than Transformers

Exclusive insights from A MINECRAFT MOVIE | Only in Theaters

Exclusive insights from A MINECRAFT MOVIE | Only in Theaters

HIGHLIGHTS | ENGLAND V AUSTRALIA | AUTUMN NATIONS SERIES

HIGHLIGHTS | ENGLAND V AUSTRALIA | AUTUMN NATIONS SERIES

24 Hours Overnight In Worlds Biggest Dog House!!

24 Hours Overnight In Worlds Biggest Dog House!!

VACATION ALONE Without Parents for 24 Hours *Security Cameras*

VACATION ALONE Without Parents for 24 Hours *Security Cameras*

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

But what is a convolution?

But what is a convolution?

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs

ResNet Strikes Back! | Patches Are All You Need? | Papers Explained

ResNet Strikes Back! | Patches Are All You Need? | Papers Explained

Vision Transformers: Using transformer neural network architecture with images - Data Hub Tech Talk

Vision Transformers: Using transformer neural network architecture with images - Data Hub Tech Talk

DeepMind Perceiver and Perceiver IO | Paper Explained

DeepMind Perceiver and Perceiver IO | Paper Explained

Vision Transformer Basics

Vision Transformer Basics

Xamdam Sobirov - Nazira Xolbo`tayeva (VideoClip)

Xamdam Sobirov - Nazira Xolbo`tayeva (VideoClip)

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

BD556+ Smoke Silencer.Who needs this for Christmas? #toys #gelblasters #gelblasterguns #airsoft

Продал авто бесплатно! Цена ошибки - 3.300.000! || LEXUS RX - СХЕМА №66 и отсрочка платежа.

Продал авто бесплатно! Цена ошибки - 3.300.000! || LEXUS RX - СХЕМА №66 и отсрочка платежа.

Кто-то еще помнит эти времена ?

Кто-то еще помнит эти времена ?

I am the angry pumpkin 🎃 #plantsvszombies #pvz #animation #laurashigihara #pvz2 #videogames #cartoon

I am the angry pumpkin 🎃 #plantsvszombies #pvz #animation #laurashigihara #pvz2 #videogames #cartoon

ВЫПЛАТИТЬ ДОЛГИ за 4 ДНЯ С МОНСТРОМ! - Debt Hunt

ВЫПЛАТИТЬ ДОЛГИ за 4 ДНЯ С МОНСТРОМ! - Debt Hunt

Stickman: Người Que Mạnh Nhất Lịch Sử #2 | Gameplay | meGAME

Stickman: Người Que Mạnh Nhất Lịch Sử #2 | Gameplay | meGAME

Самый могущественный тайный орден. От убийства царя до завоевания космоса | ФАЙБ

Самый могущественный тайный орден. От убийства царя до завоевания космоса | ФАЙБ