DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

ChatGPT: 30 Year History | How AI Learned to Talk

We went to Asheville, NC to Help, and its WORSE , much worse

Porto vs. Man. United: Extended Highlights | UEL League Phase MD 2 | CBS Sports Golazo - Europe

IS GREEN OKAY!? - Influencer Arc Ep. 2

DINOv2

hu-po

Просмотров 11 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 окт 2024
In this stream we look at Meta's latest research: DINOv2 the second version of the self-supervised foundational CV model.
github.com/fac...
arxiv.org/pdf/...
Like 👍. Comment 💬. Subscribe 🟥.
⌨️ GitHub
github.com/hu-po
🗨️ Discord
/ discord
📸 Instagram
/ gnocchibengal
#ai #computervision #machinelearning #ai

Комментарии • 23

@iProFIFA Год назад ⁺⁹
this is great. as a master-student who would probably understand next to nothing on their own from these latest cutting-edge ML research papers, this helps A LOT. Looking forward to your future vids and streams :-)
@lauesa667 Год назад ⁺⁶
I really enjoyed your overview of the paper. I'd also be interested in paper reviews of "tips and tricks", comparing certain techniques such as mixed-precision across a variety of CV tasks. While things like increasing batch size work for large companies, techniques that work for consumer grade hardware are more applicable even for researchers or grad students.
@omarllama 26 дней назад
Something very funny happened. My neighboor cat comes to visit me almost daily. I started hearing the "Meow" on the speakers and I tought it was the neighboor's cat. I actually stopped the video twice to go search for the cat in front of the door.
Say hello to you cat, and thanks for the video.
@sathyatech7903 Год назад ⁺²
I was just looking for it. You gave me amazing understanding
@roman-bn1pz Год назад ⁺⁴
The first person I've seen to actually use Nvidias Eye Contact :D
@alivecoding4995 4 месяца назад
Very nice video. Thank you! 🙏
@wolpumba4099 Год назад
Nice, I enjoy listening to that.
@kdc6884 Год назад
Great video. Love your videos. Glad I found your channel.
@daniel-mika Год назад
Very cool stream!
@wolpumba4099 3 месяца назад
*Summary: DINOv2 Paper Review*
*DINOv2: A Self-Supervised Foundation Model for Computer Vision*
* *Focus (**0:57**):* Training a large-scale, self-supervised computer vision model called DINOv2.
* *Goal (**4:40**):* Develop a model that generates versatile visual features, usable for various tasks without fine-tuning.
* *Key Ideas:*
* *Data Curation (**5:22**):* Training on a curated dataset of 142 million images (LVD-142M) leads to superior performance compared to uncurated data of the same size.
* *Self-Supervised Learning (**11:09**):* Employs a combination of existing self-supervised learning methods (DINO, iBOT) with new techniques for stabilization and acceleration.
* *Large Model and Data Scale (**6:12**):* Trains a Vision Transformer (ViT) with 1 billion parameters on a massive dataset, demonstrating the importance of scale for self-supervised learning.
* *Model Distillation (**7:44**):* Distills smaller models from the largest trained model, leading to performance improvements compared to training from scratch.
* *High-Resolution Training (**38:56**):* Demonstrates the importance of high-resolution training for pixel-level tasks like segmentation and depth estimation. Introduces a curriculum of training on low resolution and then high resolution.
* *Results:*
* *Competitive Performance (**21:54**):* DINOv2 achieves competitive performance compared to the best openly available weakly-supervised models, including OpenCLIP, across various benchmarks.
* *Strong Generalization (**11:40**):* Outperforms other self-supervised models on domain generalization benchmarks, demonstrating strong transferability to unseen data.
* *Emergent Properties (**12:25**):* Exhibits emergent properties like understanding object parts and scene geometry, similar to how LLMs develop emergent capabilities.
* *Technical Contributions (**22:21**):*
* Automatic data curation pipeline.
* Techniques for stabilizing and accelerating training (31:59), including:
* Fast and memory-efficient attention.
* Efficient stochastic depth.
* Fully sharded data parallelism.
* Detailed ablation studies to validate different components of the approach (54:37).
* *Impact (**1:53:02**):* DINOv2 pushes the boundaries of self-supervised learning in computer vision and provides a powerful new tool for researchers and practitioners.
*Noteworthy Observations:*
* The paper emphasizes the importance of curated data and large-scale training for achieving high-quality representations in self-supervised learning.
* Model distillation emerges as a promising technique for efficiently creating smaller, high-performing models.
* The authors acknowledge the potential for even greater emergent properties with further scaling of model and data size.
* Facebook AI Research's openness in sharing their model, code, and training details is commendable.
i used gemini 1.5 pro to summarize the transcript
@1ssbrudra Год назад
Love the explanation, do you think it can be used in the wild?
@feanixfukari Год назад ⁺²
Who has the bravery and the resources?
@lorenzoleongutierrez7927 Год назад
Great video by the way
@tomcat5841 Год назад ⁺⁴
cat.. 🐱
@aazzrwadrf Год назад
ty
@jonatan01i Год назад
Have you by any chance added the glasses artificially ?
@hu-po Год назад ⁺¹
Nvidia broadcast
@jonatan01i Год назад
@@hu-po why though ? :D that's brilliant!
@barderino5673 11 месяцев назад
meow,meow,meow,meow,meow XD
@DBeastLee Год назад ⁺²
meow
@lorenzoleongutierrez7927 Год назад
Gato model is trying to say some interesting info lol
@ehabalbadawy7415 4 месяца назад
Man, the eyes are throwing me off every time you look up! I am assuming you are using that thing that makes you keep eye contact. Turn it off. I try to pretend not to look at you, and every time I do, I stop watching!
@Userxx72626 Год назад
tes.. ing, teeslay, parlay

Следующие

Автовоспроизведение

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

ChatGPT: 30 Year History | How AI Learned to Talk

ChatGPT: 30 Year History | How AI Learned to Talk

We went to Asheville, NC to Help, and its WORSE , much worse

We went to Asheville, NC to Help, and its WORSE , much worse

Porto vs. Man. United: Extended Highlights | UEL League Phase MD 2 | CBS Sports Golazo - Europe

Porto vs. Man. United: Extended Highlights | UEL League Phase MD 2 | CBS Sports Golazo - Europe

IS GREEN OKAY!? - Influencer Arc Ep. 2

IS GREEN OKAY!? – Influencer Arc Ep. 2

I finally confronted Goth Egg

I finally confronted Goth Egg

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Self-supervised vision

Self-supervised vision

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Vision Transformers Need Registers - Fixing a Bug in DINOv2?

Vision Transformers Need Registers - Fixing a Bug in DINOv2?

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

How DINO learns to see the world - Paper Explained

How DINO learns to see the world - Paper Explained

OpenAI CLIP: ConnectingText and Images (Paper Explained)

OpenAI CLIP: ConnectingText and Images (Paper Explained)

MeshFormer vs MeshAnything

MeshFormer vs MeshAnything

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

КОНФЛИКТ. СВАДЬБА? КОНЕЦ ИСТОРИИ...

КОНФЛИКТ. СВАДЬБА? КОНЕЦ ИСТОРИИ...

НЮША УСПОКОИЛА КОТЯТ#cat

НЮША УСПОКОИЛА КОТЯТ#cat

Те самые соседи в три часа ночи #умихрум

Те самые соседи в три часа ночи #умихрум

Настоящий Гений Пранков - Нейтан Филдер!

Настоящий Гений Пранков - Нейтан Филдер!

КОГДА НАКРОШИЛ НА ПОЛ #shorts

КОГДА НАКРОШИЛ НА ПОЛ #shorts

BMW всего за 2 миллиона рублей #автомобили #bmw

BMW всего за 2 миллиона рублей #автомобили #bmw

«Большая прожарка» Артемия Лебедева.

«Большая прожарка» Артемия Лебедева.

Что такое налог кровью? #турция #арабы #налоги

Что такое налог кровью? #турция #арабы #налоги