Why Does Diffusion Work Better than Auto-Regression?

Generative AI in a Nutshell - how to survive and thrive in the age of AI

SpeedyIndex SEO tool - Fast backlink indexing, Automatic backlink indexers, 100 links for FREE!

I'm BACK with a Mega Landscape! - Hermitcraft - Episode 20

How Phil Nearly Died

Secret Garage Update #9 Its getting BIGGER

LLM Chronicles #6.3 Multi-Modal LLMs for Image, Sound and Video

Donato Capitella

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 12 июл 2024
In this episode we look at the architecture and training of multi-modal LLMs. After that, we’ll focus on vision and explore Vision Transformers and how they are trained with contrastive learning (OpenAI's CLIP and Google's SigLIP). Vision Transformers are the most commonly used building block in MLLMs with vision capabilities. Finally, we’ll get hands-on and look into Google’s open-weight PaliGemma, analysing its implementation to see these concepts in action within a real-world multi-modal LLM.
Series website: llm-chronicles.com/
🖹 Canvas and Colab Notebook:
- LLM Limitations and Challenges: llm-chronicles.com/pdfs/llm-c...
- Colab Notebook: colab.research.google.com/dri...
🕤 Timestamps:
01:32 - MLLM Architecture
03:49 - Training MLLMs
07:02 - Vision Transformer
09:24 - Contrastive Learning (CLIP, SigLIP)
12:35 - Lab: PaliGemma
22:53 - Summary
References:
- Vision transformer: arxiv.org/pdf/2010.11929
- Survey of multi modal LLMs: arxiv.org/pdf/2306.13549
- Microsoft's CLAP: arxiv.org/pdf/2206.04769
- SigLip: arxiv.org/pdf/2303.15343
Наука

Комментарии • 6

@Heart-Stories3D День назад ⁺¹
Thanks you very much 🙏
@donatocapitella День назад
@@Heart-Stories3D thank you fir your comment! :)
@En1Gm4A 11 дней назад ⁺¹
thanks - hightlight today
@micbab-vg2mu 11 дней назад
great talk - thank you
@Tuly03 2 дня назад ⁺⁴
Video is impressive, thank you. Ever heard of Immersive Translate?? It is a tool with meticulously crafted prompts, that allows translations in the technology field become more accurate and professional.
@masterarfanmasterarfan4721 23 минуты назад
8 w92

Следующие

Автовоспроизведение

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

SpeedyIndex SEO tool - Fast backlink indexing, Automatic backlink indexers, 100 links for FREE!

SpeedyIndex SEO tool - Fast backlink indexing, Automatic backlink indexers, 100 links for FREE!

I'm BACK with a Mega Landscape! - Hermitcraft - Episode 20

I'm BACK with a Mega Landscape! - Hermitcraft - Episode 20

How Phil Nearly Died

How Phil Nearly Died

Secret Garage Update #9 Its getting BIGGER

Secret Garage Update #9 Its getting BIGGER

Garten of Banban 8 - Official Teaser Trailer 2

Garten of Banban 8 - Official Teaser Trailer 2

I wish every AI Engineer could watch this.

I wish every AI Engineer could watch this.

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Retrieval Augmented Generation (RAG) vs In-Context-Learning (ICL) vs Fine-Tuning LLMs

Retrieval Augmented Generation (RAG) vs In-Context-Learning (ICL) vs Fine-Tuning LLMs

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Why Fine Tuning is Dead w/Emmanuel Ameisen

Why Fine Tuning is Dead w/Emmanuel Ameisen

Can AI code Flappy Bird? Watch ChatGPT try

Can AI code Flappy Bird? Watch ChatGPT try

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Unlimited AI Agents running locally with Ollama & AnythingLLM

Unlimited AI Agents running locally with Ollama & AnythingLLM

I Challenged AI to Create a Better YouTube Channel Than Me... Here's What Happened

I Challenged AI to Create a Better YouTube Channel Than Me... Here's What Happened

40$ or 50$ or Typecase iPad keyboard #ipadkeyboard #ipadcase #typecase #ipad #ipadpro

40$ or 50$ or Typecase iPad keyboard #ipadkeyboard #ipadcase #typecase #ipad #ipadpro

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

Apple добавила еще 30 функций в iOS 18! Обзор iOS 18 beta 3 и iPadOS 18 beta 3!

Треш ПК за 420 000 рублей

Треш ПК за 420 000 рублей

ОБМАНУЛИ Вот что не так с этой клавиатурой от ZONE51

ОБМАНУЛИ Вот что не так с этой клавиатурой от ZONE51

HONOR 200 LITE - ОЧЕНЬ ТОНКИЙ И ЛЕГКИЙ ТЕЛЕФОН С КРУТОЙ КАМЕРОЙ!

HONOR 200 LITE - ОЧЕНЬ ТОНКИЙ И ЛЕГКИЙ ТЕЛЕФОН С КРУТОЙ КАМЕРОЙ!

Лучший Fold 6, Flip 6, Galaxy Watch и Buds: обзор всех новинок Unpacked 2024

Лучший Fold 6, Flip 6, Galaxy Watch и Buds: обзор всех новинок Unpacked 2024

Какую Клавиатуру Выберешь? АСМР Компьютерный Магазин (Royal Kludge RK N80, RK H81)

Какую Клавиатуру Выберешь? АСМР Компьютерный Магазин (Royal Kludge RK N80, RK H81)