Why Does Diffusion Work Better than Auto-Regression?

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

I Hosted A Streamer Only Hide & Seek!

Easily 3 Star It's Over 9000 Challenge (Clash of Clans)

Creating EVEN WORSE Contraptions in The Enjenir

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

Gabriel Mongaras

Просмотров 10 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 фев 2024
Sora: openai.com/sora
Sora paper (Video generation models as world simulators): openai.com/research/video-gen...
DiTs - Scalable Diffusion Models with Transformers paper: arxiv.org/abs/2212.09748
My notes: drive.google.com/file/d/1h2pc...

Комментарии • 15

@AbhishekSingh-qp5xk 5 месяцев назад ⁺³
Incredible explanations. Love the clarity of thought and illustrations to visualize the concepts.
@yuxiangzhang2343 5 месяцев назад ⁺²
All concepts beautifully explained! Very intuitive but accurate at the same time! Thank you so much!
@progzyy 4 месяца назад
Hey!
Already watched some of your videos before, but randomly got onto this video again when I needed to learn about DiTs
I love how you explain so well and deeply things, even if sometimes you explain basic stuff, it just helps reinforces the learning and it's good
Even if it's 1 hour long, it feels like everything is needed
@xplained6486 4 месяца назад
great explaination, not too much detail and not too little. You hit a very good balance which makes it easy to follow the concepts :)
@systemdesignstudygroup315 5 месяцев назад
I was just looking for this on your channel! Thanks!
@maxziebell4013 5 месяцев назад
Great walkthrough.
@johntanchongmin 3 месяца назад
Enjoy your content! Keep it up!
@signitureDGK 5 месяцев назад ⁺¹
great explanation. I could see how they probably used a ViViT model for Sora. Vivit models have temporal and spatial encoders for self-attention mechanisms probably two DiT blocks ( factorized encoder ViViT model).
Also, When would the multihead cross attention model version be used? Let's say for generating images from text prompts with more than 1000 classes. Or perhaps conditioning on even more stuff like audio etc. The DiT Block with cross attention would be preferred?
Great video!
@thebgEntertainment1 5 месяцев назад
Great video
@adidevbhattacharya9220 16 дней назад
That was indeeed a gr8 explanation.
Can you please explain how do we get the hidden dimension d @28:19
For e.g if the img in latent space is 128x128x3 and we consider patch size of 32.
Then no. of tokens = (128/32)^2 = 16
Is the number of dimnesion (d) then = p^2 = 32^2 ?
Please clarify this
@bibiworm 5 месяцев назад ⁺¹
11:58 are you talking about ODE solver, ordinary differential equation? Thanks.
@regrefree 5 месяцев назад
Good explanation on the background part. Question, when you explained cross-attention, you said q=z and k,v = [c;t], they don't have the detail in the paper but I think it should be the other way q=[c;t] and k,v=z right?
@gabrielmongaras 5 месяцев назад
Usually the conditioning goes into the keys and queries such as in the Stable Diffusion paper. If you have Q, K, V of shape (N, d), (M, d), and (M, d) where N is the sequence length and M is the context length, then the output shape is SM[(N, d)(d, M)](M, d) -> (N, M)(M, d) -> (N, d). However, if we invert this then we have Q, K, V of shape (M, d), (N, d), and (N, d), then the output shape is SM[(M, d)(d, N)](N, d) -> (M, N)(N, d) -> (M, d) which is a sequence in terms of the conditioning sequence.
@regrefree 5 месяцев назад
@@gabrielmongaras Yep I agree. I just read the stable diffusion paper carefully, and you are right they use Q = Z and K, V = C. I would have guessed they would be reverse, since the output of the UNet is Z_T-1 using Z_T. Also their cross attentions weights' shape doesn't makes sense, I am sure they made a mistake. They should have said Q, V=Z and K=C.
@bibiworm 5 месяцев назад
14:13 don’t quite understand equation for x_t-1

Следующие

Автовоспроизведение

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

DiT: The Secret Sauce of OpenAI's Sora & Stable Diffusion 3

I Hosted A Streamer Only Hide & Seek!

I Hosted A Streamer Only Hide & Seek!

Easily 3 Star It's Over 9000 Challenge (Clash of Clans)

Easily 3 Star It's Over 9000 Challenge (Clash of Clans)

Creating EVEN WORSE Contraptions in The Enjenir

Creating EVEN WORSE Contraptions in The Enjenir

Deadpool and Wolverine - Cosmonaut Quickie

Deadpool and Wolverine - Cosmonaut Quickie

AI and The Next Computing Platforms With Jensen Huang and Mark Zuckerberg

AI and The Next Computing Platforms With Jensen Huang and Mark Zuckerberg

Planning with Diffusion for Flexible Behavior Synthesis

Planning with Diffusion for Flexible Behavior Synthesis

The U-Net (actually) explained in 10 minutes

The U-Net (actually) explained in 10 minutes

Diffusion Models | Paper Explanation | Math Explained

Diffusion Models | Paper Explanation | Math Explained

Stable/Latent Diffusion - High-Resolution Image Synthesis with Latent Diffusion Models Explained

Stable/Latent Diffusion - High-Resolution Image Synthesis with Latent Diffusion Models Explained

Denoising Diffusion Probabilistic Models | DDPM Explained

Denoising Diffusion Probabilistic Models | DDPM Explained

Stable Diffusion 3 IS FINALLY HERE!

Stable Diffusion 3 IS FINALLY HERE!

The Exercise Neuroscientist: NEW RESEARCH, The Shocking Link Between Exercise And Dementia!

The Exercise Neuroscientist: NEW RESEARCH, The Shocking Link Between Exercise And Dementia!

What Creates Consciousness?

What Creates Consciousness?

Eleftherios Petrounias helped by Arthur Zanetti = epic gala moment ❤️

Eleftherios Petrounias helped by Arthur Zanetti = epic gala moment ❤️

БОНК ЧОЙ В PLANTS VS ZOMBIES - ИМБА! И ВОТ ПОЧЕМУ...

БОНК ЧОЙ В PLANTS VS ZOMBIES - ИМБА! И ВОТ ПОЧЕМУ...

Невероятная находка в процессе чистке пляжа

Невероятная находка в процессе чистке пляжа

😱СБЕЖАЛИ С ЭДИСОНОМ В ТЕМНОТЕ ОТ ЗЛЫХ РОДИТЕЛЕЙ в SCHOOLBOY RUNAWAY в Майнкрафт..

😱СБЕЖАЛИ С ЭДИСОНОМ В ТЕМНОТЕ ОТ ЗЛЫХ РОДИТЕЛЕЙ в SCHOOLBOY RUNAWAY в Майнкрафт..

Я купила ВСЕ ЧТО мне РЕКЛАМИРОВАЛИ БЛОГЕРЫ

Я купила ВСЕ ЧТО мне РЕКЛАМИРОВАЛИ БЛОГЕРЫ

СУПРА, которая поставила на колени! Пол миллиона в ОЖИВЛЕНИЕ…

СУПРА, которая поставила на колени! Пол миллиона в ОЖИВЛЕНИЕ…

Вопрос Ребром - Субо

Вопрос Ребром - Субо

БМВ М4 сожгли дети. Вся правда. Будем восстанавливать!

БМВ М4 сожгли дети. Вся правда. Будем восстанавливать!