Diffusion Models | PyTorch Implementation

Outlier

Просмотров 93 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 янв 2025

Комментарии • 185

@outliier 2 года назад ⁺²¹
Link to the code: github.com/dome272/Diffusion-Models-pytorch
@bao-dai 2 года назад ⁺²
21:56 The way you starred your own repo makes my day bro 🤣🤣 really appreciate your work, just keep going!!
@outliier 2 года назад
@@bao-dai xd
@leif1075 Год назад
@@outliier Thanks for sharing but how do you not get bored or tired of doing the same thing for so long and deal with all the math?
@outliier Год назад
@@leif1075 I love to do it. I don’t get bored
@ananpinya835 Год назад
After I saw your next video "Cross Attention | method and math explained", I would like to see ControlNet's openpose in PyTorch Implementation which control posing on image of a dogs. Or if it is too complicate, you may simplify it to control 2 - 3 branches shape of a tree.
@javiersolisgarcia 11 месяцев назад ⁺⁸
This videos is crazy! I don't get tired of recommend it to anyone interesting in diffusion models. I have recently started to research with these type of models and I think your video as huge source of information and guidance in this topic. I find myself recurrently re-watching your video to revise some information. Incredible work, we need more people like you!
@outliier 11 месяцев назад ⁺³
Thank you so much for the kind words!
@aladinwunderlampe7478 2 года назад ⁺⁵⁶
Hello, this has become a great video once again. We didn't understand much, but it's still nice to watch. Greetings from home say Mam & Dad. ;-))))
@AICoffeeBreak 2 года назад ⁺³⁵
Great, this video is finally out! Awesome coding explanation! 👏
@astrophage381 7 месяцев назад ⁺²
These implementation videos are marvelous. You really should do more of them. Big fan of your channel!
@terencelee6492 Год назад ⁺⁷
We chose Diffusion Model as part of our course project, and your videos do save much of my time to understand the concepts and have more focus on implementing the main part. I am really grateful for your contribution.
@TheAero Год назад
This is my first few days of trying to understand diffusion models. Coding was kinda fun on this one. I will take a break for 1-2 months and study something related like GANs or VAE, or even energy-based models. Then comeback with more general understanding :) Thanks !
@zenchiassassin283 Год назад
And transformers for the attention mechanisms + positional encoding
@TheAero Год назад
I got that snatched in the past 2 months. Gotta learn the math, what is actually a distribution etc.@@zenchiassassin283
@MrScorpianwarrior 6 месяцев назад ⁺¹
Hey! I am start my CompSci Masters program in the Fall, and just wanted to say that I love this video.
I've never really had time to sit down and learn PyTorch, so the brevity of this video is greatly appreciated! It gives me a fantastic starting point that I can tinker around with, and I have an idea on how I can apply this in a non-conventional way that I haven't seen much research on...
Thanks again!
@outliier 6 месяцев назад ⁺¹
Love to hear that
Good luck on your journey!
@Mandollr Год назад ⁺¹
After my midterm week i wanna study diffusion models with your videos im so exited .thanks a lot for good explanation
@stevemurch3245 2 года назад ⁺²
Incredible. Very thorough and clear. Very, very well done.
@rewixx69420 2 года назад ⁺¹
I was wating for so long i learnd about condicional difusion models
@yingwei3436 Год назад ⁺¹
thank you so much for your detailed explaination of the code. It helped me a lot on my way of learning diffusion model. Wish there are more youtubers like you!
@Miurazzo 2 года назад ⁺¹⁰
Hi, @Outlier , thank you for the awesome explanation !
Just one observation, I believe in line 50 of your code (at 19:10) it should be:
uncond_predicted_noise = model(x,t,None)
😁
@outliier 2 года назад ⁺⁶
good catch thank you. (It's correct in the github code tho :))
@saltukkezer5100 2 дня назад
Haha, thanks. I also saw the same error and wanted to point it out!
@ethansmith7608 Год назад
this is the most underrated channel i've ever seen, amazing explanation !
@outliier Год назад
thank you so much!
@potisseslikitap7605 2 года назад ⁺³
This channel seems to be growing very fast. Thanks for this amazing tutorial.🤩
@subtainmalik5182 Год назад
most informative and easy to understand video on diffusion models on youtube, Thanks Man
@FLLCI 2 года назад ⁺¹
This video is really timely and needed. Thanks for the implementation and keep up the good work!
@mmouz2 Год назад
Sincere gratitude for this tutorial, this has really helped me with my project. Please continue with such videos.
@haoxu3204 Год назад
The best video for diffusion! Very Clear
@prabhavkaula9697 2 года назад
Thank you for sharing the implementation since authentic resources are rare
@manuelsebastianriosbeltran972 2 года назад ⁺¹
Congrats, This is a great channel!! hope to see more of these videos in the future.
@gaggablagblag9997 Год назад
Dude, you're amazing! Thanks for uploading this!
@ZhangzhiPeng-x8r 2 года назад ⁺²
great tutorial! looking to seeing more of this! keep it up!
@pratyanshvaibhav 7 месяцев назад
The Under rated OG channel
@vinc6966 Год назад
Amazing tutorial, very informative and clear, nice work!
@NickSergievskiy 2 года назад
Thank you. Best explanation with good DNN models
@947973 Год назад
Very helpful walk-through. Thank you!
@smnvrs Год назад
Thanks, this implementation really helped clear things up.
@qq-mf9pw Год назад
Incredible explanation, thanks a lot!
@yuhaowang9846 Год назад
Thank you so much for this sharing, that was perfect!
@dylanwattles7303 11 месяцев назад
nice demonstration, thanks for sharing
@DiogoSanti 11 месяцев назад
Very well done! Keep the great content!!
@henrywong741 Год назад ⁺²
Could you please explain the paper "High Resolution Image Synthesis With Latent Diffusion Models" and its implementations? Your explanations are exceptionally crystal.
@LMonty-do9ud Год назад
Thank you very much, it has solved my urgent need
@yazou3896 Год назад
It's definitely cool and helpful! Thanks!!!
@SeonhoonKim Год назад ⁺¹
Hello, thanks for your a lot contribution ! But a bit confused, At 06:04, just sampling from N(0, 1) totally randomly would not have any "trace" of an image. How come the model infer the image from the totally random noise ?
@outliier Год назад ⁺¹
Hey there, that is sort of the "magic" of diffusion models which is hard to grasp your mind around. But since the model is trained to always see noise between 0% and 100% it sees full noise during training for which it is then trained to denoise it. And usually when you provide conditioning to the model such as class labels or text information, the model has more information than just random noise. But still, unconditional training still works.
@talktovipin1 2 года назад
Very nicely explained. Thanks.
@scotth.hawley1560 11 месяцев назад
Wonderful video! I notice that at 18:50, the equation for the new noise seems to differ from Eq. 6 in the CFG paper, as if the unconditioned and conditioned epsilons are reversed. Can you comment on that?
@salehgholamzadeh3368 3 месяца назад
Great video
I faced a question at 19:10 line 50 of the code. why do we call
```model(x,label,None)```
what happened to t? shouldn't we instead call it like ```model(x,t,None)``` ??
also line 17 in ema (20:31) ```retrun old * self.beta +(1+self.beta) * new``` why 1+self.beta? shouldnt it be 1-self.beta?
@chickenp7038 2 года назад ⁺²
great walkthrough. but where would i implement dynamic or static thresholding as described in the imagen paper? the static thresholding clips all values larger then 1 but my model regularly outputs numbers as high as 5. but it creates images and loss decreases to 0.016 with SmoothL1Loss.
@jamesfogwill1455 5 месяцев назад ⁺¹
Roughly how long does an Epoch take for you? I am using rtx3060 mobile and achieving an epoch every 24 minutes. Also i cannot work with a batch size greater than 8 and a img size greater than 64 because it overfills my GPUs 6gb memory. I thought this was excessive for such small batch and img size?
@xuefengdu6926 2 года назад
thanks for your amazing efforts!
@talktovipin1 2 года назад
Looking forward for some video on Classifier Guidance as well. Thanks.
@pedrambazrafshan9598 2 месяца назад
Could you also make a video on how to implement DDIM? Or make a GitHub repository about it?
@Kooshiar Год назад
best diffusion youtube
@김남형산업공학과한양 2 года назад
Thank you for sharing!
@rachelgardner1799 Год назад
Fantastic video!
@nez2884 2 года назад
awesome implementation!
@kashishmathukiya8091 Год назад
8:38 in the UNet section, how do you decide on the number of channels to set in both input and output to the Down and Up classes. Why just 64,128, etc. ?
@outliier Год назад ⁺¹
People just go with powers of 2 usually. And usually you go to more channels in the deeper layers of the network.
@kashishmathukiya8091 Год назад
@@outliier oh okay got it. Thank you so much for clearing that and for the video! I had seen so many videos / read articles for diffusion but yours were the best and explained every thing which others considered prerequisites!! Separating the paper explanation and implementation was really helpful.
@junghunkim8467 Год назад
it is very helpful!! You are a genius.. :) thank you!!
@LeeYuanZ Год назад
Thank you so much for this amazing video! In mention that the first DDPM paper show no necessary of lower bound formulation, could you tell me the specific place in the paper? thanks!
@doctorshadow2482 Год назад
Thank you for the review. So, what is the key to make a step from text description to image? Can you please pinpoint where it is explained?
@Gruell 8 месяцев назад
Sorry if I am misunderstanding, but at 19:10, shouldn't the code be:
"uncond_predicted_noise = model(x, t, None)" instead of "uncond_predicted_noise = model(x, labels, None)"
Also, according to the CFG paper's formula, shouldn't the next line be: "predicted_noise = torch.lerp(predicted_noise, uncond_predicted_noise, -cfg_scale)" under the definition of lerp?
One last question: have you tried using L1Loss instead of MSELoss? On my implementation, L1 Loss performs much better (although my implementation is different than yours). I know the ELBO term expands to essentially an MSE term wrt predicted noise, so I am confused as to why L1 Loss performs better for my model.
Thank you for your time.
@Gruell 8 месяцев назад
Great videos by the way
@Gruell 8 месяцев назад
Ah, I see you already fixed the first question in the codebase
@WendaoZhao 7 месяцев назад ⁺¹
one CRAZY thing to take from this code (and video)
GREEK LETTERS ARE CAN BE USED AS VARIABLE NAME IN PYTHON
@ParhamEftekhar 7 месяцев назад
Awesome video.
@mic9657 Год назад
great video. please can you list the creators of the other helpful videos at 00:52? thanks
@outliier Год назад
There are from Yannick Kilcher (on the right side), the one in the lower left is from AICoffeeBreak, the one in the top right corner is the first video that comes when you google „diffusion models explained“ and I forgot the middle one sorry. But shouldnt be hard to find
@mcpow6614 Год назад ⁺²
Can you do one for tensorflow too btw very good explaination
@4_alokk Год назад ⁺¹
How did you learn do much?
@outliier Год назад ⁺¹
I read a lot of papers and watched a lot of tutorials
@luchaoqi 2 года назад ⁺¹
Awesome! How did you type Ɛ in code?
@Sherlock14-d6x 6 месяцев назад
Why is the bias off in the initial convolutional block?
@orestispapanikolaou9798 2 года назад
Great video!! You make coding seem like playing super mario 😂😂
@gordondou2286 Год назад
Can you please explain how to use Woodfisher technique to approximate second-order gradients? Thanks
@Neptutron 2 года назад ⁺¹
Thank you!!
@susdoge3767 11 месяцев назад
having hard time to understand the mathematical and code aspect of diffusion model although i have a good high level understanding...any good resource i can go through? id appreciate it
@houbenbub 2 года назад
This is GOLD
@janevirahman9904 11 месяцев назад
Hi , I want to use a single underwater image dataset what changes do i have to implement on the code?
@signitureDGK Год назад
Very cool. How would DDIM models be different? Do they use a deterministic denoising sampler?
@outliier Год назад
yes indeed
@maybritt-sch 2 года назад ⁺¹
Great videos on diffusion models, very understandable explanations! For how many hours did you train it? I tried adjusting your conditional model and train with a different dataset, but it seems to take forever :D
@outliier 2 года назад ⁺¹
Yea it took quite long. On the 3090 it trained a couple days (2-4 days I believe)
@maybritt-sch 2 года назад
@@outliier Thanks for the feedback. Ok seems like I didn't do a mistake, but only need more patience!
@outliier 2 года назад
@@maybritt-sch Yea. Let me know how it goes or if you need help
@anonymousperson9757 2 года назад ⁺¹
Thank you so much for this amazing video! You mention that changing the original DDPM to a conditional model should be as simple as adding in the condition at some point during training. I was just wondering if you had any experience with using DDPM to denoise images? I was planning on conditioning the model on the input noisy data by concatenating it to yt during training. I am going to try and play around with your github code and see if I can get something to work with denoising. Wish me luck!
@spartancoder 2 года назад
This video is priceless.
@spyrosmarkesinis443 2 года назад
Amazing stuff!
@versusFliQq Год назад
Really nice video! I also enjoyed your explanation video - great work in general :)
However, I noticed at around 5:38, you are defining sample_timesteps with low=1. I am pretty sure that this is wrong, as Python indexes at 0 meaning you skip the first noising step every time you access alpha, alpha_cumprod etc. Correct me if I am wrong but all the other implementations also utilise zero-indexing.
@arpanpoudel Год назад
this function sample the timesteps of the denoising step. selecting time=0 is the original image itself. there is no point in taking 0 timestep.
@cryoemenjoyer 2 года назад
6:57 Why the formula is ... + torch.sqrt(beta) instead of calculated posterior variance like in paper?
@outliier 2 года назад
Which paper are you referring to? In the first paper, you would just set the variance to beta and since you add the std * noise you take the sqrt(beta)
@sweetautumnfox 10 месяцев назад
With this training method, wouldn't there be a possibility of some timesteps not being trained in an epoch? wouldn't it be better to shuffle the whole list of timesteps and then sample sequentially with every batch?
@kerenye955 Год назад
Great video!
@ovrava 2 года назад
Great Video,
On what Data did you train your model again?
@homataha5626 2 года назад
Thank you for the video.
How can we use diffusion model for inpainting?
@chyldstudios 2 года назад ⁺²
Where is the link to the code?
@outliier 2 года назад ⁺²
sorry I totally forgot to put the link in the description. I updated it now, but here is the link too: github.com/dome272/Diffusion-Models-pytorch
@chyldstudios 2 года назад
@@outliier There are Diffusion implementations out there that are a lot longer but it also makes it harder to understand what is happening. You removing all the unnecessary parts and just focusing on the absolute minimum is much better in my opinion. Well done.
@egoistChelly Год назад
I think your code bugs when adjust image_size?
@UnbelievableRam 9 месяцев назад
Hi! Can you please explain why the output is getting two stitched images?
@outliier 9 месяцев назад
What do you mean with two stitched images?
@muhammadawais2173 Год назад
thanks for the easiest implementation. could you plz tell us how to find FID and IS score for these images?
@outliier Год назад ⁺¹
I think you would just sample 10-50k images from the trained model and then take 10-50k images from the original dataset and then calculate the FID and IS
@muhammadawais2173 Год назад
@@outliier thanks
@nomaannafi7561 Год назад
How can i increase the size of the generated image here?
@SkyHighBeyondReach 5 месяцев назад
Thanks alot :)
@Laszer271 Год назад
There is a slight bug at 19:11
it should be
uncond_predicted_noise = model(x, t, None)
and not
uncond_predicted_noise = model(x, labels, None)
@outliier Год назад
Yes correct. Good catch
@satpalsinghrathore2665 2 года назад
Super cool
@sandravu1541 2 года назад
great video, you got one new subscriber
@remmaria 2 года назад
Your videos are a blessing. Thank you very much!!! Have you tried using DDIM to accelerate predictions? Or any other idea to decrease the number of steps needed?
@outliier 2 года назад ⁺¹
I have not tried any speedups in any way. But feel free to try it out and tell me / us what works best. In the repo I do linked a fork which implements a couple additions which make the training etc. faster. You can check that out too here: github.com/tcapelle/Diffusion-Models-pytorch
@remmaria 2 года назад
@@outliier Thank you! I will try it for sure.
@agiengineer Год назад
Can you please tell me how much time was need to train this 3000 image for 500 Epoch?
@khyatinkadam8032 6 месяцев назад
hey can we use an image as a condition
@Soso65929 10 месяцев назад
So the process of adding noise and removing it happens in a loop
@decode168 Месяц назад
Very great topic. Could you please make another video of generating text to text images? Example: cat with “hello world “, so the model could generate the picture. Thanks ❤
@outliier Месяц назад
@@decode168 usually this is learnt when you train a big model on a lot of data automatically. So there is no specific technique for this
@jinhengfeng6440 2 года назад
terrific！
@marcotommasini5600 2 года назад
Great video, thanks for making it. I started working with diffusion models very recently and I used you implementation as base for my model. I am currently facing a problem that the MSE loss starts very close to 1 and continues like that but varying between 1.0002 and 1.0004, for this reason the model is not training properly. Did you face any issue like this one? I am using the MNIST dataset to train the network, I wanted to first test it with some less complex dataset.
@justinsong3506 Год назад
I am facing similar problems. I did the experiment on CIFAR10 dataset. The mse loss starts descresing normally but at some points the loss increse to 1 and never descrese again.
@zedtarwu3074 2 года назад
Great video! How long did it take to train the models?
@outliier 2 года назад
About 3-4 days on an rtx 3090.
@ankanderia4999 9 месяцев назад
`
x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device)
predicted_noise = model(x, t)
`
in the deffusion class why you create an noise and pass that noise into the model to predict noise ... please explain
@colintsang-ww6mz Год назад
Thank you very much for this very easy-to-understand implementation. I have one question: I don't understand the function def noise_images.
Assume that we have img_{0}, img_{1}, ..., img_{T}, which are obtained from adding the noise iteratively. I understand that img{t} is given by the formula "sqrt_alpha_hat * img_{0} + sqrt_one_minus_alpha_hat * Ɛ".
However, I don't understand the function "def noise_images(self, x, t)" in [ddpm.py].
It return Ɛ, where Ɛ = torch.randn_like(x). So, this is just a noise signal draw directly from the normal distribution. I suppose this random noise is not related to the input image? It is becasue randn_like() returns a tensor with the same size as input x that is filled with random numbers from a normal distribution with mean 0 and variance 1
In training, the predicted noise is compared to this Ɛ (line 80 in [ddpm.py]).
Why we are predicting this random noise? Shouldn't we predict the noise added at time t, i.e. "img_{t} - img_{t-1}"?
@Laszer271 Год назад ⁺¹
I had the same misconception before. It was actually explained by "AI Coffee Break with Letitia" channel in a video titled "How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED".
Basically, the model tries to predict the WHOLE noise added to the image to go from noised image to a fully denoised image in ONE STEP. Because it's a hard task to do, the model does not excel at that so at inference we denoise it iteratively, each time subtracting only a small fraction of the noise predicted by the model. In this way, the model produces much better quality samples. At least that's how I understood it :P
@rikki146 Год назад
@@Laszer271 While I understand it predicts the "whole noise", this "whole noise" is newly generated and I suppose the ground truth is (img_{t} - img_{0)).. still can't wrap my head around it.
@rawsok Год назад
You do not use any LR scheduler. Is this intentional? My understanding is that EMA is a functional equivalent of LR scheduler, but then I do not see any comparison between EMA vs e.g. cosine LR scheduler. Can you elaborate more on that?
@wizzy1996pl Год назад
last self attention layer (64, 64) changes my training type from 5 minutes to hours per epoch, do you know why?
training on a single rtx 3060 TI gpu
@LonLat1842 2 года назад
Nice tutorial
@pedrambazrafshan9598 Год назад
@outliier Do you think there is a way to run the code with a 3060 GPU on personal desktop? I get the error message: CUDA out of memory.
@MrScorpianwarrior 6 месяцев назад
Random person 6 months later, but you could try decreasing the batch size during training. Your results may not look like what he got in the video though!

Следующие

Автовоспроизведение

Flow Matching | Explanation + PyTorch Implementation