Diffusion Models - Live Coding Tutorial

dtransposed

Просмотров 24 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 сен 2024

Комментарии • 35

@outroutono4937 4 месяца назад ⁺¹
I have looked almost every video on this subject and this is by far the best approach, it's simple enough to be well understood but it gives all the tools to built more advanced models. I wish you could do a remake of this one because sometimes the code snippet is out of frame and sometimes its hard to read because of the font size. Thx a lot for this upload!
@dtransposed79 4 месяца назад
Thank you!
@danielfirebanks4973 Год назад ⁺¹³
Good tutorial, just wished that we could see the screen while you're coding, as most of the new lines you added were off-screen :/ Keep it up!
@VitaliyHAN Год назад ⁺⁴
Better font, but still can’t read not only the phone, that is main content consuming device, but even on my 13 inch MacBook. God bless I have 55 inch tv I can watch on. Even with such struggles I will continue to watch such a diamond video!
Thanks for video! Great content!
@dtransposed79 Год назад
Thank you for your comment!
@dontitube1394 Год назад
@@dtransposed79 One more tip, @ 34:36 and sometime so on, i cannot read the code you were writing. I mean litterly it is not in the video, but very informativ video.
@dtransposed79 Год назад
@@dontitube1394 Yeah. I think I will not be changing it now. A bit of a hiccup, but you can always look the code up in the attached notebook. Sorry for that tho.
@dontitube1394 Год назад ⁺¹
@@dtransposed79 yeah no worries, it was more ment as a tip for future videos
@adeolaogunleye7965 Год назад ⁺²
Thanks man, I really appreciate your work
@heera_ai Год назад ⁺¹
A great tutorial to start with!!!
@777chichi 6 месяцев назад
Thanks a lot! I really appreciate.
This tutorial explain clearly. Awesome!
Hope to see more tutorial vedios on your youtube channel, thanks.
@bbbaaa9421 Год назад
Thanks for sharing your work with us, Appreciate!,
@thepresistence5935 Год назад ⁺¹
Awesome one!
@TD_Dev 9 месяцев назад
Thanks a lot for your tutorial!
@kanakraj3198 Год назад ⁺¹
Great tutorial. Thanks for sharing.
Please make slightly advanced tutorials, like Conditonal (Image or Text) Generation of Images using Diffusion.
I see that there are very few advanced tutorials by any RUclipsr.
@paneercheeseparatha Год назад ⁺¹
You should have zoomed in the screen more so that its visible properly. Still appreciate your efforts! Nice vid.
@aleksandrrybnikov8701 Год назад
Hi, Damian! Nice videos!
@dtransposed79 Год назад
Thanks for dropping by Sasha!
@duyquangnguyen2664 9 месяцев назад
Thank for this video. Can you make video about apply high resolution for this project ?
@nqvst Год назад ⁺¹
Coming from a programming background, I always find it very strange to name variables by generic Greek letters or just X, Y. I am not criticizing your video specifically, it is a pattern that is very wide spread. But for example, you are naming the first parameter to the forward_diffusion function "x0". is it to save space? is it because you think it is easier to reference it from the mathematical formulas?
In my mind it would be much more clear if "x0" would be named "image". or am I misunderstanding your explanation maybe.
As I mentioned, I don't think your video is bad. I'm just curious as to why it is so common that code related to machine learning is generally so generically named.
@dtransposed79 Год назад ⁺¹
Interesting comment. I agree - some people indeed use more "mathematical" names, and others use more generic ones. Using the "mathematical" names comes from the fact that many of the ML code you can find online implements a logic showcased in a research paper. Since ML research borrows from the mathematical notation, the it is often convenient for the code to use the same notation, as long as they have the same context (read the paper, understand what the symbols mean). If you are confused, I would advice you to read the paper and even if you are confused by any concept, just try to grasp the high level meaning of the symbols. This would definitely help you with reading (and writing) your ML code in the future!
@nqvst Год назад
@@dtransposed79 Thanks for the reply! Yes it makes sens. if you understand the concept from reading the equations, it is more convenient to reuse the notation in the code.
And while following along this video i realized that some of the variable names gets really long if they are to be considered "good" variable names.
betas -> noise_amount
alphas -> preserved_image_data
alpha_hat_t -> cumulative_preserved_image_data_at_step
i think I'm just frustrated over not being fluent in the math language.
anyways, thanks for the video!
@МихаилЮрков-т1э Год назад ⁺¹
Thanks for tutorial.
Why posterior_variance_t = betas_t? Shouldn't it be equal to betas_t*(1 - alphas_cumprod_t_minus_1)/(1 - alphas_cumprod_t) according [Lil' Log]?
@dtransposed79 Год назад
Excellent question. Please refer to the original paper: arxiv.org/abs/2006.11239 Section 3.2. The short version: those two are the extreme values that we can set the posterior to. The choice will depend on the assumptions on x_0. My choice assumes that x_0 is sampled from Gaussian ~ N(0,1), while the other choice is optimal for x_0 deterministically set to one point.
@МихаилЮрков-т1э Год назад
@@dtransposed79 Yes, it's clear now. Thanks for the detailed answer.
@brunokemmer Год назад ⁺¹
Is there a difference between `result = alpha_hat.gather(-1, t)` and `result = alpha_hat[t]` ?
@dontitube1394 Год назад ⁺¹
No there is not, atleast for this kind of case. But for more information you can look at the documantation of torch.gather, which even states the equivalant indexing of arrays.
@dtransposed79 Год назад
@@dontitube1394 Yeah., that's right. Nevertheless, I suggest learning and using torch.gather, It is a really useful, powerful and efficient function.
@anshumansinha5874 Год назад
Hi, thanks for the video. But can you explain the part on how you introduce the positional encoding to the network? Also, can this model work for a feed forward neural network rather than a U-net ?
@dtransposed79 Год назад ⁺¹
Positional encodings in this paper directly mimic those introduced in the "Attention Is All You Need" paper. There are plenty of resources online that explain how that works.
In terms of the architecture, in theory, you could probably use any encoder-decoder architecture I think. But for images, UNet is the most fitting.
@chiscoduran9517 Год назад
Can you make a Image to Image tutorial?
@dtransposed79 Год назад
Could you be more concrete? Image-to-Image can mean multiple things.
@chiscoduran9517 Год назад
@@dtransposed79 for example a model capable of change colors to certain objects in an image, where input is an image and put is the same image with changes.
@playmaker2404 Год назад
can u say why output was not as fascinating and what can be done from here to make output clearer @dtransposed79
@WaiPanTam Год назад
Thanks man, I really appreciate your work

Следующие

Автовоспроизведение

Why Does Diffusion Work Better than Auto-Regression?