I have looked almost every video on this subject and this is by far the best approach, it's simple enough to be well understood but it gives all the tools to built more advanced models. I wish you could do a remake of this one because sometimes the code snippet is out of frame and sometimes its hard to read because of the font size. Thx a lot for this upload!
Better font, but still can’t read not only the phone, that is main content consuming device, but even on my 13 inch MacBook. God bless I have 55 inch tv I can watch on. Even with such struggles I will continue to watch such a diamond video! Thanks for video! Great content!
@@dtransposed79 One more tip, @ 34:36 and sometime so on, i cannot read the code you were writing. I mean litterly it is not in the video, but very informativ video.
@@dontitube1394 Yeah. I think I will not be changing it now. A bit of a hiccup, but you can always look the code up in the attached notebook. Sorry for that tho.
Great tutorial. Thanks for sharing. Please make slightly advanced tutorials, like Conditonal (Image or Text) Generation of Images using Diffusion. I see that there are very few advanced tutorials by any RUclipsr.
Coming from a programming background, I always find it very strange to name variables by generic Greek letters or just X, Y. I am not criticizing your video specifically, it is a pattern that is very wide spread. But for example, you are naming the first parameter to the forward_diffusion function "x0". is it to save space? is it because you think it is easier to reference it from the mathematical formulas? In my mind it would be much more clear if "x0" would be named "image". or am I misunderstanding your explanation maybe. As I mentioned, I don't think your video is bad. I'm just curious as to why it is so common that code related to machine learning is generally so generically named.
Interesting comment. I agree - some people indeed use more "mathematical" names, and others use more generic ones. Using the "mathematical" names comes from the fact that many of the ML code you can find online implements a logic showcased in a research paper. Since ML research borrows from the mathematical notation, the it is often convenient for the code to use the same notation, as long as they have the same context (read the paper, understand what the symbols mean). If you are confused, I would advice you to read the paper and even if you are confused by any concept, just try to grasp the high level meaning of the symbols. This would definitely help you with reading (and writing) your ML code in the future!
@@dtransposed79 Thanks for the reply! Yes it makes sens. if you understand the concept from reading the equations, it is more convenient to reuse the notation in the code. And while following along this video i realized that some of the variable names gets really long if they are to be considered "good" variable names. betas -> noise_amount alphas -> preserved_image_data alpha_hat_t -> cumulative_preserved_image_data_at_step i think I'm just frustrated over not being fluent in the math language. anyways, thanks for the video!
Thanks for tutorial. Why posterior_variance_t = betas_t? Shouldn't it be equal to betas_t*(1 - alphas_cumprod_t_minus_1)/(1 - alphas_cumprod_t) according [Lil' Log]?
Excellent question. Please refer to the original paper: arxiv.org/abs/2006.11239 Section 3.2. The short version: those two are the extreme values that we can set the posterior to. The choice will depend on the assumptions on x_0. My choice assumes that x_0 is sampled from Gaussian ~ N(0,1), while the other choice is optimal for x_0 deterministically set to one point.
No there is not, atleast for this kind of case. But for more information you can look at the documantation of torch.gather, which even states the equivalant indexing of arrays.
Hi, thanks for the video. But can you explain the part on how you introduce the positional encoding to the network? Also, can this model work for a feed forward neural network rather than a U-net ?
Positional encodings in this paper directly mimic those introduced in the "Attention Is All You Need" paper. There are plenty of resources online that explain how that works. In terms of the architecture, in theory, you could probably use any encoder-decoder architecture I think. But for images, UNet is the most fitting.
@@dtransposed79 for example a model capable of change colors to certain objects in an image, where input is an image and put is the same image with changes.
I have looked almost every video on this subject and this is by far the best approach, it's simple enough to be well understood but it gives all the tools to built more advanced models. I wish you could do a remake of this one because sometimes the code snippet is out of frame and sometimes its hard to read because of the font size. Thx a lot for this upload!
Thank you!
Good tutorial, just wished that we could see the screen while you're coding, as most of the new lines you added were off-screen :/ Keep it up!
Better font, but still can’t read not only the phone, that is main content consuming device, but even on my 13 inch MacBook. God bless I have 55 inch tv I can watch on. Even with such struggles I will continue to watch such a diamond video!
Thanks for video! Great content!
Thank you for your comment!
@@dtransposed79 One more tip, @ 34:36 and sometime so on, i cannot read the code you were writing. I mean litterly it is not in the video, but very informativ video.
@@dontitube1394 Yeah. I think I will not be changing it now. A bit of a hiccup, but you can always look the code up in the attached notebook. Sorry for that tho.
@@dtransposed79 yeah no worries, it was more ment as a tip for future videos
Thanks man, I really appreciate your work
A great tutorial to start with!!!
Thanks a lot! I really appreciate.
This tutorial explain clearly. Awesome!
Hope to see more tutorial vedios on your youtube channel, thanks.
Thanks for sharing your work with us, Appreciate!,
Awesome one!
Thanks a lot for your tutorial!
Great tutorial. Thanks for sharing.
Please make slightly advanced tutorials, like Conditonal (Image or Text) Generation of Images using Diffusion.
I see that there are very few advanced tutorials by any RUclipsr.
You should have zoomed in the screen more so that its visible properly. Still appreciate your efforts! Nice vid.
Hi, Damian! Nice videos!
Thanks for dropping by Sasha!
Thank for this video. Can you make video about apply high resolution for this project ?
Coming from a programming background, I always find it very strange to name variables by generic Greek letters or just X, Y. I am not criticizing your video specifically, it is a pattern that is very wide spread. But for example, you are naming the first parameter to the forward_diffusion function "x0". is it to save space? is it because you think it is easier to reference it from the mathematical formulas?
In my mind it would be much more clear if "x0" would be named "image". or am I misunderstanding your explanation maybe.
As I mentioned, I don't think your video is bad. I'm just curious as to why it is so common that code related to machine learning is generally so generically named.
Interesting comment. I agree - some people indeed use more "mathematical" names, and others use more generic ones. Using the "mathematical" names comes from the fact that many of the ML code you can find online implements a logic showcased in a research paper. Since ML research borrows from the mathematical notation, the it is often convenient for the code to use the same notation, as long as they have the same context (read the paper, understand what the symbols mean). If you are confused, I would advice you to read the paper and even if you are confused by any concept, just try to grasp the high level meaning of the symbols. This would definitely help you with reading (and writing) your ML code in the future!
@@dtransposed79 Thanks for the reply! Yes it makes sens. if you understand the concept from reading the equations, it is more convenient to reuse the notation in the code.
And while following along this video i realized that some of the variable names gets really long if they are to be considered "good" variable names.
betas -> noise_amount
alphas -> preserved_image_data
alpha_hat_t -> cumulative_preserved_image_data_at_step
i think I'm just frustrated over not being fluent in the math language.
anyways, thanks for the video!
Thanks for tutorial.
Why posterior_variance_t = betas_t? Shouldn't it be equal to betas_t*(1 - alphas_cumprod_t_minus_1)/(1 - alphas_cumprod_t) according [Lil' Log]?
Excellent question. Please refer to the original paper: arxiv.org/abs/2006.11239 Section 3.2. The short version: those two are the extreme values that we can set the posterior to. The choice will depend on the assumptions on x_0. My choice assumes that x_0 is sampled from Gaussian ~ N(0,1), while the other choice is optimal for x_0 deterministically set to one point.
@@dtransposed79 Yes, it's clear now. Thanks for the detailed answer.
Is there a difference between `result = alpha_hat.gather(-1, t)` and `result = alpha_hat[t]` ?
No there is not, atleast for this kind of case. But for more information you can look at the documantation of torch.gather, which even states the equivalant indexing of arrays.
@@dontitube1394 Yeah., that's right. Nevertheless, I suggest learning and using torch.gather, It is a really useful, powerful and efficient function.
Hi, thanks for the video. But can you explain the part on how you introduce the positional encoding to the network? Also, can this model work for a feed forward neural network rather than a U-net ?
Positional encodings in this paper directly mimic those introduced in the "Attention Is All You Need" paper. There are plenty of resources online that explain how that works.
In terms of the architecture, in theory, you could probably use any encoder-decoder architecture I think. But for images, UNet is the most fitting.
Can you make a Image to Image tutorial?
Could you be more concrete? Image-to-Image can mean multiple things.
@@dtransposed79 for example a model capable of change colors to certain objects in an image, where input is an image and put is the same image with changes.
can u say why output was not as fascinating and what can be done from here to make output clearer @dtransposed79
Thanks man, I really appreciate your work