How can a model that is only 3.2GB, produce almost infinite image combinations that can be produced from just a simple text prompt, with so many language variables. What I am interested in, is how a prompt of say a "monkey riding a bicycle" can produce something that visually represents the prompt. How are the data images tagged and categorized in training to do this? As a creative person we often say that an idea is still misty and is not formed yet. What strikes me about this diffusion process is the similarity in how our minds at a creative level seem to work. We iterate and de-noise the concept until it becomes concrete using a combination of imagination and logic. It is the same process that you described to arrive at the finished formula. What also strikes me about the images produced by these diffusion algorithms is that they look so creative and imaginative. Even artists are shocked when they see them for the first time and realize a machine made them. My line of thinking here is that we use two main tools to acquire and simulate knowledge and experience. They are images and language. Maybe this input is then stored in a similar way as a diffusion model within our memory. Logic, creativity and ideas are just a consequence of reconstituting this data due to our current social or environmental needs. This could explain our thinking process and why our memory is of such low resolution. The de-noising process could also explain many human conditions such as depression and even why we dream etc. This brings up the interesting question " Could a diffusion model be created to simulate a human personality"? Or provide new speed think concepts and formulas for the solving of a multitude of complex problems for that matter. The path would be 1) diffusion model idea/concept 2) ask a GAN like gpt-3 to check if it works 3) feed back to the diffusion model and keep iterating until it does in much the same way as de-noising a picture. Just a thought from a diffusion brain.
Thank you. You are the only page I've found that makes any attempt at explaining this aspect of generative ai. The starting with random noise, making a change, and comparing to the example db/training part is something all the pissed off artists don't seem to understand.
Fascinating. I enjoyed at 0.5x speed. I don’t really understand the math, but I appreciate going through this explanation. Maybe a version of this that walks through each step in even more detail, or with some visuals that explains the diffusion math with assumption of not knowing the notation etc? Objective being: explaining how diffusion works, and more broadly then, why it works, and even more broadly, what that means for what else might be built with diffusion. For example, using diffusion for generating videos.
Explaining the math in-depth to someone new to the subject is like an entire university level course in Bayesian probability and statistics. It's probably a better idea to brush up on the relevant concepts first and then return to the video when ready, as there's already a shortage of high-quality videos on advanced ML that assumes some prerequisite knowledge
@@GS-tk1hk Hi. This is the kind of comment that makes me aware of the current situation surrounding ML and related subjects. Thanks for the information. It appears that there's no easy route for learning about the inner workings of modern/popular ML paradigms; the current set of RUclips content is either hyper-simple (basic info, "hey this is cool" type stuff, GitHub demos, small examples, easy UI/frontends for advanced models, etc) or there's content like this video with 100% in depth information, mathematical data, graphing, explanation into the processes behind the scenes, etc... I'm a CompSci grad who's worked in software dev for a decade. I love the work I've done in a dozen languages over time and understand most coding paradigms & statistics. However, I never really got into the mathematical notation of my hobbies. If I was writing an algorithmic piece in Python or Java, it would not have any real mathemetical documentation associated with it. Videos like this make me realise that there's a whole host of computer scientists who genuinely keep the "scientist" denomination alive. I just wished there was more content available for those people who are new to ML but have absolutely zero experience of this level of mathematics outside of college-level algebra.
This is an amazing channel, very instructive, structured and easy to understand. Instant sub and hobe you make more videos over the coming months and years
Very cool. I've been meaning to learn about diffusion models. All I knew was that they 1) are beating GANs recently and 2) do that noise -> to image mapping. Very interesting to see the details here. Exciting stuff. Also, glad to see you're back making content :)
Thank you for the nice explanation of the diffusion model, though maybe one small question, at 13:43 of the video you mentioned "condition on the full clear context", should it be ''condition on the grey/cropped image''? Since if we already have the full clear image, there should be no need to use the diffusion model anymore. Or maybe I just misunderstood the problem here.
Thanks for the great tutorial! It captures most of the recent papers on diffusion models. Looks like there is a typo in 12:39 in the formula for "classifier-free diffusion guidance": in the right-hand side, there should be an open bracket after the guidance scale "s" and a closed bracket at the end.
Great video. About the GAN vs SD I honestly believe SD will "win", I myself have tried to train GANs never succeeded once as training is a finicky process, on the other hand a barely functional SD model is much more capable and easier to train.
I looked for a video that explains the concept well. And this video did help me understand it better. For large transformer langauge models, people are 'probing' the layers and activations to try and understand how the model works and what it's laten weights mean. Essentially trying to find signs of intelligence by seeing linguistic concepts. Now I am wondering about how probing sich a diffusing model could work and if it will be possible to extract and maybe even inject intermediate representations of the many hidden layers. And if it shows any parallels to for example computer rendering, drawing and painting etc.
At 3:55 you say that choosing a small beta such that x_t is close to x_{t-1} justifies using a uni-modal Gaussian for the "posterior of the forward process q(x_{t-1} | x_t)". This is the same as saying "for the transition probability of the reverse process p_theta(x_{t-1} | x_t)", right? I am mostly confused, because the forward process itself is referred to as the posterior in the paper, but you are talking about the posterior OF the forward process, right? Please correct me if I am speaking nonsense. Awesome video, thanks!
Thanks for this video Ari. I will admit the math is a little bit beyond me, but I'm slowly understanding various aspects of this process. One thing that I've been trying to wrap my mind around -- is it fair to say that there is some position in latent space that represents the solution from a particular piece of training data? IE. Are there 3 discrete solutions to the guidance of "arctic fox" stored somewhere? Or, in the conditional concept is "arctic fox" constantly getting pushed/pull on by the inference of more and more "arctic fox" labelled data (therefore never having a true latent space representation of a single image.)
On 9:28 you have q(x_{t-1} | x_t, x_0), isn't it supposed to be another way around q(x_t | x_{t-1}, x_0), since backward process is moving towards x_0, not away from it?
2:23 What does that capital N function mean? Does it mean we are pulling xt from a gaussian distribution which has a mean as given by 2nd thingy after semi colon and variance given by last thingy before ending bracket?
The elb is take from bnns And i belive the name is kinda missleading. Its basicly the log of bayses law and we ignore the part that is reliant on the probabilety of the evidance. Now that part is not effected by the model so it dosent effect derivatives
Am i stupid to still not fully understand this outside of the basic concept of how noise is used? The video is very well produced, and i think its great teaching content. But maybe i lack the mathematical foundation to follow all the explanations here.
Thanks for the great video, Ari. Do you have any course recommendations to better understand the basics behind Diffusion Models? I would love to learn more about them.
In case you didnt figure this out; the markov assumption and conditional independence of each step of p_theta means that p_theta(x_{0:T}) can be decomposed to the multiplication of each step p_theta(x_i|x_i+1:T).
Incredibly insightful explanation here, but mostly useful for those who already know what you mean. I think you'll need to be a bit more careful about how you word things before you'll find a broader audience with ambitious newbie learners. Great work!
Thank you so much for this wonderful explanation! It takes a true skill and dedication to be able to explain complicated formulas with such organic fluency that they became intuitively understandable. You saved me a lot of digging with this video! Do you have Patreon? I wish I could support you somehow!
If you have familiarity with probability and calculus, read up on variational inference and Langevin dynamics. These are the building block concepts of the formulas in this video
Watched twice but the math ist still above me, but fell asleep very nicely 👍 the style from 3blue1one is very appropriate here. Maybe revisit the topic with longer explanations and better derivations and better production quality once you're bigger? Thanks for the hard work you put behind these.
I did not yet understand how this enables models to draw something like a cow wrapped in spaghetti with that high fidelity. What component in this process has this world knowledge incorporated of how to creatively combine such concepts and draw it photoreallistically.
Excellent video. Could someone please help me understand what is meant by the "posterior of the forward process"? From Google, I see "posterior" means later in time. But the question is posed as q(x_t-1 | x_t) = ? (3:23). In my eyes, this equation is saying "forward process of the sample in a previous step given the next step is unknown". Is this interpretation correct? And if so, how is this task related to finding the posterior? I'm sorry, I've been trying to wrap my head around this for several hours now and have not been able to figure out what a posterior distribution means, and even what is meant by "distribution". Distribution of what? Pixel values? Measurements? What are the x and y-axes shown at 3:23? Please any help on this would be greatly appreciated.
Yeah, it seems that was worded a bit carelessly. The forward distribution is q, which describes how the noise is added in the forward path or direction. But in the context of your question about how this relates to a posterior, it's then the reverse path that is relevant. So it's the posterior on q wrt the reverse path. And "distribution" is a standard term for probability, as in probability distribution.
Quite hard to understand, some notations (like I, what is theta, why is there a sum of tuples on theta but theta doesn't appears in x_t and t, how do we define the gaussian function at the beginning) aren't defined These math are quite different to classical deep learning approaches, so I am a bit lost
How am I supposed to read or understand the semicolon notation in the normal normal distribution's mean parameter (reverse process section)? I take this to mean that x_{t-1} is parameterized by the \mu_\theta function that acted on x_t, given knowledge of the time step. In that case, why isn't the same notation applied to the covariance (i.e. \Sigma_{t-1} ; \gamma_\theta(x_t, t) where \gamma is some function)?
do people not feel these probability symbols are so annoying to read and the actual computation flow is so much more intuitive for people to understand? these simple ideas don't have to be explained in these long, abstract and ugly formula.
Problem with these `explanation` videos is, they don't explain WHY WE ADD NOISE TO A COHERENT IMAGE? What is our goal? And why do we REVERSE IT? What do we achieve by this?
The reverse part is the part we actually want, we're training the AI to get an image out of noise. We want to be able to generate images out of nothing, basically. But we have to start from somewhere, so as part of the training process, we turn actual images gradually into noise and then train the AI on how well it can get it back to that original. For applications, you can see the AI-generated art that's out there right now, like Dalle 2 and Midjourney as some examples.
the expression "add noise" is only there in order to make the reverse logic easier to compare to. In reality noise is not added by humans or at least not intentionally. Think of an old photo from 1900s in which you can't see people's faces clearly or a security footage where you can't tell who is who due to the lack of light or low quality cameras. Those are noises. To reverse is to make images appear clearer (although not necessarily accurate) when the originals are not. That is the achivement, and something people working in the imagary-related professions have always wanted. This is only the most basic idea tho. Diffusion can achieve much more than just enhancing photos.
I gotta say this video loses me immediately as soon as you get into formulaic notation. And I am a driver software dev who does math regularly. I just don't think this video is well explained, unless you are already very good at math. 😮
This video is a great summary. I absolutely recomend to read this paper before/after to better understand the math and intuitions behind it. arxiv.org/pdf/2208.11970.pdf
Thank you for not shying away from the math. That level of detail is very much needed for us in industry.
How can a model that is only 3.2GB, produce almost infinite image combinations that can be produced from just a simple text prompt, with so many language variables. What I am interested in, is how a prompt of say a "monkey riding a bicycle" can produce something that visually represents the prompt. How are the data images tagged and categorized in training to do this? As a creative person we often say that an idea is still misty and is not formed yet. What strikes me about this diffusion process is the similarity in how our minds at a creative level seem to work. We iterate and de-noise the concept until it becomes concrete using a combination of imagination and logic. It is the same process that you described to arrive at the finished formula. What also strikes me about the images produced by these diffusion algorithms is that they look so creative and imaginative. Even artists are shocked when they see them for the first time and realize a machine made them. My line of thinking here is that we use two main tools to acquire and simulate knowledge and experience. They are images and language. Maybe this input is then stored in a similar way as a diffusion model within our memory. Logic, creativity and ideas are just a consequence of reconstituting this data due to our current social or environmental needs. This could explain our thinking process and why our memory is of such low resolution. The de-noising process could also explain many human conditions such as depression and even why we dream etc. This brings up the interesting question " Could a diffusion model be created to simulate a human personality"? Or provide new speed think concepts and formulas for the solving of a multitude of complex problems for that matter. The path would be 1) diffusion model idea/concept 2) ask a GAN like gpt-3 to check if it works 3) feed back to the diffusion model and keep iterating until it does in much the same way as de-noising a picture. Just a thought from a diffusion brain.
great thought
Thank you. You are the only page I've found that makes any attempt at explaining this aspect of generative ai. The starting with random noise, making a change, and comparing to the example db/training part is something all the pissed off artists don't seem to understand.
You are one of the most top notch channels out there. Explanation, clarity, REFERENCING, Graphics, everything is 10 out of 10! Love it!
Extremely good video, not shying away from the math. More like this is needed.
Thank you for creating a video on this topic. I was wandering the web for a brief explanation of it.
By far best explanation of diffusion models
The quality is insane!!!!!! Thanks you very much!
I've watched this video more than once... I learn a little more each time.
Great work Ari !
Thank you for the enormous amount of effort you put into research communication. You're a real Grant Sanderson in the making ;)
The quality of high level ML content on youtube these days is insane. And you're right up there with the best my friend
Fascinating. I enjoyed at 0.5x speed. I don’t really understand the math, but I appreciate going through this explanation. Maybe a version of this that walks through each step in even more detail, or with some visuals that explains the diffusion math with assumption of not knowing the notation etc? Objective being: explaining how diffusion works, and more broadly then, why it works, and even more broadly, what that means for what else might be built with diffusion. For example, using diffusion for generating videos.
I love how you had to watch this at half speed
Explaining the math in-depth to someone new to the subject is like an entire university level course in Bayesian probability and statistics. It's probably a better idea to brush up on the relevant concepts first and then return to the video when ready, as there's already a shortage of high-quality videos on advanced ML that assumes some prerequisite knowledge
@@GS-tk1hk Hi. This is the kind of comment that makes me aware of the current situation surrounding ML and related subjects. Thanks for the information. It appears that there's no easy route for learning about the inner workings of modern/popular ML paradigms; the current set of RUclips content is either hyper-simple (basic info, "hey this is cool" type stuff, GitHub demos, small examples, easy UI/frontends for advanced models, etc) or there's content like this video with 100% in depth information, mathematical data, graphing, explanation into the processes behind the scenes, etc...
I'm a CompSci grad who's worked in software dev for a decade. I love the work I've done in a dozen languages over time and understand most coding paradigms & statistics. However, I never really got into the mathematical notation of my hobbies. If I was writing an algorithmic piece in Python or Java, it would not have any real mathemetical documentation associated with it.
Videos like this make me realise that there's a whole host of computer scientists who genuinely keep the "scientist" denomination alive. I just wished there was more content available for those people who are new to ML but have absolutely zero experience of this level of mathematics outside of college-level algebra.
Agreed I am also a graduate but I don’t understand the maths behind it
High quality video! It helps me a lot to better understand Diffusion Models. Thanks!
Thank you Ari! Your whole channel is gold 🙏
This is an amazing channel, very instructive, structured and easy to understand.
Instant sub and hobe you make more videos over the coming months and years
Best explanation of diffusion model. Thanks!
Very cool. I've been meaning to learn about diffusion models. All I knew was that they 1) are beating GANs recently and 2) do that noise -> to image mapping. Very interesting to see the details here. Exciting stuff.
Also, glad to see you're back making content :)
Thank you for the nice explanation of the diffusion model, though maybe one small question, at 13:43 of the video you mentioned "condition on the full clear context", should it be ''condition on the grey/cropped image''? Since if we already have the full clear image, there should be no need to use the diffusion model anymore.
Or maybe I just misunderstood the problem here.
Explained extremely clear and succinct! I'm sure you've put a lot of effort in making the video. THANKS!
As a physicist, it's fascinating that physics concepts are used in deep learning.
Subscribed. Very good introduction to diffusion model. I am looking forward to more AI introduction from your side. Thank you!
Thanks for the great tutorial! It captures most of the recent papers on diffusion models. Looks like there is a typo in 12:39 in the formula for "classifier-free diffusion guidance": in the right-hand side, there should be an open bracket after the guidance scale "s" and a closed bracket at the end.
thanks! this is now under errata in the description
Thank you a lot for this video, very good introduction to the subject.
You listen, even you can't understand..
It's like in music.. you miss some notes, but it still sounds good :))
Thanks for the informative video. Please continue making such videos
High-quality video! Thank you!
Can you recommend a resource (as in statistics/probability) to get into grips with the intricate mathematical details of the video?
Also curious!
Fantastic explanation!
I'd never in a billion years come up with this.
This is really nice! Thanks so much for sharing! I'll be hopping onto this topic soon too!
Great video. About the GAN vs SD I honestly believe SD will "win", I myself have tried to train GANs never succeeded once as training is a finicky process, on the other hand a barely functional SD model is much more capable and easier to train.
I looked for a video that explains the concept well. And this video did help me understand it better.
For large transformer langauge models, people are 'probing' the layers and activations to try and understand how the model works and what it's laten weights mean. Essentially trying to find signs of intelligence by seeing linguistic concepts.
Now I am wondering about how probing sich a diffusing model could work and if it will be possible to extract and maybe even inject intermediate representations of the many hidden layers. And if it shows any parallels to for example computer rendering, drawing and painting etc.
Love your channel, extremely high quality videos! One of the best I have ever seen.
Nicely explained, thanks
At 3:55 you say that choosing a small beta such that x_t is close to x_{t-1} justifies using a uni-modal Gaussian for the "posterior of the forward process q(x_{t-1} | x_t)". This is the same as saying "for the transition probability of the reverse process p_theta(x_{t-1} | x_t)", right?
I am mostly confused, because the forward process itself is referred to as the posterior in the paper, but you are talking about the posterior OF the forward process, right? Please correct me if I am speaking nonsense. Awesome video, thanks!
Thanks man, wonderful explanation
Thanks for this video Ari. I will admit the math is a little bit beyond me, but I'm slowly understanding various aspects of this process. One thing that I've been trying to wrap my mind around -- is it fair to say that there is some position in latent space that represents the solution from a particular piece of training data? IE. Are there 3 discrete solutions to the guidance of "arctic fox" stored somewhere?
Or, in the conditional concept is "arctic fox" constantly getting pushed/pull on by the inference of more and more "arctic fox" labelled data (therefore never having a true latent space representation of a single image.)
On 9:28 you have q(x_{t-1} | x_t, x_0), isn't it supposed to be another way around q(x_t | x_{t-1}, x_0), since backward process is moving towards x_0, not away from it?
You helped me tremendously with my grant proposal which includes a part of using conditional diffusion for image translation. Thank you so much 😁
I think there is a small typo at 2:45. The q(x_T | x_T-1) = N(0, 1).
Really great content! How did you make the animations?
Thanks. Now I know how to debate anti diffusion luddites.
Great work really, there is a mistake in the equation on the rightmost side at 12:46.
Excellent explanaition, thanks
2:23 What does that capital N function mean? Does it mean we are pulling xt from a gaussian distribution which has a mean as given by 2nd thingy after semi colon and variance given by last thingy before ending bracket?
Excellent. Why perform loss on noise rather than gaussian? Thanks.
Thank you for your explanations. At 02:00, what does the I represent? (The I of Beta t*I)
That's the identity matrix :) So when we multiply it by Beta_t, we get a square matrix with Beta_t at each diagonal element and zeros elsewhere.
@@ariseffai Thanks
Thank you for kind explanations
Best explanation!!
The elb is take from bnns
And i belive the name is kinda missleading.
Its basicly the log of bayses law and we ignore the part that is reliant on the probabilety of the evidance.
Now that part is not effected by the model so it dosent effect derivatives
Fantastic video. Thank you.
Great stuff. Subscribed.
Can you do a video about image denoising using wavelet transforms
This channel is hidden gem
Great Video! It's sad I can press the like button only once for this video.
Thanks!
Thanks Rafi!
At 8:29 I could not get the step when I tried doing this by hand. Any place that can explain me that? I wish I could paste pictures here 😅
At 9.32, the expectation E_q is only on the 3rd term, right?
Amazing
I lost you right after the image of the dog 😂
Am i stupid to still not fully understand this outside of the basic concept of how noise is used? The video is very well produced, and i think its great teaching content. But maybe i lack the mathematical foundation to follow all the explanations here.
My brain is hurting!
great video thank you Ari
Wow your videos are great!
Very nice!
Thanks for the great video, Ari. Do you have any course recommendations to better understand the basics behind Diffusion Models? I would love to learn more about them.
Can someone tell me why in 8:29, p_theta(x_0|x_{1:T})*p_thata(x_{1:T}) = p_theta(x_{0:T}) ?
In case you didnt figure this out; the markov assumption and conditional independence of each step of p_theta means that
p_theta(x_{0:T}) can be decomposed to the multiplication of each step p_theta(x_i|x_i+1:T).
Incredibly insightful explanation here, but mostly useful for those who already know what you mean. I think you'll need to be a bit more careful about how you word things before you'll find a broader audience with ambitious newbie learners. Great work!
great video! Is there a better way to learn diffusion models other than just reading all the linked papers from top to bottom?
Thank you so much for this wonderful explanation! It takes a true skill and dedication to be able to explain complicated formulas with such organic fluency that they became intuitively understandable. You saved me a lot of digging with this video! Do you have Patreon? I wish I could support you somehow!
Thank you for the very generous comment! Glad you enjoyed it. If you'd like, you can buy me a coffee via the "Thanks" button under the video :)
This is good!
Great vídeo!
OK, I don't understand any of those mathematic fomulas... But I want to understand them. Where should I start from?
If you have familiarity with probability and calculus, read up on variational inference and Langevin dynamics. These are the building block concepts of the formulas in this video
Watched twice but the math ist still above me, but fell asleep very nicely 👍 the style from 3blue1one is very appropriate here. Maybe revisit the topic with longer explanations and better derivations and better production quality once you're bigger? Thanks for the hard work you put behind these.
I lack the technical mathematical skills to understand this explanation. What field of maths should I get into in order to understand it?
First probability and calculus, then variational inference and Langevin dynamics
@@DrumsBah thanks
king
Nice tutorial. Would be great, if you could also add the links of the papers mentioned in the tutorial
Does anyone know which are the links to the papers that he is referring to in the tutorial?
its in the description,
I did not yet understand how this enables models to draw something like a cow wrapped in spaghetti with that high fidelity. What component in this process has this world knowledge incorporated of how to creatively combine such concepts and draw it photoreallistically.
I dont understand why do we need a model that predicts just standard normal distribution? Why dont we just use that noise
Very good
Kudos ❤
Excellent video. Could someone please help me understand what is meant by the "posterior of the forward process"? From Google, I see "posterior" means later in time. But the question is posed as q(x_t-1 | x_t) = ? (3:23). In my eyes, this equation is saying "forward process of the sample in a previous step given the next step is unknown". Is this interpretation correct? And if so, how is this task related to finding the posterior? I'm sorry, I've been trying to wrap my head around this for several hours now and have not been able to figure out what a posterior distribution means, and even what is meant by "distribution". Distribution of what? Pixel values? Measurements? What are the x and y-axes shown at 3:23? Please any help on this would be greatly appreciated.
Yeah, it seems that was worded a bit carelessly. The forward distribution is q, which describes how the noise is added in the forward path or direction. But in the context of your question about how this relates to a posterior, it's then the reverse path that is relevant. So it's the posterior on q wrt the reverse path.
And "distribution" is a standard term for probability, as in probability distribution.
To think that i want to be a data analyst its slightly scary watching the math holding the world together
Quite hard to understand, some notations (like I, what is theta, why is there a sum of tuples on theta but theta doesn't appears in x_t and t, how do we define the gaussian function at the beginning) aren't defined
These math are quite different to classical deep learning approaches, so I am a bit lost
uhm, do you have a ELI5 version? XD
It would be nice to explain the math notation a little bit before diving into the formulae.
I didn't get anything, but sounds smart
Can u drop some new video ?
How am I supposed to read or understand the semicolon notation in the normal normal distribution's mean parameter (reverse process section)?
I take this to mean that x_{t-1} is parameterized by the \mu_\theta function that acted on x_t, given knowledge of the time step.
In that case, why isn't the same notation applied to the covariance (i.e. \Sigma_{t-1} ; \gamma_\theta(x_t, t) where \gamma is some function)?
do people not feel these probability symbols are so annoying to read and the actual computation flow is so much more intuitive for people to understand? these simple ideas don't have to be explained in these long, abstract and ugly formula.
Shingle Magic better than RoofMaxx
It's uses diffusion model..... search it up, you would then realize why it can't inspect images
what are you even talking about?
only 5k subscribers is far too little!!!
Problem with these `explanation` videos is, they don't explain WHY WE ADD NOISE TO A COHERENT IMAGE? What is our goal?
And why do we REVERSE IT? What do we achieve by this?
The reverse part is the part we actually want, we're training the AI to get an image out of noise. We want to be able to generate images out of nothing, basically. But we have to start from somewhere, so as part of the training process, we turn actual images gradually into noise and then train the AI on how well it can get it back to that original.
For applications, you can see the AI-generated art that's out there right now, like Dalle 2 and Midjourney as some examples.
the expression "add noise" is only there in order to make the reverse logic easier to compare to. In reality noise is not added by humans or at least not intentionally. Think of an old photo from 1900s in which you can't see people's faces clearly or a security footage where you can't tell who is who due to the lack of light or low quality cameras. Those are noises. To reverse is to make images appear clearer (although not necessarily accurate) when the originals are not. That is the achivement, and something people working in the imagary-related professions have always wanted. This is only the most basic idea tho. Diffusion can achieve much more than just enhancing photos.
Excellent explanation. Email to reach out.
as a beginner i understood nothing. change the title because these are not the basics
I gotta say this video loses me immediately as soon as you get into formulaic notation.
And I am a driver software dev who does math regularly.
I just don't think this video is well explained, unless you are already very good at math. 😮
this vedio is not helpful at all for bigenners
This video is a great summary. I absolutely recomend to read this paper before/after to better understand the math and intuitions behind it. arxiv.org/pdf/2208.11970.pdf