How do we know that p(x|z) is normally distributed?? Do we just assume it? x|z is just a neural network and I dont see any reason for p(x|z) to distribute normally. Actually, the relation between x and z must be deterministic.
Not apparent in the video, but x|z neural network is actually outputting the mean of the distribution of x for that z, which is a gaussian. This means that there can be that a target image that can be generated by multiple z's (thus by multiple means). When computing the loss function, there are two opposing terms, one is the reconstruction error that is minimizing the distance between this mean and the target image, putting the mean generated to the right place, and KL divergence term between the p(z|x) and the standard normal distribution, which is trying to bring the outputs of the means closer for similar images .
Hi @Paul Hand, thank you for the lecture. What is the intuition behind using q(z|x) in the expectation or the expectation at all? I see that it makes sense mathematically, but how would one get the idea? In contrast, there is a derivation of the ELBO via importance sampling and then applying Jensen Inequality or via the optimal sampler.
I don't understand what phi and theta mean. "the parameters of the model", does that mean the weights of the neural network? or the parameters of the distribution, eg if it is gaussian, the parameters correspond to a mu and sigma. I appreciate if anyone can clarify, thank you!
It would be really perfect if someone started giving some examples on each step since we are talking about real things that exist in the world. Each step has its meaning and intention and is made to overcome challenges or obstacles that come up on the way. I want to know what we are doing and what is the purpose. And what is gonna happen if we wouldn't do it this way. I cannot find anything non abstract, I need examples to put my imagination on. It is clear and good only if you have prior knowledge of the things being discussed. Otherwise there are million ways to interpret things and even more to get lost
At 11:00 it seems like if we are talking about pictures the formula written in blue should generate an image with real random noise which doesn't make sense. It should have been done differently like is said in other articles so that random distributions of different images (sets of parameters or pixels) overlap. So that it is not purely random noise which is not we're trying to reach
This is a gem. Finally, someone that is able to give concise teaching well! Thank you!
Best explanation on RUclips. Exactly what I was looking for. Thorough, logical, intuitive.
One of the best explanations on VAE on YT. Thank you and keep up the good work!
Clear, concise and very accurate. Thank you so much for sharing with us this wonderful explanation.
I enjoyed your explanation. I needed something like this video to get a little deeper into the theory of the VAEs. Thank you!
Thank you so much for your lecture. You truly have a talent for teaching!
Training of generative models start here: 6:26
thank you so much. so underrated
Very good presentation. Thanks a lot!
How do we know that p(x|z) is normally distributed?? Do we just assume it?
x|z is just a neural network and I dont see any reason for p(x|z) to distribute normally. Actually, the relation between x and z must be deterministic.
Not apparent in the video, but x|z neural network is actually outputting the mean of the distribution of x for that z, which is a gaussian. This means that there can be that a target image that can be generated by multiple z's (thus by multiple means). When computing the loss function, there are two opposing terms, one is the reconstruction error that is minimizing the distance between this mean and the target image, putting the mean generated to the right place, and KL divergence term between the p(z|x) and the standard normal distribution, which is trying to bring the outputs of the means closer for similar images .
This is clear and awesome
Hi @Paul Hand,
thank you for the lecture.
What is the intuition behind using q(z|x) in the expectation or the expectation at all? I see that it makes sense mathematically, but how would one get the idea?
In contrast, there is a derivation of the ELBO via importance sampling and then applying Jensen Inequality or via the optimal sampler.
wow, this is so well explained.
What is the meaning of Latent Code?
Thank you excellent explanation!!
very nice explanation!
Very nice and comprehensive lecture. Thanks
I don't understand what phi and theta mean. "the parameters of the model", does that mean the weights of the neural network? or the parameters of the distribution, eg if it is gaussian, the parameters correspond to a mu and sigma.
I appreciate if anyone can clarify, thank you!
parameters of the model. we use MLE principles to find the optimal phi and theta
I'm pretty sure phi and theta represent the parameters in terms of weights and biases in the encoder/decoder neural networks.
It would be really perfect if someone started giving some examples on each step since we are talking about real things that exist in the world. Each step has its meaning and intention and is made to overcome challenges or obstacles that come up on the way. I want to know what we are doing and what is the purpose. And what is gonna happen if we wouldn't do it this way. I cannot find anything non abstract, I need examples to put my imagination on. It is clear and good only if you have prior knowledge of the things being discussed. Otherwise there are million ways to interpret things and even more to get lost
At 11:00 it seems like if we are talking about pictures the formula written in blue should generate an image with real random noise which doesn't make sense. It should have been done differently like is said in other articles so that random distributions of different images (sets of parameters or pixels) overlap. So that it is not purely random noise which is not we're trying to reach
24:48, how maximizing vlb will roughly maximize p(x) because, since x is given p(x) should be constant.
p(x) is actually parameterized therefore it's not constant