Best discussion and presentation. Not sure if I can attend this Paper Club because it looks like company sponsored but definatelity keep posting this videos.
With respect to energy-based models, where we need Langevin Dynamics to sample data from the model (p_theta(x)), which role do the 'empirical' and prior distribution play then? Do we use training data as samples from the prior? And samples from our current model to model our empirical distribution?
Empirical distribution are the training data, it will be a mixture of point masses (look up 'Dirac delta') at the locations of the samples in the sample space. Then match forward and reverse markov chains that go from a p_theta(x, t=0) to a normal distribution at t=T which gives you a nice denoising score-matching objective that can be used to train energy based models (train p_theta(x, t)) or score based models (train grad_{x}(p_theta)(x, t)). This training is done by noising samples from the empirical distribution and predicting the amount of noise added. Inductive bias or regularisation gives inaccurate score after training resulting in that you don't recover empirical distribution after training but something more desirable to practitioners that can generalise and achieve good results on metrics they are interested in, such as FID score.
I was wondering because I saw an explanation that said we need Langevin Dynamics for sampling from the model, such that those samples can then be used in a MCMC estimator for the true likelihood of the model.
it would be really helpful if you could post on the description your pdf files of the papers so we can read your pdf notes and possibly achieve a better understanding of the papers. Thank you. If you cant post them can u send them to me via mail or something?
Best discussion and presentation.
Not sure if I can attend this Paper Club because it looks like company sponsored but definatelity keep posting this videos.
Thanks for your interest! nPlan's paper club is open for all to attend in person in London or online! Just search for nPlan paper club :)
Great explanation! Thanks
Thank you!
With respect to energy-based models, where we need Langevin Dynamics to sample data from the model (p_theta(x)), which role do the 'empirical' and prior distribution play then? Do we use training data as samples from the prior? And samples from our current model to model our empirical distribution?
Empirical distribution are the training data, it will be a mixture of point masses (look up 'Dirac delta') at the locations of the samples in the sample space. Then match forward and reverse markov chains that go from a p_theta(x, t=0) to a normal distribution at t=T which gives you a nice denoising score-matching objective that can be used to train energy based models (train p_theta(x, t)) or score based models (train grad_{x}(p_theta)(x, t)). This training is done by noising samples from the empirical distribution and predicting the amount of noise added. Inductive bias or regularisation gives inaccurate score after training resulting in that you don't recover empirical distribution after training but something more desirable to practitioners that can generalise and achieve good results on metrics they are interested in, such as FID score.
@@benboys_thank you very much!
I was wondering because I saw an explanation that said we need Langevin Dynamics for sampling from the model, such that those samples can then be used in a MCMC estimator for the true likelihood of the model.
Concavity instead of convexity ? Since we try to push samples towards regions of high density (noisy gradient ascent)
Yes, you're right, same thing up to a sign change and people usually refer to convex optimization or log concave sampling (of a probability density)
it would be really helpful if you could post on the description your pdf files of the papers so we can read your pdf notes and possibly achieve a better understanding of the papers. Thank you. If you cant post them can u send them to me via mail or something?
Are you on Twitter, Ben?