Super detailed explanation. Human beings need more teachers like this. A small question. 11:47 why the 1/2 disappeared in tr[∑₁(1/2)∑₁⁻¹]. Why we end up with k rather than k/2
Gaussian distributions are supposed to be for continuous random variables? How are we using it for a discrete random variable where it can take K possble states here? Shouldn't we consider an integral here? Please correct me if i am wrong - totally new to machine learning
great video! but wonder why we can't use the same operation to the second terms (1/2 (x - mean2) transpose x inverse of covariance matrix 2 x ..) to get K, same as the first term?
I have some questions like when u were writting the first part at 8:37 then that time, we got a good answer which is K (trace), but I do not understand what is the need of the doing of replacing mew 2 with mew 1 in the second part at 12:30, like we can do the same as the first first part and can get some another K2 and got their difference, so what is the need of that? Thank you for the video.
Earlier we applied trace trick because A.T*X* A is scaler. Here B.T*X* A has same dimension as A.T*X* A . so B.T*X* A is a scalar number also B.T*X*A is. so we can add two scalers.
Yes we can certainly add them because they will have the same dimension i.e. 1x1, a scalar.. It is also worth mentioning that they are indeed equal. You might wanna take some matrices and vectors to try to convince yourself.
From the probability refresher, it is told that sigma of (x*p(x)) is called Expectation but sigma of p(x) is considered as Expectation at timestamp 9:08. Pls help :)
Can't thank you enough! Thank you for explaining in such detail. I had been stuck understanding this proof for hours and it was driving me crazy!
Super detailed explanation. Human beings need more teachers like this. A small question. 11:47 why the 1/2 disappeared in tr[∑₁(1/2)∑₁⁻¹]. Why we end up with k rather than k/2
I have the same question. Did you find an answer to it?
@@yashkhurana9170 not yet
There will the 1/2 term as well. In the final expression of Dkl (after proving), he is taking the 1/2 common. So it has to be K/2
it's a typo he missed it.
@@yashkhurana9170 it's a typo he missed it.
truly excellent, i have studied a lot of papers and lectures but could not remove my confusion , now feel comfortable thanks again
Wow, Amazing tutorial. Thank you for all of your videos on every topic.
I am so happy I found you, I was struggling to get a clear idea of this concept !
Best explanation I have seen so far!
Thank you so much for this. The way you structure it, the examples, is great
Damn, this is awesome. THANKS!!!!
Gaussian distributions are supposed to be for continuous random variables? How are we using it for a discrete random variable where it can take K possble states here? Shouldn't we consider an integral here? Please correct me if i am wrong - totally new to machine learning
simple and straight to the point , thank you
Thank you very much for such a simple explanation.
great video! but wonder why we can't use the same operation to the second terms (1/2 (x - mean2) transpose x inverse of covariance matrix 2 x ..) to get K, same as the first term?
12:12 shouldn't the simplified result for the other expression (containing mu2 and sigma2) also be "k" by symmetry?
I thought the same, but mu2 is q(x)'s mean and the expectation is with respect to p(x); so when you do E[(x-u2)(x-u2)^T] it won't simplify to sigma2.
@@rajinish0 I love you, thanks a lot!
I have some questions like when u were writting the first part at 8:37 then that time, we got a good answer which is K (trace), but I do not understand what is the need of the doing of replacing mew 2 with mew 1 in the second part at 12:30, like we can do the same as the first first part and can get some another K2 and got their difference, so what is the need of that? Thank you for the video.
Excellent tutorial.
Thanks for your tutorial. I have one question, on 13:49, why (B^T)A=(A^T)B. Thanks!
Earlier we applied trace trick because A.T*X* A is scaler. Here B.T*X* A has same dimension as A.T*X* A . so B.T*X* A is a scalar number also B.T*X*A is. so we can add two scalers.
Yes we can certainly add them because they will have the same dimension i.e. 1x1, a scalar.. It is also worth mentioning that they are indeed equal. You might wanna take some matrices and vectors to try to convince yourself.
very clean explanation! Thanks dr.
Great details on WGAN
what is the intuition behind the KL divergence of two distribution is not symetric ?
for me it seems like they should be symmetric
Where we can learn these algebra and prob?
Thank you Sir for the explanation!
Thank you so, so much! This is super helpful!
14:31 - the constant expression should be multiplied by 0.5. Doesn't really matter because it was denoted by beta.
Excellent tutorial
God bless you brother
From the probability refresher, it is told that sigma of (x*p(x)) is called Expectation but sigma of p(x) is considered as Expectation at timestamp 9:08.
Pls help :)
minute 11: what about the 1\2?
I think the result is K/2, but I might be missing something else
Yeah, we see in the results a factor of 1/2 appears again.
E_p(x) = µ is fine, but 16:05 you said that E_p(x) = µ_1. I think it was coming from Q(x) so it should be µ_2 right?
Great lectures, But i think you should have shown derivation of loss through MLE.
This is the error function for the auto-encoder right?
kl divergence is part of the loss function. The other part is reconstruction loss.
this is too good. thank You
great content, but sometimes we can hear your mic scratching against something, 6:48 . Not a big deal!
Bro. What did u study?
Excellent video for showing the KL for 2 gaussians. A bit too fast, but luckily RUclips has 0.75 speed :-)
one of the very best
Amazing tutorial
awesome
💖💖
so cool!
Great explanation but too many adds
ad ad ad