Deep Learning 20: (2) Variational AutoEncoder : Explaining KL (Kullback-Leibler) Divergence

Поделиться
HTML-код
  • Опубликовано: 21 дек 2024

Комментарии • 54

  • @vatsalamolly
    @vatsalamolly 3 года назад +7

    Can't thank you enough! Thank you for explaining in such detail. I had been stuck understanding this proof for hours and it was driving me crazy!

  • @xruan6582
    @xruan6582 4 года назад +13

    Super detailed explanation. Human beings need more teachers like this. A small question. 11:47 why the 1/2 disappeared in tr[∑₁(1/2)∑₁⁻¹]. Why we end up with k rather than k/2

    • @yashkhurana9170
      @yashkhurana9170 4 года назад +1

      I have the same question. Did you find an answer to it?

    • @xruan6582
      @xruan6582 4 года назад +1

      @@yashkhurana9170 not yet

    • @MrSouvikdey
      @MrSouvikdey 4 года назад +1

      There will the 1/2 term as well. In the final expression of Dkl (after proving), he is taking the 1/2 common. So it has to be K/2

    • @moliv8927
      @moliv8927 2 года назад

      it's a typo he missed it.

    • @moliv8927
      @moliv8927 2 года назад

      @@yashkhurana9170 it's a typo he missed it.

  • @oriabnu1
    @oriabnu1 5 лет назад +2

    truly excellent, i have studied a lot of papers and lectures but could not remove my confusion , now feel comfortable thanks again

  • @sabazainab1524
    @sabazainab1524 4 года назад +1

    Wow, Amazing tutorial. Thank you for all of your videos on every topic.

  • @dailyDesi_abhrant
    @dailyDesi_abhrant 4 года назад +1

    I am so happy I found you, I was struggling to get a clear idea of this concept !

  • @yangzhou6314
    @yangzhou6314 2 года назад

    Best explanation I have seen so far!

  • @IgorAherne
    @IgorAherne 2 года назад

    Thank you so much for this. The way you structure it, the examples, is great

  • @VarounsVlogs
    @VarounsVlogs 5 лет назад +7

    Damn, this is awesome. THANKS!!!!

  • @navaneethmahadevan2458
    @navaneethmahadevan2458 5 лет назад +3

    Gaussian distributions are supposed to be for continuous random variables? How are we using it for a discrete random variable where it can take K possble states here? Shouldn't we consider an integral here? Please correct me if i am wrong - totally new to machine learning

  • @omniscienceisdead8837
    @omniscienceisdead8837 2 года назад

    simple and straight to the point , thank you

  • @SP-db6sh
    @SP-db6sh 3 года назад

    Thank you very much for such a simple explanation.

  • @jerrycheung8158
    @jerrycheung8158 3 года назад

    great video! but wonder why we can't use the same operation to the second terms (1/2 (x - mean2) transpose x inverse of covariance matrix 2 x ..) to get K, same as the first term?

  • @Darkev77
    @Darkev77 2 года назад

    12:12 shouldn't the simplified result for the other expression (containing mu2 and sigma2) also be "k" by symmetry?

    • @rajinish0
      @rajinish0 2 года назад +1

      I thought the same, but mu2 is q(x)'s mean and the expectation is with respect to p(x); so when you do E[(x-u2)(x-u2)^T] it won't simplify to sigma2.

    • @Darkev77
      @Darkev77 2 года назад +1

      @@rajinish0 I love you, thanks a lot!

  • @codewithyouml8994
    @codewithyouml8994 3 года назад

    I have some questions like when u were writting the first part at 8:37 then that time, we got a good answer which is K (trace), but I do not understand what is the need of the doing of replacing mew 2 with mew 1 in the second part at 12:30, like we can do the same as the first first part and can get some another K2 and got their difference, so what is the need of that? Thank you for the video.

  • @BiranchiNarayanNayak
    @BiranchiNarayanNayak 6 лет назад +3

    Excellent tutorial.

  • @guofangtt
    @guofangtt 5 лет назад +3

    Thanks for your tutorial. I have one question, on 13:49, why (B^T)A=(A^T)B. Thanks!

    • @NeerajAithani
      @NeerajAithani 5 лет назад

      Earlier we applied trace trick because A.T*X* A is scaler. Here B.T*X* A has same dimension as A.T*X* A . so B.T*X* A is a scalar number also B.T*X*A is. so we can add two scalers.

    • @coolankush100
      @coolankush100 4 года назад

      Yes we can certainly add them because they will have the same dimension i.e. 1x1, a scalar.. It is also worth mentioning that they are indeed equal. You might wanna take some matrices and vectors to try to convince yourself.

  • @rosyluo7710
    @rosyluo7710 4 года назад

    very clean explanation! Thanks dr.

  • @johnbmisquith2198
    @johnbmisquith2198 2 года назад

    Great details on WGAN

  • @husamalsayed8036
    @husamalsayed8036 3 года назад

    what is the intuition behind the KL divergence of two distribution is not symetric ?
    for me it seems like they should be symmetric

  • @abubakarali6399
    @abubakarali6399 3 года назад

    Where we can learn these algebra and prob?

  • @nitinkumarmittal4369
    @nitinkumarmittal4369 3 года назад

    Thank you Sir for the explanation!

  • @nikiamini2768
    @nikiamini2768 3 года назад

    Thank you so, so much! This is super helpful!

  • @kwippo
    @kwippo 4 года назад

    14:31 - the constant expression should be multiplied by 0.5. Doesn't really matter because it was denoted by beta.

  • @bozhaoliu2050
    @bozhaoliu2050 3 года назад

    Excellent tutorial

  • @bekaluderbew19
    @bekaluderbew19 8 месяцев назад

    God bless you brother

  • @songsbyharsha
    @songsbyharsha 4 года назад

    From the probability refresher, it is told that sigma of (x*p(x)) is called Expectation but sigma of p(x) is considered as Expectation at timestamp 9:08.
    Pls help :)

  • @zachwolpe9665
    @zachwolpe9665 5 лет назад +3

    minute 11: what about the 1\2?

    • @arielbernal
      @arielbernal 5 лет назад +2

      I think the result is K/2, but I might be missing something else

    • @darkmythos4457
      @darkmythos4457 5 лет назад +1

      Yeah, we see in the results a factor of 1/2 appears again.

  • @Vighneshbalaji1
    @Vighneshbalaji1 4 года назад +1

    E_p(x) = µ is fine, but 16:05 you said that E_p(x) = µ_1. I think it was coming from Q(x) so it should be µ_2 right?

  • @compilations6358
    @compilations6358 4 года назад

    Great lectures, But i think you should have shown derivation of loss through MLE.

  • @naklecha
    @naklecha 5 лет назад +1

    This is the error function for the auto-encoder right?

    • @jakevikoren
      @jakevikoren 4 года назад

      kl divergence is part of the loss function. The other part is reconstruction loss.

  • @adityaprakash256
    @adityaprakash256 5 лет назад +1

    this is too good. thank You

  • @not_a_human_being
    @not_a_human_being 4 года назад +1

    great content, but sometimes we can hear your mic scratching against something, 6:48 . Not a big deal!

  • @mukundsrinivas8426
    @mukundsrinivas8426 3 года назад

    Bro. What did u study?

  • @RealMcDudu
    @RealMcDudu 4 года назад

    Excellent video for showing the KL for 2 gaussians. A bit too fast, but luckily RUclips has 0.75 speed :-)

  • @bosepukur
    @bosepukur 5 лет назад

    one of the very best

  • @saadanwer2106
    @saadanwer2106 5 лет назад

    Amazing tutorial

  • @apocalypt0723
    @apocalypt0723 4 года назад

    awesome

  • @satyamdubey4110
    @satyamdubey4110 10 месяцев назад

    💖💖

  • @choungyoungjae8271
    @choungyoungjae8271 5 лет назад

    so cool!

  • @chaitanyakalagara3045
    @chaitanyakalagara3045 4 года назад

    Great explanation but too many adds

  • @aniketsinha101
    @aniketsinha101 4 года назад

    ad ad ad