Diffusion and Score-Based Generative Models

Поделиться
HTML-код
  • Опубликовано: 16 янв 2025

Комментарии • 57

  • @rahilmahdian2141
    @rahilmahdian2141 17 часов назад

    This is one of the most principled, logically organized, and reasonablly neat explanations I have ever watched on score-based and diffusion models. Amazing Song.

  • @moaidali874
    @moaidali874 Год назад +24

    This one of the best presentations I have ever attended

  • @Blueshockful
    @Blueshockful Год назад +46

    recommended to anyone who wants to understand beyond the mere "noising/denoising" type explanations on diffusion models

  • @StratosFair
    @StratosFair Год назад +26

    Thank you guys for making this talk available on your RUclips channel. This is pure gold

  • @opinions8731
    @opinions8731 Год назад +44

    What an amazing explanation. Wish there was an AI/authors explaining their papers so clearly.

  • @ishaanrawal10
    @ishaanrawal10 16 дней назад

    This probably has to be the best explanation of diffusion models out there.Thank you!

  • @mamatoshgupta
    @mamatoshgupta Месяц назад +1

    The presentation kept me interested throughout. The simplicity and effectiveness of presentation blew my mind. Song is a genius.. Long way to go!

  • @김인수-z2p
    @김인수-z2p Год назад +4

    9:52 intractable to compute integral of exponential of neural networks
    12:00 desiderata of deep generative models
    19:00 the goal is to minimize fisher divergence between
    abla_x log(p(x_data)) and score function s(x), we don't know ground truth log(p(x_data)) but score matching is equivalent to fisher divergence up to the constant, thus same in the optimization perspective.
    23:00 however, score matching is not scalable, greatly due to the Jacovian term. the term requires many times of backpropagation computations. Thus before computing fisher divergence, project each terms with vector v to make the Jacobian disappear, and thus become more scalable, this is called sliced score matching.
    29:00 denoised score matching. The objective is tractable because we design the perturbation kernel by hand(the kernel is easily computable). However because of added noise, the denoised score matching can't estimate noise free distributions. Also the variance of denoising score matching objective becomes bigger and bigger eventually explodes when the smaller the magnitude of the noise.
    31:20 in case of Gaussian perturbation kernel, denoising score matching problem takes more simpler form. Optimize the objective with stochastic gradient descent. Be careful to choose appropriate magnitude of sigma.
    36:00 sampling from langevin dynamics, initialize x0 from simple(gaussian, uniform) distribution and z from N(0, 1), and then repeat the procedure.
    37:20 naive version of langevin dynamics sampling not working well in practice because of the low density region

  • @jacksonyan7346
    @jacksonyan7346 Год назад +3

    It really shows how good the explanation is when even I can follow along. Thanks for sharing!

  • @simong1666
    @simong1666 8 месяцев назад

    This is the best summarizing resource I have found on the topic. The visual aids are really helpful and the nature of the problem and series of steps leading to improved models, along with the sequence of logic are so clear. What an inspiration!

  • @AjayTalati
    @AjayTalati Месяц назад

    Wow,..., one of the best presentations on generative modelling !!!!

  • @binjianxin7830
    @binjianxin7830 2 года назад +6

    What a pleasant insight to think of gradients of the logits as score function! Thank you for sharing the great idea.

  • @CohenSu
    @CohenSu Год назад +7

    16:54 all papers referenced... This man is amazing

  • @coolguy69235
    @coolguy69235 10 месяцев назад

    khatarnaak aadmi hain !!
    Very good explanation !!!
    sab ka saath, sab ka vikaas

  • @LifeCodeGame
    @LifeCodeGame 2 года назад +14

    Amazing insights into generative models! Thanks for sharing this valuable knowledge!

  • @johnini
    @johnini 8 месяцев назад +1

    I am 44 second in the talk and already wanna say thank you! :)

  • @xuezhixie1640
    @xuezhixie1640 Год назад +1

    Really amazing explanation for the entire diffusion model. Clear, great, wonderful work.

  • @ck1847
    @ck1847 Год назад +4

    Extremely insightful lecture that is worth every minute of it. Thanks for sharing it.

  • @vineetgundecha7872
    @vineetgundecha7872 9 дней назад

    Amazing how useful just adding noise is!

  • @peterjackson4530
    @peterjackson4530 3 месяца назад

    I like your presentation, in the end, we appreciate the interpretation and intuitions behind. So that, we can use them to solve other problems.

  • @xiaotongxu1528
    @xiaotongxu1528 Год назад +1

    Very clear! Thanks for this amazing lecture!

  • @YashBhalgat
    @YashBhalgat Год назад +3

    46:30 was a true mic-drop moment from Yang Song 😄

  • @peterpan1874
    @peterpan1874 Год назад

    a very accessible and amazing tutorial that explained everything clearly and thoroughly!

  • @uxirofkgor
    @uxirofkgor 2 года назад +12

    damn straight from yang song...

  • @mm_Tesla_mm
    @mm_Tesla_mm Год назад +1

    I love this talk! amazing and clear explanation!

  • @hasesukkt
    @hasesukkt Год назад

    Amazing explation for me to understand the diffusion model!

  • @chartingwithliv
    @chartingwithliv Год назад +3

    Crazy this is available for free!! ty

  • @WensongSong-r8z
    @WensongSong-r8z Год назад +1

    Oh, very clear explanation! Would it be possible to share this slide?

  • @twistedsector
    @twistedsector 11 месяцев назад +1

    actually such a fire talk

  • @lucamarradi3066
    @lucamarradi3066 3 месяца назад

    Thanks for the talk very illuminating.

  • @buiquangminh3378
    @buiquangminh3378 6 месяцев назад

    Amazing explaination. Could you share the slide for the presentation?

  • @dy8576
    @dy8576 Год назад

    Such an incredible talk, i was just curious about how everyone here keeps track of all this knowledge, would love to hear from you all

  • @tomjiang6831
    @tomjiang6831 9 месяцев назад

    literally the best explaination!!!

  • @victorli6999
    @victorli6999 5 месяцев назад

    amazing presentation!

  • @해위잉
    @해위잉 8 месяцев назад

    Such a great explanation

  • @dyfgluv2182
    @dyfgluv2182 3 месяца назад

    great great stuff from absolute expert

  • @MrDanielnis123
    @MrDanielnis123 5 месяцев назад

    Great video , thank you !

  • @huseyintemiz5249
    @huseyintemiz5249 Год назад +1

    Amazing tutorial

  • @harshdeepsingh3872
    @harshdeepsingh3872 3 месяца назад

    MindBlowing

  • @RoboticusMusic
    @RoboticusMusic Год назад +1

    What is now utilizing this, is it still SoTA? Did this improve OpenAI, MJ, Stability, etc.? It sounds promising but I need updated information.

  • @JuliusSmith
    @JuliusSmith 10 месяцев назад

    Excellent overview of excellent work, thank you! I am worried about simplified CT scans, however. Wouldn't we get bias toward priors when we're looking instead for anomalies? There needs to be a way to detect all abnormalities with 100% reliability while still reducing radiation. Is this possible?

  • @gaspell
    @gaspell Год назад

    Thank you!

  • @adrienforbu5165
    @adrienforbu5165 Год назад

    Thank you so much song

  • @pengjunlu
    @pengjunlu 8 месяцев назад

    why solving "maximize likelyhood" problem is equal to solve "explicit score matching " problem? For example, once you get S(x, theta), you do get corresponding p(x); but is it the same P(x; theta) where theta maximize likelyhood?

  • @alenqquin4509
    @alenqquin4509 2 месяца назад

    nice video!

  • @yuxinzhang4228
    @yuxinzhang4228 Год назад

    Why annealed langevin dynamics from the highest noise level to the lowest noise level instead of langevin dynamics sampling directly from the score model with the lowest noise level?

    • @DrumsBah
      @DrumsBah Год назад

      You use the perturbed noise to traverse and converge to high density areas via Langevin dynamics. Due to the manifold hypothesis large areas of the data space have no density and thus no gradient for langevin to traverse. The large noise is traverse these spaces. Once closer to the higher density, the noise schedule can be decreased and the process repeats

  • @hoang_minh_thanh
    @hoang_minh_thanh Год назад

    This is amazing!

  • @mariuswuyo8742
    @mariuswuyo8742 10 месяцев назад

    I have a question about Part "prpo. evaluation": does it mean to use ODE to calculate the likelihood (P_theta(x_0)), but how to input the original data x_0 to the diffusion model?

  • @beluga.314
    @beluga.314 Год назад

    When we can obatin equivalence between DDPM(training network to obtain noise) and score based training in DDPM, then shouldn't both give same kind of results?

  • @timandersen8030
    @timandersen8030 10 месяцев назад

    How to be good in math like Dr. Song?

  • @jimmylovesyouall
    @jimmylovesyouall Год назад +1

    stanford还是牛逼,谢谢。

  • @johnini
    @johnini 8 месяцев назад +1

    Now I need some GPUs

  • @julienblanchon6082
    @julienblanchon6082 Год назад

    Wow that was cristal clear

  • @박소영-c6w
    @박소영-c6w 8 месяцев назад

    27:44

  • @clairewang8370
    @clairewang8370 7 месяцев назад

  • @贝拉从小就爱吃选购
    @贝拉从小就爱吃选购 3 месяца назад

    太牛逼了