Topographic VAEs learn Equivariant Capsules (Machine Learning Research Paper Explained)

Поделиться
HTML-код
  • Опубликовано: 4 ноя 2024

Комментарии • 39

  • @YannicKilcher
    @YannicKilcher  3 года назад +6

    OUTLINE:
    0:00 - Intro
    1:40 - Architecture Overview
    6:30 - Comparison to regular VAEs
    8:35 - Generative Mechanism Formulation
    11:45 - Non-Gaussian Latent Space
    17:30 - Topographic Product of Student-t
    21:15 - Introducing Temporal Coherence
    24:50 - Topographic VAE
    27:50 - Experimental Results
    31:15 - Conclusion & Comments

  • @t_andy_keller
    @t_andy_keller 3 года назад +78

    Thanks for covering our work Yannic! You nailed the description of generative models despite your humility. In terms of the experiment in Figure 3 (28:59), we included this to demonstrate the ability of the Topographic VAE to learn topographically structured latent spaces even without sequences. The figure shows the images from MNIST which maximally activate each neuron laid out in the 2D torus topography you described, and you can see that neurons which are closer to each other in this topography have similar 'maximum activating' images (i.e. they are organized by stroke thickness, slant, digit class). You can see similar figures in the original Topographic ICA papers, so in some sense this experiment was in homage to that prior work 😊. Excited to share the new stuff we are working on soon!

    • @dubhd4r4
      @dubhd4r4 3 года назад +1

      Wow, nice work Andy! Looking forward to digging in more, and congrats on all the good press as well!!

    • @t_andy_keller
      @t_andy_keller 3 года назад

      Thanks Tristan! Shoot me a message, lets catch up soon! Interested in hearing about your recent work too 🙂

    • @WatchAndGame
      @WatchAndGame 3 года назад +1

      Very interesting work! Do you think that it is possible to apply these concepts to the music/audio domain in order to extract useful features/representations?

    • @t_andy_keller
      @t_andy_keller 3 года назад +4

      Thanks @@WatchAndGame. It would certainly be interesting to see what types of transformations could be learned for music/audio, or really any natural time series data. As Yannic mentioned, this paper was mainly focused on artifical sequences we constructed from the MNIST and dSprites datasets, so we knew what transformations we expected to learn (i.e. rotation, scaling, translation). It would be interesting to see what transformations could be learned on natural data, although I expect you would need to increase the size of the network (# of capsules) significantly to account for the increased diversity of transformations present in such data.

    • @ptanisaro
      @ptanisaro 3 года назад

      Interesting work!!! I can see its potential to explain my problem.

  • @bernardoramos9409
    @bernardoramos9409 3 года назад +48

    Suggestion: "Multiplying Matrices Without Multiplying", speed up NNs nearly 100x on CPUs

    • @Mkill3rYT
      @Mkill3rYT 3 года назад +7

      I honestly thought it was a joke.
      Yannic please do a video on this!

    • @bernardoramos9409
      @bernardoramos9409 3 года назад +4

      There is another interesting one:
      "Random sketch learning for deep neural networks in edge computing" (180x speed up)
      As well as MONGOOSE and SLIDE (a bit older).
      They use variations of Locality Sensitive Hashing (LSH) like in the Reformer

  • @volotat
    @volotat 3 года назад +7

    I really want to see how those latent spaces looks like for hi-resolution images. It seems that such system might capture an essence of an image and store it in somewhat meaningful way.

  • @tomw4688
    @tomw4688 2 года назад

    I think the rolling operation you drew at 22:25 is faulty (i.e. clockwise should've been counterclockwise and vice versa). I think that way it'll be consistent with figure 1 at 6:00. That way the U's would be rolled "towards" the center not rolled "further away" from the center. But of course I haven't read the paper and just basing this based on figure 1.

  • @pensiveintrovert4318
    @pensiveintrovert4318 3 года назад +9

    Maps a sequence of images onto a curve in latent space instead of a normal distribution.

  • @nocomments_s
    @nocomments_s 3 года назад +2

    Yay, a new research paper overview!

  • @toto-valentin
    @toto-valentin 3 года назад +1

    it must be liberating to understand things so fluently at this level. I hope i get there one day too 😅😮‍💨.

  • @samsungtelevision695
    @samsungtelevision695 3 года назад +2

    This channel is 🔥
    Thanks for this work

  • @Kram1032
    @Kram1032 3 года назад +2

    This seems really cool to me!
    I wonder how plausible it would be to apply this to GANs or Diffusion Nets. Seems like a lot here could possibly be translated over. Although in diffusion nets it might be kinda weird. Like, would you just transform the noise you apply each step according to the target transformation?

    • @t_andy_keller
      @t_andy_keller 3 года назад +1

      You could definitely apply the Gaussian -> TPoT transformation straight forwardly in a diffusion model, although I think it would make most sense to apply it after the final diffusion step once you have gaussian random variables.

    • @Kram1032
      @Kram1032 3 года назад +2

      @@t_andy_keller yeah that makes sense

  • @neuron8186
    @neuron8186 3 года назад +1

    very informative i love your explanation you literally saved alot of my time

  • @444haluk
    @444haluk 3 года назад

    It looks oddly similar to direction cosines. I wonder if it can recognize perspectives, because if so it can report at what angle a specific pattern is standing right now wrt the past. With deepmind's grid cells, they might even act like vector cells.

  • @herp_derpingson
    @herp_derpingson 3 года назад

    I wish I took Topopography classes in uni to understand the math behind this. I wonder what would happen if we just rolled the latent space tensor n times and calculated L2 loss with nth image forward in the sequence.

  • @dinoscheidt
    @dinoscheidt 3 года назад +2

    uuh… this might be able to give anomaly detection systems more resilience to planned anomalies (like maintenance vs breaking) compared to traditional VAEs 😮

  • @fiNitEarth
    @fiNitEarth 3 года назад +3

    Didn't watch the vid yet but I'm hyped!! :)

  • @davidlearnforus
    @davidlearnforus Год назад

    Thank you!

  • @zubrz
    @zubrz 3 года назад

    missed this video somehow because of the thumbnail; noticed the ml news, though!

  • @JTMoustache
    @JTMoustache 3 года назад

    Pretty neat

  • @galchinsky
    @galchinsky 3 года назад +1

    When you see "capsules", it is always MNIST

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh 3 года назад

    At first I was wondering if it would have applications in video compression, but it was just a big meh.

  • @NoHandleToSpeakOf
    @NoHandleToSpeakOf 3 года назад

    24:22 the crux of it

  • @RobertWeikel
    @RobertWeikel 3 года назад

    if this is a VAE with a forced latent interpretation that is human understandable does that then make this a neuro symbolic VAE?
    Equally if the latent space can be forced, would that then "negate" the purpose of the latent space in the first place? (it at least seems counter productive for conservation of space )

    • @insidedctm
      @insidedctm 2 года назад

      Latent spaces aren't necessarily just for conservation of space. As a first order description they reflect our prior beliefs that the very high-dimensional world (think of the space of all possible 100x100 images) can be represented by lower-dimensional data. Imaging that those lower-dimensional data have structure that we can model would represent a more complex and potentially useful way to introduce prior beliefs about the data.

    • @RobertWeikel
      @RobertWeikel 2 года назад

      @@insidedctm by your logic, is fruitful to start thinking about expansive latent spaces? Would changing a 3d model to 4d arbitrarily introduce quality not previously seen? I would argue possibly, but it is no longer Neuro symbolic.

  • @paxdriver
    @paxdriver 3 года назад +5

    TPoT = teapot... scientists these days, always trying to word play like rappers in their ML papers lol

  • @rishabhsharma6497
    @rishabhsharma6497 3 года назад +1

    Video dekhle vardhan, comments nai...

  • @jonatan01i
    @jonatan01i 3 года назад

    Sorry for spamming, I hope you can answer if you remember:
    a work/paper where humans labeled speech utterances with how the talking person's face expression they think looks like.
    Do you know how I can find it ?