Manifold Mixup: Better Representations by Interpolating Hidden States

Поделиться
HTML-код
  • Опубликовано: 10 янв 2025

Комментарии • 32

  • @vincenzodelzoppo9125
    @vincenzodelzoppo9125 4 года назад +8

    Nice paper about regularization.
    Such an elegant solution to districate manifolds in hidden states. Most of the networks I have seen they basically learn only in the last layers. While the backbone just extracts king of random features.

  • @herp_derpingson
    @herp_derpingson 5 лет назад +17

    So many papers in rapid succession. This guy is on fire!
    \m/

    • @YannicKilcher
      @YannicKilcher  5 лет назад +11

      or I'm just procrastinating on doing the dishes :p

    • @valthorhalldorsson9300
      @valthorhalldorsson9300 4 года назад +5

      It's 9 months later and based on the rate of new videos I'm starting to worry you'll never get around to those dishes

    • @CosmiaNebula
      @CosmiaNebula 4 года назад

      @@valthorhalldorsson9300 sooner than he gets to the dishes, a robot arm would be doing the dishes

    • @lucahugh7209
      @lucahugh7209 3 года назад

      you all prolly dont care at all but does someone know a method to log back into an instagram account..?
      I stupidly forgot the account password. I love any help you can offer me!

    • @aidentroy4892
      @aidentroy4892 3 года назад

      @Luca Hugh Instablaster :)

  • @turbocaveman
    @turbocaveman 2 года назад

    This is so coooool. It’s like saying here’s a cat, here’s a dog, here’s a mix of both.

    • @mucabi
      @mucabi 6 месяцев назад

      It's exactly that. Basically it's the extension of MixUp data augmentation to the whole NN. Each layer has an input and an output and each layer learns individually the best representation. Now we are treating the latent representation of the previous layer (e.g cat, dog) as our input and smooth those accordingly.

  • @frenchmarty7446
    @frenchmarty7446 2 года назад

    I agree with your point that not every layer (especially the lower layers) will or should be linearly separable.
    However I think the objective of manifold mixup is to act as more of a regularization penalty, a given layer should be non-linearly separable only in so far as the benefits (to accuracy) overcome the penalty of mixup. The mixup adds a bias towards linearity but not a strict requirement.
    Like all regularization methods there will probably have to be a lot more fine tuning and testing before we know if, when and how it gives the right bias variance trade-off.

  • @dermitdembrot3091
    @dermitdembrot3091 3 года назад +2

    If the bottleneck layer makes the data linearly separable it may as well just be the last hidden layer. In that case this seems to be a technique for making the last hidden representation not just linearly separable but well-spaced. And I think it would induce the softmax inputs to seek an area where softmax is approximately linear.

  • @yanjieze
    @yanjieze 2 года назад

    Thanks! your paper explanation is really awesome!!!

  • @kevon217
    @kevon217 Год назад

    great explanation. thanks!

  • @dude8309
    @dude8309 5 лет назад +2

    Wow! Super interesting paper and great insights.

  • @levikok1810
    @levikok1810 4 года назад +1

    Great video, thanks a million!

  • @selfhelp119
    @selfhelp119 4 года назад

    amazing technique!

  • @ulissemini5492
    @ulissemini5492 3 года назад +11

    I like the video, but it's at 256 likes right now so I can't disturb the balance, sorry!

    • @DmitryRomanov
      @DmitryRomanov 3 года назад

      Now you can push towards 512 😁

  • @ЗакировМарат-в5щ
    @ЗакировМарат-в5щ 4 года назад

    As I understand this technique is also good for NN prunning

  • @EngineerNick
    @EngineerNick 3 года назад

    Thankyou! :)

  • @rahuldeora5815
    @rahuldeora5815 5 лет назад +2

    Nice

  • @AntonPanchishin
    @AntonPanchishin 5 лет назад +1

    This video is another great Colab candidate. colab.research.google.com/drive/1qUDe3ENm3fnxND7iibyEF1Ixcw7nu4mK . Thanks again Yannic! Your video inspired me to create a colab ipython notebook that tested out this architecture. I love the concept! It was a pain to implement using Tensorflow Keras Layers. It does appear to help. I also decided that instead of just comparing it to a vanilla classifier that we could compare it to the "Worst" classifier from your other video about "Focusing on the Biggest Losers". Have a great weekend

  • @meditationMakesMeCranky
    @meditationMakesMeCranky 5 лет назад +1

    I am not an expert, and I have not read the paper carefully, but this method seems more like a fancy data augmentation method rather than regularization.
    Also, there is something to be said about the spiral example, I personally think that batch norm does a very good job. It is not good enough because we, humans, are biased and we "know" from experience and by guessing the intentions of whomever made the dataset the true representation :)

    • @levikok1810
      @levikok1810 4 года назад +2

      Good point. I would say it's somewhere in between. You sort of create new 'averaged' samples to learn the model to be 'unsure' sometimes and this way the model converges to be more stable representation.

    • @flightrisk7566
      @flightrisk7566 3 года назад

      @@levikok1810 that analogy reminds me of DINO and CutMix

  • @dimitriognibene8945
    @dimitriognibene8945 4 года назад

    So many new hyper parameters...