Adversarial Examples Are Not Bugs, They Are Features

Поделиться
HTML-код
  • Опубликовано: 1 авг 2024
  • Abstract:
    Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. We demonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets. Finally, we present a simple setting where we can rigorously tie the phenomena we observe in practice to a misalignment between the (human-specified) notion of robustness and the inherent geometry of the data.
    Authors: Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry
    arxiv.org/abs/1905.02175
  • НаукаНаука

Комментарии • 12

  • @AndreiMargeloiu
    @AndreiMargeloiu 4 года назад +14

    You have a gift of explaining clearly! Keep up with the excellent work!

  • @anheuser-busch
    @anheuser-busch 4 года назад +7

    Thank you for doing this, I cant tell you enough how much I appreciate these videos!

  • @ramshankarsivakumar3620
    @ramshankarsivakumar3620 4 года назад +2

    Thank you, Yannic! Extremely well explained and enjoyed your correct critique of the paper as well. Look forward to seeing more such content. Kudos! Subscribed to your channel

  • @wolfisraging
    @wolfisraging 5 лет назад +1

    Great explanation. Thanks

  • @lapalacayim
    @lapalacayim 4 года назад +1

    Helps a lot!!! Thx

  • @simleek6766
    @simleek6766 5 лет назад +3

    I wonder if combining image pyramids with transformer networks would make the small features less useful than larger ones, or make them more independent, kind of like the "Processing Megapixel Images" paper.
    In the image pyramid case, larger features would show up somewhere in the most shrunken image, and several larger images, while smaller features would show up at the bottom and would be part of the larger features, but might not always. I think recognizing this way could improve recognizing drawings of cats after only seeing actual cats before.

    • @YannicKilcher
      @YannicKilcher  5 лет назад

      This is a very valid thought. The counterpoint would be that if there is a signal that generalizes well, a good classifier will pick up on it, regardless of how well you "hide" it. I don't know what overweights, I guess it's at least worth a shot.

  • @akhilezai
    @akhilezai 3 года назад +1

    omg thank you very much!

  • @rpcruz
    @rpcruz 5 лет назад +1

    Very interesting paper. But it seems unnecessary to go all the way and create a new robust dataset... Why didn't they simply train a classifier and add a penalty term to the loss function that makes the first layers invariant to small changes of x+dy/dx.

    • @rpcruz
      @rpcruz 5 лет назад +4

      Forget what I said. I see there is work on adversarial training that already does what I suggested. This was more of a theoretical work which is why they decided to modify the images themselves, to show what parts of the image were fooling the classifier.

    • @thinknotclear
      @thinknotclear 3 года назад +3

      @@rpcruz I'm interested in the paper you mentioned. Can you introduce the title of that paper? Thank you!