Tom Goldstein: "An empirical look at generalization in neural nets"

Поделиться
HTML-код
  • Опубликовано: 17 ноя 2024

Комментарии • 17

  • @mohammedrayyan1007
    @mohammedrayyan1007 3 года назад +11

    Absolute gold for developing intuition. Many thanks professor.

  • @kellymoses8566
    @kellymoses8566 3 месяца назад

    When you explained the mystery of how Neural Networks generalize so well I really wanted to know the answer.

  • @ankitkumarpandey7262
    @ankitkumarpandey7262 6 месяцев назад

    Absolute gem of a video!!

  • @adokoka
    @adokoka 3 года назад +1

    Great presentation! Refreshing to go through a math concept without maths :)

  • @andre192818
    @andre192818 3 года назад +3

    Great talk nice slides, thanks! I am wondering, if it is so difficult to find narrow, sharp minima, why are SVMs finding them?

    • @Kaslor1000
      @Kaslor1000 3 года назад +1

      I think that simple models like SVMs without non-linear kernels just lack the power or expressiveness to generate loss landscapes that even contain large margin flat minima, since they simply can't linearly separate data. In some sense all minima in SVMs are equally bad?

  • @fedorsmirnov439
    @fedorsmirnov439 3 года назад

    Thanks a lot for the awesome talk! I really wish more talks in the are of deep learning would prioritize intuition and general understanding over the mathematical proof.

  • @HamsterSword
    @HamsterSword 3 года назад

    Absolutely fantastic video, thank you!

  • @mihirmongia7787
    @mihirmongia7787 2 года назад

    amazing talk

  • @Kaslor1000
    @Kaslor1000 3 года назад +1

    Fantastic talk.

  • @denysivanov3364
    @denysivanov3364 3 года назад

    Really good lecture, thanks!

  • @TheKirillfish
    @TheKirillfish 3 года назад +1

    Very interesting! So this kinda explains why CNNs tend to overfit on textures or otherwise seek simpler visual features: it leads to wider margin and flatter minima.
    Also, seems that for an example at 49:30 we need either more data for inner red circle (and naturally get wider margins for the desirable minimum) or stronger priors. Speaking of the latter case, for example, "circular pattern" is much simpler geometrically than cherry-picking formation with arcs which was found by the model + we already have 2 circular formations on the image. Can we integrate these sorts of priors to the loss function via attention mechanism or some other way? Can transformers do that?

    • @Kaslor1000
      @Kaslor1000 3 года назад

      I guess it would be nice if NNs could somehow "zoom" into the dateset in this case, to artificially make linear separability with a wide margin easier (more likely). No way how to implement that though.

  • @arturodeza3816
    @arturodeza3816 3 года назад

    Spectacular Talk!

  • @Kaslor1000
    @Kaslor1000 3 года назад

    So, we know wide margin minima are good and that they are easy to find when they exist but I guess the question remains, why do wide margin flat minima exist in the first place? My bet would be that current networks tend to contain at least a few wide layers, and wide layers produce outputs which are of high dimension and we know linear separability is easier in higher dimensions. Also, I think that the deeper a network is, the more likely it is that data becomes easily separable at some layer (and therefore a wider margin minimum can exist), since layers near the end tend to represent higher-level features.

  • @zwrogoli666
    @zwrogoli666 3 года назад

    Great talk! Thanks for sharing

  • @sagemaninsky3979
    @sagemaninsky3979 3 года назад

    Extremely useful