Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby

Поделиться
HTML-код
  • Опубликовано: 2 янв 2025

Комментарии • 29

  • @krasserkalle
    @krasserkalle 6 лет назад +129

    This is my personal summary:
    00:00:00 History of Deep Learning
    00:07:30 "Ingredients" of the Talk
    00:12:30 DNN and Information Theory
    00:19:00 Information Plane Theorem
    00:23:00 First Information Plane Visualization
    00:29:00 Mention of Critics of the Method
    00:32:00 Rethinking Learning Theory
    00:37:00 "Instead of Quantizing the Hypothesis Class, let's Quantize the Input!"
    00:43:00 The Information Bottleneck
    00:47:30 Second Information Plane Visualization
    00:50:00 Graphs for Mean and Variance of the Gradient
    00:55:00 Second Mention of Critics of the Method
    01:00:00 The Benefit of Hidden Layers
    01:05:00 Separation of Labels by Layers (Visualization)
    01:09:00 Summary of the Talk
    01:12:30 Question about Optimization and Mutual Information
    01:16:30 Question about Information Plane Theorem
    01:19:30 Question about Number of Hidden Layers
    01:22:00 Question about Mini-Batches

    • @clusteralgebra
      @clusteralgebra 5 лет назад

      Thank you!

    • @zhechengxu121
      @zhechengxu121 5 лет назад

      Bless your soul

    • @willjennings7191
      @willjennings7191 4 года назад +1

      I have used your personal summary as a template for a section of my personal notes.
      Thank you very much!

  • @paritoshkulkarni6354
    @paritoshkulkarni6354 2 года назад +11

    RIP Naftali!

  • @FlyingOctopus0
    @FlyingOctopus0 6 лет назад +13

    I wonder if based on this we can create better training algorithms. Like for example effectiveness of dropout may have a connection to this theory. The dropout may introduce more randomness in "diffusion" stage of training.

  • @phaZZi6461
    @phaZZi6461 5 лет назад +2

    1:22:31 - thesis statement about how to choose mini batch size

  • @alexanderkurz2409
    @alexanderkurz2409 11 месяцев назад

    11:30 "information measures are invariant to computational complexity"

  • @applecom1de509
    @applecom1de509 6 лет назад +2

    Aah this is so relaxing.. Thank you!

  • @alexkai3727
    @alexkai3727 4 года назад +6

    I read another paper ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING by Harvard's researchers published in 2018, and they hold a very different view. Seems it's still unclear how neural network works.

    • @Checkedbox
      @Checkedbox 3 года назад +2

      Is that the one he mentions at ~ 29:00 ?

  • @nickybutton2736
    @nickybutton2736 4 года назад

    Amazing talk, thank you!

  • @jaimeziratearzate
    @jaimeziratearzate Год назад

    does anybody know how to show the part that the gibbs distribution converges to the optimal IB bound?
    And what is the epsilon cover of an hypothesis class?

  • @zessazzenessa1345
    @zessazzenessa1345 6 лет назад +7

    "Learn to ignore irrelevant labels" yes intriguing..........

  • @paulcurry8383
    @paulcurry8383 3 года назад +1

    Anybody know what a “pattern” is in information theory?

  • @amirmn7
    @amirmn7 6 лет назад +16

    Can he use deep learning to fix the audio problems of this video?

    • @DheerajAeshdj
      @DheerajAeshdj 3 года назад +2

      probably not because there are none

    • @AZTECMAN
      @AZTECMAN 3 года назад

      Seems like this was asked in jest, but it's actually a good question.

  • @julianbuchel1087
    @julianbuchel1087 6 лет назад +2

    When was this talk given? Has he published his paper yet? I found nothing online so far, but maybe I just didn't see it.

    • @Chr0nalis
      @Chr0nalis 6 лет назад +16

      1)Deep learning and the Information Bottleneck, 2) Opening the black box of Deep neural networks via Information

  • @AlexCohnAtNetvision
    @AlexCohnAtNetvision 3 года назад +6

    such a loss… blessed be his memory

  • @dexterdev
    @dexterdev 3 года назад

    23:04

  • @minhtoannguyen1862
    @minhtoannguyen1862 3 года назад

    44:25

  • @hanchisun6164
    @hanchisun6164 2 года назад +1

    This theory looks correct!
    When neural networks became popular, everybody in the scientific computation community eagerly wanted to describe it in their own languages. Many had achieved limited success. I think the information theory one makes the most sense, because it finds simplicity of the information from complexity of data. It is like how human thinks. We create abstract symbols that captures essence of the nature and conduct logical reasoning, which means that the dimension of freedom behind the world should be small since it is structured.
    Why did the ML community and industry not adopt this explanation?

  • @absolute___zero
    @absolute___zero 4 года назад

    oooo! so it is SGD ? If I wouldn't listen to the Q&A session I wouldn't understand it all. Now I do. Well, with second order algorithms (like Levenberg Marquard) you won't need all these balls floating to understand what's going on with your neurons. Gradient Descent is poor's man gold.

  • @binyuwang6563
    @binyuwang6563 6 лет назад +5

    If the theories are true, maybe we can compute the weights directly without iteratively learning them via gradient decsent.

    • @zessazzenessa1345
      @zessazzenessa1345 6 лет назад

      Binyu Wang oh

    • @prem4708
      @prem4708 5 лет назад +13

      How so?

    • @Daniel-ih4zh
      @Daniel-ih4zh 2 года назад

      I've been thinking about this a lot too. The weights are partly function of the data of course, and we also have things like the good regulator theorem that kinda points towards it. Also, a latent code and the parameters learned aren't distinguished in Bayesian model selection.