18. Information Theory of Deep Learning. Naftali Tishby

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Lecture 1: Introduction to Information Theory

Nardwuar vs. Chappell Roan

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

Avengers wake up, Marvel Rivals is fire

Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby

Stanford Online

Просмотров 86 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 2 янв 2025

Комментарии • 29

@krasserkalle 6 лет назад ⁺¹²⁹
This is my personal summary:
00:00:00 History of Deep Learning
00:07:30 "Ingredients" of the Talk
00:12:30 DNN and Information Theory
00:19:00 Information Plane Theorem
00:23:00 First Information Plane Visualization
00:29:00 Mention of Critics of the Method
00:32:00 Rethinking Learning Theory
00:37:00 "Instead of Quantizing the Hypothesis Class, let's Quantize the Input!"
00:43:00 The Information Bottleneck
00:47:30 Second Information Plane Visualization
00:50:00 Graphs for Mean and Variance of the Gradient
00:55:00 Second Mention of Critics of the Method
01:00:00 The Benefit of Hidden Layers
01:05:00 Separation of Labels by Layers (Visualization)
01:09:00 Summary of the Talk
01:12:30 Question about Optimization and Mutual Information
01:16:30 Question about Information Plane Theorem
01:19:30 Question about Number of Hidden Layers
01:22:00 Question about Mini-Batches
@clusteralgebra 5 лет назад
Thank you!
@zhechengxu121 5 лет назад
Bless your soul
@willjennings7191 4 года назад ⁺¹
I have used your personal summary as a template for a section of my personal notes.
Thank you very much!
@paritoshkulkarni6354 2 года назад ⁺¹¹
RIP Naftali!
@FlyingOctopus0 6 лет назад ⁺¹³
I wonder if based on this we can create better training algorithms. Like for example effectiveness of dropout may have a connection to this theory. The dropout may introduce more randomness in "diffusion" stage of training.
@phaZZi6461 5 лет назад ⁺²
1:22:31 - thesis statement about how to choose mini batch size
@alexanderkurz2409 11 месяцев назад
11:30 "information measures are invariant to computational complexity"
@applecom1de509 6 лет назад ⁺²
Aah this is so relaxing.. Thank you!
@alexkai3727 4 года назад ⁺⁶
I read another paper ON THE INFORMATION BOTTLENECK THEORY OF DEEP LEARNING by Harvard's researchers published in 2018, and they hold a very different view. Seems it's still unclear how neural network works.
@Checkedbox 3 года назад ⁺²
Is that the one he mentions at ~ 29:00 ?
@nickybutton2736 4 года назад
Amazing talk, thank you!
@jaimeziratearzate Год назад
does anybody know how to show the part that the gibbs distribution converges to the optimal IB bound?
And what is the epsilon cover of an hypothesis class?
@zessazzenessa1345 6 лет назад ⁺⁷
"Learn to ignore irrelevant labels" yes intriguing..........
@paulcurry8383 3 года назад ⁺¹
Anybody know what a “pattern” is in information theory?
@amirmn7 6 лет назад ⁺¹⁶
Can he use deep learning to fix the audio problems of this video?
@DheerajAeshdj 3 года назад ⁺²
probably not because there are none
@AZTECMAN 3 года назад
Seems like this was asked in jest, but it's actually a good question.
@julianbuchel1087 6 лет назад ⁺²
When was this talk given? Has he published his paper yet? I found nothing online so far, but maybe I just didn't see it.
@Chr0nalis 6 лет назад ⁺¹⁶
1)Deep learning and the Information Bottleneck, 2) Opening the black box of Deep neural networks via Information
@AlexCohnAtNetvision 3 года назад ⁺⁶
such a loss… blessed be his memory
@dexterdev 3 года назад
23:04
@minhtoannguyen1862 3 года назад
44:25
@hanchisun6164 2 года назад ⁺¹
This theory looks correct!
When neural networks became popular, everybody in the scientific computation community eagerly wanted to describe it in their own languages. Many had achieved limited success. I think the information theory one makes the most sense, because it finds simplicity of the information from complexity of data. It is like how human thinks. We create abstract symbols that captures essence of the nature and conduct logical reasoning, which means that the dimension of freedom behind the world should be small since it is structured.
Why did the ML community and industry not adopt this explanation?
@absolute___zero 4 года назад
oooo! so it is SGD ? If I wouldn't listen to the Q&A session I wouldn't understand it all. Now I do. Well, with second order algorithms (like Levenberg Marquard) you won't need all these balls floating to understand what's going on with your neurons. Gradient Descent is poor's man gold.
@binyuwang6563 6 лет назад ⁺⁵
If the theories are true, maybe we can compute the weights directly without iteratively learning them via gradient decsent.
@zessazzenessa1345 6 лет назад
Binyu Wang oh
@prem4708 5 лет назад ⁺¹³
How so?
@Daniel-ih4zh 2 года назад
I've been thinking about this a lot too. The weights are partly function of the data of course, and we also have things like the good regulator theorem that kinda points towards it. Also, a latent code and the parameters learned aren't distinguished in Bayesian model selection.

Следующие

Автовоспроизведение

18. Information Theory of Deep Learning. Naftali Tishby

18. Information Theory of Deep Learning. Naftali Tishby

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Lecture 1: Introduction to Information Theory

Lecture 1: Introduction to Information Theory

Nardwuar vs. Chappell Roan

Nardwuar vs. Chappell Roan

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

Avengers wake up, Marvel Rivals is fire

Avengers wake up, Marvel Rivals is fire

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

Joe Burrow, Zac Taylor HEATED Altercation After Bengals Run Up The Score! Burrow SNAPS At Taylor!

ICLR 2021 Keynote - "Geometric Deep Learning: The Erlangen Programme of ML" - M Bronstein

ICLR 2021 Keynote - "Geometric Deep Learning: The Erlangen Programme of ML" - M Bronstein

From Deep Learning of Disentangled Representations to Higher-level Cognition

From Deep Learning of Disentangled Representations to Higher-level Cognition

001. Information Theory of Deep Learning - Naftali Tishby

001. Information Theory of Deep Learning - Naftali Tishby

Lecture 1 | String Theory and M-Theory

Lecture 1 | String Theory and M-Theory

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

A Short Introduction to Entropy, Cross-Entropy and KL-Divergence

A Short Introduction to Entropy, Cross-Entropy and KL-Divergence

Deep Learning State of the Art (2020)

Deep Learning State of the Art (2020)

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

Naftali Tishby - The Information Bottleneck View of Deep Learning: Why do we need it?

Naftali Tishby - The Information Bottleneck View of Deep Learning: Why do we need it?

Симбочка и Цыпа!🥰 #симбочка #симба

Симбочка и Цыпа!🥰 #симбочка #симба

Simon's reaction when Wenda sus Gray #sprunki #wenda #gray

Simon's reaction when Wenda sus Gray #sprunki #wenda #gray

ВОССТАНАВЛИВАЮ СПРАВЕДЛИВОСТЬ #shorts

ВОССТАНАВЛИВАЮ СПРАВЕДЛИВОСТЬ #shorts

«Чужой: Ромул». Обзор «Красного Циника»

«Чужой: Ромул». Обзор «Красного Циника»

Сделал Электро-Ролики из Шлифовалок! (feat Крастер)

Сделал Электро-Ролики из Шлифовалок! (feat Крастер)

Подарок на Новый год 🎁

Подарок на Новый год 🎁

Александр Зубарев х Артем Дзюба х Леонид Слуцкий | ЧТО БЫЛО ДАЛЬШЕ?

Александр Зубарев х Артем Дзюба х Леонид Слуцкий | ЧТО БЫЛО ДАЛЬШЕ?

НОВОГОДНЕЕ УТРО

НОВОГОДНЕЕ УТРО