L11.5 Weight Initialization -- Why Do We Care?

Batch Normalization - Part 1: Why BN, Internal Covariate Shift, BN Intro

Transformers (how LLMs work) explained visually | DL5

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

L11.4 Why BatchNorm Works

Sebastian Raschka

Просмотров 2,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 18 дек 2024

Комментарии • 8

@simonvutov7575 Год назад
I keep coming back to these videos because they are so useful!!!! Continue creating these!
@mohammadyahya78 Год назад
Thank you very much. At 8:34, may I know please for the distribution of activations at different layers, what is the x, y and z axes please so that they can plot these 3D diagrams?
@NgocAnhNguyen-si5rq 2 года назад ⁺¹
Wa so intensive work! Thank you! But actually I have waited for you to mention about that BN can avoid exploding/vanishing gradient descent issue. What do you think about this advantage?
@SebastianRaschka 2 года назад
Yes, that's definitely an advantage. Keeping the activations in a reasonable range definitely helps with those two problems.
@xl0xl0xl0 2 года назад ⁺¹
One comment about BN with small batches. Why not track a moving average for the BN statistics over the last few batches (to make the total samples>=64)? Sounds like it would solve the issue.
@SebastianRaschka 2 года назад
That's an interesting idea and would work as well (except of course for the first batch)
@xl0xl0xl0 2 года назад
@@SebastianRaschka I thought more about it, and I think it might not work. In the paper they stress that BN needs to be part of the optimization loop. If we use EMA values, this goes against this idea, because only a small part of the gradient is allowed to flow back, and we might see the kind of conflict between BN and optimization they described. Hope what I'm saying makes sense. Still, I will give it a try and see how it goes.
@simonvutov7575 Год назад
Please share your list of papers, I would love to check them out

Следующие

Автовоспроизведение

L11.5 Weight Initialization -- Why Do We Care?

L11.5 Weight Initialization -- Why Do We Care?

Batch Normalization - Part 1: Why BN, Internal Covariate Shift, BN Intro

Batch Normalization - Part 1: Why BN, Internal Covariate Shift, BN Intro

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

AMAD WORLD CLASS! MAN CITY 1-2 MAN UTD GOLDBRIDGE MATCH REACTION

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

OUR FIRST 24 HOURS HOME WITH A NEWBORN + HER NAME REVEAL!!

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

SIDEMEN AMONG US MAGE ROLE: CAST A LIGHTNING STRIKE TO WIN

Buffalo Bills vs. Detroit Lions Game Highlights | NFL 2024 Season Week 15

Buffalo Bills vs. Detroit Lions Game Highlights | NFL 2024 Season Week 15

USA strikes Russia / Zelensky's statement on negotiations

USA strikes Russia / Zelensky's statement on negotiations

Declining Value of Papers in Academia

Declining Value of Papers in Academia

A Confession From The Man Who Shot JFK | Confessions Of An Assassin | @DocoCentral

A Confession From The Man Who Shot JFK | Confessions Of An Assassin | @DocoCentral

Chess Feels Easy Once You Use This Thinking System

Chess Feels Easy Once You Use This Thinking System

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

Covariance, Clearly Explained!!!

Covariance, Clearly Explained!!!

Conditional Ordinal Regression for Neural Networks (CORN) With Examples in PyTorch

Conditional Ordinal Regression for Neural Networks (CORN) With Examples in PyTorch

Felix "Unfair" | [Stray Kids : SKZ-PLAYER]

Felix "Unfair" | [Stray Kids : SKZ-PLAYER]

От первого лица: Школа 7😡 ОТНОШЕНИЯ с ДВУМЯ 💔 УШЛА из ШКОЛЫ 😱ПОДСТАВА от ДИРЕКТОРА ГЛАЗАМИ ШКОЛЬНИКА

От первого лица: Школа 7😡 ОТНОШЕНИЯ с ДВУМЯ 💔 УШЛА из ШКОЛЫ 😱ПОДСТАВА от ДИРЕКТОРА ГЛАЗАМИ ШКОЛЬНИКА

TALLEST BUILDINGS 🏢 #countryhumans

TALLEST BUILDINGS 🏢 #countryhumans

Кто стоит за убийством генерала в Москве. Водка дорожает. Сын Боярского заменит Хинштейна в Госдуме

Кто стоит за убийством генерала в Москве. Водка дорожает. Сын Боярского заменит Хинштейна в Госдуме

Легкий намек😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец #намек

Легкий намек😂 #ComedyClub #КамедиКлаб #харламов #тнт4 #тнт #демискарибидис #богатство #кравец #намек

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

ЕКАТЕРИНА ШУЛЬМАН: когда закончится февраль, о Трампе и спорах в оппозиции

ЕКАТЕРИНА ШУЛЬМАН: когда закончится февраль, о Трампе и спорах в оппозиции

would you eat this? #shorts

would you eat this? #shorts