Lecture 8 | Batch Normalization, Dropout and other Regularization methods

MIT Introduction to Deep Learning | 6.S191

Gaussian Processes

ANDREW GARFIELD | CHICKEN SHOP DATE

Los Dos Tamaulipas - Con Toda la Fe ( Video Oficial)

Coach Mike Tomlin Postgame Press Conference (Week 7 vs Jets) | Pittsburgh Steelers

Lecture 7 | Acceleration, Regularization, and Normalization

Carnegie Mellon University Deep Learning

Просмотров 4,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 20 окт 2024

Комментарии • 4

@rashidkp123 4 года назад ⁺¹
Fantastic lecture, particularly i found the last 5 minute bind blowing. Thanks Professor.
@ronmedina429 4 года назад
Prof Bhiksha, in the slides, it is said that mini-batch gradient descent results to a degradation of \sqrt{B}. I'm not sure if this is correct. I think it should be calculated as O(1/sqrt{Bk}) = O(1/sqrt{B(t/B)}) where B is the batch size and t is the number of iterations. Then O(1/sqrt{t}) decay rate per iteration same as SGD.
This can also be seen by calculating the total updates based on epsilon. O(1/B epsilon^2) for minibatch to get epsilon convergence. But then each update costs B iterations. Thus O(1/epsilon^2) iterations to get epsilon-convergence same as SGD. Please correct me if there is a misunderstanding on my part. Thank you.
@RedShipsofSpainAgain 5 лет назад
14:01 We don't know, a priori, the *true* curve (blue line). How is this particular point chosen here (the yellow bar)? One would think intuitively that not all sample points are equally worthwhile; the points that have the most error (distance between predicted and actual curve) could also improve the curve the most if they're adjusted. In contrast, the points that are already pretty darn close to their true value aren't "worth" as much to correct/improve.
So is there a way to know which of the sample points are going to improve the curve most? Because it would make sense to focus on those points first. Maybe this is like Bayesian optimization where you choose an x point that is furthest from two consecutive known points because that chosen point will give the most information gain.
@carnegiemellonuniversityde4339 5 лет назад ⁺²
You may want to read more into the idea of statistical leverage and what is considered a high-leverage point. This would tell you, of the points sampled, which have the most effect on the change in the outcome of the model. As a follow-up, you may then want to better understand, first in the case of linear regression, how the projection matrix relates the leverage to the model errors. This may help give perspective.
That leads to the following claim, which is that a neural network with linear activations has its neurons attempting to solve for this projection matrix between dimensions. Though this depends on your loss function, convincing yourself of this will help build your intuition of what is happening.

Следующие

Автовоспроизведение

Lecture 8 | Batch Normalization, Dropout and other Regularization methods

Lecture 8 | Batch Normalization, Dropout and other Regularization methods

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Gaussian Processes

Gaussian Processes

ANDREW GARFIELD | CHICKEN SHOP DATE

ANDREW GARFIELD | CHICKEN SHOP DATE

Los Dos Tamaulipas - Con Toda la Fe ( Video Oficial)

Los Dos Tamaulipas - Con Toda la Fe ( Video Oficial)

Coach Mike Tomlin Postgame Press Conference (Week 7 vs Jets) | Pittsburgh Steelers

Coach Mike Tomlin Postgame Press Conference (Week 7 vs Jets) | Pittsburgh Steelers

Peter Dinklage Breathes Fire While Eating Spicy Wings | Hot Ones

Peter Dinklage Breathes Fire While Eating Spicy Wings | Hot Ones

Street Fighting Mathematics || @ CMU || Lecture 1c of TCS Toolkit

Street Fighting Mathematics || @ CMU || Lecture 1c of TCS Toolkit

Lecture 1 | The Perceptron - History, Discovery, and Theory

Lecture 1 | The Perceptron - History, Discovery, and Theory

17 Ходов ПЕШКАМИ Подряд!В Психбольнице ему ЗАПРЕТИЛИ Шахматы. Бессмертная Партия Пешек

17 Ходов ПЕШКАМИ Подряд!В Психбольнице ему ЗАПРЕТИЛИ Шахматы. Бессмертная Партия Пешек

How to do CS Theory || @ CMU || Lecture 1b of CS Theory Toolkit

How to do CS Theory || @ CMU || Lecture 1b of CS Theory Toolkit

Lecture 15: Radiometry (CMU 15-462/662)

Lecture 15: Radiometry (CMU 15-462/662)

Gregory Chaitin Lecture Carnegie-Mellon University 2000 Pt 2

Gregory Chaitin Lecture Carnegie-Mellon University 2000 Pt 2

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

The Reparameterization Trick

The Reparameterization Trick

Lecture 3 | Learning, Empirical Risk Minimization, and Optimization

Lecture 3 | Learning, Empirical Risk Minimization, and Optimization

#开学心愿季#搞笑视频#农村搞笑#正能量短剧#乡村故事#家庭趣事#欢乐日常

#开学心愿季#搞笑视频#农村搞笑#正能量短剧#乡村故事#家庭趣事#欢乐日常

ОН У ТЕБЯ ЗА ДВЕРЬЮ!

ОН У ТЕБЯ ЗА ДВЕРЬЮ!

Human vs Jet Engine

Human vs Jet Engine

Amy help Shin Sonic Tapes rank up #trend #shinsonic #animation

Amy help Shin Sonic Tapes rank up #trend #shinsonic #animation

АМЕРИКАНЕЦ ОТЖИГАЕТ ПОД РУССКУЮ ПЕСНЮ 🔥 #реакцияиностранцев #чатрулетка #трибелыхконя

АМЕРИКАНЕЦ ОТЖИГАЕТ ПОД РУССКУЮ ПЕСНЮ 🔥 #реакцияиностранцев #чатрулетка #трибелыхконя

Millennial (1972-1995) or GEN-Z (1996-2012)? 💗 Subscribe for #fashion #shorts

Millennial (1972-1995) or GEN-Z (1996-2012)? 💗 Subscribe for #fashion #shorts

Самая простая томлёная говядина #shorts

Самая простая томлёная говядина #shorts

skibidi toilet multiverse 042

skibidi toilet multiverse 042