MIT 6.S191 (2019): Deep Learning Limitations and New Frontiers

MIT 6.S191: Reinforcement Learning

Reinforcement Learning with sparse rewards

The History of Super Mario’s Hidden Ending

The Most Illegal Baseball Bat Ever Created

I.N "HALLUCINATION" | [Stray Kids : SKZ-PLAYER]

MIT 6.S191 (2019): Deep Reinforcement Learning

Alexander Amini

Просмотров 105 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 27 дек 2024

Комментарии • 38

@kimzauto5045 5 лет назад ⁺²¹
Clearly one of the best videos on the topic, the use of examples was really good.
@MrTommyorgryte Год назад
😂
@mmattb 3 года назад ⁺³
Unusually clear presentation; well done Alexander.
@vinodpareek2268 4 года назад ⁺³
You are one of the best teacher i have ever seen..
@Lezmonify 5 лет назад ⁺²
Is there a typo at 10:01? Intuitively, it seems like the exponent of γ should (i - t) since, in current formulation, the reward terms will quickly go to 0 when t becomes large.
@brycejianchen2795 5 лет назад
Yes, I think the coefficient of r[t] should be gamma^0 which is one here
@shambles7409 3 года назад
in 34:35 how do I calculate the log-likelihood of the action given the state?
@AK47dev1_YT 2 года назад
فوق العاده بود آقای امینی
@Inviaz 5 лет назад ⁺²
What is max Q (s' , a' ) ? When i have a lot of future states and they are unknown , how can I destinate the max Q ( s' , a' ) ? 24:00
@tracev9381 5 лет назад ⁺³
sample your network again with the new state.
@sarthaksg 5 лет назад
Here, s' is the next upcoming state and a' is the next action. Max Q(s',a') would be the max of all the Q values for the next action and state. In that equation, the left term is an estimated "actual" value of the future reward which is the sum of the current reward and the reward of the next best action.
@imadeddineibrahimbekkouch11 5 лет назад ⁺²
If your state space is uncountable or continuous, don't use Q models
@nosachamos 3 года назад ⁺²
Fantastic, very clear and concise. Great work!
@samgears3937 4 года назад ⁺¹
At 36.02 does anyone know what theta is? is it a policy?
@AAmini 4 года назад ⁺¹
Theta represents all of the weights of the neural network policy (pi), which is a network that takes as input the state (s_t) and outputs the likelihood of taking each action (a_t).
@mehwishqazi4381 5 лет назад
Very well explained. How to get the slides? The link in the bio mentions coming soon!
@r00t67 5 лет назад
Very good lecture. Just one moment, i not unrestand hot it policy createng (maby Alexander show it by laser stick, but it not showing in slides)
@romesh58 3 года назад ⁺¹
Great video. The whole series is very good
@ahmarhussain8720 2 года назад
very good way of explaining
@davidsasu8251 2 года назад ⁺¹
I love you guys!
@ycnim34 5 лет назад ⁺²
Thank you all for these great videos. One thing I want to mention is that the audio volume is a little bit too low
@scottterry2606 3 года назад ⁺¹
Outstanding. Thank you.
@hhumar987 4 года назад ⁺²
can you also teach how to write code for it?
@hullopes 4 года назад ⁺¹
It was very clear and helpful.
@SHUBHAMKUMAR-xe4is 5 лет назад ⁺²
Amazing video... Kind of Reinforcement Learning in a nutshell..
@vincentkaruri2393 4 года назад ⁺¹
This is really good. Thank you!
@malekbaba7672 5 лет назад ⁺²
Thank you so much guys.
@niazmorshedulhaque4519 4 года назад
Excellent tutorial indeed
@hanimahdi7244 3 года назад ⁺¹
Thank you!
@harrypotter1155 5 лет назад ⁺³
What a really nice course!
@chicagogirl9862 5 лет назад ⁺¹
Good course, thankyou
@SphereofTime 9 месяцев назад
7:00
@waqasaps 3 года назад
wow, thanks.
@sitrakaforler8696 Год назад
Dude it's awesome T^T
@muhammadnajamulislam2823 5 лет назад ⁺²
please increase sound level
@davidj1395 3 года назад ⁺¹
WHAT?!
@canelbuino7087 3 года назад
This is why terminator is so fake... The AI will learn not to miss a shot within 20mins.

Следующие

Автовоспроизведение

MIT 6.S191 (2019): Deep Learning Limitations and New Frontiers

MIT 6.S191 (2019): Deep Learning Limitations and New Frontiers

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

Reinforcement Learning with sparse rewards

Reinforcement Learning with sparse rewards

The History of Super Mario’s Hidden Ending

The History of Super Mario’s Hidden Ending

The Most Illegal Baseball Bat Ever Created

The Most Illegal Baseball Bat Ever Created

I.N "HALLUCINATION" | [Stray Kids : SKZ-PLAYER]

I.N "HALLUCINATION" | [Stray Kids : SKZ-PLAYER]

where i have been.

where i have been.

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

Harvard Professor Explains Algorithms in 5 Levels of Difficulty | WIRED

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Introduction to Poker Theory

Introduction to Poker Theory

MIT 6.S191 (2020): Reinforcement Learning

MIT 6.S191 (2020): Reinforcement Learning

MIT 6.S191 (2021): Reinforcement Learning

MIT 6.S191 (2021): Reinforcement Learning

Computer Scientist Explains Machine Learning in 5 Levels of Difficulty | WIRED

Computer Scientist Explains Machine Learning in 5 Levels of Difficulty | WIRED

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Reinforcement Learning: Machine Learning Meets Control Theory

Reinforcement Learning: Machine Learning Meets Control Theory

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

Yann LeCun: Self-Supervised Learning Explained | Lex Fridman Podcast Clips

Авиакатастрофа в Актау: что известно о погибших казахстанцах

Авиакатастрофа в Актау: что известно о погибших казахстанцах

Разговоры возле ёлки. Всё, что мы ещё здесь не обсуждали!

Разговоры возле ёлки. Всё, что мы ещё здесь не обсуждали!

Kitsune Dreams Skins | Anime Collection in Standoff 2 (0.32.0)

Kitsune Dreams Skins | Anime Collection in Standoff 2 (0.32.0)

ПРИОРА-ТИФФАНИ с Юрой Волковым - финал «ТАЧКИ на ПРОКАЧКУ»

ПРИОРА-ТИФФАНИ с Юрой Волковым - финал «ТАЧКИ на ПРОКАЧКУ»

Появилось видео с выжившей после авиакатастрофы в Актау стюардессой

Появилось видео с выжившей после авиакатастрофы в Актау стюардессой

BadComedian - Ненависть к VK, Блокировка YouTube, Что дальше? | Жубанион

BadComedian – Ненависть к VK, Блокировка YouTube, Что дальше? | Жубанион

Моя коллекция лапкеров

Моя коллекция лапкеров

АВИАТРАГЕДИЯ близ Актау: Почему разбился рейс Баку - Грозный? - ГИПЕРБОРЕЙ. Спецвыпуск

АВИАТРАГЕДИЯ близ Актау: Почему разбился рейс Баку – Грозный? - ГИПЕРБОРЕЙ. Спецвыпуск