Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Searching the Jungle for WWII Battlefields (6 Days Fishing, Kayaking & Snorkeling in Palau)

Drone sightings force New York airport to shut down temporarily

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

AI Prism

Просмотров 48 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 дек 2024

Комментарии • 35

@ayushthada9544 6 лет назад ⁺⁴¹
He is the only guy for which I have to watch RUclips video at normal speed. The best thing about Andrej is that he gives a new perspective about the concepts which you already know very well.
@TheAIEpiphany 3 года назад ⁺¹³
Amazing lecture, thank you, Andrej! (I'm 4 years late but still hahaha).
00:00 Intro
01:05 Pong example high-level explanation (from his blog "Pong from pixels")
11:10 Q&A
19:13 Pong example detailed walk-through
33:10 Q&A
Some notes:
* He calls the input to sigmoid "logp" when it should actually be a logit and not log probability
* Numbers on his chart, before the positive reward, are +0.27, +0.24, etc. but that should be +0.9, +0.81 instead (if gamma == 0.9).
* Some minor typos like backpro prelu instead of backprop relu xD, etc.
@alexanderli9067 6 лет назад ⁺⁵⁰
Karpathy somehow manages to talk at 5x the speed of Mnih
@panchajanya91 7 месяцев назад
Andrej is a superstar in the field. It's always a pleasure listening to Andrej.
@omegzable 7 лет назад ⁺²⁰
This lecture is Gold
@PaAGadirajuSanjayVarma 4 года назад ⁺³
Andre karpathy is so humble that i want to be like him one day.Thanks for great lecture andre
@adibhide9019 5 лет назад ⁺⁵
And I use to think that I have understood Policy Gradient, nice lecture!
@mrbom3977 6 лет назад ⁺⁴
Best interpretation of Policy Gradients
@st3ppenwolf Год назад ⁺¹
Andrej should make money rolling out his own Khan academy-like business, his explanations are just brilliant!
@rafaelsouza4575 Год назад
totally, just incredible piece-by-piece explanation
@dubeya01 5 лет назад ⁺²
What a lecture... blown away!
@hazemahmed8333 4 года назад ⁺³
this is pure gold thank you so much !!
@brishtiteveja Год назад
Policy gradient explanation couldn't be simpler than this..
@naeemajilforoushan5784 7 месяцев назад
great lecture and speak very well.
@sezan92 6 лет назад ⁺⁷
I may be dumb, but how is the np.dot(W2,h) is log probability ?it can be seen in 2:59
@arkoraa 6 лет назад ⁺⁷
I think it's meant to be logit_p, suggesting that it's a quantity that is going to be mapped to a probability by the sigmoid function. It's certainly confusing to name a variable like this. He even wrote the comment wrong, as he briefly mentioned in 3:09
@stefanbschneider 4 года назад
The input to the sigmoid function is often called "logit". See Hands on ML 2nd edition page 144
@sibyjoseplathottam4828 4 года назад ⁺⁴
I wish there were two Kartpathy's, one for Computer Visio and another for RL.
@rafaelsouza4575 Год назад
maybe adding a moving average of the last predictions could make the moves more smoother and less trembling.
@florentinrieger5306 2 года назад
Ah and great lecture! :)
@ryanmckenna2047 10 месяцев назад
Why is the learning rate in the RMSProp formula -alpha but Andrej's code he uses +alpha?
@JohnGFisher 6 лет назад
This is fantastic.
@aryan_kode 4 года назад ⁺¹
just found out that he is the director of ai at tesla
@pauldacus4590 2 года назад ⁺¹
Andrej simultaneously seems to want to answer questions, and stop answering questions as soon as possible.
@ProfessionalTycoons 6 лет назад ⁺⁴
great talk if 0.75
@antonio.7557 5 лет назад
How does it work for a continuous action space? For example if the steering angle is 23 degrees, and it's a succesfull episode, then there is no error between 23(label) and 23(output), and therefore there's no gradient
@antonio.7557 5 лет назад
In a discrete one you can minimize the error between 0.7(probability of up) and 1 (up), i get that. But if it's a continuous action space i don't see how this works. You have to discretize and instead of one steering angle you have a probability value for each of the 360 degree angles?
@nanthakr8378 4 года назад
Can somebody give me the link for the code
@soylentpink7845 10 месяцев назад
Sheldon :D
@dr.mikeybee 10 месяцев назад
Of course, using an abstract representation rather than raw pixels would be helpful. Pong can be looked at as a simple Newtonian physics environment.
@Metalwrath2 5 лет назад ⁺¹
Pepega
@Nickben89 5 лет назад ⁺¹
Pepega indeed my friend...

Следующие

Автовоспроизведение

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Searching the Jungle for WWII Battlefields (6 Days Fishing, Kayaking & Snorkeling in Palau)

Searching the Jungle for WWII Battlefields (6 Days Fishing, Kayaking & Snorkeling in Palau)

Drone sightings force New York airport to shut down temporarily

Drone sightings force New York airport to shut down temporarily

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

Hollywood - Peso Pluma, Estevan Plazola (Video Oficial)

CookinWitKya Pulled Up On Me…😍

CookinWitKya Pulled Up On Me…😍

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

How to hack the simulation | Andrej Karpathy and Lex Fridman

How to hack the simulation | Andrej Karpathy and Lex Fridman

Deep RL Bootcamp Lecture 4A: Policy Gradients

Deep RL Bootcamp Lecture 4A: Policy Gradients

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Deep RL Bootcamp Lecture 3: Deep Q-Networks

Deep RL Bootcamp Lecture 3: Deep Q-Networks

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

⚔️ ВС РФ штурмуют Суджанский выступ🔥 Атака на Северск🛡️ ВСУ контратакуют📅 Военные сводки 15.12.2024.

⚔️ ВС РФ штурмуют Суджанский выступ🔥 Атака на Северск🛡️ ВСУ контратакуют📅 Военные сводки 15.12.2024.

Мужчина приплыл на остров и ДЕЛАЛ там ВСЯКОЕ

Мужчина приплыл на остров и ДЕЛАЛ там ВСЯКОЕ

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

Mache leckere Lutscher mit diesem PRO-Gadget! 🚽🍭

НОВОСТИ: Гибель военных КНДР под Курском. Путин об успехах “СВО”. Новый пакет санкций против России

НОВОСТИ: Гибель военных КНДР под Курском. Путин об успехах “СВО”. Новый пакет санкций против России

Дагестанцы проверили Мышонка / Хабиб готовит Махачева и Умара к бою на UFC 311

Дагестанцы проверили Мышонка / Хабиб готовит Махачева и Умара к бою на UFC 311

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

Dégage, Bébé ! 🦟👶 Le gadget parental pour sauver votre petit des ravages des moustiques !

Dégage, Bébé ! 🦟👶 Le gadget parental pour sauver votre petit des ravages des moustiques !

Эти вещи впечатляют ДЕВУШЕК в квартирах парней!

Эти вещи впечатляют ДЕВУШЕК в квартирах парней!