Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

An introduction to Policy Gradient methods - Deep Reinforcement Learning

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

Jason Segel Breaks Down His Most Iconic Characters

The White Lotus Season 3 | Official Teaser | Max

Deep RL Bootcamp Lecture 4A: Policy Gradients

AI Prism

Просмотров 62 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 17 янв 2025

Комментарии • 43

@naeemajilforoushan5784 9 месяцев назад ⁺⁵
After 5 years still, the lecture is a great video, thank you a lot
@bhargav975 6 лет назад ⁺³⁵
This is the best lecture I have seen on policy gradient methods. Thanks a lot.
@jony7779 4 года назад ⁺¹⁸
Every time I forget how policy gradients work exactly, I just come back here and watch starting at 9:30
@andreasf.3930 4 года назад ⁺³
And every time you visited this video, you forgot where to start watching. Thats why you posted this comment. Smart guy!
@auggiewilliams3565 5 лет назад ⁺¹
I must say that in more than 6 months, this is by far the best lecture/ material I have come across that was able to make me understand what policy gradient method actually is. I really praise this work. :) Thank you.
@marloncajamarca2793 6 лет назад ⁺³
Great Lecture!!!! Pieter's explanations are just a gem!
@ericsteinberger4101 7 лет назад ⁺¹⁰
Amazing lecture! Love how Pieter explains the math. super easy to understand.
@ashishj2358 4 года назад
Best lecture on Policy Gradients hands down. Has covered some worth noting superficial details of many papers as well.
@johnnylima1337 6 лет назад ⁺⁵
It's such a good lecture, I'm stopping to ask myself why it was so easy to cover such significant information with full understanding
@sharmakartikeya Год назад
I might be missing a simple concept here but how are we increasing/decreasing the grad log probability of the actions using the gradient of U(theta)? I get that positive return for a trajectory will make the gradient of U positive and so theta will be increased in favour of those trajectories but how is it increasing grad log prob?
@synthetic_paul 4 года назад ⁺⁵
Honestly I can’t keep up without seeing what he’s pointing at. Gotta pause and search around the screen each time he says “this over here”
@akarshrastogi3682 4 года назад ⁺²
Exactly. "This over here" has got to be the most uttered phrase in this lecture. So frustrating.
@Рамиль-ц5о 4 года назад
Very good lecture about policy gradient method. I have looked through a lot of articles and was understanding almost everything, but your derivation explanation is really the best. It just opened my eyes and showed the whole picture. Thank you very much!!
@bobsmithy3103 2 года назад
amazing work. super understandable, concise and information dense.
@norabelrose198 2 года назад
The explanation of the derivation of policy gradient is really nice and understandable here
@keqiaoli4617 4 года назад ⁺¹
why a good "R" would increase the probability of path??? Please help me
@DhruvMetha 3 года назад
Wow, this is beautiful!
@JyoPari 5 лет назад ⁺¹
Instead of having a baseline, why not make your reward function be negative for undesired scenarios and positive for good ones? Great lecture!
@ishfaqhaque1993 5 лет назад
23:20- Gradient of expectation is expectation of gradient "under mild assumptions". What are those assumptions?
@joaogui1 5 лет назад ⁺²
math.stackexchange.com/questions/12909/will-moving-differentiation-from-inside-to-outside-an-integral-change-the-resu
@dustinandrews89019 7 лет назад ⁺¹
I got a lot out of this lecture in particular. Thank you.
@biggeraaron 6 лет назад ⁺¹
Where can i buy his T-shirt?
@emilterman6924 5 лет назад
It would be nice to see what laboratories they had (what exercises)
@Procuste34iOSh 4 года назад
dont know if ur still interested, but the labs are on the bootcamp website
@JadtheProdigy 6 лет назад
best lecturer in series
@mcab2222 4 года назад ⁺³
nice but hard to follow without knowing what "this" refers to. I hope my guesses were right :)
@faizanintech1909 6 лет назад
Awesome instructor.
@isupeene 4 года назад ⁺²
The guy in the background at 51:30
@ethanjyx 5 лет назад
wow damn this is so well explained and the last video is very entertaining.
@suertem1 5 лет назад
Great lecture, thanks
@nathanbittner8307 7 лет назад
excellent lecture. Thank you for sharing.
@ProfessionalTycoons 6 лет назад
great talk!
@richardteubner7364 7 лет назад ⁺¹
1:11 why are DQNs and friends Dynamic Programming Methods? I mean the neural network works as functions approximator to satisfy Bellmans eqn. , but still Backprop is the workhorse. In my opinion DQNs are much more similar to PG methods than to Bellman Updates??! And another issue with RL Landscape slide is where the heck are model based RL algos?? This slide should be renamed to model free RL landscape.
@elzilcho222 6 лет назад ⁺¹
could you train a robot for 2 weeks in the real world then use those trained parameters to optimize a virtual environment? You know.. making the virtual environment very close to the real world?
@OfficialYunas 6 лет назад ⁺¹
Of course you could. It's the opposite of what OpenAI does when they train a model in a virtual environment and deploy it in reality.
@soutrikband 5 лет назад
Real world is very complicated with model uncertainties, friction, wear and tear and what have you...
Simulators can come close , but we cannot expect them to fully mimic real world phenomena.
@karthik-ex4dm 6 лет назад
PG is awesome!!!
Doesn't depend on environment Dynamics really?? Wow
All the pain and stress just goes away when we see our algorithms working😇😇
@Diablothegeek 7 лет назад
Awesome!! Thanks
@arpitgarg5172 5 лет назад ⁺¹¹
If you can't explain it like Pieter Abbeel or Andrew NG then you don't understand it well enough.
@piyushjaininventor 6 лет назад
Can you share ppt??
@luxorska5143 5 лет назад ⁺³
You can find all the slides and the other lectures here:
sites.google.com/view/deep-rl-bootcamp/lectures
@shaz7163 7 лет назад
very nice :)
@MarkoTintor 4 года назад
... you can use "a", and the math will be the same. :)

Следующие

Автовоспроизведение

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

THE AMAZING DIGITAL CIRCUS - Ep 4: Fast Food Masquerade

Jason Segel Breaks Down His Most Iconic Characters

Jason Segel Breaks Down His Most Iconic Characters

The White Lotus Season 3 | Official Teaser | Max

The White Lotus Season 3 | Official Teaser | Max

The Most Illegal Baseball Bat Ever Created

The Most Illegal Baseball Bat Ever Created

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Introduction to GANs, NIPS 2016 | Ian Goodfellow, OpenAI

Introduction to GANs, NIPS 2016 | Ian Goodfellow, OpenAI

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

L6 Model-based RL (Foundations of Deep RL Series)

L6 Model-based RL (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

L4 TRPO and PPO (Foundations of Deep RL Series)

L4 TRPO and PPO (Foundations of Deep RL Series)

Reinforcement Learning - My Algorithm vs State of the Art

Reinforcement Learning - My Algorithm vs State of the Art

Reinforcement Learning from scratch

Reinforcement Learning from scratch

How To Lure A Gorilla With Your Phone

How To Lure A Gorilla With Your Phone

Когда нас ждут хорошие времена? Реформы в России, кризис в США, бунт ИИ. Астролог Константин Дараган

Когда нас ждут хорошие времена? Реформы в России, кризис в США, бунт ИИ. Астролог Константин Дараган

ИРП в деревянном ящике от ОПЕРАТОРА! Российские ДЕЛИКАТЕСЫ!

ИРП в деревянном ящике от ОПЕРАТОРА! Российские ДЕЛИКАТЕСЫ!

До мира далеко? В.Баранец, В.Васильев, А.Матвийчук, К.Сивков (16.01.2025)

До мира далеко? В.Баранец, В.Васильев, А.Матвийчук, К.Сивков (16.01.2025)

เครื่องใช้ไฟฟ้าสุดชาญฉลาด อุปกรณ์สำหรับทุกบ้าน 🌟 เคล็ดลับการเลี้ยงลูก

เครื่องใช้ไฟฟ้าสุดชาญฉลาด อุปกรณ์สำหรับทุกบ้าน 🌟 เคล็ดลับการเลี้ยงลูก

Kuji Live: грязное бельё (Каргинов, Коняев, Сабуров)

Kuji Live: грязное бельё (Каргинов, Коняев, Сабуров)

Brave Coco, facing danger with a smile #clown #angel

Brave Coco, facing danger with a smile #clown #angel

Ручки зябнут Ножки зябнут

Ручки зябнут Ножки зябнут