The Unexpected Pure Math You Have to Know as a Data Scientist : Pythagorean Means

Policy Gradient Methods | Reinforcement Learning Part 6

MIT 6.S191: Reinforcement Learning

ITZY 'GOLD' ALBUM CONCEPT TRAILER

Chelsea 4-2 Brighton | P4LMER bags ALL FOUR! - HIGHLIGHTS | Premier League 2024/25

I Built Minecraft for a Real Axolotl

This is the Math You Need to Master Reinforcement Learning

ritvikmath

Просмотров 10 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 30 сен 2024

Комментарии • 27

@HemantPoonia-wq8hr 11 месяцев назад ⁺⁵
Hey can you please upload videos on causal analysis or can you suggest some books to get started with it
@matteogirelli1023 11 месяцев назад ⁺¹
You mean causal inference?
I suggest you to refer to econometrics textbooks, as in Economics we are pretty strong on that.
- "Mostly harmless econometrics" by Pishke for a graduate level in applied stats (pure stats would find it undergrad level)
- "Econometric analysis of cross-section and panel data" by Wooldridge
@HemantPoonia-wq8hr 11 месяцев назад
@@matteogirelli1023 what do you suggest for someone who want to causal inference to my domain of problem taht is climate science and earth science i just got started by reading causality by judea pearl
@djpremier333 11 месяцев назад
Statistical rethinking introduces it nicely, the whole lecture is on yt
@HemantPoonia-wq8hr 11 месяцев назад
@@djpremier333 thank you i will check their playlist
@drewgrant1605 Месяц назад
Just subscribed! I love the level you teach at in your videos. It’s slightly above the level of Statsquest but not too dense that I need to mentally prepare before watching. (No shade to Statsquest, two random events can be independently great).
@buumschakalaka4425 11 месяцев назад ⁺¹
Thanks for the great video 💪👏 will there be more RL videos coming? E.g I would like to understand more how to set up reward functions. How do I weight rewards from different actions against each other? Also how would we set up the environment in the model based approach? And more
@GulnazShalgumbayeva-d8i 6 месяцев назад
can you do rstan on R for Basyan stats case by case
@weslleys.pereira6998 5 месяцев назад
Great video! Thanks for sharing. I have a question though. I am new to the subject, so I am having trouble to understand the last step in your derivation (29:00). I am speaking about the "very very easy thing to do". Would you be kind enough to point me to where I can find more information about that? Thanks!
@xnairegodking 2 месяца назад
wow.. Best explanation ever. I think if you made a course of this it would be the best out there . Thanks for sharing
@adityabhatt3519 7 месяцев назад
Hi, I've trying to use multiple sources to look for the proof of this theorem. However, none of them use product rule (for derivative, time: 20.22). Can you please share with me if you know of a resource which does include the product rule.
@souravdey1227 4 месяца назад
Can you please make a full playlist on reinforcement learning. No one explains the math stuff as simply as you do. Also please do a separate video going into greater mathematical detail proving the theorem, kind of like numberphile2
@brycerogers5050 11 месяцев назад ⁺¹
Thanks Ritvik. Still having some trouble with why Reward (R) does not depend on Theta, mathematically - in your tree diagram, all the rewards are +-1/p, dependent (in their absolute quantity) only on the state, but also dependent (in their sign, which seems non-negligible in a reward system) on a Theta-based choice (H or L). Are you able to describe in a different way the intuition behind why d/dTheta logP(R,S|S,A) = 0?
@brycerogers5050 11 месяцев назад
Maybe better said is: what's the nuance (or obvious principle) that allows consideration of an explicit variable in a derivative wrt that variable, but disallows consideration of an implicit variable (ie further back in the causal chain) in a derivative wrt to that variable? Thanks again, your channel rocks!
@ritvikmath 11 месяцев назад ⁺³
Hey! Excellent question and it isn’t obvious in any sense. The key lies in the fact that this is a conditional probability rather than an unconditional one. If we removed the conditions on P(R,S | S,A) so it is just P(R,S) then this absolutely does depend on the policy theta and we can measure this dependency by tracing through the causal diagram. However by using a conditional probability we assume that the previous state and previous action are taken as given, at which point the probabilities for the next state and next reward are fixed and do not depend on the policy. Please let me know if that helps!
@brycerogers5050 11 месяцев назад
@@ritvikmath Ah, that makes perfect sense. Thank you.
@lial4633 5 месяцев назад
The best explanation of Policy Gradient methods I've seen!
@sharks1349 11 месяцев назад
I've been trying to understand Reinforcement learning and policy gradient methods always tripped me up. Thank you for making this video
@НиколайНовичков-е1э 11 месяцев назад ⁺¹
Thank you! :)
@avandfardi 6 месяцев назад ⁺¹
What a beautiful explanation. Thank you
@ritvikmath 6 месяцев назад
You are very welcome
@subhamkundu5043 11 месяцев назад
Great summary. I have a question why the state is not dependent on the reward?
@pushkarparanjpe 10 месяцев назад
Awesome explanation! Thanks.
@Mars.2024 4 месяца назад
Thanks A million 🎉
@matthewchunk3689 11 месяцев назад
Great summary! As good as LLMs are answering question, we still need smart people like you to get us thinking of the right questions.
@ritvikmath 11 месяцев назад
Thanks!
@tantzer6113 11 месяцев назад
LLMs are pretty bad at answering questions.

Следующие

Автовоспроизведение

The Unexpected Pure Math You Have to Know as a Data Scientist : Pythagorean Means

The Unexpected Pure Math You Have to Know as a Data Scientist : Pythagorean Means

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

MIT 6.S191: Reinforcement Learning

MIT 6.S191: Reinforcement Learning

ITZY 'GOLD' ALBUM CONCEPT TRAILER

ITZY 'GOLD' ALBUM CONCEPT TRAILER

Chelsea 4-2 Brighton | P4LMER bags ALL FOUR! - HIGHLIGHTS | Premier League 2024/25

Chelsea 4-2 Brighton | P4LMER bags ALL FOUR! - HIGHLIGHTS | Premier League 2024/25

I Built Minecraft for a Real Axolotl

I Built Minecraft for a Real Axolotl

Inter Miami vs. Charlotte FC | Messi Banger! | Full Match Highlights | September 28, 2024

Inter Miami vs. Charlotte FC | Messi Banger! | Full Match Highlights | September 28, 2024

Steven Strogatz: In and out of love with math | 3b1b podcast #3

Steven Strogatz: In and out of love with math | 3b1b podcast #3

Computer Scientist Explains Machine Learning in 5 Levels of Difficulty | WIRED

Computer Scientist Explains Machine Learning in 5 Levels of Difficulty | WIRED

RL Course by David Silver - Lecture 7: Policy Gradient Methods

RL Course by David Silver - Lecture 7: Policy Gradient Methods

Gradient Boosting : Data Science's Silver Bullet

Gradient Boosting : Data Science's Silver Bullet

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Richard Sutton on Pursuing AGI Through Reinforcement Learning

Richard Sutton on Pursuing AGI Through Reinforcement Learning

The KL Divergence : Data Science Basics

The KL Divergence : Data Science Basics

An introduction to Reinforcement Learning

An introduction to Reinforcement Learning

КАК БОМЖУ ЗАРАБОТАТЬ НА ТАЧКУ

КАК БОМЖУ ЗАРАБОТАТЬ НА ТАЧКУ

РАДИОУПРАВЛЯЕМАЯ vs НАСТОЯЩАЯ МАШИНА ЗА 500$ !)

РАДИОУПРАВЛЯЕМАЯ vs НАСТОЯЩАЯ МАШИНА ЗА 500$ !)

Этот чехол НЕ ЗАЩИТИТ твой телефон #shorts #шортс #смартфон #факты #чехол

Этот чехол НЕ ЗАЩИТИТ твой телефон #shorts #шортс #смартфон #факты #чехол

Петля времени от двух близнецов @EvenOut

Петля времени от двух близнецов @EvenOut

БЕЛКА ЗВОНИТ ДРУГУ#cat

БЕЛКА ЗВОНИТ ДРУГУ#cat

😳 24 часа ИЩУ ЖЕНУ моему другу итальянцу (до конца!😂)

😳 24 часа ИЩУ ЖЕНУ моему другу итальянцу (до конца!😂)

РОДИТЕЛИ СТАЛИ ЗОБМИ. Видео на канале😱

РОДИТЕЛИ СТАЛИ ЗОБМИ. Видео на канале😱