DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

DeepMind x UCL RL Lecture Series - Introduction to Reinforcement Learning [1/13]

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

The Aston Martin Valkyrie Is a $4.5 Million Insane Hypercar

I Ruined an Entire City With Unrelenting 100% Insanity - Highway Police Simulator

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

Google DeepMind

Просмотров 38 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 фев 2025

Комментарии •

@jedrzej8442 10 месяцев назад ⁺¹
Another great lecure! Every year this lecture gets more and more polished, thank you Hado and Deepmind! :)
@vslaykovsky 2 года назад ⁺⁹
I find it amusing that after building up strong math foundation behind policy gradient methods we end up with Cacla which is essentially a heuristic driven by some reasonable intuition.
@electric_sand 2 года назад
Lol
@blueeagle6890 3 года назад ⁺³
Great presentation. Thanks! Here’s a question: It is stated in the presentation (time: 1:03:49) that the correlation between an action at time T+N and rewards which were obtained in earlier time T is zero (and the corresponding terms in the summation are deleted). Is this true? It seems to be based on the assumption that a reward does not depend on future actions. However, the policy gradient equation is about correlation, not causality, and correlation does not require causality. A reward at time T may be correlated with an action taken at time T, which may be correlated with state at time T+N which is correlated with action at time T+N. So, there might be a correlation between action at time T+N and reward at time T. Here’s another more intuitive example: An episode may contain multiple repeating identical or substantially similar states and actions before it is terminated (for instance, a game where the player and ball are in the same situation multiple times during an episode). Therefore, an action selected at time T+N is likely to be correlated and indicative of a reward obtained at earlier time T, if there is a repeating identical scenario at time T and time T+N. Will be grateful for deblurring this issue. Thanks!
@timr8222 2 года назад
you have to take the expectation, only after the expectation it is 0
@Adam-yh6oe 3 года назад ⁺⁴
Amazing lectures Hado!
@DFM-b1n 4 месяца назад
Around 1:02:00 why don’t we take the rewards into account when computing p(tau) ? In my opinion we should take them into account, but hopefully since the probabilities P(R|S,A) do not depend on theta, they have not effect on the gradient so the equalities on the slide are true
@johntanchongmin 2 года назад ⁺²
Great lecture! Very clear and very understandable! Thanks so much:)
@k-bala-vignesh 3 года назад ⁺²
very useful, thank you
@prostohodim 2 года назад
hi! Could you please explain the ruclips.net/video/y3oqOjHilio/видео.html part? i think we should drop Expectation in the right side because we use sum everything up by probs. tell me if i'm wrong
@prostohodim 2 года назад
@a17benayad got it, thanks
@anirudhsilverking5761 3 года назад ⁺¹
Subtitles please!

Следующие

Автовоспроизведение

DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

DeepMind x UCL RL Lecture Series - Introduction to Reinforcement Learning [1/13]

DeepMind x UCL RL Lecture Series - Introduction to Reinforcement Learning [1/13]

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

What's the future for generative AI? - The Turing Lectures with Mike Wooldridge

The Aston Martin Valkyrie Is a $4.5 Million Insane Hypercar

The Aston Martin Valkyrie Is a $4.5 Million Insane Hypercar

I Ruined an Entire City With Unrelenting 100% Insanity - Highway Police Simulator

I Ruined an Entire City With Unrelenting 100% Insanity - Highway Police Simulator

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

Demetrious Johnson Trains w/ KHABIB & ISLAM MAKHACHEV! | EXCLUSIVE FOOTAGE!

Stray Kids Answers 30 Questions As Quickly As Possible

Stray Kids Answers 30 Questions As Quickly As Possible

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Fundamentals: Ten Keys to Reality | A Conversation with Nobel Laureate Frank Wilczek

Fundamentals: Ten Keys to Reality | A Conversation with Nobel Laureate Frank Wilczek

Policy Gradient Theorem Explained - Reinforcement Learning

Policy Gradient Theorem Explained - Reinforcement Learning

DeepMind x UCL RL Lecture Series - Multi-step & Off Policy [11/13]

DeepMind x UCL RL Lecture Series - Multi-step & Off Policy [11/13]

Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun

Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun

Robert Sapolsky: The Biology and Psychology of Depression

Robert Sapolsky: The Biology and Psychology of Depression

DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

The Turing Lectures: The future of generative AI

The Turing Lectures: The future of generative AI

Кто угнал тачку?

Кто угнал тачку?

Пол царства за баночку паштета. Вода и Еда. Славный Обзор.

Пол царства за баночку паштета. Вода и Еда. Славный Обзор.

Необычный румтур 🤭 #сашаспилберг

Необычный румтур 🤭 #сашаспилберг

Виселица Hangman #boardgames #настольныеигры #games #игры #настолки #настольные_игры

Виселица Hangman #boardgames #настольныеигры #games #игры #настолки #настольные_игры

Helldivers 2 - Servants of Freedom Warbond | PS5 & PC Games

Helldivers 2 - Servants of Freedom Warbond | PS5 & PC Games

Копы дуреют от этой прикормки

Копы дуреют от этой прикормки

New Colour Match Puzzle Challenge - Incredibox Sprunki

New Colour Match Puzzle Challenge - Incredibox Sprunki

ХАГГИ ВАГГИ ВЕРНУЛСЯ! - Концовка Поппи Плейтайм 4 #7 - Poppy Playtime Chapter 4

ХАГГИ ВАГГИ ВЕРНУЛСЯ! - Концовка Поппи Плейтайм 4 #7 - Poppy Playtime Chapter 4