DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

PodCast - Qwen 2 5: The New Generation of Language Models by Alibaba Cloud

Nardwuar vs. Chappell Roan

HEAR ME OUT CAKE WITH MY BROTHERS

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

DeepMind x UCL RL Lecture Series - Model-free Prediction [5/13]

Google DeepMind

Просмотров 32 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 фев 2025

Комментарии • 14

@intuitivej9327 3 года назад ⁺⁵
How lucky I am... It is a great lecture. It is fun and so understandable since it is well explained.
Thank you for sharing for all of us.
@antoniomanjavacas1466 3 года назад ⁺¹²
Maybe it's just my humble impression, but I think examples like Bandits or Blackjack are not very intuitive for someone who is just getting into RL, but they are always used as canonical because they appear in Sutton & Barto 🤔
@marcin.sobocinski 2 года назад ⁺²
Unfortunately almost all tutorials, lectures etc. are based on Sutton & Barto book... which is ... well.. not very creative to put it nicely. The book itself is not as good as it should be as a RL bible (for me there is too much historical background and proxy discussion with other RL "fathers"). Still waiting for another "bible" in this topic that would be much more practical and less "academic".
@jonas14812 2 года назад ⁺¹
Thank you so much for the amazing lecture!
@perrysdemos6062 3 года назад ⁺²
This was a great lecture, thank you :)
@vslaykovsky 2 года назад ⁺¹
15:27 Why are function approximators are optimized with mean squared error function (L2) by default? Banach's fixed point theorem uses L-infinity norm which is closer to L1 error function
@nasirasadov634 3 года назад ⁺²
1:32:41 "Inception"
@mysunnyjune 2 года назад
I really appreciate the lecture and the effort, but all formular development needs to be done much more rigorously.
@ayoghes2277 3 года назад
Is there any proof that TD converges to the maximum likelihood estimate of the Markov model, for the given data? If so could anyone direct me to it, please?
@juanmoreno9633 3 года назад
Hi!
Have you found it?
Thanks in advance.
@ayoghes2277 3 года назад
@@juanmoreno9633 Hello!
No, I have not. If you do find it, please let me know. Thank you.
@ivanily4 2 года назад
@@ayoghes2277 how about this? link.springer.com/content/pdf/10.1023/A:1022632907294.pdf
@theSpicyHam 3 года назад ⁺¹
it's good that this isn't more straying toward anything military directedly e related at all

Следующие

Автовоспроизведение

DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

DeepMind x UCL RL Lecture Series - Model-free Control [6/13]

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

PodCast - Qwen 2 5: The New Generation of Language Models by Alibaba Cloud

PodCast - Qwen 2 5: The New Generation of Language Models by Alibaba Cloud

Nardwuar vs. Chappell Roan

Nardwuar vs. Chappell Roan

HEAR ME OUT CAKE WITH MY BROTHERS

HEAR ME OUT CAKE WITH MY BROTHERS

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

Jarahn - On My Way (Official Music Video) Jarahn feat. Studd Cruiser x Yansa Q

Imagine Dragons - Take Me To The Beach (feat. Ado) (Official Lyric Video)

Imagine Dragons - Take Me To The Beach (feat. Ado) (Official Lyric Video)

DeepMind x UCL RL Lecture Series - Multi-step & Off Policy [11/13]

DeepMind x UCL RL Lecture Series - Multi-step & Off Policy [11/13]

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

Deep Learning: A Crash Course (2018) | SIGGRAPH Courses

DeepMind x UCL RL Lecture Series - Planning & models [8/13]

DeepMind x UCL RL Lecture Series - Planning & models [8/13]

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Reinforcement Learning

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

DeepMind x UCL RL Lecture Series - Approximate Dynamic Programming [10/13]

RL Course by David Silver - Lecture 7: Policy Gradient Methods

RL Course by David Silver - Lecture 7: Policy Gradient Methods

Andrew Ng: Opportunities in AI - 2023

Andrew Ng: Opportunities in AI - 2023

DeepMind x UCL RL Lecture Series - Deep Reinforcement Learning #1 [12/13]

DeepMind x UCL RL Lecture Series - Deep Reinforcement Learning #1 [12/13]

Что скрывает Гарри Поттер и Орден Феникса (5)

Что скрывает Гарри Поттер и Орден Феникса (5)

МОЖНО ПОТЕРЯТЬ СЛУХ НА КОНЦЕРТЕ СИГМА БОЯ? #янгер #shorts

МОЖНО ПОТЕРЯТЬ СЛУХ НА КОНЦЕРТЕ СИГМА БОЯ? #янгер #shorts

ИРП Беларуси! 85.7% СУПА! Качество поражает!

ИРП Беларуси! 85.7% СУПА! Качество поражает!

Шлеменко - ЧЕСТНО о поражении Шары Буллета

Шлеменко - ЧЕСТНО о поражении Шары Буллета

МОНЕСИ ПРОТИВ НИКО! G2 - Falcons IEM Katowice 2025

МОНЕСИ ПРОТИВ НИКО! G2 - Falcons IEM Katowice 2025

Идеальная жизнь рухнула за одну ночь! // Детектив Платовой "Там, где не бывает снега", 1-2 серии

Идеальная жизнь рухнула за одну ночь! // Детектив Платовой "Там, где не бывает снега", 1-2 серии

ДОНК ПРОТИВ ВАНДЕРФУЛА! SPIRIT - NAVI IEM KATOWICE 2025

ДОНК ПРОТИВ ВАНДЕРФУЛА! SPIRIT - NAVI IEM KATOWICE 2025

Day 3 | IEM Katowice 2025 Group Stage | 🎙КРИВОЙ ЭФИР

Day 3 | IEM Katowice 2025 Group Stage | 🎙КРИВОЙ ЭФИР