Keynote - Offline reinforcement learning

Proximal Policy Optimization Explained

Reinforcement Learning with AI Feedback (RLAIF) | Constitutional AI

Exploring the Town That Made Internet & Phone Service Illegal (100% Disconnected)

Microsoft Sucks at Everything

BABYMONSTER - ‘CLIK CLAK’ PERFORMANCE VIDEO

Offline Reinforcement Learning

Connor Shorten

Просмотров 6 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 3 ноя 2024

Комментарии • 12

@connor-shorten 4 года назад ⁺⁵
1:27 What is Offline RL?
2:40 Benefits of Offline RL
3:50 Quick Recap of Q-Learning
5:34 Challenge of Distribution Mismatch
7:12 DQN Replay Dataset
7:45 Ensemble-DQN and REM
9:24 Impact of Replay Dataset Size
9:50 Dataset Quality
10:32 Datasets for Data-Driven RL
11:02 Factors of Offline RL Datasets
13:19 Offline RL and Model-based RL
@PeterOtt 4 года назад ⁺⁴
Your video output is nuts! It’s like 3 per week with such quality. Also I love RL so this was really cool to learn about. This is pretty clever in squeezing more learning out of the data that’s available and allowing wider applications with the availability of data in addition to giving wider experiences to agents
PS: Can’t wait to get my Henry AI Labs t shirt!!
@connor-shorten 4 года назад ⁺²
Thank you so much, I really appreciate your support and encouragement with this channel!! I
think Offline RL is really interesting as well, I want to learn more about how RL can fine-tune chatbots and summarization. I think there could be some overlap between how the Meena chatbot is trained and then trying to give it a long-term reward such as a user-rated conversation score.
@weichen1 4 года назад ⁺⁶
Working on similar problems, I believe using offline RL at first will make the model learn faster. But we still need to interact with the environment to refine and complete the edge case experiences because human agent might never encounter some cases.
@rbain16 4 года назад ⁺²
I was thinking along the same lines as you. I wonder what would happen if the trained offline RL agent was allowed to interact with the environment, producing data that would train a new offline RL agent? I.e. what would happen if you switched back and forth between learning an offline & online agent?
@weichen1 4 года назад ⁺²
@@rbain16 I think that will make the algorithm more robust as proven in asynchronous advantage actor critic (compare to A2C). Keep rotating between online and offline is like accumulating experiences asynchronously.
@connor-shorten 4 года назад ⁺¹
@@weichen1 I think the beauty of it also is that you can learn from other agents. Although I'd be surprised if distribution mismatch / lack of importance sampling doesn't cause divergence in more complex environments!
@DeepGamingAI 4 года назад ⁺⁴
I don't think I'm understanding why Offline RL is categorized under "Reinforcement" Learning and not simply Supervised Learning
@rbain16 4 года назад ⁺²
The artificial neural network parameterizes the action-value function (i.e. Q function), which comes from the reinforcement learning framework. The network is updated in a way that attempts to maximize reward over time (also from the RL framework), even if the network isn't the thing interacting with the environment at each time step. Hope that helps, someone correct me if I'm wrong.
@DeepGamingAI 4 года назад ⁺³
@@rbain16 Oh I see. So it uses the information about rewards in the offline training dataset whereas in supervised setting, the actions taken by the human/expert system are used as target for directly learning the policy π and not Q-function. Is that right? Guess I have been getting confused between Q-learning and π-learning.
@rbain16 4 года назад ⁺¹
That's pretty much correct :) That supervised policy would only ever be as good as the data.
I am currently reading Sutton and Barto's RL book. I would highly recommend it as they've been leaders in this field for decades.

Следующие

Автовоспроизведение

Keynote - Offline reinforcement learning

Keynote - Offline reinforcement learning

Proximal Policy Optimization Explained

Proximal Policy Optimization Explained

Reinforcement Learning with AI Feedback (RLAIF) | Constitutional AI

Reinforcement Learning with AI Feedback (RLAIF) | Constitutional AI

Exploring the Town That Made Internet & Phone Service Illegal (100% Disconnected)

Exploring the Town That Made Internet & Phone Service Illegal (100% Disconnected)

Microsoft Sucks at Everything

Microsoft Sucks at Everything

BABYMONSTER - ‘CLIK CLAK’ PERFORMANCE VIDEO

BABYMONSTER - ‘CLIK CLAK’ PERFORMANCE VIDEO

Federal judge sides with Iowa secretary of state over voter eligibility dispute

Federal judge sides with Iowa secretary of state over voter eligibility dispute

Offline Reinforcement Learning: BayLearn 2021 Keynote Talk

Offline Reinforcement Learning: BayLearn 2021 Keynote Talk

Data Efficient Reinforcement learning for Autonomous Robots with Simulated and Off-policy Data

Data Efficient Reinforcement learning for Autonomous Robots with Simulated and Off-policy Data

Reinforcement Learning: Machine Learning Meets Control Theory

Reinforcement Learning: Machine Learning Meets Control Theory

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study (Paper Explained)

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study (Paper Explained)

Offline Reinforcement Learning

Offline Reinforcement Learning

Offline Reinforcement Learning Research Survey

Offline Reinforcement Learning Research Survey

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

NBS Knowledge Lab webinar: Reinforcement Learning for Quantitative Trading

NBS Knowledge Lab webinar: Reinforcement Learning for Quantitative Trading

Rate Yana’s cuteness 1-10😂😈🐶🐱

Rate Yana’s cuteness 1-10😂😈🐶🐱

Rate our flexibility 1-10😳👯‍♀️🔥💗

Rate our flexibility 1-10😳👯‍♀️🔥💗

Random Emoji Beatbox Challenge #beatbox #tiktok

Random Emoji Beatbox Challenge #beatbox #tiktok

ПЛАСТИКОВАЯ ВИЛКА

ПЛАСТИКОВАЯ ВИЛКА

Героями дыры закрыть не получится! | Историк, Николай Фельдман | Альфа

Героями дыры закрыть не получится! | Историк, Николай Фельдман | Альфа

Satisfying Brick Paving 🧱 (🎥: tt/elchrisconstruction)

Satisfying Brick Paving 🧱 (🎥: tt/elchrisconstruction)

ГАРДЕРОБЩИЦА (смешное видео, юмор, приколы, поржать, смех)

ГАРДЕРОБЩИЦА (смешное видео, юмор, приколы, поржать, смех)

🤣 Придумал как ничего не делать и получать зарплату, но начальство всё узнало! | Новостничок

🤣 Придумал как ничего не делать и получать зарплату, но начальство всё узнало! | Новостничок