Offline Reinforcement Learning

Поделиться
HTML-код
  • Опубликовано: 3 ноя 2024

Комментарии • 12

  • @connor-shorten
    @connor-shorten  4 года назад +5

    1:27 What is Offline RL?
    2:40 Benefits of Offline RL
    3:50 Quick Recap of Q-Learning
    5:34 Challenge of Distribution Mismatch
    7:12 DQN Replay Dataset
    7:45 Ensemble-DQN and REM
    9:24 Impact of Replay Dataset Size
    9:50 Dataset Quality
    10:32 Datasets for Data-Driven RL
    11:02 Factors of Offline RL Datasets
    13:19 Offline RL and Model-based RL

  • @PeterOtt
    @PeterOtt 4 года назад +4

    Your video output is nuts! It’s like 3 per week with such quality. Also I love RL so this was really cool to learn about. This is pretty clever in squeezing more learning out of the data that’s available and allowing wider applications with the availability of data in addition to giving wider experiences to agents
    PS: Can’t wait to get my Henry AI Labs t shirt!!

    • @connor-shorten
      @connor-shorten  4 года назад +2

      Thank you so much, I really appreciate your support and encouragement with this channel!! I
      think Offline RL is really interesting as well, I want to learn more about how RL can fine-tune chatbots and summarization. I think there could be some overlap between how the Meena chatbot is trained and then trying to give it a long-term reward such as a user-rated conversation score.

  • @weichen1
    @weichen1 4 года назад +6

    Working on similar problems, I believe using offline RL at first will make the model learn faster. But we still need to interact with the environment to refine and complete the edge case experiences because human agent might never encounter some cases.

    • @rbain16
      @rbain16 4 года назад +2

      I was thinking along the same lines as you. I wonder what would happen if the trained offline RL agent was allowed to interact with the environment, producing data that would train a new offline RL agent? I.e. what would happen if you switched back and forth between learning an offline & online agent?

    • @weichen1
      @weichen1 4 года назад +2

      @@rbain16 I think that will make the algorithm more robust as proven in asynchronous advantage actor critic (compare to A2C). Keep rotating between online and offline is like accumulating experiences asynchronously.

    • @connor-shorten
      @connor-shorten  4 года назад +1

      @@weichen1 I think the beauty of it also is that you can learn from other agents. Although I'd be surprised if distribution mismatch / lack of importance sampling doesn't cause divergence in more complex environments!

  • @DeepGamingAI
    @DeepGamingAI 4 года назад +4

    I don't think I'm understanding why Offline RL is categorized under "Reinforcement" Learning and not simply Supervised Learning

    • @rbain16
      @rbain16 4 года назад +2

      The artificial neural network parameterizes the action-value function (i.e. Q function), which comes from the reinforcement learning framework. The network is updated in a way that attempts to maximize reward over time (also from the RL framework), even if the network isn't the thing interacting with the environment at each time step. Hope that helps, someone correct me if I'm wrong.

    • @DeepGamingAI
      @DeepGamingAI 4 года назад +3

      @@rbain16 Oh I see. So it uses the information about rewards in the offline training dataset whereas in supervised setting, the actions taken by the human/expert system are used as target for directly learning the policy π and not Q-function. Is that right? Guess I have been getting confused between Q-learning and π-learning.

    • @rbain16
      @rbain16 4 года назад +1

      That's pretty much correct :) That supervised policy would only ever be as good as the data.
      I am currently reading Sutton and Barto's RL book. I would highly recommend it as they've been leaders in this field for decades.