Is A2C Different from PPO?

Поделиться
HTML-код
  • Опубликовано: 13 сен 2024
  • We go through what is PPO, compare with A2C, highlight differences and similarities. We look conceptually, do some maths, and compare using stable baselines. Both A2C and PPO are policy gradient methods for reinforcement learning which are popular recently, and work really well for large-scale training.
    This is based on the paper "A2C is a special case of PPO", arxiv.org/abs/... .

Комментарии • 9

  • @Zoutepepselmetkomkommer
    @Zoutepepselmetkomkommer 4 месяца назад +1

    Just wanted to say you are the first person I found to try and explain PPO on YT, so progress!

  • @parttimelarry
    @parttimelarry Год назад +1

    Just starting to go down this rabbit hole, thanks for making this channel. Cheers.

    • @rlhugh
      @rlhugh  Год назад +1

      Thanks for being the first person to comment on this video, and almost the first person to comment on any of my recent videos :D Let me know if you have any questions/comments/concerns etc please.

    • @parttimelarry
      @parttimelarry Год назад

      @@rlhugh For sure. I am exploring reinforcement learning for trading at the moment and am starting from scratch. I've seen a lot of RL tutorials import A2C and PPO, but they kind of gloss over what they are, so I found this while searching for some context.

  • @C0ld5t4r
    @C0ld5t4r Год назад +1

    Please more :D, nice Content🤩

  • @Rookie_AI
    @Rookie_AI Год назад

    hi, and what would this conclusion lead us to?

    • @vitaly1085
      @vitaly1085 Год назад +4

      Hi, my take aways are ppo is more general, you can set up ppo as a2c and use it. You no longer need a2c implementation in sb3 and code can be deleted. You don’t need ordinary knife when we have Swiss knife, but if it’s enough we use it “more often” in a kitchen of a house.

    • @vitaly1085
      @vitaly1085 Год назад +4

      Also, potentially ppo is more sample efficient, since it can use multiple epochs, also can achieve higher reward and so on, other words it can’t be worse then a2c in any parameter

    • @Rookie_AI
      @Rookie_AI Год назад

      @@vitaly1085 hi, thanks for the clarification