My vision for mountain car (no coding in this video!)

Поделиться
HTML-код
  • Опубликовано: 13 сен 2024

Комментарии • 13

  • @MobyMotion
    @MobyMotion 11 месяцев назад

    I got that far! What I’d love more of is that video that showed us the benefits of incorporating entropy regularisation. I want to learn all the tips and tricks that you can throw at RL when it’s not behaving as you want, and you’re one of the only channels I’ve seen that talks about that. Really helpful :)

  • @merv893
    @merv893 Год назад

    Great vid, hope this makes your day.

  • @Drwildy
    @Drwildy 5 месяцев назад +1

    I liked this video because it made me really think... OK why does Policy Iteration and Value Iteration NOT work for the mountain car problem.
    But it also made me look into Q-learning because many others have solved with problem with Q learning, so why DOES Q learning work.
    Well Q-learning updates its estimates based not only on the current reward but also on the estimated future rewards. This makes it adept at handling environments where rewards are sparse and delayed, as it can effectively backpropagate from the rare occurrences of receiving rewards.

  • @elijahberegovsky8957
    @elijahberegovsky8957 Год назад +2

    I’m trying to solve it right now using proximal policy optimisation with random network distillation (no reward shaping, just the curiosity module), and my goodness is it hard. Might just be the issue of hyperparameters, but it took around 1.5M timesteps just to get to the flag once, and after 7.5M it was still taking on average ~150 steps to get to the flag. The agent quickly explores everything easily accessible, and the predictor network just sorta learns everything and stops giving rewards. And then the agent is pretty much at the ground zero again. I want to build Never Give Up on top of it, but I wonder if RND is enough with appropriate parameter tweaking

  • @dmsovetov
    @dmsovetov 6 месяцев назад

    Hi, so how is it going? Did you manage to solve this problem with A2C?

  • @2theorists
    @2theorists 2 года назад

    Hi, fantastic content :) I am a bit new to the field of RL and would appreciate it if we could go through some project-based videos (explanation + code)

    • @rlhugh
      @rlhugh  2 года назад

      Yes definitely. Any particular preferences for what kind of projects you could be interested most in?

    • @2theorists
      @2theorists 2 года назад

      @@rlhugh Like we can have some videos in which you can solve mountain car with having some code explanations side by side, I really like your knowledge depth being a deep learning practitioner myself :)

    • @rlhugh
      @rlhugh  2 года назад

      @@2theorists awesome. Sounds great :)

  • @elijahberegovsky8957
    @elijahberegovsky8957 Год назад

    By the way, have you actually coded it up after this video? I’d be very interested to hear what method ended up working

    • @rlhugh
      @rlhugh  Год назад

      No, I started trying more mainstream videos. Which crashed and burned to be honest :P anyway, im kind of dabbling in using Unity as an environment for now, and seeing where that takes me.

    • @rlhugh
      @rlhugh  Год назад

      It's possible I should come back to the mountain car idea. I seem to be getting a fair few comments on this video (relative to my other videos :P )

    • @rlhugh
      @rlhugh  Год назад +1

      Making RUclips videos is actually like a kind of RL tbh. Very sparse rewards. The reward assignment problem is very challenging. Huge combinatorially large action space...