Solving the Walking Robot Problem with Reinforcement Learning

Поделиться
HTML-код
  • Опубликовано: 4 авг 2024
  • This video shows how to use the reinforcement learning workflow to get a bipedal robot to walk. It also looks at how to modify the default example to make it look more like how one would set up a traditional control problem by adding a reference signal. It will also consider how a Reinforcement Learning equipped agent can replace parts of a traditional control system rather than an end-to-end design. Finally, we show some of the limitations of this design.
    You can find the example model used in this video in the MATLAB Central File Exchange: bit.ly/2HBxe79
    Watch our full video series about Reinforcement Learning: • Reinforcement Learning
    By the end of this series, you’ll be better prepared to answer questions like:
    - What is reinforcement learning and why should I consider it when solving my control problem?
    - How do I set up and solve the reinforcement learning problem?
    - What are some of the benefits and drawbacks of reinforcement learning compared to a traditional controls approach?
    Artificial intelligence, machine learning, deep neural networks. These are terms that can spark your imagination of a future where robots are thinking and evolving creatures.
    Check out these other resources:
    - Reinforcement Learning by Sutton and Barto: bit.ly/2HAYbb4
    - Reinforcement Learning Course by David Silver: • RL Course by David Sil...
    - Reinforcement Learning Toolbox: bit.ly/2YjuAYa
    - Deep Reinforcement Learning for Walking Robots: • Deep Reinforcement Lea...
    Check out the individual videos in the series:
    • What Is Reinforcement Learning?: • What Is Reinforcement ...
    • Understanding the Environment and Rewards: • Understanding Reinforc...
    • Policies and Learning Algorithms: • Reinforcement Learning...
    • The Walking Robot Problem: • Solving the Walking Ro...
    • Overcoming the Practical Challenges: • Overcoming the Practic...
    • An Introduction to Multi-Agent Reinforcement Learning: • Introduction to Multi-...
    • Why Choose Model-Based Reinforcement Learning?: • Why Choose Model-Based...
    --------------------------------------------------------------------------------------------------------
    Get a free product Trial: goo.gl/ZHFb5u
    Learn more about MATLAB: goo.gl/8QV7ZZ
    Learn more about Simulink: goo.gl/nqnbLe
    See What's new in MATLAB and Simulink: goo.gl/pgGtod
    © 2019 The MathWorks, Inc. MATLAB and Simulink are registered
    trademarks of The MathWorks, Inc.
    See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
  • НаукаНаука

Комментарии • 36

  • @joshchen7763
    @joshchen7763 4 года назад +2

    Amazing! The combination of RL and control model is impressive.

  • @alexcargill3602
    @alexcargill3602 Год назад +1

    Thank you Brian!! Im starting my masters and i want to get involved in this section of control!! Love your videos😊

  • @jingchengpang4248
    @jingchengpang4248 2 года назад +1

    It is really a good demonstration for getting a start in the reward design.

  • @ramisketcher2069
    @ramisketcher2069 5 лет назад +1

    You are AWESOME!

  • @chun-weikong7086
    @chun-weikong7086 3 года назад +3

    Dear Douglas, thank you for this fantastic videos. I am now trying to control a drone with neural network controller. But I am wondering what should be included in the observation states?

  • @kvlnnguyieb9522
    @kvlnnguyieb9522 Год назад

    really useful and simple

  • @alexandrpetrov1110
    @alexandrpetrov1110 3 года назад

    Thanks!

  • @kekeke1201
    @kekeke1201 4 года назад

    Theoretically, can the result of the running program including the learning structure be put on a 1 to 1 real life mashine? I mean just like people learning through virtual simulators first then practice in real life. So the hope would be that the real life mashine can start quite well already with minor twicks during real life training. Would that be possible?

  • @akinyaman
    @akinyaman Год назад

    in which program this models are tested l wonder a lot? couse there is physics goin on like gravity l think

  • @young4810
    @young4810 4 года назад

    there is a problem when i run walking robotrl3D. Variable 'agent' has been deleted from base workspace.

  • @kwinvdv
    @kwinvdv 5 лет назад +2

    Are there more convenient ways of training a robot with reinforcement learning such that you can guarantee stability for any initial condition or a very large set of possible disturbances, without having to add more scenarios to the training set? For example in more traditional control approaches you could try and find a Lyapunov function for the first question.

    • @BrianBDouglas
      @BrianBDouglas 5 лет назад +3

      As far as I know, there aren't any formal methods to guarantee stability for learned deep neural networks. The next video in this series will cover some ways to improve robustness and stability (but as you say it comes from adding more scenarios to the training set). Another approach that I mention in the next video is to use RL as an optimization tool for the parameters in a traditional control system architecture. Then you can use the existing formal tools to analyze the resulting system. If anyone reading this comment knows of formal methods for stability with RL or any current research that is trying to advance this topic, please respond and let me know where I can read up on it. Thanks!

  • @Hudmyq
    @Hudmyq 5 лет назад

    Hi, Brian, thanks for the new video. It is a good example. How did you know the overall reward value is 0.0625? Did you tune it manually ? Can you make simple example such as spring mass model ?

    • @roboticseabass
      @roboticseabass 5 лет назад +2

      That 0.0625 was for sure tuned manually -- based on the model parameters, this translates to "if you survive without falling each time step until the max simulation time, you get 25 reward points". For reference, a "high" final reward for this example was around 100-150, so that gives you an idea of the relative weighting of the survival reward vs. the others. The highest-scaled component of the reward is the forward velocity, as that is the primary goal of the RL problem.
      If you want a simpler example like a mass-spring-damper, I'd recommend checking out the Reinforcement Learning Toolbox example page: www.mathworks.com/help/reinforcement-learning/examples.html

    • @Hudmyq
      @Hudmyq 5 лет назад

      @@roboticseabassthanks, do you have any online training class about reinforcement learning?

    • @561slifer
      @561slifer 5 лет назад

      @@roboticseabass Can you show how to do this for a system in space state? I Don´t understand to much =(

  • @zhenisotarbay5159
    @zhenisotarbay5159 5 лет назад

    Error evaluating 'PreLoadFcn' callback of block_diagram 'walkingRobotRL2D'.
    Callback string is 'robotParametersRL
    '
    Caused by:
    Undefined function or variable 'robotParametersRL'.

  • @chuanjiang6931
    @chuanjiang6931 7 месяцев назад

    Running into this issue when executing createWalkingAgent2D.m:
    rlRepresentation will be removed in a future release. Unable to automatically convert rlRepresentation to new representation object. Use the new representation objects rlValueRepresentation, rlQValueRepresentation, rlDeterministicActorRepresentation, or
    rlStochasticActorRepresentation instead.
    Error in createDDPGNetworks (line 48)
    critic = rlRepresentation(criticNetwork,criticOptions, ...
    Error in createWalkingAgent2D (line 35)
    createDDPGNetworks;
    I am in MATLAB 2023b, is this code still maintained?

  • @cinghialandri
    @cinghialandri 3 года назад

    With such a reward function, isn’t the robot incentivized to stand still in its starting position?

  • @wishIKnewHowToLove
    @wishIKnewHowToLove Год назад

    funny looking robot! :--)

  • @HarshvardhanKanthode
    @HarshvardhanKanthode 5 лет назад +1

    Why was the height vector such a large negative value (multiplied by 50 and squared too) ? Won't it cause the robot to decrease its height by possibly falling down?

    • @CrystalDataMusic
      @CrystalDataMusic 4 года назад +3

      I think that z hat = initial height (or expected walking height with knees bent) - current height. The picture begs to differ, but that's the only way that makes sense to me.

    • @CrystalDataMusic
      @CrystalDataMusic 4 года назад +4

      I found this on the Matlab website: z hat is the normalized vertical translation displacement of the robot center of mass.

  • @mcasualjacques4234
    @mcasualjacques4234 3 года назад

    Maybe if you could have a measurement called "looks like human gait" the AI would only have to provide the corrections needed to prevent falling. Or rather "looks like the one reference gait we sampled"

  • @johnkoester7795
    @johnkoester7795 4 года назад

    How about a robot that has four legs? Or did you do that already I’m not sure

  • @alexanderskusnov5119
    @alexanderskusnov5119 5 лет назад +1

    It's better to learn it turning instead of going backward.

    • @BrianBDouglas
      @BrianBDouglas 5 лет назад

      I agree! For a simple demonstration in the video, though, it was much easier to get it to walk backward than venture away from the x-axis.

  • @BrianBDouglas
    @BrianBDouglas 5 лет назад +23

    Hey everyone, thanks for watching this video! If you have any questions or comments that you'd like me to see, please leave them under this comment so that I get notified and can respond. Cheers!

    • @561slifer
      @561slifer 5 лет назад +1

      Can you show how to do this for a system in space state? I Don´t understand to much =(

    • @zhenisotarbay5159
      @zhenisotarbay5159 5 лет назад

      Failed to load library 'walkingRobotUtils' referenced by 'walkingRobotRL2D/Walking Robot/Sensors/Rotation Matrix to Roll Pitch Yaw'
      simulink is not installed
      but thw simulink is installed

    • @atifmehmood8681
      @atifmehmood8681 5 лет назад

      Hello Brian, I am having a convergence problem my code runs but never come to a solution. kindly tell me where to check. I am doing trajectory tracking of an omnidirectional robot. Hope to see a reply from you

    • @user-zi1rf4wh2o
      @user-zi1rf4wh2o 4 года назад

      Hi, Brian, I am curious about that whether the observations collected from lidar or camera can be simulated or not. Since learning from physical robot arm is difficult and time-consuming. So it should be trained through simulation.

    • @cinghialandri
      @cinghialandri 3 года назад

      With such a reward function, isn’t the robot incentivized to stand still in its starting position?