Solving the Walking Robot Problem with Reinforcement Learning
HTML-код
- Опубликовано: 4 авг 2024
- This video shows how to use the reinforcement learning workflow to get a bipedal robot to walk. It also looks at how to modify the default example to make it look more like how one would set up a traditional control problem by adding a reference signal. It will also consider how a Reinforcement Learning equipped agent can replace parts of a traditional control system rather than an end-to-end design. Finally, we show some of the limitations of this design.
You can find the example model used in this video in the MATLAB Central File Exchange: bit.ly/2HBxe79
Watch our full video series about Reinforcement Learning: • Reinforcement Learning
By the end of this series, you’ll be better prepared to answer questions like:
- What is reinforcement learning and why should I consider it when solving my control problem?
- How do I set up and solve the reinforcement learning problem?
- What are some of the benefits and drawbacks of reinforcement learning compared to a traditional controls approach?
Artificial intelligence, machine learning, deep neural networks. These are terms that can spark your imagination of a future where robots are thinking and evolving creatures.
Check out these other resources:
- Reinforcement Learning by Sutton and Barto: bit.ly/2HAYbb4
- Reinforcement Learning Course by David Silver: • RL Course by David Sil...
- Reinforcement Learning Toolbox: bit.ly/2YjuAYa
- Deep Reinforcement Learning for Walking Robots: • Deep Reinforcement Lea...
Check out the individual videos in the series:
• What Is Reinforcement Learning?: • What Is Reinforcement ...
• Understanding the Environment and Rewards: • Understanding Reinforc...
• Policies and Learning Algorithms: • Reinforcement Learning...
• The Walking Robot Problem: • Solving the Walking Ro...
• Overcoming the Practical Challenges: • Overcoming the Practic...
• An Introduction to Multi-Agent Reinforcement Learning: • Introduction to Multi-...
• Why Choose Model-Based Reinforcement Learning?: • Why Choose Model-Based...
--------------------------------------------------------------------------------------------------------
Get a free product Trial: goo.gl/ZHFb5u
Learn more about MATLAB: goo.gl/8QV7ZZ
Learn more about Simulink: goo.gl/nqnbLe
See What's new in MATLAB and Simulink: goo.gl/pgGtod
© 2019 The MathWorks, Inc. MATLAB and Simulink are registered
trademarks of The MathWorks, Inc.
See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Наука
Amazing! The combination of RL and control model is impressive.
Thank you Brian!! Im starting my masters and i want to get involved in this section of control!! Love your videos😊
It is really a good demonstration for getting a start in the reward design.
You are AWESOME!
Dear Douglas, thank you for this fantastic videos. I am now trying to control a drone with neural network controller. But I am wondering what should be included in the observation states?
really useful and simple
Thanks!
Theoretically, can the result of the running program including the learning structure be put on a 1 to 1 real life mashine? I mean just like people learning through virtual simulators first then practice in real life. So the hope would be that the real life mashine can start quite well already with minor twicks during real life training. Would that be possible?
in which program this models are tested l wonder a lot? couse there is physics goin on like gravity l think
there is a problem when i run walking robotrl3D. Variable 'agent' has been deleted from base workspace.
Are there more convenient ways of training a robot with reinforcement learning such that you can guarantee stability for any initial condition or a very large set of possible disturbances, without having to add more scenarios to the training set? For example in more traditional control approaches you could try and find a Lyapunov function for the first question.
As far as I know, there aren't any formal methods to guarantee stability for learned deep neural networks. The next video in this series will cover some ways to improve robustness and stability (but as you say it comes from adding more scenarios to the training set). Another approach that I mention in the next video is to use RL as an optimization tool for the parameters in a traditional control system architecture. Then you can use the existing formal tools to analyze the resulting system. If anyone reading this comment knows of formal methods for stability with RL or any current research that is trying to advance this topic, please respond and let me know where I can read up on it. Thanks!
Hi, Brian, thanks for the new video. It is a good example. How did you know the overall reward value is 0.0625? Did you tune it manually ? Can you make simple example such as spring mass model ?
That 0.0625 was for sure tuned manually -- based on the model parameters, this translates to "if you survive without falling each time step until the max simulation time, you get 25 reward points". For reference, a "high" final reward for this example was around 100-150, so that gives you an idea of the relative weighting of the survival reward vs. the others. The highest-scaled component of the reward is the forward velocity, as that is the primary goal of the RL problem.
If you want a simpler example like a mass-spring-damper, I'd recommend checking out the Reinforcement Learning Toolbox example page: www.mathworks.com/help/reinforcement-learning/examples.html
@@roboticseabassthanks, do you have any online training class about reinforcement learning?
@@roboticseabass Can you show how to do this for a system in space state? I Don´t understand to much =(
Error evaluating 'PreLoadFcn' callback of block_diagram 'walkingRobotRL2D'.
Callback string is 'robotParametersRL
'
Caused by:
Undefined function or variable 'robotParametersRL'.
Running into this issue when executing createWalkingAgent2D.m:
rlRepresentation will be removed in a future release. Unable to automatically convert rlRepresentation to new representation object. Use the new representation objects rlValueRepresentation, rlQValueRepresentation, rlDeterministicActorRepresentation, or
rlStochasticActorRepresentation instead.
Error in createDDPGNetworks (line 48)
critic = rlRepresentation(criticNetwork,criticOptions, ...
Error in createWalkingAgent2D (line 35)
createDDPGNetworks;
I am in MATLAB 2023b, is this code still maintained?
With such a reward function, isn’t the robot incentivized to stand still in its starting position?
funny looking robot! :--)
Why was the height vector such a large negative value (multiplied by 50 and squared too) ? Won't it cause the robot to decrease its height by possibly falling down?
I think that z hat = initial height (or expected walking height with knees bent) - current height. The picture begs to differ, but that's the only way that makes sense to me.
I found this on the Matlab website: z hat is the normalized vertical translation displacement of the robot center of mass.
Maybe if you could have a measurement called "looks like human gait" the AI would only have to provide the corrections needed to prevent falling. Or rather "looks like the one reference gait we sampled"
How about a robot that has four legs? Or did you do that already I’m not sure
It's better to learn it turning instead of going backward.
I agree! For a simple demonstration in the video, though, it was much easier to get it to walk backward than venture away from the x-axis.
Hey everyone, thanks for watching this video! If you have any questions or comments that you'd like me to see, please leave them under this comment so that I get notified and can respond. Cheers!
Can you show how to do this for a system in space state? I Don´t understand to much =(
Failed to load library 'walkingRobotUtils' referenced by 'walkingRobotRL2D/Walking Robot/Sensors/Rotation Matrix to Roll Pitch Yaw'
simulink is not installed
but thw simulink is installed
Hello Brian, I am having a convergence problem my code runs but never come to a solution. kindly tell me where to check. I am doing trajectory tracking of an omnidirectional robot. Hope to see a reply from you
Hi, Brian, I am curious about that whether the observations collected from lidar or camera can be simulated or not. Since learning from physical robot arm is difficult and time-consuming. So it should be trained through simulation.
With such a reward function, isn’t the robot incentivized to stand still in its starting position?