Deep Reinforcement Learning for Walking Robots - MATLAB and Simulink Robotics Arena

MATLAB

Просмотров 53 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 июл 2024
Sebastian Castro demonstrates an example of controlling humanoid robot locomotion using deep reinforcement learning, specifically the Deep Deterministic Policy Gradient (DDPG) algorithm. The robot is simulated using Simscape Multibody™, while training the control policy is done using Reinforcement Learning Toolbox™.
In this video, Sebastian outlines the setup, training, and evaluation of reinforcement learning with Simulink® models. First, he introduces how to choose states, actions, and a reward function for the reinforcement learning problem. Then he describes the neural network structure and training algorithm parameters. Finally, he shows some training results and discusses the benefits and drawbacks of reinforcement learning.
You can find the example models used in this video in the MATLAB Central File Exchange: bit.ly/2HBxe79
For more information, you can access the following resources:
- Reinforcement Learning Tech Talks: bit.ly/2HBzMlS
- Blog and Videos: Walking Robot Modeling and Simulation: bit.ly/3JTs0ST
- Paper: Continuous Control with Deep Reinforcement Learning: bit.ly/2HAkJsp
- Paper: Emergence of Locomotion Behaviours in Rich Environments: bit.ly/2HBuTsO
--------------------------------------------------------------------------------------------------------
Get a free product Trial: goo.gl/ZHFb5u
Learn more about MATLAB: goo.gl/8QV7ZZ
Learn more about Simulink: goo.gl/nqnbLe
See What's new in MATLAB and Simulink: goo.gl/pgGtod
© 2019 The MathWorks, Inc. MATLAB and Simulink are registered
trademarks of The MathWorks, Inc.
See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
Наука

Комментарии • 42

@musabcoskun4494 4 года назад ⁺¹
Hi @Sebastian Castro I am working on DRL algorithms to grasp objects with a robotic arm with gripper. So, do you have any recommendations for me?
Thanks in advance,
@dr.ammaraldair1481 5 лет назад
Very good Job keep a head
@shashankswarup7812 3 года назад
Hey, I want to implement clipped double deep Q-learning for task allocation in cloud resources (VMs).
Is it possible to use MATLAB and Simulink for network simulation?
@ahedalbadin2889 5 лет назад
Very useful
@dianamariasaenz6229 3 года назад ⁺¹
How can this demo be opened? The walkingRobotRL2D command does not seem to work. Thanks!
@jorgefelipegaviriafierro705 3 года назад ⁺¹
Great work, thank you, I will try to implement an agent that makes MPPT for PV arrays based on this.
@ehsanhoseyni7267 3 года назад
And please send it to me. we can start coapration.
@exmundi 2 года назад ⁺¹
Is there any new version of this for matlab 2020b?
@franciscoserra8455 7 месяцев назад
Why do you use the previous action to calculate the reward and not the current action?
@josehmartin2023 3 года назад ⁺³
making a video showing something ready is very good. But it would be better still one or more videos, or even a course on how to apply neural networks in control systems with examples that start from scratch. Because I've been trying without success to apply a controller made of neural networks to any transfer function.
@zaininurhidayat7336 3 года назад
dear sir
i want to implement to mobile balancer robot. how to modelling and convert to RL until robot balance?
and at maltlab just simulation? and how to actual to robot balancer mobile?
@zahrasafari-d8179 3 года назад
thank you for this helpful video. I just have a question. How did you plot the robot during training? I need to see how my model act during training. I appreciate any reply.
@user-ed2iq3if6s 3 года назад ⁺¹
Don't use the parallel function. You could try to comment the last if block in createDDPGOptions.m from line 27 to 30.
@jcrash42 9 месяцев назад
they used simscape multi body
@oldcowbb Год назад
how do i play the trained network
@abdulbasithashraf5480 3 года назад
How to set initial conditions
@Thebreak1 5 лет назад
Do I have to buy the Reinforcement Learning Toolbox? Why is that not directly included in my Matlab-License.......
@roboticseabass 5 лет назад
Yes, Reinforcement Learning Toolbox is one of the requirements for this example. The list of required products is shown in the File Exchange/GitHub links.
@Thebreak1 5 лет назад
I know that this is required. Was more a question of why and if there is no way to use the code without buying?
@AtriyaBiswas 5 лет назад
Great short tutorial!! I wanna implement Q learning algorithm in the Agent. Do RL toolbox has Q learning algorithm? If you can make another such tutorial on Q learning implementation, that would be great. Thanks in Advance!!
@roboticseabass 5 лет назад
Yes, the toolbox has Q learning and Deep-Q Network (DQN) algorithms. I picked DDPG since I wanted a continuous action space vs. the discrete options provided by those other algorithms.
@AtriyaBiswas 5 лет назад
Thanks! But I couldn't find the Q learning algorithm in MATLAB 2018b. Is Q learning algorithm available only in 2019a/b?
@roboticseabass 5 лет назад
@@AtriyaBiswasReinforcement Learning Toolbox is new in R2019a, so that would make sense.
@AtriyaBiswas 5 лет назад
Thanks!! @@roboticseabass I have to get it installed.
@AtriyaBiswas 4 года назад
Hi @Sebastian Castro. I want to build an agent with three action variables and the action variables are discrete. Suppose action variable 'A' has 4 discrete values A = [0, 650, 1500, 4500]; action variable 'B' has 6 discrete values B = [0, 25, 50, 75, 100, 125]; and action variable 'C' has 7 discrete values C = [-25, -12, -5, 0, 5, 12, 25];
Should I write the code for "actInfo = rlFiniteSetSpec([0, 650, 1500, 4500];[0, 25, 50, 75, 100, 125];[-25, -12, -5, 0, 5, 12, 25])" ?? I couldn't find any matlab example showing how to write the code for actInfo when using more than one discrete action variable.
@attilakovacs1501 4 года назад ⁺¹
Hi Sebastian, great video! I have a question, when i try to use pretrained agent from the example i run into an error, could you help me with that please? The error is:
MATLAB System block 'walkingRobotRL3D/RL Agent/AgentWrapper' error occurred when invoking 'outputImpl' method of 'AgentWrapper'. The error was thrown from '
'M:\matlab_2019b\toolbox
l
l\+rl\+agent\AbstractPolicy.m' at line 133
'M:\matlab_2019b\toolbox
l
l\simulink\libs\AgentWrapper.m' at line 113'.
Invalid observation type or size.
Invalid observation type or size.
Dot indexing is not supported for variables of this type.
Thanks in advance!
@roboticseabass 4 года назад
I noticed this too with the 3D example as I tested some updates in MATLAB R2019b. So, I generated some new walking agents but have not published them yet. Email us at roboticsarena@mathworks.com and I can send you an agent file that works.
@pv4343 5 лет назад ⁺²
Great video.
I’ll do a project in my University to control continuous process with reinforcement learning, do you some advice to give me?
@roboticseabass 5 лет назад ⁺³
Based on my experience, my #1 tip is: Create a good reward function that penalizes jumping around the upper/lower limits of your possible action space. Otherwise, you just get an on-off controller.
@alexanderskusnov5119 5 лет назад
Do you use genetic algorithm?
@roboticseabass 5 лет назад ⁺³
No, this is the Deep Deterministic Policy Gradient (DDPG) reinforcement learning algorithm.
If you want, there's an earlier video in our series that shows Genetic Algorithms for joint waypoint optimization: ruclips.net/video/-dEX1SZOZEY/видео.html
@pv4343 5 лет назад
Can I use Reinforcement Learning in Matlab withtout this toolbox, only with simulink and script?
@roboticseabass 5 лет назад ⁺¹
Not unless you implement a lot of the functionality yourself. Neural network representations, RL algorithms, etc.
@kaiwenyang5728 4 года назад
3:33 is not accurate. The actor is backproped by policy gradient while critic is backproped by TD
@roboticseabass 4 года назад ⁺¹
Thanks -- this was an oversimplification of the general gist of DDPG for beginners.
Indeed, the critic loss is found using temporal differencing (TD) using target networks to make this a little less unstable during training! And the actor loss is simply the negative critic estimate (negative because you want to maximize Q-value, or minimize negative Q-value).
However, in both cases, this loss *is* then backpropagated through the respective actor/critic networks to update their parameters.
@ersinortagenc5233 Год назад ⁺¹
7:07 I get an arror that says "Unrecognized function or variable 'numObs'." when I run "createDDPGNetowks.m" How can I solve this problem?

Следующие

Автовоспроизведение