Teaching Robots to Walk w/ Reinforcement Learning

Поделиться
HTML-код
  • Опубликовано: 27 ноя 2021
  • Robot sim adventure video part two, covering my attempts to get some reinforcement learning to work with the Bittle robot in the Isaac sim.
    Repo for this project: github.com/Sentdex/TD3-Bittle
    More on Puget: hubs.ly/H0-By8Q0
    The actual Petoi Bittle robot can be found here: www.petoi.com/collections/rob...
    Neural Networks from Scratch book: nnfs.io
    Channel membership: / @sentdex
    Discord: / discord
    Reddit: / sentdex
    Support the content: pythonprogramming.net/support...
    Twitter: / sentdex
    Instagram: / sentdex
    Facebook: / pythonprogramming.net
    Twitch: / sentdex

Комментарии • 105

  • @kopasz777
    @kopasz777 2 года назад +84

    I feel like the reward function should take into account the energy required for the movement (optimizing for better distance/energy). Sudden high accelerations take more energy than slow and gradual ones. You could probably get movement closer to what natural evolution came up with rather than slowly wiggling and jittering forward.

    • @zuggrr
      @zuggrr 2 года назад +5

      Wow, this is a very interesting lead !

    • @sentdex
      @sentdex  2 года назад +19

      I like the idea of this, but not sure how exactly to calculate it.

    • @mankaransingh2599
      @mankaransingh2599 2 года назад +23

      @@sentdex this is used alot in control theory / model predictive control.
      the modified loss function would be : loss + alpha * (previous_output - current_output) ^ 2
      the `alpha * (previous_output - current_output) ^ 2` term makes sure that next input is closer to the previous one to avoid jitter. the alpha is ofcourse, some weight to prioritize this term

    • @maciejkrajewski4799
      @maciejkrajewski4799 2 года назад +3

      @@sentdex You could also check how often does the direction of servo movement changes. Frequent changes are not natural and will draw a lot of current in a real world experiments

    • @jeroenritmeester73
      @jeroenritmeester73 2 года назад

      @@maciejkrajewski4799 Just the direction would be very unstable, since a still limb may jitter between small numbers around zero like -0.001 and 0.001. The magnitude definitely matters. If the difference of servo positions is calculated, that basically accounts for the speed of the movement. This speed could also be used as a parameter. Minimising this, offset by a small number to allow for any movement, might work.

  • @AChadi-ug9pg
    @AChadi-ug9pg 2 года назад +20

    One of my heros is back at it again ❤️

  • @aloufin
    @aloufin 2 года назад

    All these concepts are so impenetrable when the initial research papers come out, and they are briefly discussed on HN/twitter... Thanks for explaining these concepts in a well broken down manner Harrison, only when these learning videos come out can I start to really understand how it all fits together. Your unique teaching style is reaching so many people in the world, Thank you!

  • @neuron8186
    @neuron8186 2 года назад +4

    This channel where it all started for me 2 years ago thanks because of you I got into deep learning

  • @nsgoneape9899
    @nsgoneape9899 Год назад

    you are a beacon of hope and knowledge. Thank you

  • @batrox9219
    @batrox9219 2 года назад +2

    i've been looking forward to this video for so long :DD

  • @wktodd
    @wktodd 2 года назад +23

    Taking tips from evolution, you might base your reward on energy usage. All the short gaits will use more energy to move a given distance. You could probably estimate energy use from loop time of the gait.

    • @judedavis92
      @judedavis92 2 года назад

      Well evolution isn’t proven so it would make more sense to call it: “creation”. Works for either belief.

    • @wktodd
      @wktodd 2 года назад +7

      @@judedavis92 unfortunately, there's no learning involved in the creation myth, just a simple know-all that magically makes things work ;-)

    • @judedavis92
      @judedavis92 2 года назад

      @@wktodd with all due respect, I understand why you think that.
      Creation actually involves pretty much all learning. I’m not too sure why people don’t understand that.
      And uhh ‘magically’? No, no, no, no. The principle of creation is that everything and order and was intricately designed by a creator (and not by an explosion).
      Think, how are our brains learning, so intricately designed, but by an explosion? Doesn’t make sense. That’s what evolution can’t prove. With all due respect, I disagree from not only a programmer’s point of view, but also a logical point of view.
      But the fact you read all of this means that you are respecting my opinion, thanks for hearing me out :)

    • @morkjetk
      @morkjetk 2 года назад +5

      @Jude Davis Evolution is the process of gradual improvement/change by random mutations. It doesn't necessarily have anything to do with an explosion (big bang).
      That's why the analogy used is relevant in this context because that's exactly what the it does.

    • @rickybloss8537
      @rickybloss8537 2 года назад +2

      @@judedavis92 If evolution isn't real why does simulating it work? The neat algorithm mentioned in the video stands for Neral evolution of augmented topologies. Your myth meanwhile has absolutely no evidence. Besides a book that supports slavery, and a book that condemns consentual homosexuality. When a couple of words "slavery is wrong" could have saved countless lives, and exploitation.

  • @lel7531
    @lel7531 2 года назад +1

    Eager to see the next while not too long from now I hope

  • @chris-graham
    @chris-graham 2 года назад +4

    Few suggestions:
    Current public state of the art is PPO.
    PPO can be used with either discrete actions or continuous actions.
    If you choose to use discrete actions, use discrete delta actions with n buckets of modifiers.
    Your batch size is very small. Try something around 50k-100k environment steps & minibatches of about 4k or 8k with a rollout length of ~200-400. Actor critic networks are EXTREMELY sensitive to updates so you absolutely have to use large batches. Large batches reduce the chance of getting stuck in local minima.
    As a last resort, you can add an LSTM into the network since walking does have some multi frame state dependency.
    Your network size is appropriate. Try increasing learning rate if you haven't. 5e-4 is probably a good starting point.
    Use 0.01(default PPO) for entropy and decay to 0.005 (or lower) over time.
    I haven't checked if this is how you've built your critic, but these types of problems like one fully connected value function output to one tensor connected to last layer of your network (before action outputs)
    Add your previous actions as features into your observation space.
    There's no "rush" to finish the episode so perhaps add a reward that decays with time to encourage speed.
    There's a bajillion other things you can change, but that's the joy of reinforcement learning :)

    • @sentdex
      @sentdex  2 года назад

      Thank you for the feedback. Do you have a link to a clear/simple implementation of continuous ppo that you'd recommend?

  • @Ian-ue8iq
    @Ian-ue8iq 2 года назад +28

    I noticed that a lot of the "walks" were asymmetric, (which makes sense because the model doesn't know that the robot has any summary). So what if you were to make a model that took all of the inputs but was only allowed to control the right side, then ran it again with the inputs mirrored and apply the outputs to the left side. That way lessons wouldn't have to be learned twice. Maybe.

    • @nobodydoesanything381
      @nobodydoesanything381 5 месяцев назад

      This would work for flat surfaces but if you were walk on a rock on one side but not the other it would fall over. Having it learn both sides separately allows each side to respond to different environments on either sides

  • @easyBob100
    @easyBob100 2 года назад +5

    The best AI controlled motion I've seen was based on learning mo-cap data. I've always thought that was the best way to do it: Use motion data to have basic movement, and let the AI deal with changing that motion to adapt to the environment/control signals. That way, the model doesn't need to be big. [/thoughts]

    • @codecampbase1525
      @codecampbase1525 2 года назад +1

      Yes. The whole argument: „but i want to make it walk on its own“ is an unatural task, as humans and other living animals learn by watching and adapting to the behaviors of the parents/ other animals.
      Besides that, using a reward is already a given hint/direction on what to do.
      So I totally agree that movement and other motions should be introduced with some given examples and then let the adaption begin from there. Mocap is one of them.

  • @swannschilling474
    @swannschilling474 2 года назад

    This is great!! Keep it up!! I am addicted already! 😁

  • @Reza-wd3ji
    @Reza-wd3ji 2 года назад +4

    I recommend checking out Physics-informed neural networks. It's basically a way to make the agent "understand" the physical environment better. Also, Normalized advantage functions might be good a substitute for DDPG in continuous controlling problems.

  • @HellTriX
    @HellTriX 2 года назад +4

    I think the biggest problem with all these deep learnings, is that the reward is to basic. For example, a high reward just for reaching a goal is inadequate. Adding additional rewarding for how efficiently the servos are used, reward for standing up more straight and using a wider range of servo motion. When a kid learns to walk, they are reinforced over time for how long they can stand, how fast they can get there, and additional tasks like reaching and overcoming obstacles for treats, etc. So can we really blame the reinforcement models when we don't even reward them for how visually appealing they are, or how efficiently they use their ranges of motion?

  • @Joe_Zajac
    @Joe_Zajac Год назад

    DDPG algorithm on bipedal sim initially looked like John Cleese’s funny walk on Monty Python’s Flying Circus

  • @ayyazulhaq7335
    @ayyazulhaq7335 2 года назад +2

    for CarePackage in range(mint6.15):
    GPU(2,RTX3090)
    RAM(1,TeraByte)
    CPU(32,CORE)
    print("Awesome")
    print("Can I get one ")

  • @hayoun3
    @hayoun3 2 года назад +1

    Sentdex : And... I'll call this "Walking training framework."
    Google : What does it stands for?
    Sentdex :

  • @roostertechchan
    @roostertechchan 2 года назад +1

    5:15 Ministry of silly walks... funny enough from Monthy Python

  • @Stinosko
    @Stinosko 2 года назад +1

    Very NEAT video!

  • @Tamingshih
    @Tamingshih 9 месяцев назад

    Thank you!

  • @K1RTB
    @K1RTB 2 года назад +1

    „Updating back to stupidity“ is what I do every night.

  • @ayyazulhaq7335
    @ayyazulhaq7335 2 года назад

    Awesome work, I wanted to create an Robotic Dog and Robotic Hand using DQN + Evolution algorithms. Give a shot to these in future work

  • @ethanblackthorn3533
    @ethanblackthorn3533 2 года назад

    Fascinating stuff

  • @xXReVo_LuTiOnXx
    @xXReVo_LuTiOnXx 2 года назад +7

    Just had another idea on how you might solve the current problem.
    Maybe it would be possible to reward the module if it keeps the back steady.
    Because right now it is moving the back a hell lot through all the shaky movement. But if we reward it to keep the middle part leveld, it should actually be forced to use the legs way more.

  • @codecampbase1525
    @codecampbase1525 2 года назад

    The beauty is how everyone is guessing as no one knows the right answer.
    ML and AI is such a Great topic, but it needs ethic laws/guidelines and a moral approach to prevent our downfall in the longterm.
    His book is great to grasp the math behind it, even if you are already familiar with the concept / cs related topics.

  • @saminyeasararnob1810
    @saminyeasararnob1810 2 года назад

    A quick try would be, (1) action repeat - repeat the same action multiple times ex 2,3,4 (2) Update the network after every episode, updating after fixed episode can accumulate more bad examples in the buffer (3) try multiple actions ex: take 10 actions using the policy and then take the one that critic thinks to be the best during training only

  • @qzorn4440
    @qzorn4440 2 года назад

    years ago i have seen a video on the evolution of a desk lamp to hop, some just quavered and shook, then the off spring started hopping like hop along Cassidy. Mmm... 😍

  • @ianhaylock7409
    @ianhaylock7409 2 года назад +4

    It seems to me that to do a correct walking gait, you need all the legs to move through a similar range. Maybe you can reward those that have the most similar leg movements.

  • @Nedwin
    @Nedwin 2 года назад

    Wow, amazing!

  • @oplavevski
    @oplavevski 2 года назад

    Another route to try may be to reduce the jerkiness. When doing these micro movements, there's lots of jerk happening. Do a reward system based on the lower number of jerks per second. If that works...

  • @MrBoubource
    @MrBoubource 2 года назад

    I love the word bittle.

  • @SimeonRadivoev
    @SimeonRadivoev 2 года назад +1

    Yea I still think for a functional and robust walking from machine learning you need a lot more sensors, at the very least potentiometer sampling for each joint.

  • @Stinosko
    @Stinosko 2 года назад +2

    When in doubt, you can download even more ram

  • @imdadood5705
    @imdadood5705 2 года назад

    Lollll! I don’t understand even a single thing you were explaining. I know something about RL. I just kept on watching 😅

  • @Diego0wnz
    @Diego0wnz 2 года назад +1

    Import WTF is a good one

  • @robomasticus
    @robomasticus 2 года назад

    It's the ministry of silly walks!

  • @swannschilling474
    @swannschilling474 2 года назад

    Hey, just came back to this episode and I was thinking that reducing the available joint angles could also be something that changes the outcome quiet a bit...
    Also, since you did a bit with Game Engines in the past. Did you ever take a look into Unity? ML Agents and Articulation Bodys seem to be something that might be fun to explore? I would love to see you trying it out! 😊

  • @Stinosko
    @Stinosko 2 года назад +3

    To prevent the shaking of the leg: maybe devide the end reward by the total movement of the leg.
    Keep track of eveytime the servo switches direction and the end reward is divided by the total movement switches. If the robot learn the shaking movement, it wil get rewarded for shaking less with the less while it is in a good start position to learn the "normal" dog movement? 😊

    • @Stinosko
      @Stinosko 2 года назад

      In real life those robot servo's usually are not equipped to handel lots movement direction of sudden acceleration/deceleration so having those negative rewards might be useful? 🤷‍♂️

    • @Amit9001
      @Amit9001 2 года назад

      👍👍👍

  • @patham9
    @patham9 2 года назад

    Cool work! On the other hand it's also interesting how RL fails to be practical. Meanwhile legged robots operate with Model Predictive Control based on first-principle engineering.

    • @sentdex
      @sentdex  2 года назад +3

      Careful though. It was but a couple years ago where people made similar claims against AI and language. Now look where we are ;)
      I think we'll likely see huge advances in robotics AI over the next few years. Not there yet, but I think we will be.

    • @patham9
      @patham9 2 года назад

      @@sentdex RL is flawed unfortunately. I'm optimistic nevertheless actually, Deep Learning has made great progress for perception and signal processing though. And also control theory has made great progress in practice, when looking at Boston Dynamic's legged robots for instance, there's no RL in there. Also other robotics tools, such as SLAM, planning algorithms and reasoning has made progress even when out of fashion in ML (which doesn't change its key role and necessity though).

  • @somethingwithbryan
    @somethingwithbryan 2 года назад +2

    I wonder if you have seen Robert Miles' video titled "Training AI Without Writing A Reward Function, with Reward Modelling". Would love to see this implemented and it might help here

  • @spsharan2000
    @spsharan2000 2 года назад +1

    Check out stable baselines-3 zoo. It would make your training work much easier.

  • @Nugget11578
    @Nugget11578 2 года назад +3

    Try including energy used in the reward. Because actual creatures generally move in ways that minimize energy usage.

  • @sefahinti5012
    @sefahinti5012 5 месяцев назад

    I preordered my Rohorse 🤣

  • @judedavis92
    @judedavis92 2 года назад

    I think you know what we’re gonna ask. Where’s the next ep of nnfs? 😂

  • @kornelillyes2848
    @kornelillyes2848 2 года назад

    Here we go, poggers

  • @louth4740
    @louth4740 2 года назад +3

    Did you try stable baselines 3 PPO?

  • @xXKM4UXx
    @xXKM4UXx 2 года назад

    You should look into deep evolutionary reinforcement leaning or try with a recurrent neural net. How do you know where you are going if you doing know where you've come from?

  • @tomaslapes5722
    @tomaslapes5722 2 года назад

    Maybe punish the robot when the body exceeds some tilt threshold( in order to keep the body sort of leveled)

  • @shmarvdogg69420
    @shmarvdogg69420 2 года назад +1

    So when u gonna deploy 1000 of these in a park somewhere to spread the robot dog love?

    • @sentdex
      @sentdex  2 года назад +2

      Soon as I have a thousand of them

  • @Veptis
    @Veptis 2 года назад +1

    I am kinda scared to look at the maths of backpropagation with multiple continuous outputs. Like what shape of data does the loss function even take?
    I feel like a delta over just one step might not say enough. Perhaps you need to take a delta over a few steps,

  • @ashu-
    @ashu- 2 года назад +1

    7:58 exploration vs exploitation

  • @robivlahov
    @robivlahov 4 месяца назад

    make it move legs horizontaly so front left right rear at the same time , same as bd does it with spot

  • @AJMansfield1
    @AJMansfield1 2 года назад

    Would there be any merit to a learning approach that uses NEAT to get an initial neural network that's started to bite on the problem, and then convert that network to a dense one in order to refine it by another approach? The connections NEAT makes between non-adjacent layers doesn't map _directly_ onto a dense network topology, but it should be possible to use some of the extra neurons to proxy those earlier neuron activations into later layers. (Even if you're using a sigmoid activation function, you could still make it work by using a very small 0.01× connection from the origin to the proxy, and then the inverse 100× connection from the proxy to the destination, in order to keep the proxy's value in the linear region of the activation function.)

  • @10xXxtailedxXxdemon
    @10xXxtailedxXxdemon 2 года назад

    Me: sees first line of code...*likes video*

  • @erdenkalilayev3877
    @erdenkalilayev3877 Год назад

    wow so good man but i cant understand one thing why did you use your pc as hardware as a source of power to train while we can use Google Cloud where our robot can learn walking, it seems that it's not that simple as it looks when we decide to choose cloud-based trainings, am i right?

  • @joshfoulkes5327
    @joshfoulkes5327 2 года назад

    How would you get a robot to walk without machine learning, are there any guidelines or would you just go for it and adapt?

  • @uskhan7353
    @uskhan7353 2 года назад

    Will it make it more simple if I use unity ML agents library ?

  • @michpo1445
    @michpo1445 2 месяца назад

    please remember to include a requirements.txt file with packages and specify the exact version number of python when releasing source code. Thanks

  • @aadarshkumar2257
    @aadarshkumar2257 2 года назад

    Will the Neural Networls from scratch in python series continue ? When the next video in that series will come.
    Please clarify this matter !

  • @zrebbesh
    @zrebbesh 2 года назад

    Is this sufficiently well-documented that I could take a neural network derived in a completely different environment and write a savefile that you could import it from?

    • @sentdex
      @sentdex  2 года назад

      I give their docs/forums a D or F right now. Too much is constantly changing, it's definitely a lot of tinkering to get stuff to work, at least for me so far.

  • @bennguyen1313
    @bennguyen1313 2 года назад

    Any thoughts PerceptiLabs?

  • @dhruvdwivedy4192
    @dhruvdwivedy4192 2 года назад

    Hey 👋 isn’t it best to do codes in c++ for better optimisation and performance ? 🤔

    • @sentdex
      @sentdex  2 года назад +1

      In reality, most of the python code wraps highly optimized c++. I assure you it'd run much slower if I was writing the c++ :P

  • @krembananowy
    @krembananowy 2 года назад +3

    Often in environments that run at 60 steps per second people run their models with action repeat - they ask their model for action, say, only every 6th step, and otherwise repeat the last action. In this case it'd be even simpler to implement, i.e. try changing kit.update() from 1/60 to some other values, such as 1/10.

  • @erbterb
    @erbterb 2 года назад

    I am seeing this with the thought:
    Is this not what movement would look like if you have no muscle fatigue and no account of the moment arm around a joint?
    I am certain we would all be scooting around like limbo dancers, if we did not have to account for lactic acid buildup.
    Send me the genius prize from 311 trillion when you add these features to the model environment variables.

  • @444haluk
    @444haluk 2 года назад

    Isn't isaac sim for ROS and isaac gym for RL?

    • @sentdex
      @sentdex  2 года назад

      That's the original argument, maybe. I used whatever I could figure out. The docs here are less than ideal, and I like designing things in the Sim UI more than hand-coding everything. First video showed my experiences and issues with Gym: ruclips.net/video/phTnbmXM06g/видео.html

  • @liquidpromo
    @liquidpromo 2 года назад

    frogbot

  • @fuba44
    @fuba44 2 года назад

    I'm "like" number 1000, just saying... I'm pretty special.

  • @elishashmalo3731
    @elishashmalo3731 2 года назад

    You mentioned that you used NEAT to test your input values. Has your stance on NEAT changed since - ruclips.net/video/ZC0gMhYhwW0/видео.html ?
    Bc in that video you seemed a little uninterested/disappointed in NEAT. Any new insights?

    • @sentdex
      @sentdex  2 года назад

      Nope, stance hasn't changed much. I think even back then I was pretty impressed with NEAT where it works, I just try to not oversell things. It cannot solve complex problems because it's essentially slightly better than brute force, but the argument I am trying to make here is it *can* be useful. It wont always work on your problems. It's extremely fast/easy to see if it does, and then to test other elements of your flow if it does work, and since that video I have worked more with it and come to respect it maybe a little more. I would certainly recommend everyone be familiar with this algorithm and try it out. It's actually quite fun to use on simpler projects. Having a model trained within minutes is... neat :D

    • @elishashmalo3731
      @elishashmalo3731 2 года назад

      @@sentdex how complex does a problem have to be that you don’t think neat would work? Because it seems like this problem is quite complex.
      Also, smart idea using NEAT to test your work!! I’ve been trying to get a deep-q-learning AI to play flappy bird and I’ve been having some trouble. The fact that it takes so long makes it really hard to test things (I don’t know how to set up TF on GPU yet). But today I just whipped up a NEAT program to do it and I had a essentially perfect model in literally minutes. I also found some clear issues in my code because of it.
      So thanks a ton for that hidden tip!!

  • @mannycalavera121
    @mannycalavera121 2 года назад

    Good, now teach it to update design, 3d print itself etc..

  • @nickrazes2720
    @nickrazes2720 2 года назад

    Please zoom in the code more next time. I watched this video on my phone and it is mostly unreadable.

    • @sentdex
      @sentdex  2 года назад +1

      Code is linked in description to read at whatever zoom level you desire, but I will also try to zoom in more. Usually do, forgot this time :(

    • @nickrazes2720
      @nickrazes2720 2 года назад

      @@sentdex thank you :) it’s ok. Just a friendly reminder:)

  • @ramtinnazeryan
    @ramtinnazeryan 2 года назад

    awesome video but texts were very small to read. can you punch in a little next time?

  • @dasoulking4190
    @dasoulking4190 2 года назад

    hmm

  • @CommunistBearFighter
    @CommunistBearFighter 2 года назад

    first

  • @strange_man
    @strange_man 2 года назад

    import WTF

  • @SoumyaMaitra2008
    @SoumyaMaitra2008 Год назад

    Just loved it!! Can you DM me your contact details?