Reinforcement Learning with Quadruped Robot Bittle and NVIDIA Isaac Gym: Almost Walking

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024

Комментарии • 42

  • @Hardwareai
    @Hardwareai  4 месяца назад +1

    Support my work on making tutorials and guides on Patreon!
    www.patreon.com/hardware_ai

  • @PP-fh2mk
    @PP-fh2mk 2 года назад +5

    I'm trying on this project too
    Glad to see more information about Isaac gym
    I hope your channel gets bigger!

    • @tastlk6351
      @tastlk6351 2 года назад +1

      hello i am working on this project too,and I have achieved some works so far.I hope that we can discuss together and collaborate for this project

    • @Hardwareai
      @Hardwareai  2 года назад +1

      I hope so too!

    • @PP-fh2mk
      @PP-fh2mk 2 года назад

      @@tastlk6351 sry for late reply
      I found leggedgym and tried insert own model but it doesn't work well..
      This is really hard project ever

    • @TheChromePoet
      @TheChromePoet 2 года назад

      @@Hardwareai Can you please tell me something, is it possible to fast forward the learning process to extreme speeds, for example, 30 days would be equivalent to 30,000 years of learning?

  • @allanboughen
    @allanboughen 2 года назад +3

    Creating RL is going to continue to get faster & easier. It will become thinking like a robot not writing hundreds of lines of code. The T-800 will be here very soon & we'll need experts in edging & ball fondling at the ready.

    • @Hardwareai
      @Hardwareai  2 года назад +1

      What is "thinking like a robot"? xD

    • @allanboughen
      @allanboughen 2 года назад

      @@Hardwareai the task will be to set optimal reward functions. To do this you will need to think like your subject (robot). Gym let's you see what he is thinking (or not thinking about) & adjust your functions to suit. The way you arrived at the IMU solution.

  • @ericblenner-hassett3945
    @ericblenner-hassett3945 2 года назад +1

    It is looking more and more like set poses, with their own reward, should also be added in the learning.

    • @Hardwareai
      @Hardwareai  2 года назад +1

      Well, I think what you're saying is that imitation learning can be used - that is true, although the setup is a bit cumbersome. What I want to experiment with is RL with feedback, as per deepmind.com/blog/article/learning-through-human-feedback

  • @mlcat
    @mlcat 2 года назад

    Good job and congrats on relocation!

  • @mdaamir9245
    @mdaamir9245 Год назад

    0:01
    hello and welcome to my channel hardware
    0:04
    ai
    0:06
    this is my first video after trading
    0:08
    east for west and moving to switzerland
    0:12
    it is also going to be the least
    0:14
    technical video from recently made and
    0:17
    will have none of my usual distracting
    0:19
    hand waving since i'm still setting up
    0:21
    the studio and the green screen
    0:24
    so please sit back and relax while i'll
    0:28
    tell you about reinforcement learning
    0:30
    powered robots taking over the world
    0:34
    nope that's not happening anytime soon
    0:43
    [Music]
    0:50
    since i finished my tiny ml course
    0:53
    series i wanted to focus a bit more on
    0:56
    robotics and publish some of the last
    0:59
    year's projects that i was quietly
    1:01
    working on
    1:02
    you remember that i made a few videos
    1:05
    about a robotic dog from pitoy beetle
    1:09
    i discussed how to write a custom driver
    1:11
    for it and perform a tele operation and
    1:14
    also how to do mapping with lidar and or
    1:17
    cameras
    1:18
    subscribers to my twitter knew that i
    1:21
    was exploring reinforcement learning for
    1:24
    opta and beetle
    1:27
    here is where my perfectionism came into
    1:30
    play
    1:31
    reinforcement learning is notoriously
    1:33
    hard for real world problems and
    1:37
    in my humble perfectionist opinion i did
    1:40
    not achieve stellar results and thus did
    1:43
    not have material to share
    1:45
    well
    1:46
    watching a series on training beetle
    1:48
    from zendex i realized that even the
    1:52
    past to the final project and experiment
    1:55
    however unsuccessful they might be are
    1:59
    interesting and useful to other people
    2:02
    worst case people can just learn how not
    2:04
    to do reinforcement learning for
    2:06
    quadruped robots from me
    2:08
    plus being in academia now taught me a
    2:11
    thing or two about failures in
    2:13
    scientific research having value on
    2:16
    their own
    2:17
    spoiler alert though it wasn't a
    2:20
    complete failure
    2:22
    first of all if you haven't watched
    2:24
    zendex videos do watch them he did a
    2:27
    great job explaining many of the basic
    2:30
    things that i won't be focusing on this
    2:32
    video
    2:33
    i was using nvidia isaac gem as a
    2:36
    simulation environment for the
    2:38
    experiments
    2:39
    it's fresh off the development bench and
    2:42
    in fact still isn't beta phase
    2:45
    but the fact that it fully utilizes
    2:49
    nvidia gpu capabilities for simulation
    2:52
    makes it possible to keep most of
    2:55
    elements of your training pipeline
    2:57
    namely the environment and the agents
    3:00
    and the products of their interaction in
    3:02
    gpu as tensors speds up training by a
    3:06
    lot
    3:07
    i tried open ai gym before and while it
    3:10
    was the thing that possibly inspired
    3:13
    nvidia team to create isaac gin now
    3:16
    isaac jim can be strongly recommended in
    3:19
    favor of open's open ai's gym
    3:23
    speed of training means a great deal
    3:26
    when testing different reward functions
    3:28
    verifying the correctness of your
    3:29
    environment and robot model and so on
    3:33
    it could very well be different between
    3:36
    success and never getting past the point
    3:39
    where your robot just flies across the
    3:41
    environment like a crazed chickadee
    3:46
    [Music]
    3:52
    [Music]
    4:02
    for my first try i adapted one of the
    4:05
    example algorithms nvidia shipped
    4:08
    with their first version of isaac jim
    4:11
    the ant walker to ottawa
    4:14
    it uses ppo algorithm which stands for
    4:18
    proximal policy optimization
    4:21
    an actor critic method
    4:23
    it is one of the most commonly used
    4:25
    baselines for new reinforcement learning
    4:27
    tasks and its variants have also been
    4:30
    used to train a robot hand to solve a
    4:33
    rubik's cube or win dota 2 against
    4:35
    professional players
    4:37
    so it's a good place to get started
    4:42
    experimenting with simpler robots also
    4:45
    allowed me to get the hang of creating
    4:48
    somewhat complex urdf robot descriptions
    4:52
    in phobus
    4:53
    a more or less vzweek editor
    4:56
    what you see is what you get
    4:58
    working as a blender plugin
    5:01
    it was a success and i was really happy
    5:04
    to see that virtual otta has learned the
    5:07
    walking gate resembling the walking gate
    5:10
    of a normal altar
    5:23
    after a slight nudge from an aussie
    5:25
    friend of mine i went to tackle a more
    5:29
    challenging task
    5:31
    teaching a quadruped robot how to walk
    5:34
    first in simulation and then ideally
    5:37
    utilizing scene to real to transfer the
    5:40
    learned knowledge to an actual physical
    5:42
    robot
    5:43
    creating your df for detail wasn't the
    5:46
    cakewalk
    5:47
    but after some try on error i was able
    5:49
    to create a urdf with 3d models reverse
    5:53
    engineered by a third-party developer
    5:56
    and i shared it on github for the people
    5:58
    to build on my work
    6:00
    i'm happy to see it was stared quite a
    6:03
    few times and used by other people
    6:06
    including sandix
    6:08
    reinforcement learning algorithm wise
    6:10
    the first thing i tried was adopting the
    6:13
    same ant walker approach
    6:16
    it did not work well or at all
    6:19
    what was different in detail apart from
    6:22
    inherently more complexity coming from
    6:25
    having more joints
    6:26
    is that it's d its default pose is not
    6:30
    stable
    6:31
    changing the initial position of joints
    6:33
    aka the starting pose however just
    6:36
    brought different but still not
    6:38
    satisfying gates like slight jumping on
    6:41
    the knees
    6:42
    walking still on the knees and a lot a
    6:45
    lot of fall among the things that i have
    6:50
    tried also was tweaking the reward
    6:52
    function to incentivize a pride movement
    6:55
    in specific direction and staying above
    6:58
    certain height
    7:00
    that just brought more jumping
    7:03
    in inside it seems the model was
    7:06
    hopelessly overfitting when the only
    7:09
    thing it was incentivized to learn was
    7:12
    movement in specific direction for
    7:14
    longest time possible without dying
    7:18
    of being reset in this case mostly from
    7:21
    falling or turning on its back
    7:23
    i mean in the end as often happens with
    7:27
    reinforcement learning algorithms it
    7:29
    wasn't wrong
    7:31
    perhaps jumping on its knees was the
    7:33
    best way to move in a specific direction
    7:36
    for longest time possible without
    7:38
    accidents
    7:39
    it's just not exactly what i wanted from
    7:42
    it
    7:44
    with the second version of isaac gym
    7:47
    nvidia released the code for quadrupid
    7:51
    walking for animal robots and by
    7:54
    comparing my old code with it i
    7:57
    immediately realized what was missing
    8:00
    piece of the puzzle here
    8:02
    instead of trying to formulate the
    8:04
    reward functions as just movement in
    8:06
    specific direction for longest time
    8:09
    possible without dying
    8:11
    in order to avoid overfitting they
    8:14
    formulated the reward function for
    8:16
    animal essentially as just difference
    8:19
    between random angular and linear
    8:21
    velocity commons and the actual angular
    8:25
    and linear velocities of robot was
    8:27
    moving with
    8:29
    after being given these comments
    8:32
    that would teach the quadruped how to
    8:35
    move in different directions and avoid
    8:38
    the pitfalls of previous approach
    8:40
    just because jumping like a wounded
    8:43
    cricket is no longer the best way to
    8:45
    maximize the reward function
    8:49
    so a more generalized gate needs to be
    8:52
    developed
    8:53
    by and held by the algorithm
    8:56
    animal code could not be used as a
    8:59
    drop-in replacement for beetle and i had
    9:01
    to make quite a few tweaks with respect
    9:04
    to initial joint position angular linear
    9:06
    velocities and reward function
    9:09
    however in the end it worked reasonably
    9:12
    well i wasn't able to get a perfect
    9:14
    walking gait but for that i suppose
    9:17
    more research is needed
    9:21
    now for some final thoughts when making
    9:24
    a urdf model for beetle i have already
    9:27
    contemplated how would it be possible to
    9:30
    transfer the trained algorithm to real
    9:33
    robots to bridge the gap between
    9:35
    simulation and reality
    9:38
    the code for animal while working for
    9:40
    robots in simulation takes many
    9:43
    observations that won't be accessible on
    9:45
    beetle
    9:46
    the only two sensors that are available
    9:49
    are accelerometer and gyroscope which
    9:51
    actually combined in a single mpu unit i
    9:55
    placed a virtual gyroscope and
    9:57
    accelerometer in the center of the board
    10:00
    when making beetle urdf and this is
    10:03
    where we can get rotation and speed
    10:06
    values from virtual accelerometer try
    10:09
    training algorithm that takes these plus
    10:12
    velocity comments and outputs angles or
    10:15
    torque for servers
    10:17
    the speed at which all of this needs to
    10:19
    be executed means that the neural
    10:21
    network very likely needs to be run on
    10:24
    the edge right at the beetle main board
    10:30
    the standard knight board will not be
    10:32
    sufficient since it only has atmega328p
    10:36
    chip so by board with esp32 needs to be
    10:40
    used
    10:41
    fortunately i got one lawyer beetle
    10:44
    equipped with by board that followed me
    10:48
    all the way to switzerland

    • @Hardwareai
      @Hardwareai  Год назад

      I do need to start adding subtitles :)

  • @allanboughen
    @allanboughen 2 года назад

    I hope bittle becomes a year long series.

    • @Hardwareai
      @Hardwareai  2 года назад

      It is likely to become a trilogy :)

  • @olalekanisola8763
    @olalekanisola8763 Год назад

    I quite like this video; it was very helpful.I downloaded your URDF and attempted to replicate Petoi Bittle in Isaac Gym, however several of the body parts started floating for no apparent reason. Petoi robot appears to be in nearly a standing position and moves strangely even when I modify its joints positions during training

    • @Hardwareai
      @Hardwareai  Год назад

      Hmmm. I did publish an example code, did you have a look? Does that work normally?

  • @levbereggelezo
    @levbereggelezo 2 года назад

    Very good

  • @allanboughen
    @allanboughen 2 года назад

    OMG, OMG, OMG! I'm one very exited Ozzy. If you & 100 followers all fail (partially succeed) then share you results in an easy to observe environment (ISAAC gym) you will have a powerful & successful team.

    • @Hardwareai
      @Hardwareai  2 года назад

      I was trying to pronounce "Aussie". Did I fail? xD

  • @nicobohlinger7077
    @nicobohlinger7077 8 месяцев назад

    Thanks for the great video. I was wondering where you got the PD gains from. Do we know for sure that Bittle uses a PD controller with those gains on the real robot? I didn't find any documentation on this

    • @Hardwareai
      @Hardwareai  8 месяцев назад +1

      Great question! No, unfortunately PD gains are incorrect - they were tweaked to make it work in simulation, as there are quite a few other things needed to make it work on a real robot. Which project are you working on?

  • @andresmendez6151
    @andresmendez6151 2 года назад

    Could you please edit the description to contain the git repo. At the moment, is says "WIP" but I am not sure what that means. Nice video by the way.

    • @Hardwareai
      @Hardwareai  2 года назад

      Oh-oh, I do need to upload the code then. I'll put a reminder for myself.

  • @ilhamakbar531
    @ilhamakbar531 11 месяцев назад

    hallo sir, do you have a code for your pondo as the example of the video ?

    • @Hardwareai
      @Hardwareai  10 месяцев назад

      What is the pondo? There is GH repository in the video description.

  • @maximeg3659
    @maximeg3659 Год назад

    strap an NRF24l01 on it, run the net on a big desktop GPU, add some random input delay in the gym to account for lag, i'm curious of the results.

    • @Hardwareai
      @Hardwareai  Год назад

      Why is NRF24l01 necessary? ESP32 on BiBoard can be wirelessly connected to PC and it already has accelerometer/gyro.

  • @abdurrahmanaliyu1512
    @abdurrahmanaliyu1512 Год назад

    Hi. Can you share the github link for the URDF?

    • @Hardwareai
      @Hardwareai  Год назад

      It's in the video description.

  • @pa-su6901
    @pa-su6901 2 года назад

    Thank you for giving useful video. As my case also trying to hard to making my mobile robot car. Could you please give me an example python code or helpful reference? I want to reduce my experiment time.

    • @Hardwareai
      @Hardwareai  2 года назад +1

      Right. As mentioned in another comment, I'm wrapped up at the moment, but I put a reminder to clean and upload the code!

  • @VishnuVardhan-vy1ve
    @VishnuVardhan-vy1ve 2 года назад

    Sir can we make a custom reinforcement learning environment with issasc gym

    • @Hardwareai
      @Hardwareai  2 года назад

      Of course. This is actually a point of Isaac Gym.

    • @VishnuVardhan-vy1ve
      @VishnuVardhan-vy1ve 2 года назад

      @@Hardwareai can you a video for that for us plz

  • @aixle3590
    @aixle3590 Год назад

    Hey lovely video, will you be sharing the code?

    • @aixle3590
      @aixle3590 Год назад

      I apologise, I found the github. Thank you for your hardwork

    • @aixle3590
      @aixle3590 Год назад

      How did you actually transferred the model into the BiBoard?

    • @Hardwareai
      @Hardwareai  Год назад

      That was not done yet.