AI Learns to Play The Floor is Lava (Deep Reinforcement Learning in Blender)

Moby Motion

Просмотров 3,5 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 окт 2024

Комментарии • 23

@croup007 11 месяцев назад ⁺¹
Geoffrey is all of us. Clocking in day and night; smiling with existential crisis, just to chase the all-important cube, in exchange for what? What is it all for? What is the cube? What's your cube? Who am I? I look in the mirror, and all I see is Geoffrey...
@MobyMotion 11 месяцев назад ⁺³
what else should I teach Geoffrey? :D
@AzaleaAnimations26 11 месяцев назад
maybe a racetrack with obstacles?
@aram_a 11 месяцев назад ⁺¹
Super Mario style jumping and hitting mushrooms?
@MorkiMoo 11 месяцев назад ⁺¹
Maybe you could make him a friend and have them play tag in a big maze?
@fireballferret8146 11 месяцев назад ⁺¹
Fuel/energy expenditure.
Getting there as fast as possible is one thing, getting there as fuel efficiently as possible is another.
@GinnyGlider 11 месяцев назад ⁺¹
(In the vid) It often tends to roll backwards, so teach it to turn instead.
Oh, and perhaps (a bit challenging) have it build domino setups!
@GinnyGlider 11 месяцев назад ⁺¹
At first thought "Huh, that last hop is defining gravity" Then I dropped it when you *said
"Oh, you learning to fly................. That will come in handy!" It's like it's trying
really hard to use its mind power to lift itself in the air xD.
Find it interesting for uses like unique secondary animations sort of stuff.
Edit: Btw is that Rendered in Eevee next or Cycles? (Looks pretty good if E).
@MatijaIlievski 11 месяцев назад ⁺¹
at the beginning i had no idea he could fly XDDD
@hannes7218 2 месяца назад
This is amazing! Do you have any tutorials about how you implement RL in Blender? I am currently working on a Sensor Simulation Tool in Blender and it would be great to do some RL experiments as well :)
@MobyMotion 2 месяца назад ⁺¹
Hey! I used a very slightly modified version of Blend Torch that allowed multiple environments to train at the same time: github.com/MohamadZeina/pytorch-blender
For RL, I used Stable Baselines 3. Reach out to my email (in channel bio) if you have more questions
@hannes7218 2 месяца назад
@@MobyMotion Thank you very much. I will definitely check this out :)
@陈奕纯-u9c 11 месяцев назад ⁺¹
It looks really cool! I'm working on DRL too recently. Could you share what's the agent's state, reward, action and the structure of neural network?
@MobyMotion 11 месяцев назад
Thanks :)
State: 3D position of the target relative to the agent, 3D rotation of the agent, 3D position of agent, sideways position of the moving furniture (when it’s there). This is stacked, with the number of frames being optimised through a Bayesian search. I think best was 4 or 8 frames. Frames are skipped, again this is searched for, I think best was 4 or 8 again.
Reward: positive for velocity towards target, negative for velocity towards, plus a large reward for touching it, that’s spread out over 10 frames or so, negative reward for being close to the lava (though could have been stronger)
Actions: 5 continuous actions, 3 for angular momentum of the wheel, 2 for a forcefields (up / down - and / back)
Neural architecture is searched for - I think best was 1024 neurons x 3 layers, although in other projects you need much smaller
Let me know if you need anything else ;)
@陈奕纯-u9c 11 месяцев назад
@@MobyMotion Thanks for your detailed info! I have a few more quetions:
Seems like dimension of the state is not fixed? The first and second level of your video don't have any furniture and the number of furniture could change. How do you modify your state tensor to fix the dimension of it? If I understand correctly, your state is those geometry parameters of 4 frames? Why velocity and angle velocity are not needed in state? Do previous action have no influence?
What's the frequency of giving action? Does the agent take action every frame or keep one action lasting for some number of frames?
Was the 1024x3 structure tried out yourself or did you follow other people's work? Trying out a neural network's structure that work for a new task is really time consuming.
@MobyMotion 11 месяцев назад
Most runs were trained from scratch, so the input dimension was different to match the number of observations. I didn’t think to include velocity, my thinking was that having a few frames over time provided this same velocity information implicitly. Velocity would probably have been better tbh.
1 action lasts multiple frames - this is what I meant by frame skipping. 1 action is applied for multiple frames until the next decision step.
I found the 1024x3 through a Bayesian optimisation search, but if your compute is limited the default (256x2) is probably enough for most environment. If it fails to learns your problems is usually a bug or problem with observations / reward structure, and not the architecture.
If you tell me about your scene I can try to give my thoughts on what I would try.
@陈奕纯-u9c 11 месяцев назад ⁺¹
@@MobyMotion Thanks for your kind explanation. So you trained different neural networks which have different input dimension for your different levels? That cost lots of time. What do you mean by problem with observations / reward structure, and not the architecture?
I'm recently writing a paper using DRL method in fluid dynamics and the task is controlling a 2d model flexible jellyfish to chase target. Its a new task that no previous work tried, so I struggled with the choice of the structure of my network. I will search the Bayesian optimisation search to find what I can learn. Again thanks for the info.
@MobyMotion 11 месяцев назад
Yeah that’s right - most levels were trained from scratch. It costs a lot of time, but each run only trained for about 12-24 hours until it converged. It was important for me to parallelise it though (ie gather experiences from multiple agents at once) to speed up training.
I mean if your agent isn’t learning, the problem is unlikely to be the number of neurons, it’s more likely to be a poor choice of observations or rewards. That jellyfish problem sounds cool - what software are you using for the environment? I’m also a deep learning researcher, feel free to send me an email if you’d like to talk in more depth (email is on my channel).
@siraaron4462 11 месяцев назад
I would have introduced more "levels" before jumping to the big one (moving targets, obstacles, reward for touching safe ground etc.)
@MobyMotion 11 месяцев назад ⁺¹
Thanks for the feedback :) there were lots of intermediate tests, but when you’re running them it’s hard to tell what’s worth including. Next video will definitely have more levels though because I’m trying to get an agent to learn something much harder
@aram_a 11 месяцев назад
This is so cool!!!!
@croup007 11 месяцев назад
Man, this rocket league update is wild :O

Следующие

Автовоспроизведение

AI Learns to Play Tag (and breaks the game)