Geoffrey is all of us. Clocking in day and night; smiling with existential crisis, just to chase the all-important cube, in exchange for what? What is it all for? What is the cube? What's your cube? Who am I? I look in the mirror, and all I see is Geoffrey...
At first thought "Huh, that last hop is defining gravity" Then I dropped it when you *said "Oh, you learning to fly................. That will come in handy!" It's like it's trying really hard to use its mind power to lift itself in the air xD. Find it interesting for uses like unique secondary animations sort of stuff. Edit: Btw is that Rendered in Eevee next or Cycles? (Looks pretty good if E).
This is amazing! Do you have any tutorials about how you implement RL in Blender? I am currently working on a Sensor Simulation Tool in Blender and it would be great to do some RL experiments as well :)
Hey! I used a very slightly modified version of Blend Torch that allowed multiple environments to train at the same time: github.com/MohamadZeina/pytorch-blender For RL, I used Stable Baselines 3. Reach out to my email (in channel bio) if you have more questions
Thanks :) State: 3D position of the target relative to the agent, 3D rotation of the agent, 3D position of agent, sideways position of the moving furniture (when it’s there). This is stacked, with the number of frames being optimised through a Bayesian search. I think best was 4 or 8 frames. Frames are skipped, again this is searched for, I think best was 4 or 8 again. Reward: positive for velocity towards target, negative for velocity towards, plus a large reward for touching it, that’s spread out over 10 frames or so, negative reward for being close to the lava (though could have been stronger) Actions: 5 continuous actions, 3 for angular momentum of the wheel, 2 for a forcefields (up / down - and / back) Neural architecture is searched for - I think best was 1024 neurons x 3 layers, although in other projects you need much smaller Let me know if you need anything else ;)
@@MobyMotion Thanks for your detailed info! I have a few more quetions: Seems like dimension of the state is not fixed? The first and second level of your video don't have any furniture and the number of furniture could change. How do you modify your state tensor to fix the dimension of it? If I understand correctly, your state is those geometry parameters of 4 frames? Why velocity and angle velocity are not needed in state? Do previous action have no influence? What's the frequency of giving action? Does the agent take action every frame or keep one action lasting for some number of frames? Was the 1024x3 structure tried out yourself or did you follow other people's work? Trying out a neural network's structure that work for a new task is really time consuming.
Most runs were trained from scratch, so the input dimension was different to match the number of observations. I didn’t think to include velocity, my thinking was that having a few frames over time provided this same velocity information implicitly. Velocity would probably have been better tbh. 1 action lasts multiple frames - this is what I meant by frame skipping. 1 action is applied for multiple frames until the next decision step. I found the 1024x3 through a Bayesian optimisation search, but if your compute is limited the default (256x2) is probably enough for most environment. If it fails to learns your problems is usually a bug or problem with observations / reward structure, and not the architecture. If you tell me about your scene I can try to give my thoughts on what I would try.
@@MobyMotion Thanks for your kind explanation. So you trained different neural networks which have different input dimension for your different levels? That cost lots of time. What do you mean by problem with observations / reward structure, and not the architecture? I'm recently writing a paper using DRL method in fluid dynamics and the task is controlling a 2d model flexible jellyfish to chase target. Its a new task that no previous work tried, so I struggled with the choice of the structure of my network. I will search the Bayesian optimisation search to find what I can learn. Again thanks for the info.
Yeah that’s right - most levels were trained from scratch. It costs a lot of time, but each run only trained for about 12-24 hours until it converged. It was important for me to parallelise it though (ie gather experiences from multiple agents at once) to speed up training. I mean if your agent isn’t learning, the problem is unlikely to be the number of neurons, it’s more likely to be a poor choice of observations or rewards. That jellyfish problem sounds cool - what software are you using for the environment? I’m also a deep learning researcher, feel free to send me an email if you’d like to talk in more depth (email is on my channel).
Thanks for the feedback :) there were lots of intermediate tests, but when you’re running them it’s hard to tell what’s worth including. Next video will definitely have more levels though because I’m trying to get an agent to learn something much harder
Geoffrey is all of us. Clocking in day and night; smiling with existential crisis, just to chase the all-important cube, in exchange for what? What is it all for? What is the cube? What's your cube? Who am I? I look in the mirror, and all I see is Geoffrey...
what else should I teach Geoffrey? :D
maybe a racetrack with obstacles?
Super Mario style jumping and hitting mushrooms?
Maybe you could make him a friend and have them play tag in a big maze?
Fuel/energy expenditure.
Getting there as fast as possible is one thing, getting there as fuel efficiently as possible is another.
(In the vid) It often tends to roll backwards, so teach it to turn instead.
Oh, and perhaps (a bit challenging) have it build domino setups!
At first thought "Huh, that last hop is defining gravity" Then I dropped it when you *said
"Oh, you learning to fly................. That will come in handy!" It's like it's trying
really hard to use its mind power to lift itself in the air xD.
Find it interesting for uses like unique secondary animations sort of stuff.
Edit: Btw is that Rendered in Eevee next or Cycles? (Looks pretty good if E).
at the beginning i had no idea he could fly XDDD
This is amazing! Do you have any tutorials about how you implement RL in Blender? I am currently working on a Sensor Simulation Tool in Blender and it would be great to do some RL experiments as well :)
Hey! I used a very slightly modified version of Blend Torch that allowed multiple environments to train at the same time: github.com/MohamadZeina/pytorch-blender
For RL, I used Stable Baselines 3. Reach out to my email (in channel bio) if you have more questions
@@MobyMotion Thank you very much. I will definitely check this out :)
It looks really cool! I'm working on DRL too recently. Could you share what's the agent's state, reward, action and the structure of neural network?
Thanks :)
State: 3D position of the target relative to the agent, 3D rotation of the agent, 3D position of agent, sideways position of the moving furniture (when it’s there). This is stacked, with the number of frames being optimised through a Bayesian search. I think best was 4 or 8 frames. Frames are skipped, again this is searched for, I think best was 4 or 8 again.
Reward: positive for velocity towards target, negative for velocity towards, plus a large reward for touching it, that’s spread out over 10 frames or so, negative reward for being close to the lava (though could have been stronger)
Actions: 5 continuous actions, 3 for angular momentum of the wheel, 2 for a forcefields (up / down - and / back)
Neural architecture is searched for - I think best was 1024 neurons x 3 layers, although in other projects you need much smaller
Let me know if you need anything else ;)
@@MobyMotion Thanks for your detailed info! I have a few more quetions:
Seems like dimension of the state is not fixed? The first and second level of your video don't have any furniture and the number of furniture could change. How do you modify your state tensor to fix the dimension of it? If I understand correctly, your state is those geometry parameters of 4 frames? Why velocity and angle velocity are not needed in state? Do previous action have no influence?
What's the frequency of giving action? Does the agent take action every frame or keep one action lasting for some number of frames?
Was the 1024x3 structure tried out yourself or did you follow other people's work? Trying out a neural network's structure that work for a new task is really time consuming.
Most runs were trained from scratch, so the input dimension was different to match the number of observations. I didn’t think to include velocity, my thinking was that having a few frames over time provided this same velocity information implicitly. Velocity would probably have been better tbh.
1 action lasts multiple frames - this is what I meant by frame skipping. 1 action is applied for multiple frames until the next decision step.
I found the 1024x3 through a Bayesian optimisation search, but if your compute is limited the default (256x2) is probably enough for most environment. If it fails to learns your problems is usually a bug or problem with observations / reward structure, and not the architecture.
If you tell me about your scene I can try to give my thoughts on what I would try.
@@MobyMotion Thanks for your kind explanation. So you trained different neural networks which have different input dimension for your different levels? That cost lots of time. What do you mean by problem with observations / reward structure, and not the architecture?
I'm recently writing a paper using DRL method in fluid dynamics and the task is controlling a 2d model flexible jellyfish to chase target. Its a new task that no previous work tried, so I struggled with the choice of the structure of my network. I will search the Bayesian optimisation search to find what I can learn. Again thanks for the info.
Yeah that’s right - most levels were trained from scratch. It costs a lot of time, but each run only trained for about 12-24 hours until it converged. It was important for me to parallelise it though (ie gather experiences from multiple agents at once) to speed up training.
I mean if your agent isn’t learning, the problem is unlikely to be the number of neurons, it’s more likely to be a poor choice of observations or rewards. That jellyfish problem sounds cool - what software are you using for the environment? I’m also a deep learning researcher, feel free to send me an email if you’d like to talk in more depth (email is on my channel).
I would have introduced more "levels" before jumping to the big one (moving targets, obstacles, reward for touching safe ground etc.)
Thanks for the feedback :) there were lots of intermediate tests, but when you’re running them it’s hard to tell what’s worth including. Next video will definitely have more levels though because I’m trying to get an agent to learn something much harder
This is so cool!!!!
Man, this rocket league update is wild :O