I have always had a question about mlagents: they randomly select actions at the beginning of training. Can we incorporate human intervention into the training process of mlagents to make them train faster? Is there a corresponding method in mlagents? Looking forward to your answer.
Hello, how do I go about recording a demo file if I want to use imitation learning for a robot motion case? It's not easy to control the robot display from the keyboard, can I use the animation in unity to create demo files for the robot?🤔
My demo seems to have negligible affect in comparison to regular model training. I tried swapping the built in demo and model with my own and the only real difference I could see, which was huge, was when using the ExpertPush model compared to my own. Changing the demo didn't change much that I could tell. I would love some explanation on what exactly is recorded and how it is used by the agent. Is is purely an input record and which inputs? Are the rewards used or not? Can you use multiple demos at the same time? For example, a demo for boundaries, points of interest, goals and rewards, etc. I really did love this video and it helped me record my first demo. I know this is about a year old which seems like 10 years when looking at the development of Unity and ML Agents, but luckily not enough has changed that there should be any problem setting up your first demo. Thanks!
Unfortunately my experience with demonstrations has been similar. Not much better results than just regular training. I think maybe my understanding of how it works isn’t quite deep enough. A good place to ask these questions is the ML-Agents Forum: forum.unity.com/forums/ml-agents.453/ The engineers and researchers that built it are pretty active on there. -Adam
This tutorial is great! Have a question for you about an idea I have on extending this feature: I am working on developing a "player vs AI" game that that includes variation in the environments, NPC-count and (naturally) human play-style (since the human is the AI's opponent!). And I want the agents to adopt human-like behaviors. As such I am considering (mostly for fun and sake of experiment) to outsource player recording by running a publicly-accessible multiplayer server that instantiates instance of the "Area" prefab so that multiple players can play simultaneously on that single-instance server, and can technically record "24/7". The idea being, when a player joins, an "Area" prefab instantiates and starts recording his actions as he faces off against the "current generation" agent(s), or other players. Perhaps so many demonstrations would yield diminishing returns, and call for an iterative approach. Technically this sounds doable (assuming instantiating recorders doesn't break stuff), but I'd love to know what you think! The main hitch I see is, I'm not sure how to either 1. Combine many unique demo files, or 2. Train on multiple demo files. Might you have thoughts on that? Seems not something ml-agents natively supports, but perhaps there's workarounds... Perhaps an alternative idea is to just have the instance running in training mode, and activate (or instantiate) the agents when a new player drops in - perhaps more direct and reliable, but at cost of the agent learning to be more "perfect" and "botlike" rather than humanlike. What's might be your opinion on that?
Last I checked, the demo recorder was pretty space-intensive and saving a long stream to a file took a long time. The issue, I think, is that you need to record enough observations and behaviors in a clip to show the desired outcome. Otherwise it can’t correlate behaviors with desired outcomes. That aside, I’m not sure if it would work well or not. Maybe try a tiny project where you have a few hours of human play and try to train with that before you set up a multiplayer server.
Imitation Learning is based on observating behavior from records (like method in the video), this is used when the goal of AI isn't so clear or when you want a human-like behavior. Reinforcement Learning is a more traditional reward/punishment method when in the beginning AI does random stuff and gets a reword or a punishment (and obviously then it learns to efficiently get a reword and avoid punishment)
Hello Immersive Limit! I have a question for you. Watching this video you launch a training based on the .demo that you have recorded. But when you will launch the training (off-screen) the agent will still follow the script PushAgentBasic, where there is a reward function. So... How is this Imitation Learning for real? Does the agent ignores the reward function and just try to imitate you receiving reward on how much it is close to your recorded actions? Or maybe we should speak about Imitation Learning over Reinforcement Learning?
There’s a weighting applied to how much to learn from the reward function vs imitation. I think in this case, it relies only a little on the demo and more on the reward.
@@ImmersiveLimit Oh yes, I saw it in the gail_config.yaml file. You have the reward_signals section and here you have: - the "extrinsic" section, that is how much your training is based on the external reward (the reward function you have in your script). The more "strenght" you put (maximum 1.0) the more your training is based on your reward function. - the "gail" section, that is how much your training is based on the demo you provided. The more "streght" you put, the more your training is based on your demo (maximum 1.0). So if you combine like 0.5 extrinsic strength and 0.5 gail strength you will have a balanced training between your reward and your demo reward. I will try it now. github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md Here they say that gail strength should be set lower than extrinsic strength if your demonstrations are suboptimal (e.g. from a human), so that a trained agent will focus on receiving extrinsic rewards instead of exactly copying the demonstrations. I guess if gail strength is too high you will get an overfitted brain that copy in a perfect way your demo.
ok so i set up my gail config, but where do I go from there if I want to train my agent? I currently do not have a brain on my behaviour parameters script, and I do not know how to create a brain and therefore don't think that I can train.
@@goldengamer8510 Looks like they moved it into an 'imitation' folder. github.com/Unity-Technologies/ml-agents/blob/master/config/imitation/PushBlock.yaml
my python keeps crashing whenever i import the demo and try training from it. it's frustrating. I cant figure out what's causing it - the output messages in the command prompt mean nothing to me :(
@@ImmersiveLimit good shout. Think it's my code causing issues though because I'm able to make a demo and use it to train using the examples so im doing something wrong I've just not really had a whole lot of time to investigate recently. Itll be something stupid I'm sure 🙃
It is possible to debug the Python code with Visual Studio (though not as easy as C#). It’s probably a simple mismatch in versions or a small code change or something.
@@ImmersiveLimit I'll have to watch your video on debugging stuff, cos I wouldn't know where to start with that. All I'm trying to do is rotate a "bot" and shoot a target in a circle lol
Sure, two ways. We have a free tutorial that is a little out-dated (ruclips.net/video/axF_nHHchFQ/видео.html) that you can use. I'm working on updates, but it takes time. The other way is a Udemy course that we made that is up to date with the latest version: www.udemy.com/course/ai-flight/?couponCode=JANUARY2020
Exactly the "push" I needed to graduate to GAIL! Thanks!
Really great video! Is it possible to start training using this GAIL method, and then switch to a traditionnal RL approach (ie switch to PPO) ?
Yes, actually they’ve designed it to work that way. All you have to do is specify how long to use demonstrations in the config file.
@@ImmersiveLimit How exactly does one do this? Is it just a max_steps parameter inside of the gail section?
I haven’t looked in a while, but if you look in the examples, I think you should find an example doing GAIL.
I have always had a question about mlagents: they randomly select actions at the beginning of training. Can we incorporate human intervention into the training process of mlagents to make them train faster? Is there a corresponding method in mlagents? Looking forward to your answer.
Hello, how do I go about recording a demo file if I want to use imitation learning for a robot motion case? It's not easy to control the robot display from the keyboard, can I use the animation in unity to create demo files for the robot?🤔
Just what I was looking for, thank you!
Thanks José! We love hearing this. 😀
My demo seems to have negligible affect in comparison to regular model training. I tried swapping the built in demo and model with my own and the only real difference I could see, which was huge, was when using the ExpertPush model compared to my own. Changing the demo didn't change much that I could tell.
I would love some explanation on what exactly is recorded and how it is used by the agent. Is is purely an input record and which inputs? Are the rewards used or not? Can you use multiple demos at the same time? For example, a demo for boundaries, points of interest, goals and rewards, etc.
I really did love this video and it helped me record my first demo. I know this is about a year old which seems like 10 years when looking at the development of Unity and ML Agents, but luckily not enough has changed that there should be any problem setting up your first demo. Thanks!
Unfortunately my experience with demonstrations has been similar. Not much better results than just regular training. I think maybe my understanding of how it works isn’t quite deep enough. A good place to ask these questions is the ML-Agents Forum: forum.unity.com/forums/ml-agents.453/ The engineers and researchers that built it are pretty active on there. -Adam
This tutorial is great! Have a question for you about an idea I have on extending this feature:
I am working on developing a "player vs AI" game that that includes variation in the environments, NPC-count and (naturally) human play-style (since the human is the AI's opponent!). And I want the agents to adopt human-like behaviors. As such I am considering (mostly for fun and sake of experiment) to outsource player recording by running a publicly-accessible multiplayer server that instantiates instance of the "Area" prefab so that multiple players can play simultaneously on that single-instance server, and can technically record "24/7". The idea being, when a player joins, an "Area" prefab instantiates and starts recording his actions as he faces off against the "current generation" agent(s), or other players. Perhaps so many demonstrations would yield diminishing returns, and call for an iterative approach.
Technically this sounds doable (assuming instantiating recorders doesn't break stuff), but I'd love to know what you think! The main hitch I see is, I'm not sure how to either 1. Combine many unique demo files, or 2. Train on multiple demo files. Might you have thoughts on that? Seems not something ml-agents natively supports, but perhaps there's workarounds...
Perhaps an alternative idea is to just have the instance running in training mode, and activate (or instantiate) the agents when a new player drops in - perhaps more direct and reliable, but at cost of the agent learning to be more "perfect" and "botlike" rather than humanlike. What's might be your opinion on that?
Last I checked, the demo recorder was pretty space-intensive and saving a long stream to a file took a long time. The issue, I think, is that you need to record enough observations and behaviors in a clip to show the desired outcome. Otherwise it can’t correlate behaviors with desired outcomes. That aside, I’m not sure if it would work well or not. Maybe try a tiny project where you have a few hours of human play and try to train with that before you set up a multiplayer server.
The Official Imitation Learning Docs link in your video is crashed....
This ended a bit early, I was hoping you would show the training part as well
Thanks for the feedback. The training should be very similar to the other ML-Agents tutorials we’ve done. 🙂
@@ImmersiveLimit No worries, the tutorial itself was really good
Great Video.Thanks so much
I can't find the GAIL config file
Looks like they moved it into an 'imitation' folder. github.com/Unity-Technologies/ml-agents/blob/master/config/imitation/PushBlock.yaml
@@ImmersiveLimit So would I have to create a new .yaml file for my project?
Yes, or you can just add on to an existing one.
@@ImmersiveLimit Okay thanks
thanks for the great video!
will def give this a shot :)
Thank you! Very useful!
Hi. What is the difference between Imitation Learning and Inverse Reinforcement Learning?
Not sure what the second one is, so I don’t know 🤷🏻♂️
Imitation Learning is based on observating behavior from records (like method in the video), this is used when the goal of AI isn't so clear or when you want a human-like behavior. Reinforcement Learning is a more traditional reward/punishment method when in the beginning AI does random stuff and gets a reword or a punishment (and obviously then it learns to efficiently get a reword and avoid punishment)
Updateing the unity explorer: right click it -> refresh
Great video, thanks
Hello Immersive Limit! I have a question for you. Watching this video you launch a training based on the .demo that you have recorded. But when you will launch the training (off-screen) the agent will still follow the script PushAgentBasic, where there is a reward function. So... How is this Imitation Learning for real? Does the agent ignores the reward function and just try to imitate you receiving reward on how much it is close to your recorded actions? Or maybe we should speak about Imitation Learning over Reinforcement Learning?
There’s a weighting applied to how much to learn from the reward function vs imitation. I think in this case, it relies only a little on the demo and more on the reward.
@@ImmersiveLimit Oh yes, I saw it in the gail_config.yaml file. You have the reward_signals section and here you have:
- the "extrinsic" section, that is how much your training is based on the external reward (the reward function you have in your script). The more "strenght" you put (maximum 1.0) the more your training is based on your reward function.
- the "gail" section, that is how much your training is based on the demo you provided. The more "streght" you put, the more your training is based on your demo (maximum 1.0).
So if you combine like 0.5 extrinsic strength and 0.5 gail strength you will have a balanced training between your reward and your demo reward. I will try it now.
github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md
Here they say that gail strength should be set lower than extrinsic strength if your demonstrations are suboptimal (e.g. from a human), so that a trained agent will focus on receiving extrinsic rewards instead of exactly copying the demonstrations. I guess if gail strength is too high you will get an overfitted brain that copy in a perfect way your demo.
ok so i set up my gail config, but where do I go from there if I want to train my agent? I currently do not have a brain on my behaviour parameters script, and I do not know how to create a brain and therefore don't think that I can train.
Do you know how to use imitation learning outside Unity?
I haven’t explored that yet
You are awesome.
Thank you!
Do you know where I could find the GAIL config folder?
@@goldengamer8510 It depends where do u install the library ml agents. I have it here
C:\ML-Agents\config\gail_config.yaml
@@alessandro_picardi If i go to the config folder, there is no GAIL file
@@goldengamer8510 Looks like they moved it into an 'imitation' folder. github.com/Unity-Technologies/ml-agents/blob/master/config/imitation/PushBlock.yaml
7:30 Just right click and hit refresh (or Ctrl+R) in the unity assests window in the Project tab and it should show the newly created files/folders
Thank you!
very Informative video. Can you please elaborate the basic file structure, language used and deepnets in next video.
Thanks...
We'll be sure to keep posting more about ML-Agents this year!
Thanks :)
hm... when I start training using the demonstration, I got this error.
IndexError: agent_id 0 is not present in the BatchedStepResult.
That means your Behavior parameters are not correct. compare the parameters with the values in ur demo file in Unity
my python keeps crashing whenever i import the demo and try training from it. it's frustrating. I cant figure out what's causing it - the output messages in the command prompt mean nothing to me :(
I would recommend reaching out to the ML-Agents people on Github or on their official Unity ML-Agents support forum. They’re really helpful!
@@ImmersiveLimit good shout. Think it's my code causing issues though because I'm able to make a demo and use it to train using the examples so im doing something wrong I've just not really had a whole lot of time to investigate recently. Itll be something stupid I'm sure 🙃
It is possible to debug the Python code with Visual Studio (though not as easy as C#). It’s probably a simple mismatch in versions or a small code change or something.
@@ImmersiveLimit I'll have to watch your video on debugging stuff, cos I wouldn't know where to start with that. All I'm trying to do is rotate a "bot" and shoot a target in a circle lol
@@ImmersiveLimit turns out than in my heuristic i was creating a action array 1 larger than my actions and that was making it crash...
anyway to find out how to actually use these neural networks to train your own game.
I can't seem it figure it out = (
Sure, two ways. We have a free tutorial that is a little out-dated (ruclips.net/video/axF_nHHchFQ/видео.html) that you can use. I'm working on updates, but it takes time. The other way is a Udemy course that we made that is up to date with the latest version: www.udemy.com/course/ai-flight/?couponCode=JANUARY2020