Just bought my very first Udemy course! Thank you for your work. Not that I needed a tutorial about ML agents, but more as a thank you for your YT videos and also to learn about game logic, the course seems like a very nice skeleton for my future ML experiments (racing etc). Just wanted to ask, in that course, do agents only learn on the same race track? If yes, do you plan to make a YT video or update the Udemy course to show how to generate race tracks on the fly, so that the agent learns on random tracks to help it generalize better and not overfit on the same track? (so new track every agent.done()) Thanks again.
Thank you so much! In the course, I only teach how to train on one racetrack, but we train 4 racetracks simultaneously, so it would be easy to expand up to at least 4 static tracks. I’ll have to consider how I would create random tracks. It seems challenging to make a whole track procedurally, but maybe segments could be effective too. Thanks for the suggestion and support.
Great video ! The update on the new version 14 was great ! Have you tried playing around with the decision period of an agent ? Does a smaller decision period induces a longer training ? I found out that setting a decision period to 1 on the PushBlock example made the algo not learn, that's why I ask this question. Keep it up !
Thank you! I honestly haven’t experimented with it. I bet it depends on the task. I think I was reading about the DeepMind Capture The Flag research and saw that they used a decision period of 4 or 5, so I figured that was a good number to use and never questioned it.
It depends on the task, but mainly you wanna avoid faster decision making cause it won't give enough time for the reward algorithm to evaluate that decision properly Which results in a chaotic behaviors. That's what learned after a lot of experimentation.
The curiosity maybe future of learning!! since its algorithm can help to learn another task in future training! so the model moves another environment to learns different task with existing algorithm without creating one from zero
It's good to know. Thanks! BTW, are you planning to make a course or video on more advanced usage of ml-agents? I would love it. Something that uses full vision input (meaning not just rays) with CNN and RL. I wanted to participate in Obstacle Tower Challenge but it was a bit too complex to start with. If there was a course I would start from that. Thanks again!
perhaps the planes were doing the same thing as I did - crash on purpose so you don't have to fly back to get the checkpoint. It's faster to respawn and then your are right in front of it (unless that wasn't the case for the agents)
During training they start over with zero reward when they crash, so that strategy wouldn’t work. It does work in race mode though, but they wouldn’t know that, so I supposed that gives you an advantage!
Soo, what is the explanation answer. This seems to suggest that we should Not using Curiosity? Is that really the advice. Or is the advice that we need to be careful to reduce the strength of Intrinsic rewards. A better strategy would possibly be to disable Curiosity networks and the intrinsic rewards, when the external rewards are consistently good. Without clear guidance on the hyper parameter tuning we are all just thrashing around in the dark, and not developing good experience.
I think the takeaway is just to not assume that curiosity (or any other technique really) will always work well. I always encourage experimentation because I definitely don’t know the best way for your unique case. I probably don’t even know the best way for my own projects. 🙂
I've been working with your Penguin agent and I definitely learned a huge amount from that. Thanks for sharing.
You’re welcome!
Just bought my very first Udemy course! Thank you for your work. Not that I needed a tutorial about ML agents, but more as a thank you for your YT videos and also to learn about game logic, the course seems like a very nice skeleton for my future ML experiments (racing etc). Just wanted to ask, in that course, do agents only learn on the same race track? If yes, do you plan to make a YT video or update the Udemy course to show how to generate race tracks on the fly, so that the agent learns on random tracks to help it generalize better and not overfit on the same track? (so new track every agent.done()) Thanks again.
Thank you so much! In the course, I only teach how to train on one racetrack, but we train 4 racetracks simultaneously, so it would be easy to expand up to at least 4 static tracks. I’ll have to consider how I would create random tracks. It seems challenging to make a whole track procedurally, but maybe segments could be effective too. Thanks for the suggestion and support.
Maybe not generate the whole track but reshuffle existing elements, making it unique
Great video ! The update on the new version 14 was great ! Have you tried playing around with the decision period of an agent ? Does a smaller decision period induces a longer training ? I found out that setting a decision period to 1 on the PushBlock example made the algo not learn, that's why I ask this question. Keep it up !
Thank you! I honestly haven’t experimented with it. I bet it depends on the task. I think I was reading about the DeepMind Capture The Flag research and saw that they used a decision period of 4 or 5, so I figured that was a good number to use and never questioned it.
Immersive Limit All right thank you !
It depends on the task, but mainly you wanna avoid faster decision making cause it won't give enough time for the reward algorithm to evaluate that decision properly Which results in a chaotic behaviors.
That's what learned after a lot of experimentation.
The curiosity maybe future of learning!!
since its algorithm can help to learn another task in future training! so the model moves another environment to learns different task with existing algorithm without creating one from zero
thanks for your explanation!
It's good to know. Thanks!
BTW, are you planning to make a course or video on more advanced usage of ml-agents? I would love it.
Something that uses full vision input (meaning not just rays) with CNN and RL. I wanted to participate in Obstacle Tower Challenge but it was a bit too complex to start with. If there was a course I would start from that. Thanks again!
I’ve got some stuff that I’m working on, may take a while before it is ready. 🙂
9:00 - you can turn on th gizmos in Game view too
Why doesn't my Tensor Board have a "Curiousity" chart? Also, what is the difference between curiosity and the Beta parameter?
I think it does if you expand one of the collapsed sections. I’m not sure on the second one, the docs would probably be helpful though. 🙂
perhaps the planes were doing the same thing as I did - crash on purpose so you don't have to fly back to get the checkpoint. It's faster to respawn and then your are right in front of it (unless that wasn't the case for the agents)
During training they start over with zero reward when they crash, so that strategy wouldn’t work. It does work in race mode though, but they wouldn’t know that, so I supposed that gives you an advantage!
@@ImmersiveLimit I need that advantage lol
Soo, what is the explanation answer. This seems to suggest that we should Not using Curiosity? Is that really the advice. Or is the advice that we need to be careful to reduce the strength of Intrinsic rewards. A better strategy would possibly be to disable Curiosity networks and the intrinsic rewards, when the external rewards are consistently good.
Without clear guidance on the hyper parameter tuning we are all just thrashing around in the dark, and not developing good experience.
I think the takeaway is just to not assume that curiosity (or any other technique really) will always work well. I always encourage experimentation because I definitely don’t know the best way for your unique case. I probably don’t even know the best way for my own projects. 🙂
hmm Interesting