The agent tries to predict the remaining racing time if it executes action A, B or C. The impact of wiggling on overall racing time is probably so small that the agent is unable to differentiate the actions, as long as they accelerate. There is no reward for the agent to press as few buttons as possible, or for it to keep the same action several frames in a row.
For future reference, in this video the AI was not yet allowed to accelerate and brake at the same time, yet got a very decent time!
These self driving cars are getting out of hand
can you make a tutorial
why is it wiggling on straights, does that actually help?
The agent tries to predict the remaining racing time if it executes action A, B or C.
The impact of wiggling on overall racing time is probably so small that the agent is unable to differentiate the actions, as long as they accelerate.
There is no reward for the agent to press as few buttons as possible, or for it to keep the same action several frames in a row.
@@linesight-rl this also can be caused by stochastic policy, on evaluation it's better to disable all randomness
@@linesight-rl cool project anyway!
@@howuhh8960 In this case, there is no stochastic policy. The reinforcement learning algorithm is value-based.
@@linesight-rl very cool, keep going!
alas, still no "pin of shame" How will people know that AI is Yahweh?