What do you mean it doesnt have enough energy to go to the flag. No mayter how much back and forth motion it does, the energy requirement to move up to the flag is still the same.
I don't feel like I understand the principle from your video- what is the purpose of partitioning the state into tiles? How and when are they assigned a Q value and when is it modified? Are the Q values just zero during the first epoch? Does this work for larger state spaces? Does the agent really learn anything substantial from a replay of a 40k steps Epoch?
I partition the state into tiles to make a function that relates states with q-values. Think of it this way: I need a relationship between states and future returns. There is no obvious function I can think of to do the job. Instead, I break state space into squares (partitions) and assign that square a random q-value. This is initialization. As the algorithm learns, the q-value will be more representative of the actual q-value. This method doesn't work for larger state spaces. At that point, you would want to use a neural network. For this specific reinforcement learning problem, 40k steps can be helpful in the beginning for exploration. If your algorithm is taking 40k steps after a few thousand epoch, that's the sign your parameterization may be incorrect. Hope this helped!
Honestly, I have a bunch of code stored on my computer for various projects. I need to organize the code and upload them. Eventually, I will upload code.
What do you mean it doesnt have enough energy to go to the flag. No mayter how much back and forth motion it does, the energy requirement to move up to the flag is still the same.
I don't feel like I understand the principle from your video- what is the purpose of partitioning the state into tiles? How and when are they assigned a Q value and when is it modified? Are the Q values just zero during the first epoch? Does this work for larger state spaces? Does the agent really learn anything substantial from a replay of a 40k steps Epoch?
I partition the state into tiles to make a function that relates states with q-values. Think of it this way: I need a relationship between states and future returns. There is no obvious function I can think of to do the job. Instead, I break state space into squares (partitions) and assign that square a random q-value. This is initialization. As the algorithm learns, the q-value will be more representative of the actual q-value. This method doesn't work for larger state spaces. At that point, you would want to use a neural network. For this specific reinforcement learning problem, 40k steps can be helpful in the beginning for exploration. If your algorithm is taking 40k steps after a few thousand epoch, that's the sign your parameterization may be incorrect. Hope this helped!
This is very cool progress do u have a code repo for your learning?
Honestly, I have a bunch of code stored on my computer for various projects. I need to organize the code and upload them. Eventually, I will upload code.