Idea - Could you do a reinforcement learning where you learn MNIST numbers. A random path is optimized for best fit in viewing the pixels in each sample. Like moving a camera or moving an MNIST number in the camera view. Move the number so a simple model gets the best recognition. Maybe a spring constant function. One the resets its value but gets the right value when there is input to be processed.
I'm really having a bad time understanding how the DRL model needs to be trained in regards to the Bellman equation. Simple Q-Learning is easy to comprehend with the q -table and continues update to the q-values per actions. But in Deep RL i can't seem to grasp how the model needs to be updated. My first thoughts was using DRL like a normal supervized learning problem. first creating a q-table from using Q-Learning and than using a deep model on that data and so trying to train this model for other data that the Q-Learning has not seen yet. But obviously this is not the case. Do you have any recommendations on what to read for me to understand more slowly what is supposed to happen regarding the model update phase of the training?
Next time may be you should create a model with Subclassing API which allows you to create any kinds of custom losses, metrics, etc instead of using a poor sequential model wrapped in one function lol. And for such models have you tried to use Batch Normalization or selu activation with lecun kernel norm to get much faster convergence?
Unsure if your comment is regarding the old Keras API or the new post tensorflow 2 API. TF2 makes this a breeze (and I will continue to use sequential models as they continue to be useful).
Really useful and I'm also struggling with DQN for solving mazes. Currently, I'm not using a target network since a lot of tutorials don't contain a target network as well. Does that really matter and will try to add it now. Thanks a lot
loved it. Keep up the work. It was amazing.
you are awesome dude
Idea - Could you do a reinforcement learning where you learn MNIST numbers. A random path is optimized for best fit in viewing the pixels in each sample. Like moving a camera or moving an MNIST number in the camera view. Move the number so a simple model gets the best recognition. Maybe a spring constant function. One the resets its value but gets the right value when there is input to be processed.
That's a good idea, though I don't think it needs reinforcement learning. spatial transformer networks already work on this principle I believe
I'm really having a bad time understanding how the DRL model needs to be trained in regards to the Bellman equation. Simple Q-Learning is easy to comprehend with the q -table and continues update to the q-values per actions. But in Deep RL i can't seem to grasp how the model needs to be updated. My first thoughts was using DRL like a normal supervized learning problem. first creating a q-table from using Q-Learning and than using a deep model on that data and so trying to train this model for other data that the Q-Learning has not seen yet. But obviously this is not the case. Do you have any recommendations on what to read for me to understand more slowly what is supposed to happen regarding the model update phase of the training?
Next time may be you should create a model with Subclassing API which allows you to create any kinds of custom losses, metrics, etc instead of using a poor sequential model wrapped in one function lol. And for such models have you tried to use Batch Normalization or selu activation with lecun kernel norm to get much faster convergence?
Unsure if your comment is regarding the old Keras API or the new post tensorflow 2 API. TF2 makes this a breeze (and I will continue to use sequential models as they continue to be useful).
Really useful and I'm also struggling with DQN for solving mazes. Currently, I'm not using a target network since a lot of tutorials don't contain a target network as well. Does that really matter and will try to add it now. Thanks a lot
It matters a great deal. Check out my more recent streams where I do dqn from scratch in pytorch
@@JackofSome Thanks a lot, will check it out :)
i was wondering if you completed the robot docking problem , I like robotics a lot
Unfortunately no. I haven't had enough time to continue my RL work sadly.
@@JackofSome thanks a lot for your reply, I will give it a try, is it okay if ask you for some suggestion if I get stuck
Yeah that's fine
what does this line do?
trainable_model.compile(optimizer=Adam(), loss=lambda yt, yp: yp)
what is lambda yt
and what is yp?
lambda is the keyword in python to define a function inline. I recommend reading up on it. It's pretty useful
@@JackofSome thanks!
Where is livestream number 2 i cant find it and i NEED IT
ruclips.net/video/_7D8W-uUSxw/видео.html
I made it unlisted because of the unmitigated disaster it was. Trust me you don't need it 😅
Why this is [DEPRECATED]? Is there a new method of Maze Solving with Deep RL?