Deep Q-Learning - Combining Neural Networks and Reinforcement Learning

deeplizard

Просмотров 140 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 окт 2024

Комментарии • 91

@deeplizard 6 лет назад ⁺²¹
Check out the corresponding blog and other resources for this video at:
deeplizard.com/learn/video/wrBUkpiRvCA
@deeplizard 6 лет назад ⁺²
Thanks!
@conorheffernan4134 3 года назад ⁺²¹
Genuinely thought this was going to be a massive leap from Q-Learning. Really just the glueing together of two concepts. Thank god for that!
@zakariaabderrahmanesadelao3048 4 года назад ⁺⁹
been struggling to understand DQN until now and thanks to your clear explanation. you're a gift from the ML gods
@donrosenthal5864 6 лет назад ⁺²⁶
in a different series, you had one of the best explanations of backprop I've seen. This is turning out to be one of the best I've seen for Reinforcement Learning. I'm understanding (at least at a high level) how deep neural nets can be used in environments where you can enumerate the states -- and I am hoping that later in this series you will show us techniques for environments (like the examples of OpenAI in Dota) where, because of the number of players, and the complex environment, the number of possible states is near infinite. Keep up the great work! We really appreciate these courses!
@deeplizard 6 лет назад
Hey Don - Thank you! I'm really glad to hear that you've liked and learned from both the Deep Learning Fundamentals series as well as this new Reinforcement Learning series. Thank you for commenting and letting me know!
@SandwichMitGurke 5 лет назад ⁺¹
this is so true! I found this series today and I have been searching for tutorials for a long time now. It's a bummer that it's hard for people to find this channel. I have the feeling that I will watch most of this channels videos in the next days :D
@rafatrench 5 лет назад ⁺²⁷
Im honestly surprised this doesnt have nay more views, thats the best free resource I have seen for people just getting started with reinforcement learning!
@deeplizard 5 лет назад ⁺³
Thank you, Archer! Glad you're enjoying it!
@neonlight1203 4 года назад
Exactly!
@ArmanAli-ww7ml 2 года назад ⁺⁴
I first studied about supervised machine learning and found it pretty simple. But when i moved to reinforcement learning, it has states actions rewards and i was just comparing it with the machine learning to understand it but even after reading a lot i could not understand, here you have just explained the concept simply that have greatly cleared my confusions. Thanks
@leicesteryu 3 года назад ⁺²
Truly amazing! I have fully knowledge about Q-learning, but feel very difficult to teach my students until I found your videos. Thank you for sharing such a high quality materials
@aidanmclaughlin5279 2 года назад ⁺¹
Dear RUclips algorithm,
recommend more from this channel
@origamigirl11RK 5 лет назад ⁺¹
I don't know who you are but you're amazing. I'm a graduate student trying to build a program to segment MRI images and your videos have been so incredible helpful. I'm going to join your patreon as soon as I can get the account set up but I just wanted to say thank you. I have a background in electrical engineering and mathematics and your videos have been a perfect combination of humor, info, and examples. Thank you so much
@deeplizard 5 лет назад ⁺¹
You're so welcome! I'm very happy to hear that the content has been helpful for you. (And also that you pick up on the humor lol!)
@I3ornfly 4 года назад ⁺²
This is a very great thing. It would be even greater if you guys make a course covering most popular papers of recent years (2017+) in deep learning, because, with your explanation, everything seems just so easy!!! I bet, If you do that, then all the followers of Andrew Ng courses will come to you! (cause courses by Andrew Ng now outdated)
@hello3141 6 лет назад ⁺¹
Nice job on motivating the setup for DeepQ-RL and congrats on your live test. The chat feature adds a nice sense of community. Was great hearing "Space Walk" during the countdown!
@deeplizard 6 лет назад
Thanks, Bruce! I thought the new premiere feature was cool and liked using the live chat as well. We'll definitely make use of it again! Thanks for attending!
@mateusbalotin7247 3 года назад ⁺¹
Thank you for the video!
@satarupapanda7774 5 лет назад ⁺¹
Thanq team deeplizard. excellent series ... most of my fundamentals got cleared... thank you so much. hope to see more in future..
@bqumba1 5 лет назад ⁺¹¹
I liked your explanation, but there is one thing that is bugging me (possibly a stupid question):
How are you going to minimize the loss of your network if you don't have the target Q values for a given problem?
I assumed the purpose of using ANN was to approximate the Q function value for a given state-action pair, but then again in order to train the network you will need those true values as your target output.
@deeplizard 5 лет назад ⁺⁷
The loss from the network is calculated by comparing the outputted Q-values to the target Q-values from the right hand side of the Bellman equation. I expand on this more in the following videos. There, you will see that there is actually another issue we could run into, and so we make an improvement to the training process using something called fixed Q-targets. Check out the following videos/blogs in this series, and let me know if this resolves your question!
@tinyentropy 5 лет назад ⁺²
"Just a convolutional network" That's it. The plain idea is simple. Indeed, I was exactly feeling the same ;) Thanks for making this so transparent!
@suyashtrivedi6245 4 года назад
The best channel for Deep Learning by far! Just wanted to ask what if the action-set can be overlapping? For example if there are 2 actions like jump and left, and they can occur simultaneously? Do we need to make a separate action (jump left) for that?
@nitinmalhotra2560 3 года назад ⁺²
I was also feeling the same way that deep neural networks would be much complicated, but was relaxed when I came to know that it was not that difficult to understand it.
@tingnews7273 5 лет назад ⁺³
What I learned:
1.Why old q-table don't work anymore. too big too stochatic too computive.
2.Deep Q-learning:combing Q-learning with deep neural network.
3.Deep Q-Network(DQN):deep neural network that approximates a Q-function.
4.How to combine a little confuse, I think I will get it clear along with the course.
5.Input will do some preprocess. Including shrink cut and a set of images combine together.
6.The network is not special, just good old neural network
@deeplizard 5 лет назад ⁺¹
Love hearing the summary of what you learned!
@tingnews7273 5 лет назад ⁺¹
@@deeplizard merry chrismas
@andywatts 5 лет назад ⁺³
Great video.
The NN input being time series windows- but as images.....that's cool.
I guess you need to see the NN working independently with those images and a couple of layers? ....before converting it to minimise Bellman?
@deeplizard 5 лет назад
Thanks, Andrew! The network actually starts using the Bellman equation right away in its loss calculation. You'll see in the later episodes of the series how exactly this is implemented in code.
@AdityaAbhiram 4 года назад ⁺¹
The neural network animation got me like 🤯🤯🤯
@user-or7ji5hv8y 6 лет назад ⁺¹
One of the best explanations!
@hangchen 5 лет назад ⁺⁶
Yes I feel the same way - we just use CNN to solve another problem - Q-learning.
@hazzaldo 5 лет назад ⁺¹
TY for the great explanation. Loving the series. I have a question on this video:
1- How do we feed the 4 still frames/screenshots from the game that represent the input state, into the Policy network? Will all 4 frames be fed in one flattened tensor (where one image ends the next one starts)? Or will they be fed separately one after the other into the network?
@faisaldj 3 года назад ⁺¹
great tutorial...
@prasadatluri 5 лет назад ⁺²
What are the target Q values you are using to calculate the loss? I am asking this because the network has to know( like in supervised learning how we give the labeled data) to which ideal Q values it should converge. I am asking this as I am confused because you just compared it with CNNs (supervised learning method with labeled training set)? What is the labeled training set that we are using here?
@GROOVEtheH3RO 5 лет назад ⁺¹
I'm stuck at the exact same question. @deeplizard
@GROOVEtheH3RO 5 лет назад
ok well, it becomes clear if you follow this tutorial further. The target Q values will be approximated by the target network, given the next state as input.
@rohanbobby9913 2 месяца назад
Hey deeplizard, i had a question. Since the sequence of images are also important, doesn't the neural network require some sort of RNN layers too to acknowledge the sequence?
@chadjensenster 5 лет назад ⁺¹
Love your videos, by far the best explanation I have seen on machine learning. Two thoughts/ideas for videos; first, it would be cool to see bayesian optimisation to tune hyperparameters. I think you might get some good hits if you combine bayesian and deep Q.
Second, rewards for humans are never linear, I was thinking of applying a reward based on a formula such as .01*x^3 to the reward. The higher the reward input into the formula, the higher the output, which will be fed to our machine.
@deeplizard 5 лет назад ⁺¹
Thanks for the feedback, chadjensenster! The second reward idea is interesting. Have you implemented it? If so, would love to hear your results.
@chadjensenster 5 лет назад
@@deeplizard I haven't. I haven't had much time to code as I have a partial differential equations final coming up. I have a 3 week break before fall semester starts that I can try it out. I will let you know how it goes.
@chadjensenster 5 лет назад ⁺¹
@@deeplizard the coefficient would be the same order of magnitude as your maximum reward at any one step, to keep your function from blowing up. It's there to "stretch" the reward function to reasonable levels. If you were doing number of steps survived, say 100, having a coefficient of 0.0001 for a max possible reward at the end of each game of 100. That is one example of a function you could use, you might also be able to use c*tan(x) but would have to have a very small coefficient to keep your reward from blowing up. Also c1*tanh(c2x) could work with a larger coefficient and it would never blow up. C1 would control the magnitude and c2 would control the left right stretch. It might be good for open ended rewards such as crypto or stock investing.
@deeplizard 5 лет назад ⁺¹
Definitely! Surely this type of reward makes sense for scenarios that are more complex than simple game environments, where the longer you go at something, the more reward you receive at a given step. Plus, the non-linearity factor makes sense too in certain situations… where once you’ve reached a certain threshold, the longer you go at something after that, the less *incremental* reward you receive. (Think of the graph of log(x) for x >= 1 for example.) If you’ve taken an econ course, this idea is referred to as the diminishing law of returns.
Good luck on your DE final! Also, cool to see you on the vlog channel too :D
@chadjensenster 5 лет назад ⁺²
@@deeplizard I thought of that as well, but from a human learning perspective, I feel we learn with a function that is shaped as a right skewed bell curve. It would have the diminishing returns you discussed. I am not sure if I am getting into learning rate territory or reward territory I am still new enough at deep learning and haven't been able to sit down and fully watch your videos yet. If the reward modifies the learning rate based on performance, adjusting the reward would affect the learning rate. If the reward is a separate function from the learning rate, my above bell curve analogy might be more appropriate as a learning rate modification. Well, I have an exam in 4 hours, I better quit stalling and start studying. 😁 And thanks for the well wishes.
@ugurkaraaslan9285 5 лет назад ⁺¹
Good explanation! Thanks lot!
@dianaepureanu2129 5 лет назад ⁺¹
I am confused with something. When you define the input, you use 4 snaphots of the game. This is one single state, right? But then do you need to always create these snapshots later? Or do you only provide 4 and then the NN knows how to do the rest? I do not understand whether the input is generated only once with only 4 snapshots chosen by us. I hope the question is clear.
@deeplizard 5 лет назад
Hey Diana - Yes, 4 snapshots corresponds to one single state. For each state, 4 new snapshots will need to be generated to feed to the model. This may seem tedious, but we can automate this process in code, which we'll see in the upcoming videos where we code a DQN to play a game.
@dianaepureanu2129 5 лет назад ⁺¹
@@deeplizard thank you😁
@Utkrisht123 4 года назад ⁺¹
What if number of actions keeps changing for eg in games like chess or go, number of available moves keeps varying for each board position. So how to model nueral network for that ?
Btw your series are great. I have watched your neural network series as well :)
@carchang4843 Год назад ⁺¹
5:32 when did you go over this?
@crykrafter 6 лет назад ⁺¹
Thanks. Great explanation
@korenucl3646 3 года назад
say in modern games, the action you could take is continuous. like you could gently hold joystick left to move it slowly and you could push the joystick all the way to make it go faster. Then would the DQN output layer be exploding in number?
@Laétudiante Год назад
Great video! But my questions is why are we certain that the outputs node correspond to each action precisely? Are we inputting states that have these four actions ( as I thought they are just four different frames)
@ufukaltan7693 4 года назад
If our goal is actually to predict the next action of the ball, is it possible to utilize LSTM networks instead of having 4 nodes or 4 different frames in the input layer? Since RNNs are functional for analyzing a series of images(video).
@ArmanAli-ww7ml 2 года назад
Do we input states one at a time or a whole state space to the input all at once
@sidddddddddddddd 2 года назад
Does this mean that at the output, the "s" in "q(s,a)" refers to 4 different states? Also if you have four actions in the action space [left, right, up, down], how do you know which node of the output layer corresponds to which action?
@ArmanAli-ww7ml 2 года назад
how would we define whether the upper node corresponds to action of turning right ?
@iworeushankaonce 4 года назад ⁺¹
amazing!
@j_owatson 3 года назад
Could you use a RNN instead of a CNN and then reduce the amount of input frames?
@tripzero0 5 лет назад
Can you combine image and data features? For example, what if I have a robot that has orientation data as well as visual input? How would one combine the image data and orientation data as feature inputs to a neural network?
@vishalpoddar 4 года назад
If reinforcement learning has time dependency in different time steps why don't we make use of LSTM or other neurons which are ideal to capture time dependency?
@daviddisbrow2222 5 лет назад
So is the previous method using value iteration and example of machine learning? Trying to wrap my head this term.
@deeplizard 5 лет назад
If you look at our example of Frozen Lake, for example, we can see that our agent is learning how to perform a specific task (reaching the frisbee) without using any explicit instructions. This task was accomplished using value iteration. Therefore, I'd say that this is indeed a machine learning problem.
@carchang4843 Год назад
3:32
When did you go over that?
@McRookworst 3 года назад
Great video series so far! I'm a bit confused about the multiple states thing. I thought the benefit of q-learning was to reinforce itself when it sees the same states over and over again. But what if you have a game where each frame is always slightly different: the algorithm won't have a previous entry in its q table for any of the states it receives right? Does the machine learning help to solve this?
@honkhonk8009 Год назад
I know this is 2 years old, but DQN uses something called "Experience Replay"
Something about sampling everything randomly in a replay buffer, and then training the network off that buffer or sumn.
Then the network can keep learning from previous experiences so it doesnt forget stuff, while learning from
@EranM 2 года назад ⁺¹
I
@karolbielen2090 5 лет назад ⁺¹
I've watched 5 previous episodes and for that entire time I was wondering: what if I've got a non-static (i.e. changing) environment? And now I've finally learned that Q learning alone is not enough anymore and it's necessary to use CNN. Complications arise...
*Reality is often disappointing*
@amadlover 6 лет назад
Can you do one with the state of the entities in the scene, e.g. velocity of the ball, and move the paddle accordingly.
@deeplizard 6 лет назад
Hey Nihal - Can you please elaborate a bit? I'm not sure that I follow the question :)
@carchang4843 Год назад ⁺¹
6:30 when did we see this?
@carchang4843 Год назад
6:58 OH okay. I'll be sure to look at this
@karthikd490 5 лет назад
why can't we use a softmax activation function in the final layer and treat this as a classification problem, instead of using NO Activation function. ?
@deeplizard 5 лет назад ⁺²
Because we want the raw outputs (the Q-values) from the network that correspond to each possible action from the given state.
@karthikd490 5 лет назад
@@deeplizard Thank you. Please extend this series to include Actor Critic Methods and handling continuous action spaces using keras api.
Edit: I pledge to donate 50$ worth ETH for the above tutorials. Thanks again.
@issafares1830 5 лет назад ⁺¹
{
"question": "Which of the following is not true?(one choice) ",
"choices": [
"Q-learning is a reinforcement learning method.",
"Q-learning is an on-policy method.",
"DQN involves Q-learning method and deep neural networks",
"Training a DQN means update Q-values through updating the wights of the neural network "
],
"answer": "Q-learning is an on-policy method.",
"creator": "fares",
"creationDate": "2019-09-13T16:16:22.744Z"
}
@deeplizard 5 лет назад
Thank you for the quiz question, Issa! First one for this video!
I've just posted it, so it's now live below :D
deeplizard.com/learn/video/wrBUkpiRvCA
@himanshuranjan9246 4 года назад
it is a weird question. but whose voice is in the video ?
@deeplizard 4 года назад
ruclips.net/channel/UC9cBIteC3u7Ee6bzeOcl_Og
@himanshuranjan9246 4 года назад
@@deeplizard thanks for the tutorial they were helpful.
@londonl.5892 4 года назад
What is the error function that this network is training on? In most networks, we have a dataset to train on. In this case, we don't seem to have a dataset unless q(s, a) is already precomputed. But I thought the goal of this was to help so that we didn't have to precompute q(s, a) for all states and actions.
@AwfulnewsFM Год назад
I was imagining layers where nodes are tiny q-tables somehow 😂😂
@AwfulnewsFM Год назад
In hindsight this is pretty irrational
@karthik-ex4dm 6 лет назад ⁺¹
Wow!! Tunelling in atari games comes to make the work of the agent comes from epsilon decay param.. I suppose ¿?
@deeplizard 6 лет назад
The agent starts to exploit what it's learned from exploring the environment in earlier steps!
@WahranRai 3 года назад ⁺¹
The background music is disturbing !
@escapefelicity2913 3 года назад
Get rid of the background noise

Следующие

Автовоспроизведение

Replay Memory Explained - Experience for Deep Q-Network Training