Q Learning Explained (tutorial)
HTML-код
- Опубликовано: 4 окт 2024
- Can we train an AI to complete it's objective in a video game world without needing to build a model of the world before hand? The answer is yes using Q learning! I'll go through several use cases and show some python code of how Q learning works.
Code for this video:
github.com/llS...
Adnan's Winning code:
github.com/Adn...
Alberto's runner up code:
github.com/alb...
Please Subscribe! And like. And comment. That's what keeps me going.
Want more inspiration & education? Connect with me:
Twitter: / sirajraval
Facebook: / sirajology
More learning resources:
mnemstudio.org/...
ocw.mit.edu/co...
uhaweb.hartford...
/ deep-reinforcement-lea...
www.cs.cmu.edu...
cs.stanford.edu...
www.quora.com/...
www0.cs.ucl.ac....
Join us in the Wizards Slack channel:
wizards.herokua...
And please support me on Patreon:
www.patreon.co... Instagram: / sirajraval Instagram: / sirajraval
Signup for my newsletter for exciting updates in the field of AI:
goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
www.wagergpt.co
Always love it when a programming-related video starts with "Hello, World."
Not even kidding.
exactly
It's a delicate balance between efficiency and curiosity
Hey Siraj, I want to thank you for bringing such epic videos on AI and mathematics to inherit hunger for learning. *A good teacher is one which ignites a spark in student* ...Love from India.😂😢🤓❤
sending hugs
love your videos great Work buddy ! i am starting to study AI next week and I am freaking excited as well as confident because your channel (and brilliant and other stuff) is increadibly helpful :D
I think I should unsubscribe until I learn enough Deep Learning, as these videos are giving me existential crisis.
Aditya Patil 😂
Aditya Patil same here bro :(
nah dont worry im just going to get better at explaining
Siraj Raval +100 IOTA
yup
"Of course you are, you beautiful wizard."
haha its true
This video misses an important issue with Q learning. The Q function is based on possibly getting a reward in the future, even if a reward is not available right away. If the algorithm keeps getting into new states, it might posit that the lack of rewards are simply a case of it getting closer to a large reward in the far future. It won't know to "correct the value down" until it loops back to (and finds itself unable to escape) a previous state. It can become "stuck" with that bias, so long as it's not sure it's in a closed state list. The more states are in the system, the larger this bias can become.
Man these reinforcement learning series are amazing !
where can i find a simpler video? most of the terms he uses, i don't understand
I think key of q-learning is to mimic human's leanring that we always learn under some motivation, otherwise we could n't be too good at anything. The algorithm introduce result of action as feedback into shaping decision making
omgod besides the video being really well made and offering quality explanation(kudos!) you are the first guy from India with a totally nice pronunciation!
I have an AI test next week. It's like you uploaded this for me. Thanks!
Hi Siraj, Thanks for introducing and teaching such great stuff every week. These are of great help. Recently I was studying about agent-critic reinforcement learning and I found its methodology quite similar to GAN where agent performing the role of generator and critic as the discriminator. Can you please provide your thoughts on this?
Man, how do you gather so much information so quickly???😱
Deepak Surya AI
Def Tank Not sure but probably :P
Deep learning man...
AJ J that'll be crazy fast 😂😂
Internet
With all this talk about agents you should consider perhaps doing an extensive video on Multi-Agent Systems. Jason etc...
Someone has a supply of NZT.
i wasn't able to get anything from this
Except for the use of ‘less’ rather than ‘fewer’ but I still think you’re amazing
Man, the AI community can't thank you enough!
thanks for watching!
This is helping me to understand life strategies, lmao. Great video Siraj.
I watched this video TWICE !!!
My first time: With zero background, I understood nothing!
I Rage quit in the middle.
*A full semester passes by in which I took Deep Reinforcement Learning class *
My second time: Oh...! So that's what he was talking about!
I guess there's a special lingo that needs to be learnt, and making a 10 minute youtube video about it is absolute pointless, cause the ones with zero background won't understand jack shit, and the ones with background already know this shit. So... you know.
I have background in other types of ai, yet I want to learn q learning. I know the lingo but don’t know this. Boom, it’s helpful.
Awesome, I found a hidden gem on youtube
Siraj, you understand computer science far more deeply than I do. But I think you need to review clean coding practices. For developing an algorithm, it might not be a big deal. But if you want to share your code with others - for distribution, review, or learning - readable code will make the process much smoother.
Great video Siraj, thanks for bringing your knowledge into the video game world! :)
no problem driftwood this is fun!
I get the concepts of quite a few different models now(RNN,CNN,normal feed worward), but I have trouble putting it into code. Can you please point me to a good resource to learn tensorflow itself from, please?
Okay Siraj these explanations were amazing... intuitive and easily absorbed !
Nice effects too xD
Damn why u getting dislikes tho. Anyways what you think about creating a general game bot to rekk all online scores
Hey Siraj, great video (and your other videos are also great). I got a question, what happens if it takes a long time to complete an "episode"? How do you efficiently train the network in that case?
i did this for my college project :) thanks siraj
I know Q learning is a type of reinforcement learning. But I'm wondering if adding human feedback in the loop makes the model more accurate and less prone to mistakes.
U do really great job Siraj,but ur videos look like just a lecture and no effective learning just some informations that flooded all over the internet.it would be better to explain exactly what u do on a example code and explain the entire code step by step.i hope U really do it i have so many unsolved questions that no one can answer it if u explain Ur code for this video and other videos like this U will help us so much.
thx alot man.
the title said "Q Learning Explained" not how to make Q Learning..you have to know the difference
I've found that for the 50 or so videos of his I've watched over the past year and a half, I have learnt absolutely nothing. I've probably wasted 50-100 hours on his channel, just following along, rewatching stuff, trying to understand stuff, but never actually understanding. Most of the time, watching his videos gave me the impression that I was learning when in actuality I wasn't learning crap. I can't even recall anything that I learnt from watching any of his videos this video included. Most of what he says is just lost in the jaron and at times I even feel like he's not even trying to teach but just to give the illusion of teaching others. His videos will probably be helpful for those that already know what he's talking about or are already deep in the field, but for individuals that are just starting off on this ml journey, I doubt they'll learn much from him.
@@bobsmithy3103 You need to do the coding. You won't learn programming by watching stuff.
Me and my study partner literally refer to you as "our friend Siraj" when we are studying ML because you never fail to help us understand all the concepts that were unclear before
What's 'Q'? The 'q' in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.
Dude, can you slow down a little so that I don't have to hear you gasping for air? Or so that I can figure out what you're saying?
Hi Siraj, I request that you make your videos in such a manner where any layman can understand the logic behind the same ...I know its sometimes very tough by knowledge is that only ...to make the complex things easy ....cheers
Best teacher ever.
How do u expect one to grasp everything if u explain with the light speed
Playback speed 0.75x. I’m serious.
For mario, think about the state space. Then think about markov chain approach that needs exact match of the state. I hope you get my point that no one is going to have enough ram and time for it to train for mario. You can downsample as you can but still the approach is not allowing a reasonably sized state space.
TD is nice for Mario, because jumping a Goomba has little influence on the rest of a level, but MC is great for Go, where early moves are important later.
Nice :) Hope this will help me create an AI which solves "Plague Inc" xDDD
Nah teach it to infect Greenland first
4:05 "it's called the fuck you function" - that's what I heard
that helped me out understanding my issue with exploration thx.
Why is Siraj in the kitchen? Is he about to show us how to cook something?
Why are there no subtitles, at least in English? Are you not interested in foreign listeners?
Hey Siraj, thanks for the video. I have a probably naive question. With Q learning, an observation and score is given to an agent by the environment. Does this mean that q learning requires an environment that is perfectly informed? For gym, goals are clearly defined to the world such as getting to a destination. What if the goal is not so well defined?
Example: What if I want to use q learning for exploration of a fixed geometry environment with the goal of finding resources that are not immediately know to the environment. Now it's unclear to me as how to define the score for each frame as the environment would not know about where the resources are in the first place until the agent is close enough to spot it.
Sorry for the lengthy comment!!! I understand you are very busy and my comment might be extremely dumb so any help is greatly appreciated! Thanks again!
Great easily understandable video!
Love your videos!
Thanks so much for making Machine Learning interesting. :D
thanks!
What were you trying to cook with q learning?
haha needed my green screen
What kinds of approaches can you take when there isn't an obvious reward metric to feed to your algorithm? Let's say you wanted to make an AI to begin and finish a game that doesn't seem very linear such as Zelda or Metroid, or an analogous but unknown game. Do you just cram as many item counters as you can for measuring rewards?
Hi siraj, great video.
btw you should change your mic to more high quality mic. as headphone user i'm not comfortably enough watching your videos i still can hear noise in the background. anyway keep the good work
Wow. You're really really good at explaining things in a super easy and fun way :) Amazing video! I love it!
Great video! Q Learning FTW!
And thanks for the shoutout :D
np alberto
4:20 q funktions seem great for speedrunning, but I wonder if there is only limited computing power TD algorithm could learn quicker and if you finish the learning process with a q algorithm it might have some cornerstones to find out the best way in a better manner
Here in 2023 because ... reasons
Just so happens we all have the same reason 😉
No one ever seems to talk about how the agent knows which actions are available to it. Where are the options "left and right" defined?
From where should I start learning AI and Machine Learning
Pls help me guys
I am a beginner and I know Python programming
First learn calculus 1 and 2, if possible learn calculus 3. After calculus 2 you can start learning probability and statistics. Make sure to learn the full details for statistics, not just basic stuff like normal distributions. Learn many different well known distributions such as chi square, gamma, etc. Learn how to do inferential statistics such as point estimation, interval estimation. Once you spend 1 to 2 years learning the Math and also at the same time brush up your python and R skills for data science packages. Finally you may start looking into machine learning techniques. But first start with good ol linear regression and logistic regression (more math + linear algebra). And then learn statistical learning methods that have been rebranded into machine learning methods from 50 to 100 years ago. Once you're done with that you can start learning some more modern methods like reinforcement learning and neural networks. There is no ONE, BEST, Tool in machine learning. Every method is suitable for different cases. But yes, neural networks are cool so everyone talks about it.
Dude, you're allowed to blink haha
Nice kitchen background, but the plain background with memes are easier to watch and focus... btw did you slow down the video by 10-20% ?)
Quality of a certain action in a certain state. Bellman equation. Algorithm.
Hi Siraj, thanks for sharing. It seems the code is no longer working.
Hi Siraj. Thanks for the video. Can you tell me what tools do you use to edit those awesome videos? Thanks.
Great siraj, keep them coming
Can you make a video on setting up python with all its libraries needed for your videos. I am having a hard time knowing what all libs I need for your older videos.
Which version python do you use: 32bit or 64bit.
You need to check out the youtube channel 'sendex'. Also I am 99% confident that he is using the 64bit version.
Victor Gallagher thanks a lot
senTdex right?
Sorry, yes sentex is right.
got it wrong again, 'sentdex'
Amazing work Siraj!
You are damn right, I am a beautiful wizard!
Why is the q_table initialized as np.zeros((n_states, n_states, 3))? 3 is the number of actions, right (i.e. drive left, drive right, do nothing)? Why would we need two dimensions for the number of states?
One thing I’ve wondered is how to tackle AI for tasks that can’t be parallelized or sped up (I.e. model-free, real time task based AI)
Perryman1138 record some example data so you would pre train your model. if i do remember correctly it s called imitation learning. by giving the agent some interessing actions path with aready computed rewards. the more the better.
Ah I see. In particular, I was thinking of Dungeon Crawl Stone Soup, a color-graphics terminal roguelike for which recordings of thousands of games exist in text format, but I believe it might only record the game outputs, not the player inputs. Still, a fascinating concept! Thanks!
27 liberal arts majors watched this video
163 now.
204 man
237
Thanks for the videos man. However it seems that the Q Matrix cannot be used with a large number of states.
0:49 this is why I subbed
Surprisingly good....hmmmmm. Great job!
why anyone talking about reinforcement learning does this only in python. I wanna see examples in c++, in c#, in java.. why only python?
dude just get the point, you can implement it in any programming language. and converting python code to c# is easy. c# has all the things python has. maybe lots of lines but, it's easy to do.
I'm confused about how this is machine learning. It seems to me that the computer just creates a lookup table. Could you please clarify, for I'm sure I am missing something?
Well he only talked about finding the optimal policy.. But not to forget you have to generalize on the optimal policy i.e you drop that agent in an environment which it doesn't know about or let me put it this way.. Suppose your agent has learned the policy that whenever the distance between the agent and an obstacle is precisely 5 meter it should apply brakes but heres the catch.. The state space is not discrete always it can be continuous. In-fact in real life it is continuous. So machine learning comes to play at this point.. It takes the state vector and applies function approximation (can be a neural network) to spit out an action on it.. Hope that helps.. :)
PS: ML is nothing but function approximation or curve fitting..
Thanks! I think I understand now.
What is the difference between Q-function and Value-function? The formulas look very similar to me
The Q function is also called the action value function, the main difference is that V(s) is the value of "being in state s" Q(s, a) is value of "being in state s, and taking action a". Both have similar-looking Bellman equations and Bellman optimality equations, and they can be related to each other based on how teh environment works and what the current policy is (no equation-writing in RUclips comments, so cannot show you :-(. Many model-free RL algorithms use Q because it gives you a way to select the next action (just pick a so that Q(s, a) is maximum out of all possible Q(s, ?) - whilst if you have V(s), the only way to maximise it is to look ahead to see what the next state will be, which you can only do if you have some way to predict what the next state (i.e. a model)
Must the reward be only discrete 1 or 0? Can it be an intermediate fraction or decimal?
How do we check the convergence of Q matrix?
Thanks for doing this Siraj. But I am running this in Ubuntu. I am not able to see anything though the code runs fine as i can see the iterations. Any idea how to fix it.
Can Q_LEARNING be used for solving classification problem? If it does then how? Could you explain or make a video regarding this?@Siraj Raval
"So easy, a liberal art major can do it " lmaooo🤣🤣🤣🤣🤣🤣
this was pretty decent but you shouldnt have titled the video the way you did because i would not consider it that beginner friendly
much better than your earlier ones. Pace is good and there're less memes and gimmicks. Good job!
I am new here and dont know from where to start learning this stuff. Any suggestions
So is python the laugage for deep learning?
This guy is too smart for me. I need someone who is on my low IQ to explain this.
I would suggest to look up Code Bullet. He does fun AI stuff in an easier to comprehend, sillier manner.
A.I Learns to DRIVE does Q-Learning and its bigger brother Deep Q network.
Not as in-depth, but fun to see and gives a good starting idea of what the AI does.
Is model here referred to model of the environment/world?
You should have many more followers
this and genetic algorithm which one is better?
q-learning is model-free learning, not model-based learning just FYI
Does this mean that the player has to have already been in a state and taken some actions to make an optimal decisions, or is there a technique to use past results to estimate future rewards i.e. a neural network? With the state consisting of two different variables in this case, it seems like to would take a while for the car to find the best actions to take for each occurring state in a reasonable time. I'm a little confused.
"a technique to use past results to estimate future rewards i.e. a neural network", yes. Q learning is exactly that. Start with a untrained agent that knows nothing of the environment. Also strongly bias that agent to random actions at first in order to gather data. Next, allow the agent to take some number of actions in the environment, while recording the entire session. At the end take the reward, in this case it could be "units to the right of the start." No go back over the recording and apply a share of that score to every move (perhaps with a decay for the older actions.) Finally, feed each instance of the replay, one at a time, into the network. You have to provide the state of the world + action and train it towards the score. If you keep doing this your network will start to converge on an understanding of what moves will create what score. Once you gain some data stop purely randomly sampling actions. Start using predictions from your model to inform the next move (the rate at which you go from pure random to pure agent is an important hyper-parameter.) If all goes well your agent learns better and better actions to take in each situation until it knows how to get very good scores. At least, that's how it should work! I'm struggling to get my model to converge on a similar toy example.
what dustin said
Thanks
Well that made my day. Thanks Siraj!
I didn't get the 'liberal arts major' reference. Who is the bearded man in the inset at 0:52?
This is top tier content man. Thank you so much!
great work sir
Great work man
Great video! Thx
Great video!
can anyone explain what the value of n_states represents in the program? does this mean there are only 40 possible positions in the environment or what? thanks in advance
edit - i'm thinking that the "3" in "q_table = np.zeros((n_states, n_states, 3))" represents the fact that there are 3 possible actions for the car? i'm confused.
Hey! i am aiming to create an Ai, i was going to use a genetic algorithm.
What do you think the best type of algorithm would be for creating a bipedal balancing/walking robot?
i was thinking of using unity to simulate the physics
Thanks
cool. Thanks
You are brilliant. I only dream of having your ease of understanding of these processes!🎉
Can anyone please explain the if else used for selecting random action.. Thank you
0:50 For that joke you really deserve a subscribtion! :D