Q Learning Explained (tutorial)

Siraj Raval

Просмотров 333 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 4 окт 2024
Can we train an AI to complete it's objective in a video game world without needing to build a model of the world before hand? The answer is yes using Q learning! I'll go through several use cases and show some python code of how Q learning works.
Code for this video:
github.com/llS...
Adnan's Winning code:
github.com/Adn...
Alberto's runner up code:
github.com/alb...
Please Subscribe! And like. And comment. That's what keeps me going.
Want more inspiration & education? Connect with me:
Twitter: / sirajraval
Facebook: / sirajology
More learning resources:
mnemstudio.org/...
ocw.mit.edu/co...
uhaweb.hartford...
/ deep-reinforcement-lea...
www.cs.cmu.edu...
cs.stanford.edu...
www.quora.com/...
www0.cs.ucl.ac....
Join us in the Wizards Slack channel:
wizards.herokua...
And please support me on Patreon:
www.patreon.co... Instagram: / sirajraval Instagram: / sirajraval
Signup for my newsletter for exciting updates in the field of AI:
goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
www.wagergpt.co

Комментарии • 261

@Sluppie 5 лет назад ⁺³⁴
Always love it when a programming-related video starts with "Hello, World."
Not even kidding.
@ImtithalSaeed 3 года назад
exactly
@amidhmi5243 6 лет назад ⁺¹¹
It's a delicate balance between efficiency and curiosity
@KunwarPratapSingh41951 6 лет назад ⁺⁸
Hey Siraj, I want to thank you for bringing such epic videos on AI and mathematics to inherit hunger for learning. *A good teacher is one which ignites a spark in student* ...Love from India.😂😢🤓❤
@SirajRaval 6 лет назад ⁺¹
sending hugs
@manuelkarner8746 5 лет назад ⁺³
love your videos great Work buddy ! i am starting to study AI next week and I am freaking excited as well as confident because your channel (and brilliant and other stuff) is increadibly helpful :D
@adityapatil325 6 лет назад ⁺²⁵⁸
I think I should unsubscribe until I learn enough Deep Learning, as these videos are giving me existential crisis.
@stefan-ls7yd 6 лет назад
Aditya Patil 😂
@greenbillugaming2781 6 лет назад
Aditya Patil same here bro :(
@SirajRaval 6 лет назад ⁺⁵²
nah dont worry im just going to get better at explaining
@stefan-ls7yd 6 лет назад
Siraj Raval +100 IOTA
@hdef6602 6 лет назад
yup
@WesKeppy 6 лет назад ⁺¹⁵
"Of course you are, you beautiful wizard."
@SirajRaval 6 лет назад ⁺⁴
haha its true
@slugfiller 6 лет назад ⁺²
This video misses an important issue with Q learning. The Q function is based on possibly getting a reward in the future, even if a reward is not available right away. If the algorithm keeps getting into new states, it might posit that the lack of rewards are simply a case of it getting closer to a large reward in the far future. It won't know to "correct the value down" until it loops back to (and finds itself unable to escape) a previous state. It can become "stuck" with that bias, so long as it's not sure it's in a closed state list. The more states are in the system, the larger this bias can become.
@smtabatabaie 6 лет назад ⁺¹
Man these reinforcement learning series are amazing !
@GameCasters 6 лет назад ⁺⁴
where can i find a simpler video? most of the terms he uses, i don't understand
@yabincheng4171 9 месяцев назад
I think key of q-learning is to mimic human's leanring that we always learn under some motivation, otherwise we could n't be too good at anything. The algorithm introduce result of action as feedback into shaping decision making
@waltzofthestars2078 2 года назад
omgod besides the video being really well made and offering quality explanation(kudos!) you are the first guy from India with a totally nice pronunciation!
@CodeEmporium 6 лет назад
I have an AI test next week. It's like you uploaded this for me. Thanks!
@prinkle12 6 лет назад ⁺⁵
Hi Siraj, Thanks for introducing and teaching such great stuff every week. These are of great help. Recently I was studying about agent-critic reinforcement learning and I found its methodology quite similar to GAN where agent performing the role of generator and critic as the discriminator. Can you please provide your thoughts on this?
@dsuryas 6 лет назад ⁺⁵³
Man, how do you gather so much information so quickly???😱
@dan-garden 6 лет назад ⁺¹²
Deepak Surya AI
@dan-garden 6 лет назад ⁺³
Def Tank Not sure but probably :P
@mooe20 6 лет назад ⁺⁶
Deep learning man...
@dsuryas 6 лет назад
AJ J that'll be crazy fast 😂😂
@g0d182 6 лет назад
Internet
@tallwaters9708 6 лет назад ⁺¹
With all this talk about agents you should consider perhaps doing an extensive video on Multi-Agent Systems. Jason etc...
@rhejamphi 6 лет назад ⁺²⁴
Someone has a supply of NZT.
@sadeghshaikhi5950 3 года назад ⁺⁵
i wasn't able to get anything from this
@philrowlands1087 2 года назад
Except for the use of ‘less’ rather than ‘fewer’ but I still think you’re amazing
@bauwndule 6 лет назад ⁺⁷
Man, the AI community can't thank you enough!
@SirajRaval 6 лет назад ⁺²
thanks for watching!
@aaronsilver-pell411 6 лет назад ⁺²
This is helping me to understand life strategies, lmao. Great video Siraj.
@talkohavy 4 года назад ⁺¹⁰
I watched this video TWICE !!!
My first time: With zero background, I understood nothing!
I Rage quit in the middle.
*A full semester passes by in which I took Deep Reinforcement Learning class *
My second time: Oh...! So that's what he was talking about!
I guess there's a special lingo that needs to be learnt, and making a 10 minute youtube video about it is absolute pointless, cause the ones with zero background won't understand jack shit, and the ones with background already know this shit. So... you know.
@saveerjain8168 4 года назад
I have background in other types of ai, yet I want to learn q learning. I know the lingo but don’t know this. Boom, it’s helpful.
@trevorgustavgreen8148 5 лет назад
Awesome, I found a hidden gem on youtube
@nfcopier1 6 лет назад
Siraj, you understand computer science far more deeply than I do. But I think you need to review clean coding practices. For developing an algorithm, it might not be a big deal. But if you want to share your code with others - for distribution, review, or learning - readable code will make the process much smoother.
@DriftyG 6 лет назад ⁺²⁷
Great video Siraj, thanks for bringing your knowledge into the video game world! :)
@SirajRaval 6 лет назад ⁺²
no problem driftwood this is fun!
@kalebbruwer 6 лет назад ⁺²
I get the concepts of quite a few different models now(RNN,CNN,normal feed worward), but I have trouble putting it into code. Can you please point me to a good resource to learn tensorflow itself from, please?
@AhmadM-on-Google 6 лет назад
Okay Siraj these explanations were amazing... intuitive and easily absorbed !
Nice effects too xD
@AhmadM-on-Google 6 лет назад
Damn why u getting dislikes tho. Anyways what you think about creating a general game bot to rekk all online scores
@paviad 6 лет назад ⁺²
Hey Siraj, great video (and your other videos are also great). I got a question, what happens if it takes a long time to complete an "episode"? How do you efficiently train the network in that case?
@jinxblaze 6 лет назад
i did this for my college project :) thanks siraj
@Ludens93 10 месяцев назад
I know Q learning is a type of reinforcement learning. But I'm wondering if adding human feedback in the loop makes the model more accurate and less prone to mistakes.
@aliazizi129 6 лет назад ⁺¹⁴
U do really great job Siraj,but ur videos look like just a lecture and no effective learning just some informations that flooded all over the internet.it would be better to explain exactly what u do on a example code and explain the entire code step by step.i hope U really do it i have so many unsolved questions that no one can answer it if u explain Ur code for this video and other videos like this U will help us so much.
thx alot man.
@keanzoe 6 лет назад
the title said "Q Learning Explained" not how to make Q Learning..you have to know the difference
@bobsmithy3103 6 лет назад ⁺⁶
I've found that for the 50 or so videos of his I've watched over the past year and a half, I have learnt absolutely nothing. I've probably wasted 50-100 hours on his channel, just following along, rewatching stuff, trying to understand stuff, but never actually understanding. Most of the time, watching his videos gave me the impression that I was learning when in actuality I wasn't learning crap. I can't even recall anything that I learnt from watching any of his videos this video included. Most of what he says is just lost in the jaron and at times I even feel like he's not even trying to teach but just to give the illusion of teaching others. His videos will probably be helpful for those that already know what he's talking about or are already deep in the field, but for individuals that are just starting off on this ml journey, I doubt they'll learn much from him.
@subjord 5 лет назад ⁺¹
@@bobsmithy3103 You need to do the coding. You won't learn programming by watching stuff.
@siriwessberg7460 4 года назад ⁺¹
Me and my study partner literally refer to you as "our friend Siraj" when we are studying ML because you never fail to help us understand all the concepts that were unclear before
@yabincheng4171 9 месяцев назад
What's 'Q'? The 'q' in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.
@nuadathesilverhand3563 5 лет назад ⁺²
Dude, can you slow down a little so that I don't have to hear you gasping for air? Or so that I can figure out what you're saying?
@saurabhiim 6 лет назад
Hi Siraj, I request that you make your videos in such a manner where any layman can understand the logic behind the same ...I know its sometimes very tough by knowledge is that only ...to make the complex things easy ....cheers
@cash4laughs71 6 лет назад
Best teacher ever.
@prateek6502-y4p 5 лет назад ⁺²
How do u expect one to grasp everything if u explain with the light speed
@BillBaxter 5 лет назад
Playback speed 0.75x. I’m serious.
@enobil 6 лет назад
For mario, think about the state space. Then think about markov chain approach that needs exact match of the state. I hope you get my point that no one is going to have enough ram and time for it to train for mario. You can downsample as you can but still the approach is not allowing a reasonably sized state space.
@CTimmerman 6 лет назад
TD is nice for Mario, because jumping a Goomba has little influence on the rest of a level, but MC is great for Go, where early moves are important later.
@BigDvsRL 5 лет назад ⁺⁸
Nice :) Hope this will help me create an AI which solves "Plague Inc" xDDD
@Sohlstyce 4 года назад
Nah teach it to infect Greenland first
@vitulus_ 5 лет назад ⁺²
4:05 "it's called the fuck you function" - that's what I heard
@Yannoux2000 6 лет назад
that helped me out understanding my issue with exploration thx.
@otonanoC 5 лет назад ⁺¹
Why is Siraj in the kitchen? Is he about to show us how to cook something?
@AlexeyKravets 4 года назад ⁺¹
Why are there no subtitles, at least in English? Are you not interested in foreign listeners?
@Palamdrone 6 лет назад ⁺¹
Hey Siraj, thanks for the video. I have a probably naive question. With Q learning, an observation and score is given to an agent by the environment. Does this mean that q learning requires an environment that is perfectly informed? For gym, goals are clearly defined to the world such as getting to a destination. What if the goal is not so well defined?
Example: What if I want to use q learning for exploration of a fixed geometry environment with the goal of finding resources that are not immediately know to the environment. Now it's unclear to me as how to define the score for each frame as the environment would not know about where the resources are in the first place until the agent is close enough to spot it.
Sorry for the lengthy comment!!! I understand you are very busy and my comment might be extremely dumb so any help is greatly appreciated! Thanks again!
@aksjhdbaksjhdbNotASpam 8 месяцев назад
Great easily understandable video!
@khiljichand 6 лет назад ⁺²
Love your videos!
Thanks so much for making Machine Learning interesting. :D
@SirajRaval 6 лет назад
thanks!
@boffo25 6 лет назад ⁺²
What were you trying to cook with q learning?
@SirajRaval 6 лет назад ⁺¹
haha needed my green screen
@daggawagga 6 лет назад
What kinds of approaches can you take when there isn't an obvious reward metric to feed to your algorithm? Let's say you wanted to make an AI to begin and finish a game that doesn't seem very linear such as Zelda or Metroid, or an analogous but unknown game. Do you just cram as many item counters as you can for measuring rewards?
@AfdalWahyu 6 лет назад
Hi siraj, great video.
btw you should change your mic to more high quality mic. as headphone user i'm not comfortably enough watching your videos i still can hear noise in the background. anyway keep the good work
@daesoolee1083 6 лет назад ⁺¹
Wow. You're really really good at explaining things in a super easy and fun way :) Amazing video! I love it!
@alberjumper 6 лет назад
Great video! Q Learning FTW!
And thanks for the shoutout :D
@SirajRaval 6 лет назад
np alberto
@dmarsub 6 лет назад
4:20 q funktions seem great for speedrunning, but I wonder if there is only limited computing power TD algorithm could learn quicker and if you finish the learning process with a q algorithm it might have some cornerstones to find out the best way in a better manner
@LawrenceDCodes. 10 месяцев назад ⁺¹¹
Here in 2023 because ... reasons
@Nick_With_A_Stick 5 месяцев назад
Just so happens we all have the same reason 😉
@Mirandorl 5 лет назад
No one ever seems to talk about how the agent knows which actions are available to it. Where are the options "left and right" defined?
@souravjamwal77 6 лет назад ⁺¹
From where should I start learning AI and Machine Learning
Pls help me guys
I am a beginner and I know Python programming
@davidrey6126 6 лет назад
First learn calculus 1 and 2, if possible learn calculus 3. After calculus 2 you can start learning probability and statistics. Make sure to learn the full details for statistics, not just basic stuff like normal distributions. Learn many different well known distributions such as chi square, gamma, etc. Learn how to do inferential statistics such as point estimation, interval estimation. Once you spend 1 to 2 years learning the Math and also at the same time brush up your python and R skills for data science packages. Finally you may start looking into machine learning techniques. But first start with good ol linear regression and logistic regression (more math + linear algebra). And then learn statistical learning methods that have been rebranded into machine learning methods from 50 to 100 years ago. Once you're done with that you can start learning some more modern methods like reinforcement learning and neural networks. There is no ONE, BEST, Tool in machine learning. Every method is suitable for different cases. But yes, neural networks are cool so everyone talks about it.
@zachwhelpley661 4 года назад ⁺¹
Dude, you're allowed to blink haha
@andriibogomazov7863 6 лет назад
Nice kitchen background, but the plain background with memes are easier to watch and focus... btw did you slow down the video by 10-20% ?)
@EickSternhagen 4 года назад
Quality of a certain action in a certain state. Bellman equation. Algorithm.
@yelenayu8522 20 дней назад
Hi Siraj, thanks for sharing. It seems the code is no longer working.
@samacumen 6 лет назад
Hi Siraj. Thanks for the video. Can you tell me what tools do you use to edit those awesome videos? Thanks.
@zakarie 6 лет назад
Great siraj, keep them coming
@sarangs8441 6 лет назад ⁺⁷
Can you make a video on setting up python with all its libraries needed for your videos. I am having a hard time knowing what all libs I need for your older videos.
Which version python do you use: 32bit or 64bit.
@VictorGallagherCarvings 6 лет назад ⁺⁶
You need to check out the youtube channel 'sendex'. Also I am 99% confident that he is using the 64bit version.
@sarangs8441 6 лет назад ⁺¹
Victor Gallagher thanks a lot
@bauwndule 6 лет назад ⁺²
senTdex right?
@VictorGallagherCarvings 6 лет назад
Sorry, yes sentex is right.
@VictorGallagherCarvings 6 лет назад ⁺¹
got it wrong again, 'sentdex'
@UnboxingSve 6 лет назад
Amazing work Siraj!
@joakim69 5 лет назад ⁺¹
You are damn right, I am a beautiful wizard!
@grantstenger6182 6 лет назад
Why is the q_table initialized as np.zeros((n_states, n_states, 3))? 3 is the number of actions, right (i.e. drive left, drive right, do nothing)? Why would we need two dimensions for the number of states?
@Perryman1138 6 лет назад
One thing I’ve wondered is how to tackle AI for tasks that can’t be parallelized or sped up (I.e. model-free, real time task based AI)
@Yannoux2000 6 лет назад
Perryman1138 record some example data so you would pre train your model. if i do remember correctly it s called imitation learning. by giving the agent some interessing actions path with aready computed rewards. the more the better.
@Perryman1138 6 лет назад
Ah I see. In particular, I was thinking of Dungeon Crawl Stone Soup, a color-graphics terminal roguelike for which recordings of thousands of games exist in text format, but I believe it might only record the game outputs, not the player inputs. Still, a fascinating concept! Thanks!
@alljiang 6 лет назад ⁺⁹⁰
27 liberal arts majors watched this video
@debayondharchowdhury2680 5 лет назад
163 now.
@nizamuddinahmed8913 5 лет назад
204 man
@JohnDoe-uq2qd 4 года назад
237
@TheCrashman16 6 лет назад
Thanks for the videos man. However it seems that the Q Matrix cannot be used with a large number of states.
@Cyphlix 6 лет назад
0:49 this is why I subbed
@adamwespiser9209 6 лет назад
Surprisingly good....hmmmmm. Great job!
@marcel2711 3 года назад
why anyone talking about reinforcement learning does this only in python. I wanna see examples in c++, in c#, in java.. why only python?
@bezelyesevenordek 3 года назад
dude just get the point, you can implement it in any programming language. and converting python code to c# is easy. c# has all the things python has. maybe lots of lines but, it's easy to do.
@vikramb183 6 лет назад
I'm confused about how this is machine learning. It seems to me that the computer just creates a lookup table. Could you please clarify, for I'm sure I am missing something?
@mahirgulzar5403 6 лет назад
Well he only talked about finding the optimal policy.. But not to forget you have to generalize on the optimal policy i.e you drop that agent in an environment which it doesn't know about or let me put it this way.. Suppose your agent has learned the policy that whenever the distance between the agent and an obstacle is precisely 5 meter it should apply brakes but heres the catch.. The state space is not discrete always it can be continuous. In-fact in real life it is continuous. So machine learning comes to play at this point.. It takes the state vector and applies function approximation (can be a neural network) to spit out an action on it.. Hope that helps.. :)
PS: ML is nothing but function approximation or curve fitting..
@vikramb183 6 лет назад
Thanks! I think I understand now.
@BosakMaw 6 лет назад
What is the difference between Q-function and Value-function? The formulas look very similar to me
@neilslater8223 6 лет назад ⁺¹
The Q function is also called the action value function, the main difference is that V(s) is the value of "being in state s" Q(s, a) is value of "being in state s, and taking action a". Both have similar-looking Bellman equations and Bellman optimality equations, and they can be related to each other based on how teh environment works and what the current policy is (no equation-writing in RUclips comments, so cannot show you :-(. Many model-free RL algorithms use Q because it gives you a way to select the next action (just pick a so that Q(s, a) is maximum out of all possible Q(s, ?) - whilst if you have V(s), the only way to maximise it is to look ahead to see what the next state will be, which you can only do if you have some way to predict what the next state (i.e. a model)
@aniseedus 6 лет назад
Must the reward be only discrete 1 or 0? Can it be an intermediate fraction or decimal?
@ronstubed 6 лет назад
How do we check the convergence of Q matrix?
@shantomathew-fh3hv 6 лет назад
Thanks for doing this Siraj. But I am running this in Ubuntu. I am not able to see anything though the code runs fine as i can see the iterations. Any idea how to fix it.
@arafatullahturjoy5380 5 лет назад
Can Q_LEARNING be used for solving classification problem? If it does then how? Could you explain or make a video regarding this?@Siraj Raval
@johnmelendez8829 5 лет назад ⁺⁸
"So easy, a liberal art major can do it " lmaooo🤣🤣🤣🤣🤣🤣
@Nik-dz1yc 4 года назад
this was pretty decent but you shouldnt have titled the video the way you did because i would not consider it that beginner friendly
@precogtyrant 6 лет назад
much better than your earlier ones. Pace is good and there're less memes and gimmicks. Good job!
@Raj_Patel21 6 лет назад
I am new here and dont know from where to start learning this stuff. Any suggestions
@crazyoldhippieguy 4 года назад
So is python the laugage for deep learning?
@aey2579 4 года назад
This guy is too smart for me. I need someone who is on my low IQ to explain this.
@Foxhood 4 года назад ⁺¹
I would suggest to look up Code Bullet. He does fun AI stuff in an easier to comprehend, sillier manner.
A.I Learns to DRIVE does Q-Learning and its bigger brother Deep Q network.
Not as in-depth, but fun to see and gives a good starting idea of what the AI does.
@Madlion 6 лет назад
Is model here referred to model of the environment/world?
@IrateMoogle 6 лет назад
You should have many more followers
@somtoachu5704 6 лет назад
this and genetic algorithm which one is better?
@GregorianHunter 4 года назад
q-learning is model-free learning, not model-based learning just FYI
@matthewdaly8879 6 лет назад ⁺¹
Does this mean that the player has to have already been in a state and taken some actions to make an optimal decisions, or is there a technique to use past results to estimate future rewards i.e. a neural network? With the state consisting of two different variables in this case, it seems like to would take a while for the car to find the best actions to take for each occurring state in a reasonable time. I'm a little confused.
@dustinandrews89019 6 лет назад ⁺²
"a technique to use past results to estimate future rewards i.e. a neural network", yes. Q learning is exactly that. Start with a untrained agent that knows nothing of the environment. Also strongly bias that agent to random actions at first in order to gather data. Next, allow the agent to take some number of actions in the environment, while recording the entire session. At the end take the reward, in this case it could be "units to the right of the start." No go back over the recording and apply a share of that score to every move (perhaps with a decay for the older actions.) Finally, feed each instance of the replay, one at a time, into the network. You have to provide the state of the world + action and train it towards the score. If you keep doing this your network will start to converge on an understanding of what moves will create what score. Once you gain some data stop purely randomly sampling actions. Start using predictions from your model to inform the next move (the rate at which you go from pure random to pure agent is an important hyper-parameter.) If all goes well your agent learns better and better actions to take in each situation until it knows how to get very good scores. At least, that's how it should work! I'm struggling to get my model to converge on a similar toy example.
@SirajRaval 6 лет назад ⁺²
what dustin said
@matthewdaly8879 6 лет назад ⁺²
Thanks
@dustinandrews89019 6 лет назад ⁺¹
Well that made my day. Thanks Siraj!
@RendallRen 6 лет назад
I didn't get the 'liberal arts major' reference. Who is the bearded man in the inset at 0:52?
@brianmvukwe5506 5 месяцев назад
This is top tier content man. Thank you so much!
@yogeshsaini5039 6 лет назад
great work sir
@eagleswildcard 6 лет назад
Great work man
@manjaecho5909 6 лет назад
Great video! Thx
@diegoosorio7752 3 года назад
Great video!
@chicken6180 6 лет назад
can anyone explain what the value of n_states represents in the program? does this mean there are only 40 possible positions in the environment or what? thanks in advance
edit - i'm thinking that the "3" in "q_table = np.zeros((n_states, n_states, 3))" represents the fact that there are 3 possible actions for the car? i'm confused.
@Kipsterbro 6 лет назад
Hey! i am aiming to create an Ai, i was going to use a genetic algorithm.
What do you think the best type of algorithm would be for creating a bipedal balancing/walking robot?
i was thinking of using unity to simulate the physics
Thanks
@raminrasoulinezhad 5 лет назад
cool. Thanks
@philrowlands1087 2 года назад
You are brilliant. I only dream of having your ease of understanding of these processes!🎉
@kermitthehermit9373 5 лет назад
Can anyone please explain the if else used for selecting random action.. Thank you
@Zohbie 6 лет назад ⁺²
0:50 For that joke you really deserve a subscribtion! :D

Следующие

Автовоспроизведение