Reinforcement Learning via Q-Learning: Learning the Values of the Best Actions
HTML-код
- Опубликовано: 5 фев 2025
- ** Apologies for the low volume. Just turn it up **
The idea of Temporal Difference learning is introduced, by which an agent can learn state/action utilities from scratch. The specific Q-learning algorithm is discussed, by showing the rule it uses to update Q values, and by demoing its behavior in a grid world.
The program used in this video is part of the Pac-Man projects at: ai.berkeley.edu...
The specific project from which this program comes is available at this link: ai.berkeley.edu...
The grid world problem is from Artificial Intelligence: A Modern Approach, by Russell and Norvig
I don't know if anyone told you this, but this is a good video. You definitely lifted the fog for me a little, which is way better than where I started. Appreciate the vid.
I really like how you slowly wrote out the Q update equation and clearly said what each variable meant. It can be challenging to find that on the internet sometimes.
I don't get tired of praising your teaching skills, your explanation is excellent!
I think the part where you explain the Q-Learning equation is a nice example. You go slow, explaining what each part of the equation means, not only in a mathematical way, but in the perspective of the agent too. This is a subtle thing, but it makes total difference.
At the end of the video, when you show, slowly and manually, how the agent would behave and how the values are updated, it is clear that you really want to make us fully understand the topic, without laziness or preponderance. Thank you very much, please keep on doing what you are doing!
None of the videos on youtube explain the process in iterative way..how values actually change...all just talk in air..this is the first that has clarified the theory for me ..
Best video on Q Learning. I have viewed so many videos but none showed how the Q values are calculated dynamically. Well done and thank you.
Jacob you were brilliant , No boduy else explained like you, as Albert Einstein said : If you can't explain it simply, you don't understand it well enough.
I was searching for these kind of videos for a long time, finally found it ! Thank you so much Jacob, looking forward to see more of your videos :)
You guy really have a knack for explaining. Thank you very much!!
Must say that your videos on Computer Science are awesome keep on rolling!
Thank you! Yours is the only video i have found that really breaks it all down clearly and nicely!
Superb video.. atlast afterwatching so many videos on Qlearning , i finally got a grip on how it exactly it works . thanks.
I previously ignored your video due to low audio. but this is by far the best video on Q learning
I wish you were mine teacher. Wasn't able to sleep comfortably because my instructor just showed the equations not explain the equation in detail.
Thank you so much Jacob, your video is the best of demonstrating what Q learning is on the whole google net I searched! Plz make new contents about Machine Learning!
Very calm and composed explanation. Just makes things look easy. thankyou very much
Here is the values for the parameters in the equation, for who still cannot link up the action demonstration and the equation:
7:08
Given: gamma = 1, alpha = 0.5,
The cost refers to r_action = -0.04
Let
s = (bottom left corner) denoted as [[0,0,0,0], [0,0,0,0], [1,0,0,0]) (as long as you can identify the location)
a = UP
Q(bottom left, , UP) = 0 + 0.5(-0.04 + 1*0 -0) = -0.02
Question: when you say at 02:57 that Q(s_t, a_t)
yes
Oh, ok, great. Perhaps you can specify in the description what each mathematical symbol represent in terms of your previous computations.
Finally a detailed explanation
Indeed a great video form formulation to actual practical example of Q learner.
Absolutely the best explanation 👌
Easily the best explanation of q learning
the best RL video, thank you, helped me before exams
really good job, thank you! I was looking for videos like this
This is an outstanding video. Kudos!
I keep on without understanding the process in order to implement it myself... I will have to do an effort with having a look on other tutorials. I barely see how you calculate the Q values from 6:26 and further on the video.
Hello Jacob, thank you so much for this video. I tried some other RL courses but none were as clear as yours. Thank you for helping me.
Very educational simulation! Thanks Jacob :-)
Thank you so much!! Really a life saver
Why the sound is so quiet? I had to use earphone to hear it.
Can someone explain why the final value for the first iteration is 0.5? And why the right action value is -0.02 and not a bigger number since it found the reward?
This is a good video thank you so much.
At 9:55 when you go up, the value changed from -.02 to -.03. Can someone plug in the numbers to the formula so I can see how it arrived at -.03?
The update formula for the qlearning is q(s, a) += (1-learningRate)*q(s,a) + learningRate * (reward + discountFactor*max(q(s',a)).
learningRate= 0.5
q(s,a)=-0.02
reward = -0.04
discountFactor=0.9
max(q(s',a)) = 0
So q(s,a) = 0.5*-0.02 + 0.5*(-0.04+0)=-0.03
@@wonderInNoWhere no this is not correct, the discountFactor is 1, it is mentioned in the video
Does anyone figure out how to get the code for the display in this video? I tried to implement it with the code he provided in the description, but can't get the number updated as it is in the video. I appreciate it if anyone could provide the code.
Thank you!. Great explanation. What do you recommend to stop the process, if it do not converge?
Why does the final state has state value? Isn't the final state has 0 all the time because it terminates all action once agent is in it?
Question. at ~ 4:22 is the r_t+1 the reward for being in the state we are in (that we are just about to leave), or is it the reward for the optimal state (the one we're getting the max of)?
It's really in between the states. It is the reward associated with performing the action. However, I guess you could think of it as the reward for entering state s_t+1
I wish I could take your college class!
Thanks for the upload.. but I'm hearing the sound too small...
Beautiful!! Thanks a lot...
What if a state action have a same value with another action and both of them make a repeat step?
when the value is the same on all four sides what makes the agent to decide the next action?
Can someone explain how the computer gets from 0.75 to 0.88?
How have you visualized the numbers while moving?
I am still confused with the terms reward here, is it -0.04 as you described in previous video as movement cost?
it seems so
This video explain the Q-learning equation so clearly, thanks a lot. BTW, where can get the code for the Gridworld Display? That would be helpful to understand how the theory goes.
good stuff
good shit bro
Can't hear the audio, much too quiet
Go someplace quiet and increase volume to highest. 💯 Worth doing for the way he’s explaining this believe me.
Thank you
one question. is the movement cost of -0.04 in the Q learning formula is Rt+1 ?
Yes.
Thanks 4 this video
Can I contact with you ?
voice is very low Sir
Audio is too quiet to hear
R(t+1) ARE YOU SURE ??? I think it should be R(t)
couldn't hear a thing..
Fix the audio please :)
You sound like Sam Harris
Not at all audible
hey can you talk quieter. im covering my ears, and my neighbors hear this through the house
Really really good and resonabley explained