Reinforcement Learning via Q-Learning: Learning the Values of the Best Actions

Jacob Schrum

Просмотров 115 тыс.

941

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 5 фев 2025
** Apologies for the low volume. Just turn it up **
The idea of Temporal Difference learning is introduced, by which an agent can learn state/action utilities from scratch. The specific Q-learning algorithm is discussed, by showing the rule it uses to update Q values, and by demoing its behavior in a grid world.
The program used in this video is part of the Pac-Man projects at: ai.berkeley.edu...
The specific project from which this program comes is available at this link: ai.berkeley.edu...
The grid world problem is from Artificial Intelligence: A Modern Approach, by Russell and Norvig

Комментарии • 68

@iDKMuzick 9 лет назад ⁺³⁰
I don't know if anyone told you this, but this is a good video. You definitely lifted the fog for me a little, which is way better than where I started. Appreciate the vid.
@frazierbaker2982 7 лет назад ⁺⁸
I really like how you slowly wrote out the Q update equation and clearly said what each variable meant. It can be challenging to find that on the internet sometimes.
@EduardoYukio 6 лет назад
I don't get tired of praising your teaching skills, your explanation is excellent!
I think the part where you explain the Q-Learning equation is a nice example. You go slow, explaining what each part of the equation means, not only in a mathematical way, but in the perspective of the agent too. This is a subtle thing, but it makes total difference.
At the end of the video, when you show, slowly and manually, how the agent would behave and how the values are updated, it is clear that you really want to make us fully understand the topic, without laziness or preponderance. Thank you very much, please keep on doing what you are doing!
@physicsmadness 5 лет назад ⁺⁴
None of the videos on youtube explain the process in iterative way..how values actually change...all just talk in air..this is the first that has clarified the theory for me ..
@kleemc 7 лет назад ⁺⁵
Best video on Q Learning. I have viewed so many videos but none showed how the Q values are calculated dynamically. Well done and thank you.
@Grkashani 5 лет назад
Jacob you were brilliant , No boduy else explained like you, as Albert Einstein said : If you can't explain it simply, you don't understand it well enough.
@Ip_man22 6 лет назад ⁺⁴
I was searching for these kind of videos for a long time, finally found it ! Thank you so much Jacob, looking forward to see more of your videos :)
@YoussouphaSambe 9 лет назад ⁺⁶
You guy really have a knack for explaining. Thank you very much!!
@MrTJK1492 8 лет назад ⁺¹
Must say that your videos on Computer Science are awesome keep on rolling!
@WugglersProductions 8 лет назад ⁺¹
Thank you! Yours is the only video i have found that really breaks it all down clearly and nicely!
@Skandawin78 7 лет назад
Superb video.. atlast afterwatching so many videos on Qlearning , i finally got a grip on how it exactly it works . thanks.
I previously ignored your video due to low audio. but this is by far the best video on Q learning
@AshK455 4 года назад ⁺²
I wish you were mine teacher. Wasn't able to sleep comfortably because my instructor just showed the equations not explain the equation in detail.
@riderblack6401 6 лет назад
Thank you so much Jacob, your video is the best of demonstrating what Q learning is on the whole google net I searched! Plz make new contents about Machine Learning!
@muzammilnxs 8 лет назад
Very calm and composed explanation. Just makes things look easy. thankyou very much
@5743363 10 месяцев назад
Here is the values for the parameters in the equation, for who still cannot link up the action demonstration and the equation:
7:08
Given: gamma = 1, alpha = 0.5,
The cost refers to r_action = -0.04
Let
s = (bottom left corner) denoted as [[0,0,0,0], [0,0,0,0], [1,0,0,0]) (as long as you can identify the location)
a = UP
Q(bottom left, , UP) = 0 + 0.5(-0.04 + 1*0 -0) = -0.02
@alfcnz 8 лет назад
Question: when you say at 02:57 that Q(s_t, a_t)
@JacobSchrum 8 лет назад ⁺¹
yes
@alfcnz 8 лет назад
Oh, ok, great. Perhaps you can specify in the description what each mathematical symbol represent in terms of your previous computations.
@dereck-2205 2 года назад
Finally a detailed explanation
@aeigreen 7 лет назад
Indeed a great video form formulation to actual practical example of Q learner.
@artem_isakow Год назад
Absolutely the best explanation 👌
@physicsmadness 5 лет назад
Easily the best explanation of q learning
@danielahmadzadeh4921 6 лет назад
the best RL video, thank you, helped me before exams
@rebeccaaben-athar7714 3 года назад
really good job, thank you! I was looking for videos like this
@walter_ullon 6 лет назад
This is an outstanding video. Kudos!
@francesc8882 4 года назад
I keep on without understanding the process in order to implement it myself... I will have to do an effort with having a look on other tutorials. I barely see how you calculate the Q values from 6:26 and further on the video.
@chihoxtra 6 лет назад
Hello Jacob, thank you so much for this video. I tried some other RL courses but none were as clear as yours. Thank you for helping me.
@Quintenkonijn 7 лет назад
Very educational simulation! Thanks Jacob :-)
@Chillos100 4 года назад
Thank you so much!! Really a life saver
@hyejin1986 8 лет назад ⁺⁶
Why the sound is so quiet? I had to use earphone to hear it.
@davidjeon8132 4 года назад
Can someone explain why the final value for the first iteration is 0.5? And why the right action value is -0.02 and not a bigger number since it found the reward?
@nawazishalvi9886 4 года назад
This is a good video thank you so much.
@KennTollens 4 года назад
At 9:55 when you go up, the value changed from -.02 to -.03. Can someone plug in the numbers to the formula so I can see how it arrived at -.03?
@wonderInNoWhere 4 года назад
The update formula for the qlearning is q(s, a) += (1-learningRate)*q(s,a) + learningRate * (reward + discountFactor*max(q(s',a)).
learningRate= 0.5
q(s,a)=-0.02
reward = -0.04
discountFactor=0.9
max(q(s',a)) = 0
So q(s,a) = 0.5*-0.02 + 0.5*(-0.04+0)=-0.03
@lovemormus 4 года назад
@@wonderInNoWhere no this is not correct, the discountFactor is 1, it is mentioned in the video
@wonderInNoWhere 4 года назад
Does anyone figure out how to get the code for the display in this video? I tried to implement it with the code he provided in the description, but can't get the number updated as it is in the video. I appreciate it if anyone could provide the code.
@edgargutierrez2274 7 лет назад
Thank you!. Great explanation. What do you recommend to stop the process, if it do not converge?
@riderblack6401 6 лет назад
Why does the final state has state value? Isn't the final state has 0 all the time because it terminates all action once agent is in it?
@toxicitysocks 8 лет назад
Question. at ~ 4:22 is the r_t+1 the reward for being in the state we are in (that we are just about to leave), or is it the reward for the optimal state (the one we're getting the max of)?
@JacobSchrum 8 лет назад
It's really in between the states. It is the reward associated with performing the action. However, I guess you could think of it as the reward for entering state s_t+1
@imveryhungry112 6 лет назад ⁺¹
I wish I could take your college class!
@leetcode_hardcore 8 лет назад ⁺²
Thanks for the upload.. but I'm hearing the sound too small...
@MrNaveenmn 8 лет назад
Beautiful!! Thanks a lot...
@Zakiyfarhanfuad 6 лет назад
What if a state action have a same value with another action and both of them make a repeat step?
@Skandawin78 7 лет назад
when the value is the same on all four sides what makes the agent to decide the next action?
@mscherf94 2 года назад
Can someone explain how the computer gets from 0.75 to 0.88?
@sashahelliwel 6 лет назад
How have you visualized the numbers while moving?
@NikeshBajaj 7 лет назад
I am still confused with the terms reward here, is it -0.04 as you described in previous video as movement cost?
@Andrewl9110 7 лет назад
it seems so
@qingli1422 7 лет назад
This video explain the Q-learning equation so clearly, thanks a lot. BTW, where can get the code for the Gridworld Display? That would be helpful to understand how the theory goes.
@kostyamamuli1999 2 года назад
good stuff
@prateekarora8561 4 года назад
good shit bro
@juleswombat5309 8 лет назад ⁺⁴¹
Can't hear the audio, much too quiet
@vinaypursnani1903 4 года назад ⁺¹
Go someplace quiet and increase volume to highest. 💯 Worth doing for the way he’s explaining this believe me.
@heenashaikh8422 4 года назад
Thank you
@Skandawin78 7 лет назад
one question. is the movement cost of -0.04 in the Q learning formula is Rt+1 ?
@omeriftikhar9153 6 лет назад
Yes.
@vedantyadav3977 7 лет назад
Thanks 4 this video
@Alysai-zy3yq 2 года назад
Can I contact with you ?
@ShameerBashir-o8m 4 месяца назад
voice is very low Sir
@mbord7057 7 лет назад
Audio is too quiet to hear
@spellweavergeneziso 5 лет назад
R(t+1) ARE YOU SURE ??? I think it should be R(t)
@Skandawin78 7 лет назад
couldn't hear a thing..
@testtesttesttesttest884 7 лет назад
Fix the audio please :)
@johnsmith-mp4pr 6 лет назад
You sound like Sam Harris
@AllAboutCode 7 лет назад
Not at all audible
@joesiu4972 6 лет назад
hey can you talk quieter. im covering my ears, and my neighbors hear this through the house
@spamspamer3679 6 лет назад
Really really good and resonabley explained

Следующие

Автовоспроизведение

Q-Learning: What do those parameters mean? Epsilon, Gamma, and Alpha explained