Watch Q-learning Agent Play Game with Python - Reinforcement Learning Code Project

deeplizard

Просмотров 51 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 окт 2024

Комментарии • 79

@deeplizard 6 лет назад ⁺⁸
Check out the corresponding blog and other resources for this video at:
deeplizard.com/learn/video/ZaILVnqZFCg
@DEEPAKSV99 4 года назад ⁺³⁶
Great explanation mam.
Also, I have a doubt. In this video, it is noticeable that the agent took 3 different paths to the goal in its 3 episodes of play. My question is, even though the q-table is not being updated while playing & the agent is starting from the same state in every episode, what causes the randomness in it to take a different path every time?
EDIT: I found out the answer. It is the environment which is introducing the randomness by being too much slippery. The randomness was totally removed when I initialized the environment as "env = gym.make("FrozenLake-v0", is_slippery=False)", and then all the 3 episodes took the same path, which is the shortest path to the goal.
Thank you, I'm not deleting this comment since someone else may find this useful too :-)
@startrek3779 3 года назад ⁺⁴
Hi. Thank you for sharing this helpful guide with us!
@aashwinsharma8194 3 года назад ⁺¹
Thnx. I had the same doubt.
@Moonz97 6 лет назад ⁺²⁵
I love how you include small snippets of talks at the end. Amazing!
@wendersonj 4 года назад ⁺¹⁴
I "think" this is one (or the) best series of videos about reinforcement learning for free !
Thanks for all of your work and keep it going for the sake of us mortals !
Cheers from Brazil.
@tamoorkhan3262 4 года назад ⁺⁶
Most in-depth, informative, and yet underrated channel on Machine Learning. I hope it grows faster and keeps up the good work....👍
@kebbatiyassine7076 3 года назад ⁺⁵
This is amazing you are amazing! so nicely made I love your tutorials you're just the best.
num_episodes = 25000
max_steps_per_episode = 200
learning_rate = 0.005
discount_rate = 0.99
exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.0008
exploration_decay_rate = 0.001
These yielded 75%
@nitinmalhotra2560 3 года назад ⁺³
These videos are awesome. I'm genuinely thankful to you for these videos.
@rishabhsheoran6959 2 года назад ⁺¹
I just found the best course on Reinforcement Learning!
@Mahmoud-li2xn Год назад ⁺¹
We love you deeplizard, you're the best
@MuhammadArshad 2 года назад ⁺¹
I was so rooting for the agent to fall in the hole :D I guess I am on team White Walkers :P . Amazing tutorials. Thank you!
@ramasamyseenivasagan4174 4 года назад ⁺¹
concepts clearly explained. Best material in the web. Keep going!!!
@Akavall 5 лет назад ⁺⁴
I was analyzing the q-table, and it didn't make much sense to me, in some case the highest value would be, say, "Right" which lands into a hole, but then I realized that the reason for that is that the algorithm has learned probabilities of slipping, or in other words it learned that "the best we can do here is: try to go Right, and then slip in the correct direction".
@himanshuladia9099 5 лет назад
That's actually a very good observation!
@AkshayAradhya 5 лет назад ⁺²
I dont think it would intentionally go into a hole. Chance of slipping is just 10%
@malinamalinowska9184 3 года назад
I really like that series! It's simple, and you quickly can code something and see how it works. Good job :)
@sounakbhowmik2841 4 года назад ⁺³
the red marker, showing the agent's current position is not apearing. Can you help?
@christianjt7018 5 лет назад ⁺¹
nice, it was very fun to code it with you
@tallwaters9708 5 лет назад ⁺⁷
So, the print on screen (e.g., left, right) is the way the agent wants to move, but the slippy ice could make it move somewhere else?
@deeplizard 5 лет назад ⁺²
Correct!
@EranM 2 года назад
I think, by how the AI played. that when u go up, u might go up with highest chance, or left/right with better chance, and im not sure about down.. i think opposite is 0 chances.. by the example we've seen ofcourse..
@valentin.stamate 2 года назад ⁺¹
I love the intro song
@aamir122a 6 лет назад ⁺³
With Neural network we can approximate Q-Table inside the neural network, this comes in handy when either state space is too large or size of the state space is not known in advance. However, the key to solving sparse rewards and large state spaces are better exploration algorithm. I will talk about value network and policy network some other time, good effort though.
@deeplizard 6 лет назад ⁺³
Exactly, Aamir! This is the direction we'll be going in when we learn deep reinforcement learning.
@aamir122a 6 лет назад
At some point, it would be good to cover topics like proximal policy optimization, from OpenAI and multi-agent RL.
@tinyentropy 5 лет назад ⁺⁷
Do I understand it correctly, i.e. without the slippery ice, that would be a deterministic process (unless there is a tie of actions yielding maximal q-values), i.e. all three episodes would exactly be played the same oder of steps?
@deeplizard 5 лет назад ⁺³
Correct!
@suileungmak9325 5 лет назад ⁺²
Thank for your video. I would like to ask why the agent could not get 100% win. If the agent choose the action based on the highest value in Q-Table, he should follow the same path in each episode and get 100% win. Would you mind to give me some advise ? Thanks for all your time.
@shireeshkumar6631 4 года назад ⁺¹
Iven't started the series yet....but one question arises after watching this video....once the model gets trained for which ever game we work on, can we store the weight of the model and launch a game human vs computer....do u teach how to do that in your videos ....??
@ankurgupta2806 5 лет назад
Just a suggestion, in the code you can also include the condition for the steps to exceed the threshold but it ends up neither in a hole not at the goal
@oswaldopaez2947 2 года назад
I love your videos. It's the best series for this topic. However I have a question, my agent only takes 2 actions up and down and it never changes the action, it's always the same. Any suggestion for this? Once again, thanks for uploading these videos :)
@rainfeedermusic 4 года назад ⁺¹
This environment is "SFFF, FHFH, . . . "
Any idea how I can change it? (The grid)
@msbrdmr 10 месяцев назад
What would it be like if I changed the rewards or defined another environment for this problem? If the state is Hole, the reward will be -10; for Goal, it will be -1, and for Frozen grid, it will be -2 for example?
@MichaelJamesActually 3 месяца назад ⁺¹
2024 - had to install the latest version of py_game. also, had to use render mode human. it opened up the more animated display.
@siddhantagarwal4941 Год назад ⁺¹
I am getting the error "No available video device". What should I do?
@maryammahdavi8923 2 года назад
Hi , thanks for ur great explanations, I have a question : when I want to run the code I receive this massage : done==True is unexpected indent, why?????
@abcqer555 6 лет назад ⁺²
Keep em comming :)
@mateusbalotin7247 3 года назад
Great video, thank you!
@KaramAbuGhalieh 5 лет назад ⁺¹
what is the "scope" of the agent? like does it know what are the surrounding blocks?? state gives the current location only, right?
@mohanadahmed2819 4 года назад ⁺¹
No. It does not know the environment. The environment (represented by a set of the states) evolves probabilistically. The agent becomes more aware of the environment as it explores it more, but this all in a probabilistic sense.
@AshishKumar-ny4cq Год назад ⁺¹
Hi pleseee HELP....
Hi this was great...
I almost completed it, but I cant see my agent play, I can only see episode 1 than nothing comes in screen and after some time, you reached the goal. But I cant see my agent play nor game grid.
Please help me out.
I checked the code so many times its exactly same.
@im-Anarchy Год назад ⁺¹
I am having same issue, and since you posted this comment a month ago, have you resolved this issue, any help is appreciated
@ehtesamulazim5887 2 года назад ⁺¹
the render function doesn't show any error message for me, the code works fine, the agent does great but I can't see the actions of the agent or the state of the environment. using jupyter on firefox
@siddhantagarwal4941 Год назад ⁺¹
Same with me
@im-Anarchy Год назад ⁺²
@@siddhantagarwal4941 have you guys resolved this issue, Im having same prob.
@JoJo777890 Год назад
I have the same problem.
@jaitiwari9624 4 месяца назад
I did exactly the same thing but somehow my game is not rendering. i can only see the episode and you reached the goal text.
Help please.
Also I use version 1 of frozen lake
@joey101046 6 лет назад ⁺¹
Great stuff , when will you upload deep reinforcement learning videos. Thanks again .
@deeplizard 6 лет назад
Thanks, joey! I'm aiming to release a new video to this series every few days! The next video starts deep reinforcement learning.
@joey101046 6 лет назад
Super kool . I am writing a book with O'Rielly on AI in which one of the chapter is on RL & Deep RL . I am a practicing Data Scientist and work on ML projects for clients like Mercedes , Nissan , JCPenny etc. I find your tutorials extremely useful and incredibley engaging as well. Looking forward to many more. Also , will it be alright , If i can take some of your content for my chapter in the book ? Thanks in advance :) Cheers
@deeplizard 6 лет назад ⁺¹
I'm glad you're finding value in the tutorials! Also, thanks for sharing your background. That all sounds great!
At this time, we're not authorizing any reproduction of deeplizard content, but I wish you all the best with your book!
@izzatbr1684 3 года назад
you are the best, thanks a lot!
@user-rt6wc9vt1p 4 года назад
how can this algorithm be applied to enviroments that change? for example when you reach food you would eat it, therefore removing it from the enviroment and changing it
@mohammedehsanurrahman1047 4 года назад
Can you also explain A2C and A3C algorithms ?
@TheHunnycool 5 лет назад ⁺¹
FROZEN LAKE: How can the agent move to the left of starting point, if not then why does it have a large q-value for left action in q-table? Thanx for replying
@deeplizard 5 лет назад
Note that the game is non-deterministic because although the agent can choose to move left from the starting state, since the ice is slippery, then the ice may make the agent slip into another state rather than the state the agent chose to move. This explains why the Q-value associated with a ending state could be non-zero.
@cyborgx1156 3 года назад
@@deeplizardDoes the slipperiness affects the performance of the agent?
@amee9442 Год назад
How is the grid of the game displayed? These is no command in the code right?
@tanmaykulkarni6046 4 месяца назад
Yes, no command to print the grid
@ilfi0re 5 лет назад
in your q_table first line: [0.57804676 0.51767675 0.50499139 0.47330103] where argmax take first element 0.578..., thats is equivalent to go always left from start point? agent must remain at start point? on the other hand, It seems there is a strange (wrong) action results for example: take.ms/EnAyhU .I do something wrong or it is code error inside OpenAI Gym?
@faezeaghamirzaei5947 4 года назад
thank you its great
@dulminarenuka8819 6 лет назад ⁺¹
awesome thanx a lot :-)
@Nandu369 5 лет назад
I am facing an issue with the above code implementation.When i'm testing the game with the trained Q-Learning agent,none of my episodes out of 3 the episodes are winning.Is that normal or is it a mistake in my code.The avg reward after 10000 episodes is 0.68.
@deeplizard 5 лет назад
Hm... With an average reward of 0.68, your agent should be winning 68% of the time. Try running 10 episodes as opposed to 3, and see how many (if any) episodes your agent wins. If it's still not winning at all, I would assume there is an error or misconfiguration with your code.
@samarthagarwal7219 3 года назад
If reward coming greater than 1 .what could have went wrong?
@FirstNameLastName-fv4eu 4 года назад ⁺¹
After I subscribed to your channel, I felt so small!!! So tiny!!! So little. Who are you guys ??? you have any certifications where I pay money and get a part of your brain?
@deeplizard 4 года назад ⁺²
😂😂😂 Our names are Chris and Mandy. We share more about ourselves on our vlog channel and Instagram :)
ruclips.net/channel/UC9cBIteC3u7Ee6bzeOcl_Og
instagram.com/deeplizard
No certifications available as of yet! 🧠
@Vanilla102 4 года назад
pretty dope
@jastremblay3299 4 года назад
what you using to code in? I tried pycharm & it can't find the stuff you imported
@deeplizard 4 года назад
I was using Jupyter Notebook.
@NityaStriker 4 года назад
Cody
You could either download Anaconda Navigator and launch JupyterLab / Jupyter Notebook from there. Or you could create an account on IBM Skills Network Labs and open a Jupyter Notebook there. For this course, I’d prefer the first option but both are viable options.
@wishIKnewHowToLove Год назад
:)
@andrecalhoun1020 5 лет назад ⁺¹
My agent just keeps going Left. FeelsBadMan..
@maximind5677 5 лет назад
@Kai Rycroft If you've uploaded the code on github, please give me the link as I'm facing some problems...
@maximind5677 5 лет назад
@Kai Rycroft File "", line 22
state=new_state
^
SyntaxError: invalid syntax
This the error I'm getting. idk what I'm doing wrong...
@wendersonj 4 года назад
My agent does it too.
I think that it's because of the two holes close to him, so he keeps hitting the wall to not fall.
I tried to use a discount in the reward for every step, making it go to the right in some executions... but it insist on going left . (:P)
Cheers

Следующие

Автовоспроизведение

Deep Q-Learning - Combining Neural Networks and Reinforcement Learning