Great explanation mam. Also, I have a doubt. In this video, it is noticeable that the agent took 3 different paths to the goal in its 3 episodes of play. My question is, even though the q-table is not being updated while playing & the agent is starting from the same state in every episode, what causes the randomness in it to take a different path every time? EDIT: I found out the answer. It is the environment which is introducing the randomness by being too much slippery. The randomness was totally removed when I initialized the environment as "env = gym.make("FrozenLake-v0", is_slippery=False)", and then all the 3 episodes took the same path, which is the shortest path to the goal. Thank you, I'm not deleting this comment since someone else may find this useful too :-)
I "think" this is one (or the) best series of videos about reinforcement learning for free ! Thanks for all of your work and keep it going for the sake of us mortals ! Cheers from Brazil.
This is amazing you are amazing! so nicely made I love your tutorials you're just the best. num_episodes = 25000 max_steps_per_episode = 200 learning_rate = 0.005 discount_rate = 0.99 exploration_rate = 1 max_exploration_rate = 1 min_exploration_rate = 0.0008 exploration_decay_rate = 0.001 These yielded 75%
I was analyzing the q-table, and it didn't make much sense to me, in some case the highest value would be, say, "Right" which lands into a hole, but then I realized that the reason for that is that the algorithm has learned probabilities of slipping, or in other words it learned that "the best we can do here is: try to go Right, and then slip in the correct direction".
I think, by how the AI played. that when u go up, u might go up with highest chance, or left/right with better chance, and im not sure about down.. i think opposite is 0 chances.. by the example we've seen ofcourse..
With Neural network we can approximate Q-Table inside the neural network, this comes in handy when either state space is too large or size of the state space is not known in advance. However, the key to solving sparse rewards and large state spaces are better exploration algorithm. I will talk about value network and policy network some other time, good effort though.
Do I understand it correctly, i.e. without the slippery ice, that would be a deterministic process (unless there is a tie of actions yielding maximal q-values), i.e. all three episodes would exactly be played the same oder of steps?
Thank for your video. I would like to ask why the agent could not get 100% win. If the agent choose the action based on the highest value in Q-Table, he should follow the same path in each episode and get 100% win. Would you mind to give me some advise ? Thanks for all your time.
Iven't started the series yet....but one question arises after watching this video....once the model gets trained for which ever game we work on, can we store the weight of the model and launch a game human vs computer....do u teach how to do that in your videos ....??
Just a suggestion, in the code you can also include the condition for the steps to exceed the threshold but it ends up neither in a hole not at the goal
I love your videos. It's the best series for this topic. However I have a question, my agent only takes 2 actions up and down and it never changes the action, it's always the same. Any suggestion for this? Once again, thanks for uploading these videos :)
What would it be like if I changed the rewards or defined another environment for this problem? If the state is Hole, the reward will be -10; for Goal, it will be -1, and for Frozen grid, it will be -2 for example?
Hi , thanks for ur great explanations, I have a question : when I want to run the code I receive this massage : done==True is unexpected indent, why?????
No. It does not know the environment. The environment (represented by a set of the states) evolves probabilistically. The agent becomes more aware of the environment as it explores it more, but this all in a probabilistic sense.
Hi pleseee HELP.... Hi this was great... I almost completed it, but I cant see my agent play, I can only see episode 1 than nothing comes in screen and after some time, you reached the goal. But I cant see my agent play nor game grid. Please help me out. I checked the code so many times its exactly same.
the render function doesn't show any error message for me, the code works fine, the agent does great but I can't see the actions of the agent or the state of the environment. using jupyter on firefox
I did exactly the same thing but somehow my game is not rendering. i can only see the episode and you reached the goal text. Help please. Also I use version 1 of frozen lake
Super kool . I am writing a book with O'Rielly on AI in which one of the chapter is on RL & Deep RL . I am a practicing Data Scientist and work on ML projects for clients like Mercedes , Nissan , JCPenny etc. I find your tutorials extremely useful and incredibley engaging as well. Looking forward to many more. Also , will it be alright , If i can take some of your content for my chapter in the book ? Thanks in advance :) Cheers
I'm glad you're finding value in the tutorials! Also, thanks for sharing your background. That all sounds great! At this time, we're not authorizing any reproduction of deeplizard content, but I wish you all the best with your book!
how can this algorithm be applied to enviroments that change? for example when you reach food you would eat it, therefore removing it from the enviroment and changing it
FROZEN LAKE: How can the agent move to the left of starting point, if not then why does it have a large q-value for left action in q-table? Thanx for replying
Note that the game is non-deterministic because although the agent can choose to move left from the starting state, since the ice is slippery, then the ice may make the agent slip into another state rather than the state the agent chose to move. This explains why the Q-value associated with a ending state could be non-zero.
in your q_table first line: [0.57804676 0.51767675 0.50499139 0.47330103] where argmax take first element 0.578..., thats is equivalent to go always left from start point? agent must remain at start point? on the other hand, It seems there is a strange (wrong) action results for example: take.ms/EnAyhU .I do something wrong or it is code error inside OpenAI Gym?
I am facing an issue with the above code implementation.When i'm testing the game with the trained Q-Learning agent,none of my episodes out of 3 the episodes are winning.Is that normal or is it a mistake in my code.The avg reward after 10000 episodes is 0.68.
Hm... With an average reward of 0.68, your agent should be winning 68% of the time. Try running 10 episodes as opposed to 3, and see how many (if any) episodes your agent wins. If it's still not winning at all, I would assume there is an error or misconfiguration with your code.
After I subscribed to your channel, I felt so small!!! So tiny!!! So little. Who are you guys ??? you have any certifications where I pay money and get a part of your brain?
😂😂😂 Our names are Chris and Mandy. We share more about ourselves on our vlog channel and Instagram :) ruclips.net/channel/UC9cBIteC3u7Ee6bzeOcl_Og instagram.com/deeplizard No certifications available as of yet! 🧠
Cody You could either download Anaconda Navigator and launch JupyterLab / Jupyter Notebook from there. Or you could create an account on IBM Skills Network Labs and open a Jupyter Notebook there. For this course, I’d prefer the first option but both are viable options.
My agent does it too. I think that it's because of the two holes close to him, so he keeps hitting the wall to not fall. I tried to use a discount in the reward for every step, making it go to the right in some executions... but it insist on going left . (:P) Cheers
Check out the corresponding blog and other resources for this video at:
deeplizard.com/learn/video/ZaILVnqZFCg
Great explanation mam.
Also, I have a doubt. In this video, it is noticeable that the agent took 3 different paths to the goal in its 3 episodes of play. My question is, even though the q-table is not being updated while playing & the agent is starting from the same state in every episode, what causes the randomness in it to take a different path every time?
EDIT: I found out the answer. It is the environment which is introducing the randomness by being too much slippery. The randomness was totally removed when I initialized the environment as "env = gym.make("FrozenLake-v0", is_slippery=False)", and then all the 3 episodes took the same path, which is the shortest path to the goal.
Thank you, I'm not deleting this comment since someone else may find this useful too :-)
Hi. Thank you for sharing this helpful guide with us!
Thnx. I had the same doubt.
I love how you include small snippets of talks at the end. Amazing!
I "think" this is one (or the) best series of videos about reinforcement learning for free !
Thanks for all of your work and keep it going for the sake of us mortals !
Cheers from Brazil.
Most in-depth, informative, and yet underrated channel on Machine Learning. I hope it grows faster and keeps up the good work....👍
This is amazing you are amazing! so nicely made I love your tutorials you're just the best.
num_episodes = 25000
max_steps_per_episode = 200
learning_rate = 0.005
discount_rate = 0.99
exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.0008
exploration_decay_rate = 0.001
These yielded 75%
These videos are awesome. I'm genuinely thankful to you for these videos.
I just found the best course on Reinforcement Learning!
We love you deeplizard, you're the best
I was so rooting for the agent to fall in the hole :D I guess I am on team White Walkers :P . Amazing tutorials. Thank you!
concepts clearly explained. Best material in the web. Keep going!!!
I was analyzing the q-table, and it didn't make much sense to me, in some case the highest value would be, say, "Right" which lands into a hole, but then I realized that the reason for that is that the algorithm has learned probabilities of slipping, or in other words it learned that "the best we can do here is: try to go Right, and then slip in the correct direction".
That's actually a very good observation!
I dont think it would intentionally go into a hole. Chance of slipping is just 10%
I really like that series! It's simple, and you quickly can code something and see how it works. Good job :)
the red marker, showing the agent's current position is not apearing. Can you help?
nice, it was very fun to code it with you
So, the print on screen (e.g., left, right) is the way the agent wants to move, but the slippy ice could make it move somewhere else?
Correct!
I think, by how the AI played. that when u go up, u might go up with highest chance, or left/right with better chance, and im not sure about down.. i think opposite is 0 chances.. by the example we've seen ofcourse..
I love the intro song
With Neural network we can approximate Q-Table inside the neural network, this comes in handy when either state space is too large or size of the state space is not known in advance. However, the key to solving sparse rewards and large state spaces are better exploration algorithm. I will talk about value network and policy network some other time, good effort though.
Exactly, Aamir! This is the direction we'll be going in when we learn deep reinforcement learning.
At some point, it would be good to cover topics like proximal policy optimization, from OpenAI and multi-agent RL.
Do I understand it correctly, i.e. without the slippery ice, that would be a deterministic process (unless there is a tie of actions yielding maximal q-values), i.e. all three episodes would exactly be played the same oder of steps?
Correct!
Thank for your video. I would like to ask why the agent could not get 100% win. If the agent choose the action based on the highest value in Q-Table, he should follow the same path in each episode and get 100% win. Would you mind to give me some advise ? Thanks for all your time.
Iven't started the series yet....but one question arises after watching this video....once the model gets trained for which ever game we work on, can we store the weight of the model and launch a game human vs computer....do u teach how to do that in your videos ....??
Just a suggestion, in the code you can also include the condition for the steps to exceed the threshold but it ends up neither in a hole not at the goal
I love your videos. It's the best series for this topic. However I have a question, my agent only takes 2 actions up and down and it never changes the action, it's always the same. Any suggestion for this? Once again, thanks for uploading these videos :)
This environment is "SFFF, FHFH, . . . "
Any idea how I can change it? (The grid)
What would it be like if I changed the rewards or defined another environment for this problem? If the state is Hole, the reward will be -10; for Goal, it will be -1, and for Frozen grid, it will be -2 for example?
2024 - had to install the latest version of py_game. also, had to use render mode human. it opened up the more animated display.
I am getting the error "No available video device". What should I do?
Hi , thanks for ur great explanations, I have a question : when I want to run the code I receive this massage : done==True is unexpected indent, why?????
Keep em comming :)
Great video, thank you!
what is the "scope" of the agent? like does it know what are the surrounding blocks?? state gives the current location only, right?
No. It does not know the environment. The environment (represented by a set of the states) evolves probabilistically. The agent becomes more aware of the environment as it explores it more, but this all in a probabilistic sense.
Hi pleseee HELP....
Hi this was great...
I almost completed it, but I cant see my agent play, I can only see episode 1 than nothing comes in screen and after some time, you reached the goal. But I cant see my agent play nor game grid.
Please help me out.
I checked the code so many times its exactly same.
I am having same issue, and since you posted this comment a month ago, have you resolved this issue, any help is appreciated
the render function doesn't show any error message for me, the code works fine, the agent does great but I can't see the actions of the agent or the state of the environment. using jupyter on firefox
Same with me
@@siddhantagarwal4941 have you guys resolved this issue, Im having same prob.
I have the same problem.
I did exactly the same thing but somehow my game is not rendering. i can only see the episode and you reached the goal text.
Help please.
Also I use version 1 of frozen lake
Great stuff , when will you upload deep reinforcement learning videos. Thanks again .
Thanks, joey! I'm aiming to release a new video to this series every few days! The next video starts deep reinforcement learning.
Super kool . I am writing a book with O'Rielly on AI in which one of the chapter is on RL & Deep RL . I am a practicing Data Scientist and work on ML projects for clients like Mercedes , Nissan , JCPenny etc. I find your tutorials extremely useful and incredibley engaging as well. Looking forward to many more. Also , will it be alright , If i can take some of your content for my chapter in the book ? Thanks in advance :) Cheers
I'm glad you're finding value in the tutorials! Also, thanks for sharing your background. That all sounds great!
At this time, we're not authorizing any reproduction of deeplizard content, but I wish you all the best with your book!
you are the best, thanks a lot!
how can this algorithm be applied to enviroments that change? for example when you reach food you would eat it, therefore removing it from the enviroment and changing it
Can you also explain A2C and A3C algorithms ?
FROZEN LAKE: How can the agent move to the left of starting point, if not then why does it have a large q-value for left action in q-table? Thanx for replying
Note that the game is non-deterministic because although the agent can choose to move left from the starting state, since the ice is slippery, then the ice may make the agent slip into another state rather than the state the agent chose to move. This explains why the Q-value associated with a ending state could be non-zero.
@@deeplizardDoes the slipperiness affects the performance of the agent?
How is the grid of the game displayed? These is no command in the code right?
Yes, no command to print the grid
in your q_table first line: [0.57804676 0.51767675 0.50499139 0.47330103] where argmax take first element 0.578..., thats is equivalent to go always left from start point? agent must remain at start point? on the other hand, It seems there is a strange (wrong) action results for example: take.ms/EnAyhU .I do something wrong or it is code error inside OpenAI Gym?
thank you its great
awesome thanx a lot :-)
I am facing an issue with the above code implementation.When i'm testing the game with the trained Q-Learning agent,none of my episodes out of 3 the episodes are winning.Is that normal or is it a mistake in my code.The avg reward after 10000 episodes is 0.68.
Hm... With an average reward of 0.68, your agent should be winning 68% of the time. Try running 10 episodes as opposed to 3, and see how many (if any) episodes your agent wins. If it's still not winning at all, I would assume there is an error or misconfiguration with your code.
If reward coming greater than 1 .what could have went wrong?
After I subscribed to your channel, I felt so small!!! So tiny!!! So little. Who are you guys ??? you have any certifications where I pay money and get a part of your brain?
😂😂😂 Our names are Chris and Mandy. We share more about ourselves on our vlog channel and Instagram :)
ruclips.net/channel/UC9cBIteC3u7Ee6bzeOcl_Og
instagram.com/deeplizard
No certifications available as of yet! 🧠
pretty dope
what you using to code in? I tried pycharm & it can't find the stuff you imported
I was using Jupyter Notebook.
Cody
You could either download Anaconda Navigator and launch JupyterLab / Jupyter Notebook from there. Or you could create an account on IBM Skills Network Labs and open a Jupyter Notebook there. For this course, I’d prefer the first option but both are viable options.
:)
My agent just keeps going Left. FeelsBadMan..
@Kai Rycroft If you've uploaded the code on github, please give me the link as I'm facing some problems...
@Kai Rycroft File "", line 22
state=new_state
^
SyntaxError: invalid syntax
This the error I'm getting. idk what I'm doing wrong...
My agent does it too.
I think that it's because of the two holes close to him, so he keeps hitting the wall to not fall.
I tried to use a discount in the reward for every step, making it go to the right in some executions... but it insist on going left . (:P)
Cheers