This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description. Time stamps for all the modules: Intro 00:00:00 Intro to Deep Q Learning 00:01:30 How to Code Deep Q Learning in Tensorflow 00:08:56 Deep Q Learning with Pytorch Part 1: The Q Network 00:52:03 Deep Q Learning with Pytorch part 2: Coding the Agent 01:06:21 Deep Q Learning with Pytorch part 3: Coding the main loop 01:28:54 Intro to Policy Gradients 01:46:39 How to Beat Lunar Lander with Policy Gradients 01:55:01 How to Beat Space Invaders with Policy Gradients 02:21:32 How to Create Your Own Reinforcement Learning Environment Part 1 02:34:41 How to Create Your Own Reinforcement Learning Environment Part 2 02:55:39 Fundamentals of Reinforcement Learning 03:08:20 Markov Decision Processes 03:17:09 The Explore Exploit Dilemma 03:23:02 Reinforcement Learning in the Open AI Gym: SARSA 03:29:19 Reinforcement Learning in the Open AI Gym: Double Q Learning 03:39:56 Conclusion 03:54:07
Hi newcomer! Don't be afraid by the 4 hour long video!! Its really just several lessons concatenated, the first one containing a whole program in 52min! You also have Phil's GitHub in the video description if you prefer to study the code and go to the video only when having a hard time figuring sth out. Thank you Phil for such substantial content!
Hey Phil, I want to thank you for sharing such good content for free. I have one question for you. Are you planning to do a series on imitation learning techniques for continuous action and state space? An overview of how to achieve this task will also be great.
Hello Phil I appreciate with your create video. I'm planning to develop AI game bot for Dota 2 game based on the method Deepmind used in Starcraft bot. but i still have no idea how to start what is the components. Could you help me for that please
Hey Phil, do you have a video on how to set up your virtual environment for these tutorials? Conda/pip/gym/PyTorch/Tensorflow, etc packages, linter and IntelliSense on Visual Studio Code? Thanks
Thanks for the tutorials. They really helped. I saw this tutorial on youtube and went on to get your intro to RL course at o'reilly. I am really enjoying the course , esp how you create simple quizzes on the study material, which make it much easier to understand the subject. I have a question with regards the "maze running robot" topic . The actionSpace is = {'U': (-1,0), 'D': (1,0), 'L': (0,-1), 'R': (0,1)} and the maze is of the size (6,6) (x,y) co-ordinate. The "state" variable has (x,y) co ordinates in it. However when we def the function isAllowedMove(self, state, action): there we take y , x =state (essentially reversing x and y). I am not able to understand why would I need to invert the maze[x,y] to maze[y,x] ? Rgds, Anirban
Great question. Sorry for the delayed reply, I was out of town and this comment escaped me. It's because the x coordinate represents the columns and the y coordinate represents the row. In a right handed coordinate system, x is on the horizontal axis, and y is on the vertical axis. Hence, x is the column and y is the row. Since the indexing of numpy is row, column, we have to switch the two indices. I really should have used i and j instead of x and y to avoid confusion.
Good questions! We need 2 neural networks because if we use 1 we are effectively chasing a moving target. We use the same network to learn the value of states as well as to choose the actions. The bias comes in because we are taking a max over actions, which implicitly biases the estimates.
I haven't forgotten you :) I'm working on it now. I finished up another project with a DQN and made some improvements that may benefit you. I'll do a new video on that this weekend, and will initiate a pull request on your repo if I get it working.
OK, I've gotten it to run on my local machine with some improvements. I've forked the repo and sent in a pull request with some suggestions on how to push the project forward. Let me know what you think!
@@MachineLearningwithPhil hello man, it's really exciting that you looked into my project, at the moment as i talk i'm executing/training it myself, your answer was very long, so i probably have to read it several times, while trying different things, is there some way to direct chat with you? for example on discord that would make things easier if you are not busy i'll try having a smaller action space, i've already tried it with 32x32, cause the environment itself is somewhat generic also i might try totally rebuilding the environment and try other ways of approaching my bigger problem if you heared about tf.agents they offer a policy gradient agent, i've tried that on my environment, but also didnt get a very good result :D i don't fully understand policy gradient yet also to be honest sorry for my messy structure, i am not very experienced with programming in a team, or git/github i propose to you to make a discord server, there you could probably reach more people combined with twitter, your website and youtube, also it takes like 2 min to setup it's somewhat unpleasant to write in the youtube comment section :P if you add me on discord under the tag "Gotti#0140" i could probably communicate better with you thank you for your look and pull into my project
I can set up a discord server, no problem. I'll get to that later this weekend. We can collab on the project and maybe something cool will come of it. Thanks!
This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description.
Time stamps for all the modules:
Intro 00:00:00
Intro to Deep Q Learning 00:01:30
How to Code Deep Q Learning in Tensorflow 00:08:56
Deep Q Learning with Pytorch Part 1: The Q Network 00:52:03
Deep Q Learning with Pytorch part 2: Coding the Agent 01:06:21
Deep Q Learning with Pytorch part 3: Coding the main loop 01:28:54
Intro to Policy Gradients 01:46:39
How to Beat Lunar Lander with Policy Gradients 01:55:01
How to Beat Space Invaders with Policy Gradients 02:21:32
How to Create Your Own Reinforcement Learning Environment Part 1 02:34:41
How to Create Your Own Reinforcement Learning Environment Part 2 02:55:39
Fundamentals of Reinforcement Learning 03:08:20
Markov Decision Processes 03:17:09
The Explore Exploit Dilemma 03:23:02
Reinforcement Learning in the Open AI Gym: SARSA 03:29:19
Reinforcement Learning in the Open AI Gym: Double Q Learning 03:39:56
Conclusion 03:54:07
Anyone woke up to this😅?
Yeah that was a wild jump from what I fell asleep to
Yes😂😂😂😂
i put on someone making leather shoes, didnt expect programing.
Me too
Went to sleep to pokemon, woke up to this 💀
Hi newcomer! Don't be afraid by the 4 hour long video!! Its really just several lessons concatenated, the first one containing a whole program in 52min!
You also have Phil's GitHub in the video description if you prefer to study the code and go to the video only when having a hard time figuring sth out.
Thank you Phil for such substantial content!
This is the best lecture on RL ever! Thank you do much!
Hey Phil, I want to thank you for sharing such good content for free. I have one question for you. Are you planning to do a series on imitation learning techniques for continuous action and state space? An overview of how to achieve this task will also be great.
I hadn't planned on it but I can add it to the list
@@MachineLearningwithPhil Thanks! that would be very great of you
Hello Phil
I appreciate with your create video.
I'm planning to develop AI game bot for Dota 2 game based on the method Deepmind used in Starcraft bot. but i still have no idea how to start what is the components. Could you help me for that please
Hey Phil, do you have a video on how to set up your virtual environment for these tutorials? Conda/pip/gym/PyTorch/Tensorflow, etc packages, linter and IntelliSense on Visual Studio Code? Thanks
I don't, sorry. I run Linux which puts me in the minority I think
Thanks for the tutorials. They really helped. I saw this tutorial on youtube and went on to get your intro to RL course at o'reilly. I am really enjoying the course , esp how you create simple quizzes on the study material, which make it much easier to understand the subject.
I have a question with regards the "maze running robot" topic .
The actionSpace is = {'U': (-1,0), 'D': (1,0), 'L': (0,-1), 'R': (0,1)}
and the maze is of the size (6,6) (x,y) co-ordinate. The "state" variable has (x,y) co ordinates in it. However when we def the function isAllowedMove(self, state, action): there we take y , x =state (essentially reversing x and y). I am not able to understand why would I need to invert the maze[x,y] to maze[y,x] ?
Rgds,
Anirban
Great question. Sorry for the delayed reply, I was out of town and this comment escaped me.
It's because the x coordinate represents the columns and the y coordinate represents the row. In a right handed coordinate system, x is on the horizontal axis, and y is on the vertical axis. Hence, x is the column and y is the row. Since the indexing of numpy is row, column, we have to switch the two indices. I really should have used i and j instead of x and y to avoid confusion.
This is a great vid Phil, thank you! BTW, at 2:44 I know it's just an example, but are those ballpark salaries accurate? Amazon for $350,000?!?
hah! Nope, I just pulled them out of thin air. Glassdoor indicates starting compensation of around $170,000 with stocks included.
Thank you so much!
Wow awesome Phil. I'll take a look someday XD
i rewatched this video and didnt really understand why we need 2 NNs at 5:55 and what "elminating bias and the estimates of the actions" means 🤔
Good questions! We need 2 neural networks because if we use 1 we are effectively chasing a moving target. We use the same network to learn the value of states as well as to choose the actions. The bias comes in because we are taking a max over actions, which implicitly biases the estimates.
Thank you sooo much 👍
Could you tell me your develop environment, I use win10 & python3.7(anaconda) but I can not install all gym environments....[cry]
I'm running Ubuntu 18.04 and Python 3.6.7. Which environments are giving you issues?
@@MachineLearningwithPhil gym does not support python 3.7 very well....
@@liangyumin9405 You can do conda create -n NewEnvironment python=3.6
Then activate the environment and try installing gym and seeing if it works.
@@MachineLearningwithPhil use py3.6 virtual env. may be a good idea~, thk u
hello, did you look into my problem?
I haven't forgotten you :) I'm working on it now. I finished up another project with a DQN and made some improvements that may benefit you. I'll do a new video on that this weekend, and will initiate a pull request on your repo if I get it working.
OK, I've gotten it to run on my local machine with some improvements. I've forked the repo and sent in a pull request with some suggestions on how to push the project forward. Let me know what you think!
@@MachineLearningwithPhil hello man, it's really exciting that you looked into my project, at the moment as i talk i'm executing/training it myself, your answer was very long, so i probably have to read it several times, while trying different things, is there some way to direct chat with you? for example on discord that would make things easier if you are not busy
i'll try having a smaller action space, i've already tried it with 32x32, cause the environment itself is somewhat generic
also i might try totally rebuilding the environment and try other ways of approaching my bigger problem
if you heared about tf.agents they offer a policy gradient agent, i've tried that on my environment, but also didnt get a very good result :D
i don't fully understand policy gradient yet also to be honest
sorry for my messy structure, i am not very experienced with programming in a team, or git/github
i propose to you to make a discord server, there you could probably reach more people combined with twitter, your website and youtube, also it takes like 2 min to setup
it's somewhat unpleasant to write in the youtube comment section :P
if you add me on discord under the tag "Gotti#0140" i could probably communicate better with you
thank you for your look and pull into my project
I can set up a discord server, no problem. I'll get to that later this weekend. We can collab on the project and maybe something cool will come of it. Thanks!
@@MachineLearningwithPhil nice 😅
Great course though the tf is now tf2...
Actor critic in tf2 dropping today.