Deep Q-Network Training Code - Reinforcement Learning Code Project

deeplizard

Просмотров 28 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 авг 2024

Комментарии • 206

@deeplizard 5 лет назад ⁺⁵
👀 Come say hey to us on OUR VLOG:
🔗 ruclips.net/channel/UC9cBIteC3u7Ee6bzeOcl_Og
👉 Check out the blog post and other resources for this video:
🔗 deeplizard.com/learn/video/ewRw996uevM
@marsmars1992 5 лет назад
I still cannot find what I asked for in the vlog
@deeplizard 5 лет назад
What did you ask for?
@marsmars1992 5 лет назад
@@deeplizard videos on double DQN, duel DQN and prioritized experience replay.
@deeplizard 5 лет назад ⁺³
I see. The vlog is a more casual channel where talk more personally. Deep learning content is published on this channel, not the vlog. The topics you requested may be added to this series in the future.
@AmanPandey-dr3tz 3 года назад ⁺⁵
@@marsmars1992 i dont get why ur so demanding, just be a bit more polite ? We r getting this incredible content for free at least be a bit grateful
@shivbhatia2784 5 лет назад ⁺⁵⁹
You're my favourite youtuber, I was so sad when this series stopped, please keep making these amazing videos!
@haahee8420 4 года назад ⁺¹³
This is easily the best source to learn DQL I've seen, really well done.
@pablovela2053 5 лет назад ⁺²²
Don't worry robotic voice at the end. I hear ya, and I do quite enjoy the little insights you bring
@deeplizard 5 лет назад ⁺³
🤖💗
@MShahbazKharal 4 года назад ⁺⁴
You are amazing, I don't understand why thousands of people are watching the videos but aren't giving thumbs up. Your videos are so awesome and understandable that I learned reinforcement learning and deep reinforcement learning in one night. Thanks again.
@deeplizard 4 года назад
Thank you, Muhammad! Glad to hear how much you're enjoying the content 😊
@davidak_de Месяц назад
Not everyone has an account.
@davorm1430 2 года назад ⁺¹
a few days ago i found the most brilliant diamond in the pile of lots of rubble, gems, other diamonds and also some junk. There were some other really nice things in the pile....but this one was the brightest,shiniest and the best in them. Then i clicked subscribe button on it and put this comment on one of its sides
@deeplizard 2 года назад
Thanks so much Davor! Happy to hear that 😊
@lingfromhongkong 2 года назад ⁺¹
what an amazing series that I have just stumbled across. Will be purchasing the code as a show of support!
@TheJysN 4 года назад ⁺⁵
Watched through the whole series so far to help me with an University assignment. Very useful thank you! Would be great to see you continue the series.
@radcyrus 5 лет назад ⁺⁶
This was simply the best code explanation that I have ever seen, thank you!
@deeplizard 5 лет назад
You're welcome, Rad!
@girlspider2303 2 года назад
I watched every episode in this Reinforcement Learning series. Incredible job! You can explain complex ideas in such vivid, concise, and easy to follow ways. I have some PyTorch background, but nothing about openAI and Gym. I found that once I understood the theory you explained, the episodes on actual coding is just a breeze, almost no explanation needed. To know that this great tutorial is created by a fellow female programmer, I am doubly happy and proud. I will look into how to support your work anyway I can.
@deeplizard 2 года назад
So happy to hear all of this, thanks so much! ♀️💪
@michaelwjoyner 5 лет назад ⁺⁸
DL- This rocks!!!!! Everything that you have done in the series is absolutely amazing. I can't wait until the next video
@ricma9710 3 года назад ⁺¹
First time learning this for my dissertation and have to say thank you for this series, they were good!
@housseynenadour2233 3 года назад ⁺³
I finished this project, so happy, thank you,
btw gamma = 0.999 made me crazy, gamma = 0.99 worked fine, astnoshing that the tiny difference of 0.009 has an influance over stability
@mikymuky1171 3 года назад ⁺⁴
This was an insane series!
I have too much to express. All in all, thank you for the great knowledge. I could not be more excited to apply my knowledge to future project.
(I already have a few in mind)
p.s this was a fun binge t'ill late at night. Completely worth it :)
@andread4721 2 года назад ⁺²
This series is great! Thank you! What a pity that the "future episodes" haven't come.
@neogarciagarcia443 4 года назад ⁺⁹
Hi, congrats for your great job. I have some comments:
- I would really love to see this series continue, it's been a while since your last video.
- I think the main improvement could be to use a CNN instead of fully connected layers, which are not really the best idea for image.
- Would you go on with more advanced reinforcement learging algorithms like actor critic?. It would be also very interensting to see other applications apart from games.
@neogarciagarcia443 4 года назад ⁺⁹
I liked Mandela quoute "I never lose, I either win, or learn"
@nitinmalhotra2560 3 года назад ⁺¹
I really liked that cute AI robot voice at the end, and robot voice never get depressed.
@DEEPAKSV99 4 года назад ⁺¹
I have finally completed your series on Machine Learning, Keras & RL.
The quality of your content is the best I have ever seen.
But what amazes me even more than your contents, is your prompt response to all the comments, even after 2 years of uploading the videos & the continuous maintaining of your blogs.
Even though in your view, I am just 1 among the million viewers of your channel, there isn't a single comment of mine that you haven't responded to. I really don't have words to describe the amount of dedication you are putting into all of these works.
Once again thank you so much for the contents, it was a wonderful journey through your series.
I am from one of the Top 10 premier institutions of India, and honestly, the learning I got from this channel is the greatest in a short period like this.
I really hope your works to have a large positive impact on the society/world and all the very best for your future plans. . .
@deeplizard 4 года назад ⁺²
Thank you very much, Deepak! Was happy to interact with you and your thoughtful comments as you went through the content :)
@fabiStgt 4 года назад ⁺⁴
First of all: congratulation on this series! - great videos, great explanations!
About the hyperparameter tuning: I just gave a little parameter sweep a go without changing the DQN structure, pure hyperparameter tuning. From what I can tell right now it seems the net needs to do more learning from the experience memory. The easiest way to accomplish this is to increase the batch size but I guess one could also simply keep the batch size as is but learn from more than one batch per iteration.
Anyway, here is an example hyperparameter set which gave me 109.15 mvg avg after 1000 episodes and 147.92 mvg avg after 2000 episodes:
batch_size=2048
gamma=0.999
eps_start=1.0
eps_end=0.01
eps_decay=0.0007
target_update=10
memory_size=500000
lr=0.0005
@DEEPAKSV99 4 года назад ⁺¹
Hi, thanks for sharing the values.
I have a few doubts regarding them:
- Did you train the policy net with the batch size of 2048 once in every timestep as shown in the video, or once in every episode?
- How long did the 2000 episodes that you mentioned take to complete?
- What GPU did you use, for performing the training?
Also, if you still have the source code of this, it would be highly helpful if you could share it.
@fabiStgt 4 года назад ⁺³
The code should be pretty close to what was shown in the video series, but I don't remember the details. However, I did find the code (hope this is the right file) I used back then, so feel free to check yourself:
gist.github.com/gooofy/c348699ac3762f4f19f143456a7a4286
I have no records of training times. GPU I used was a GeForce GTX 1080Ti .
@DEEPAKSV99 4 года назад
@@fabiStgt Thank you so much. Your code is working perfectly in my PC (with an RTX 2070) & has saved me a lot of time. . .
@fabiStgt 4 года назад ⁺¹
@@DEEPAKSV99 cool - thanks for the positive feedback, happy training! :)
@workhard7044 4 года назад ⁺¹
hi , did you find the best hyperpameter for this solution ??
@lw833 3 года назад ⁺¹
BEST DQN tutorial !!!
@neogarciagarcia443 4 года назад ⁺²
Hi DeepLizard, don't get depressed about no one listening to your existential insights, I really enjoy them and make me think too. What kind of future are we facing with A.I. disrupting more and more sectors?. Is reallistic that we will achieve the "Singularity"? (machine intelligence overtaking human intelligence?). How far are we? What then?
@deeplizard 4 года назад ⁺¹
Love these questions ❤️🤖
@Kazshmir 5 лет назад ⁺⁴
Thank you so much for doing these tutorials! You have a great style, and it's cool to see how your content just keeps getting better and better.
@JayanSarkar 3 года назад ⁺¹
Getting such quality content in free.. Thank you very much.
@shirishhirekodi6913 3 года назад ⁺¹
Thanks for this series. Don't profess to have got it all, but learnt a lot. Will be starting small and doing some small RL projects on a Raspberry then moving on to more complex problems. Will be coming back to this as a good handy-dandy resource
@user-tj4ut8ox9r 4 года назад ⁺¹
Ahh, man. I have had so much troubles with this series... But I have done it! (with some minor tweaks). Thank you for realising this content. The problems I have had coding this thing only made me dig deeper and get more low level insight
@junhuangho6937 5 лет назад ⁺⁵
love this series, waiting for the next 'episode' 😂😍
oh and im hearing you at the end of each episode! reminds me of Ulka from Socratica
@ensabinha 3 года назад ⁺²
Congrats for the good work. Looking forward to see more videos about RL and DQN
@MagixBox00 5 лет назад ⁺²
Great explanation, now I have a better understanding of DQN and tell that robot I'm very thankful for the insights.
@deeplizard 5 лет назад
🤖❤️
@LS-df5fe 3 года назад
First, thanks for ALL your course/classes. They are great.
I have figured out my issue.
@impossibletalent9568 4 года назад ⁺¹
Thank you so much I am waiting from 2 years for reinforcement learning video
@vuk6468 5 лет назад ⁺²
Thank you very much for creating all this content about deep learning. It´s really difficult to find RL code tutorials currently, specially as good as this one. Btw the DL fundamentals series was pretty useful as well
@mohamedhamed6655 5 лет назад ⁺²
Great content, clearly explained. The only thing that I don't like is that circular arrow ( distracting some times it's not clear where it points out) it would be much better if you go back to the regular arrow
@deeplizard 5 лет назад
Thanks for the feedback, Mohamed!
@denismerigold486 5 лет назад ⁺³
Hello! Thank you for advising the course on fastai. Part 2 there is a good explanation of the creation of the main parts of the fastai library, but it is a pity that pytorch is not much explained from inside. Thank you. I hope you do something like this purely about pytorch. and thanks for the second channel it was interesting to see what you are in life. all the best
@deeplizard 5 лет назад ⁺²
You're welcome, Denis. Thanks for the feedback. Glad to see you're enjoying the second channel as well :)
@mauritzandreae4278 5 лет назад ⁺²
Congratulations for hitting the 40k
@deeplizard 5 лет назад
Woohoo! 🎉
Thank you Mauritz! ❤️
@fuma9532 4 года назад ⁺⁶
Hi: first of all, thank you for this amazing series. It's been quite a journey, but I'm glad I made it till the end. The only problem is, the code still has a bug: after implementing Lorenz Kapral's code, transforming the Class Agent's random actions into tensors (required not to get the AttributeError: 'int' object has no attribute 'item'), a new one popped up:
*'int' object has no attribute 'dim'*
Being this close to a functioning program but not knowing how to solve this bug is so frustrating! I've gotten so much out of those lessons, but I know not solving this will haunt my dreams. Can anyone please help?
PS: I love your logo and outro music, great job! I'll definitely check out your other series on Deep Learning next!
@kanavsingla7957 4 года назад ⁺⁵
Hey so I was working on it and I figured it out. Seems like, the video 16 has wrong code in it for the agent select action function. Use this and it would start training:
class Agent():
def __init__(self, strategy, num_actions,device):
self.current_step = 0
self.strategy = strategy
self.num_actions = num_actions
self.device= device

def select_action(self, state, policy_net):
rate = self.strategy.get_exploration_rate(self.current_step)
self.current_step += 1
if rate > random.random():
action = random.randrange(self.num_actions)
return torch.tensor([action]).to(self.device) # explore
else:
with torch.no_grad():
return policy_net(state).argmax(dim=1).to(self.device) # exploit
@fuma9532 4 года назад
@@kanavsingla7957 Thanks for the input man, but the code still doesn't work, same error as before :(
I've tried upgrading my pytorch-geometric and pypi packages hoping it was just a version error but that didn't fix it either
@daniekpo 4 года назад ⁺¹
And yes, I love the insights from the robot at the end!
@jakevikoren 4 года назад
100% yes on the existential insights! Looking forward to whatever content you choose to put out next. Much love
@fz8227 3 года назад ⁺¹
Great content!
@dmitriys4279 5 лет назад ⁺³
I wish you create some tutorials about RL with board or card games like chess, checkers, poker, connect six and so. We will be able to compete with you and with other community members
@TheHunnycool 5 лет назад ⁺¹
Best Q-nets video over internet. But i didnt understand why did u take current states from random batch and then found max q-values from it's next states from the batch, even though u had the current state after taking the action and recording the experience
@deeplizard 5 лет назад
Thank you, Himanshu! Take a look at the Bellman equation that we use in our calculation of the loss. It requires the max Q-value from the next state. That is why we need it.
@aravindk4967 4 года назад
Ahh I hope you continue this series! I'd like to hear the robot in the end again too
@marsmars1992 5 лет назад ⁺²
Please make more videos about the improvements of DQN like double Dqn, duel Dqn and prioritized experience replay becauase I don't fully understand them. and you can use them in the code. Thanks in advance
@yusufmoallim4667 2 года назад
To everyone who contributed to making this amazing playlist, thank you very much. You are great people. Well done. However,
THERE IS A SERIOUS PROBLEM WITH YOUR CODE.
memory.push(Experience(state, action, next_state, reward))
states, actions, rewards, next_states = extract_tensors(experiences)
Reward and next_state are swapped.
Also, I think calculating the gradient after each time step is not efficient.
@deeplizard 2 года назад
Thank you Yusuf! I've just checked the issue you raised. If you check the definition of extract_tensors(), you will notice that it extracts the states, actions, rewards, and next_states from the given Experiences based on attribute name. So it's fine that Experiences are passed in with the order (S,A,N,R), and extract_tensors() extracts those attributes and returns as (S,A,R,N) since the function is referring to and extracting each attribute by name and not the order in which it is stored in the original Experience tensor.
Regarding the gradient update, it is occurring at each timestep since the network is passed a new batch at each step. In other words, we are using mini-batch gradient descent, which is typical. What frequency for gradient updates would you have in mind as being more efficient for this problem?
Appreciate your feedback!
@yusufmoallim4667 2 года назад
@@deeplizard
Thank you again for your amazing explanatory comment. It is my bad I didn't notice it was extracted by name and not in order. Sorry for that.
Regarding the gradient update, I meant that if the agent learns to play, the number of mini-batches per episode will be higher than number of mini-batches per epoch if we compared it to normal deep learning. And it will increase the better the agent is. Won't that be inefficient ? 😅
Thank you.
@luckycatfinance 4 года назад ⁺¹
Your channel is amazing, just stumbled into it. Really amazed!
@Mahesha999 3 года назад
Post credits:
Robot voice: You still there even 3 min after video ended!!?? Great!! Congratulation, you just unlocked the secret insight!!!. The secret insight is: the narrator of this story was never able to tune hyperparameters, so she will never upload the next video in this series !!! Anyways thanks for joining!!! Bye for the last time!!!
@alanjohnstone8766 4 года назад
I have the same comment as I made on your other cartpole program. When you get to the end of an episode the qvalue is the reward only not reward+ estimated rewards for future states. ie you need to store Done in your experiences and test it when working out the loss
@mobenamtois 4 года назад ⁺²
Hey I barley know anything but I like your vids God Bless DeepLizzard
@supriyadevidutta 4 года назад ⁺¹
beutifully explained this complex theory and a practical code
@baqerghezi1342 4 года назад ⁺¹
I face this problem:
agent's select_action method gives an int type, but torch.cat does not work with integer...
and I want to thank you for your wonderful course
@houyahiya5729 2 года назад
Thank you so much for this awesome explanation .. i did not find such explanation in other resources.. i am very grateful to you.
I have some questions and I would be very grateful if you answer me.
1- does the second step start with a new batch different from the previous one, i.e. eliminate the old batch ?
2- does Training start as soon as the replay memory is full or just when the size of the replay memory is equal to the batch size ?
3- does the training process take place while the experience replay process continues to store new experiences?
Cordially
@nas-zu8xl 3 года назад ⁺¹
Hey. the videos are really concise, articulate and easy to understand, better than many paid content. A huge thanks. couple of questions: will there be new videos added to the RL playlist? or will teaching RL be continued elsewhere in an another playlist or something? Thanks for answering and all the free knowledge shared, GO COLLECTIVE INTELLIGENCE!
@deeplizard 3 года назад
Great! We plan to create more RL content in the future. Not sure yet if it will continue in this course or be broken out into a separate one.
@EDeN99 4 года назад ⁺¹
Nice work, but I have a question, why did we start to compare with 195, how did we get about this value
@lalithmovva1146 4 года назад ⁺²
Your content is amazing. Love you 3000❤
@RadekMarko 5 лет назад
Thank you very much for this example.
I would need some help with my AI bot where I do not have discreet actions in real live.
Let's use analogy to the example from the video. Let's get assumption that in our CartPole environment we have one of the following action spaces:
Option 1
1. Move left x% (where x is in the range from 0 - 100% as real 32-bit, how fast or how hard we will move left)
2. Move right x% (where x is in the range from 0 - 100% as real 32-bit, how fast or how hard we will move left)
3. Do not move (for consistency we can store value as above, but is meaning less)
Option 2
1. Move left by x% of capacity (x meets the following criteria: -1
@rampalgrihdhwajsingh5312 Год назад ⁺¹
nice video
@machinelearningapplication6467 2 года назад
great series thank you for posting! one question though: when training the network, you calculate the MSE loss. back propagation requires the loss of each output node, however with a DQN you only have the loss of the one output node correspond to the action you took. So how do you backdrop the DQN policy network, if you only calculated the error for the one action?
@arturasdruteika2628 4 года назад ⁺¹
Hi, this is the BEST channel that explains how ML works. BTW would this code run faster if you rendered the environment not all the time but like every 100 or more episodes?
@deeplizard 4 года назад ⁺¹
Thanks, Arturas! We're required to render the environment at each timestep to get an updated image rendered to the screen.
@keyangke 4 года назад
look forward to next one, excellent videos!
@marcopozza5665 3 года назад ⁺¹
Great videos and serie!
Hope to see the next episode soon because Im trying to tweak the parameters but the performances are still far to 195 :(
@neogarciagarcia443 4 года назад ⁺²
Will you go on in this series with more advanced algorithms like actor critic?. I'd really love to see that.
@deeplizard 4 года назад ⁺¹
We plan to add more advanced RL topics in the future :)
@harshdalai911 4 года назад ⁺¹
Very good tutorial. loved it!
@AlienAI23 5 лет назад ⁺³
good job deeplizard 😍
@sudiptabhuyan8896 3 года назад
Hi, thank you for your video on Deep Q networks. Can you please share training code or brief idea of coding for robotic assembly using force feedback
@quirozarte 5 лет назад
Your content is amazing... Thank you so much for doing it.
@raminbakhtiyari5429 3 года назад
thanks for this brilliant series. I'm very happy to find your channel and afraid of seeing the last episode of this series. thanks to you I'm so excited about RL. could you please introduce me to some resources that are as good as you for learning more algorithms and deeper concepts?.
thank you
@tarunbirgambhir3627 3 года назад
at 1:40, the target_update (=x) updates the target network with the weights of the policy network every x episodes. But under the 9th step at 4:10, the target network is said to be updated every x timesteps for each episode.
My question is which one is correct. Do we update the target network after x timesteps or x episodes?
@kushis7242 5 лет назад ⁺²
Hi
Great content and instruction style.
Have been following and studying your videos on RL, Keras, and Deep Learning. In the Deep Learning /Keras series, you explain about training your data, validating it and then testing the data. Each of the three pieces - training, validation and testing is against a different inputs.
Does this same logic apply in case of Reinforcement Learning? i.e In the training phase we a building a model and we tweak the hyper-parameters until we get the required performance. Next, will there be a Testing Class that will validate the model we built in the Training phase ?
@deeplizard 5 лет назад ⁺²
Hey kushi - Glad you've been enjoying all of the content. For RL problems like this one, we do not have train/validation/test breakdown like we do for classification problems. Notice that the way in which we're using the network in deep Q-learning is a different approach from how even the same network (architecture-wise) would be used for classification problems.
@kushis7242 5 лет назад ⁺¹
@@deeplizard Thanks for the clarification
@davidak_de Месяц назад
I get an error after few seconds:
RuntimeError: a Tensor with 12241 elements cannot be converted to Scalar
@alchemication 4 года назад
Absolutely fantastic series, most probably the best material for learning RL. In the end the solution turned out to be quite complex, but the code is really neat, I love it! Unfortunately I am not familiar with PyTorch enough yet to conclude when the network fitting is occurring for the policy_net (sorry, I know Keras only). Do we keep training the policy_net on each step within an episode? Thanks!
@techie1143 2 года назад
Hi deeplizard, Could you please help with the following 2 questions:
1. when the code _, reward, self.done, _ = self.env.step(action.item()) is executed, how does the env object determine the reward and the state it's currently in? My understanding is that the state and action spaces are defined as in_features of the first linear layer, and out_features of the output layer in DQN object. how were these infomation and rewards make aware to the env object?
2. how to understand "To solve cart and pole, the average reward must be greater than or equal to 195 over 100 consecutive episodes"?
@Throwingness 3 года назад
The difficulty in this playlist went from 4 to 1000 with this last notebook. I feel like I'm looking at something written in a foreign language. I need baby steps.
@nikhil-sethi 4 года назад
Hi! Thank you for the great series and simple explanations. I just have one question/suggestion.
To get the max Q-values for the Bellman equation: What if we were to keep the 'done' values for each action into our replay memory tuple --> Sample and zip a "done_all" tuple from the entire replay memory --> and use this(after some manipulation) as a boolean index into the target Q values to make the True values to zero and False to maximum (since the done=True values correspond to an action that ended the episode)? Would this be helpful as we wouldn't have to check each next state for all zeros?
@amirrezaheidari8041 3 года назад
Hi, after watching video #18 I was waiting for the next video to see how to tune the hyperparameters of a DQN but I did not find any. Is it possible to explain how the hyperparameters should be tuned properly? I have wrote the cpde for DQN temperaure controller, but my problem is that every run,exactly with the same hyperparameters, I get very different results. Can you let me know any suggestions?
@johnmark-ps8jy 4 года назад ⁺¹
Thanks for the great work. Are you planning to continue with the series to advanced levels like actor critic, duel and double DQL? if so when are the release dates? Thanks again i enjoyed the series.
@deeplizard 4 года назад ⁺¹
Hey john - Glad you're enjoying the course!
We plan to add more advanced topics to this course in the future but do not currently have the timeline for release dates :)
@mainakdeb9188 4 года назад
this series has been amazing so far, when can we expect new episodes?
@kchan8878 4 года назад
Excellent tutorial!
@Mantor-Diver 3 года назад
thank you for this Video but i have question. above code is written in python, Pycharm or Anaconda? also i want to know Libraries used.
@claudiogenovese2321 4 года назад
Hi! Thank you very much for this course it has been very instructive and fun thanks also to the two coding exercise. I have one question on this example: for each step of evolution once we have enough experiences in memory we do a step of optimisation. This clearly works nicely if we suppose that the optimization step and the evolution step have more or less the same cost. Is this an artefact of the exercise or is a very common practice? Should one expect any particular and characteristic problem by changing this one to one balance for performance reason?
@andread4721 2 года назад
One question: In the for loop, what does the count() function return? What does it count here?
@harkus8831 3 года назад ⁺¹
How long does it take to train for 1000 episodes? I build my own one on Keras and it takes forever. : (
@michaelelkin9542 3 года назад
For me 1000 episodes takes about an hour. The better it does the longer it takes. How long does your take? I am trying to speed it up.
@MrTony2371 4 года назад ⁺¹
Hello, deeplizard! Thanks a lot for your awesome channel and knowledge!
I've been discovering recently ways to improve DQN performance and read about Double DQN. After hours of thinking and reading I understood, that the algorithm you use in this particular series is not a simple DQN, but actually is Double DQN, because you use second neural network to calculate target Q values.
Am I right? I'm really confused, because you didnt use words "Double DQN" in this series, which makes me think, that there is something missing about double DQN.
@deeplizard 4 года назад ⁺³
Hey Tony - We are indeed using two networks in this example, but this particular technique is commonly called Deep Q-Learning with Fixed Q-targets. The technique of using Double DQNs is very similar, but with a subtle difference. We have not covered that approach in this course, but you can check the subtle difference between the two techniques as summarized in the post below.
datascience.stackexchange.com/questions/32246/q-learning-target-network-vs-double-dqn
For further understanding, you can research Double DQNs versus Fixed Q-targets.
@MrTony2371 4 года назад ⁺¹
@@deeplizard Thanks a lot!
@abhinavhada5854 5 лет назад ⁺¹
Hi deeplizard,
I want to make a Traffic Light Detection program using CNN.
Do you have any recommendations for videos, or insights for me on how to progress?
@origamigirl11RK 5 лет назад ⁺¹
I'd really love it if you could go into more detail about the reward system.
@deeplizard 5 лет назад ⁺²
Thanks for the recommendation! What kind of info are you interested in understanding regarding the reward system?
@origamigirl11RK 5 лет назад ⁺³
@@deeplizard Mostly just more information on how the reward system can be set up in python and what makes a strong system. How does one approach deciding what is worth a reward and what is worth a punishment and then how much? Many of the examples are of games with clear rules but there are some issues where this approach could be applicable where the rules don't seem as clear to me like segmentation and image processing. I'd be happy to post in the discord if my question isn't good for a video. Thanks
@deeplizard 5 лет назад ⁺¹
Oh ok, I see. Deciding on the reward function in an RL problem is pretty variable from one problem to the next and somewhat arbitrary given that there are no rules or strict specifications for how the reward function should behave (other than the obvious for which you want it to positively affect the agent's return for doing what it's supposed to and negatively affect it otherwise).
Given this, defining your own reward function or deciding on which one to use it a relatively tricky task, especially when, like you said, the task at hand is not as straight forward as a simple game with a straight forward points system.
I'll put this on our list of potential topics to cover in the future. It would make for interesting content, but I would need to spend some more time on it myself to cover it thoroughly. In the mean time, check out the highest rated answer at the link below. I think it gives some good info on this topic:
stats.stackexchange.com/questions/189067/how-to-make-a-reward-function-in-reinforcement-learning
This medium article also has some quick tips to consider when defining your reward function:
medium.com/@BonsaiAI/deep-reinforcement-learning-models-tips-tricks-for-writing-reward-functions-a84fe525e8e0
@origamigirl11RK 5 лет назад ⁺²
deeplizard Thank you. These links are great. I’m still fairly new to the world of AI so I really appreciate your channel and your advice 👍
One last question, if I want to use DQN to segment a series of images (active contour/snake) that I have the solutions to can I use the known answers to inform the reward function or is that not good practice/is there a more practical approach for that type of problem?
@deeplizard 5 лет назад ⁺²
Great, you're welcome! I have not actually heard of using a DQN for an image segmentation task until you mentioned it. I know that typical CNNs are used for these kinds of tasks, but I haven't heard of RL being integrated. After I read your comment, I did some brief searching and saw that is indeed a thing. It seems that using RL approaches to image segmentation may be relatively new, starting around 2018-ish from what I was seeing.
Anyway, with limited knowledge on this use case, I believe you should be able to use the solutions to help inform your reward function in image segmentation. The paper linked below discusses a network called DeepOutline where "a human-like agent finds out object boundaries one after another." In section 3.2, they talk about using the true solutions to measure rewards.
arxiv.org/pdf/1804.04603.pdf
I hope this helps!
@griffithslittlebeauties7269 3 года назад
Idk if anyone will see this but I would love some help if I can get it. My program only has an 100 episode moving avg of 9-10 throughout 1000 episodes. As far as I can tell I have all of the parameters set correctly. Any suggestions?
@RadekMarko 4 года назад ⁺¹
Hi Deeplizard,
I'm trying to run the project on CUDA GPU and I'm getting some errors. I've cleared most of them so far, but I'm stuck on one of them in the following code:
def extract_tensors(experiences):
batch = Experience(*zip(*experiences))

t1 = torch.cat(batch.state)
t2 = torch.cat(batch.action)
t3 = torch.cat(batch.reward)
t4 = torch.cat(batch.next_state)
return (t1, t2, t3, t4)
in this line of code: " t1 = torch.cat(batch.state)
" I'm getting "Expected object of backend CUDA but got backend CPU for sequence element 0 in sequence argument at position #1 'tensors'". Any suggestions?
Best regards,
Rad
@xZerplinxProduction 5 лет назад ⁺²
The thumbnail reminds me of one of those expanded mind memes
@deeplizard 5 лет назад
Haha! 🤯
@taku8751 4 года назад
In previous videos about fronzen-lake and q-learning, we init all cells in q-table to zero, right?
What I do not understand is in DQN, why we do not set all params in the model to zero?
Will random q-values output by the model affect our result?
@vida7561 2 года назад
hi thank you when next episode come out please record it
@GauravSharma-ui4yd 4 года назад ⁺¹
Hey deeplizard, such an amazing series, are you done with this series bcz no video on this from a long time?
@deeplizard 4 года назад ⁺¹
Thanks, Gaurav! May start adding more advanced RL topics to this series in the future :)
@ahmarhussain8720 Год назад
hi can you please let me know which playlist is this video a part of thanks
@roger_is_red 3 года назад
So do have a video that shows how you get to duration average of 190?
@michaelwjoyner 5 лет назад ⁺¹
DL - Please consider putting a ConvNet our our DQN, pretty please !!!! (big smile)
@deeplizard 5 лет назад ⁺¹
Hey Michael 😁 - Planning on including conv layers in our DQN in future episodes 😁
@LoveEricaxinz 4 года назад
@@deeplizard Looking forward to it and more algorithms like policy gradient, PPO, etc.
Really good explanation.
@TheOfficialJeppezon 4 года назад
I don't get the the function where we calculate the Q-values. You say we pass in the both the state and the action? I thought we pass in the state, get the q-values and from that select the action?
@ThePositiev3x 4 года назад
At 12:53, we pass whole states to network but we used to pass single state. I guess two videos ago. I'm confused now. What does your network expect? A single state or states? Or both?
Your net is waiting for input whose size is height x width x 3 but you're passing a lot more than that by passing whole states. What is your net doing with that input?????
Edit: When I try to give more inputs than it is expecting, it gives size error as expected. How does it work in your case, no idea.
@qixu1785 4 года назад
Cannot understand the extract_tensors( )function. For example, Experience(*zip(*experiences)), what is the mean of symbol *? And t1 = torch.cat(batch.state) outputs
a mistake, "cat(): argument 'tensors' (position 1) must be tuple of Tensors, not Tensor".
@qixu1785 4 года назад
I think t1 = torch.cat(batch.state) should be changed to t1 = torch.tensor(batch.state).
@user-tj4ut8ox9r 4 года назад ⁺¹
This item() is a mess. Says 'int object has no attribute item'. Don't know what to do. I followed the whole code, twice.
@lorenzkapral4043 4 года назад ⁺⁴
class Agent():
def __init__(self, strategy, num_actions, device):
self.current_step = 0
self.strategy = strategy
self.num_actions = num_actions
self.device = device
def select_action(self, state, policy_net):
rate = self.strategy.get_exploration_rate(self.current_step)
self.current_step += 1
if rate > random.random():
return torch.tensor([random.randrange(self.num_actions)]).to(self.device)
else:
with torch.no_grad():
return policy_net(state).argmax(dim=1).to(self.device)
The random actions have to be torch.tensors
@fuma9532 4 года назад ⁺¹
@@lorenzkapral4043 Thanks, your code helped solving that problem. But now a new on epopped up: " 'int' object has no attribute 'dim' ". Anyone knows how to solve this?
@mehedihasanshuvo4874 4 года назад
I got error while running the main program. when it try to execute the line
next_q_values = QValues.get_next(target_net, next_states)
which point to below line and cause error
values[non_final_state_locations] = target_net(non_final_states).max(dim=1)[0].detach()
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
any solution??????????????????
@user-tj4ut8ox9r 4 года назад
Hello! This isnt done yet. Please make a video on checkpointing so we can save our work
@dzundzanify 5 лет назад ⁺²
Hey, i'm watching ur all videos from the beginning, but one playlist intercept another and cant get where I have to be at the moment.
Can u please make another playlist, what the order of videos?)
U are making awesome stuff
@deeplizard 5 лет назад ⁺¹
Thanks, Dmitriy! Using the website may help you navigate the individual series better:
deeplizard.com
For deep learning material, the recommended order is:
Deep Learning Fundamentals
Keras
TensorFlow.js
PyTorch
Reinforcement Learning
For deeper programming fundamentals as well, you can do the Data Science series before Deep Learning Fundamentals.

Следующие

Автовоспроизведение

Deep Q-Network Image Processing and Environment Management - Reinforcement Learning Code Project