Building a Custom Environment for Deep Reinforcement Learning with OpenAI Gym and Python

Nicholas Renotte

Просмотров 143 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 окт 2024

Комментарии • 309

@Paul_Jeong96 3 года назад ⁺¹⁶
Thank you for your tutorial, I hope to see how you can visualize the environment in the upcoming tutorial!
@NicholasRenotte 3 года назад ⁺⁹
Me too! Keen to do a ton more stuff with RL and possibly PyGame!
@NicholasRenotte 3 года назад
@Eliseo Raylan awesome! Let me know how you go with it!
@tomtalkscars9494 3 года назад ⁺⁸
Love these. Building custom environments is one of the biggest areas missing with the OpenAI stuff imo.
Would be cool to see one bringing in external data. Like predicting the direction of the next step of a Sine Wave or something simple like that.
@NicholasRenotte 3 года назад ⁺²
Definitely, got way more stuff on RL planned once the Python course is out!
@user___01 3 года назад ⁺²
Man you can't stop giving us this gold of a tutorial!
@NicholasRenotte 3 года назад ⁺¹
Definitely! Two a week that's the goal man!
@laser7861 2 года назад ⁺⁹
Great tutorial. Simple and to the point, especially for someone who is familiar with RL concepts and just wants to get the nuts and bolts of an OpenAI gym env.
@tawsifkamal88 3 года назад ⁺⁸
Really informative video! As a high schooler self-learning RL, tutorials such as these are really helpful for showing applicability in RL.
@NicholasRenotte 3 года назад ⁺²
Ohhh man, going to have something awesome for you in a few days time then!
@Techyisle 3 года назад ⁺¹¹
Your tutorials were awesome, and I just finished your 3-hour RL tutorial, and I would like to see a Pygame implementation as soon as possible :)
If possible, try to create a different set of advanced videos where you will explain the math and intuition behind RL, along with code implementations (to cater a different audience).
Something I like about you is that you respond to each every comment, a characteristic which I don't see often from others. Kudos to you!
Thanks again mate! Stay safe!
@NicholasRenotte 3 года назад ⁺⁴
Thanks @Techy Isle, I'm definitely going to be going into more detail. Been studying some hardcore DL stuff like crazy while producing the Python basics course!
@baronvonbeandip Год назад
This is way more useful than the last one. The more you can modify OpenAI's envs, it seems, the more that you can get out of the reinforcement learning schema.
@pratyushpatnaik4617 3 года назад ⁺³¹
Sir that was exceptionally good!!! 🔥
I would really love to see the render function in play using pygame.
Waiting eagerly for it!!!!
@NicholasRenotte 3 года назад ⁺⁹
Definitely, can't wait to finally do something with Pygame!
@viswanathansankar3789 3 года назад ⁺²
@@NicholasRenotte Yeah Please do it as soon as possible...
@albertsalgueda1036 2 года назад
Yes! there is a need for Environment viz.
@prakhars962 3 года назад ⁺¹
just recommended this video to one of my coursemate. your videos are worth sharing.
@NicholasRenotte 3 года назад
Thanks soo much!
@MuazRazaq 3 года назад ⁺¹
I Can't stop myself from commenting on this exceptionally good tuotorial.
Sir, really amazing job. I must say you should continue this good work, the way you explain each and every line is something that is very rare in the material that is available till now.
Much love from a Pakistani student currently in South Korea 😍
@NicholasRenotte 3 года назад ⁺¹
Ohhhh thanks so much @Muaz! Soo glad you enjoyed it.
@oliverprislan3940 2 года назад
Thank you Nicholas,
this is a very good example to give it a kick start.
@SatoBois 3 года назад ⁺⁷
Hello Nick! I love your tutorial and it's actually helping so much in university especially consider the lack of documentation for openai. I was actually doing a custom environment for tictactoe to practice but for some reason when I run dqn.fit() like you did with the same everything for the keras-rl training part I get this:
"ValueError: Error when checking input: expected dense_16_input to have 2 dimensions, but got array with shape (1, 1, 3, 3)"
I don't quite understand why it got that shape because my tictactoe game's observation space is a np.array([Discrete(3)]*9) to represent the nine tiles and the three possibilites of what could be in them.
Again, thank you for the helpful tutorials!
@myceliumbrick1409 Год назад
yep i have the same error. Did you manage to solve the issue?
@julian.estevez 8 месяцев назад
Thanks a lot for the clarity of explanation.
@charlesewing9772 2 года назад ⁺⁴
Hi, great video! I was just wondering what happens if say for example the temperature is at 100 and the model try's to add 1 to temperature (so now outside the limits), does it then resample automatically or would you have to implement this in the code yourself?
@yxzhou5402 2 года назад ⁺¹
As a beginner of RL, all your videos really help me a lot so thank u!!! And I just wonder if there is any chance to see the tutorial on how to build the env with multi-dim action?
@DreamRobotics 3 года назад ⁺¹
Very simple and very nice. Good work.
@NicholasRenotte 3 года назад
Thanks so much @Dr. Abdul-Mannan Khan!
@islam6916 3 года назад ⁺³
Thank you for the video ⚡⚡⚡
I hope you can make a Custom Agent Next time ✅
Looking forward to see that ✨
@NicholasRenotte 3 года назад ⁺²
Heya! Definitely, code is 80% of the way there, I should have it up in the coming weeks!
@islam6916 3 года назад
@@NicholasRenotte Great !!!
@nilau8463 2 года назад
Thank you for the guide, really gave me a good idea how to implement my own models!
@markusbuchholz3518 3 года назад ⁺¹
Nicholas as I mention sometimes ago your YT channel is outstanding and your effort impressive. The RL is my favourite branch of ML so I extra enjoyed watch your performance. Exceptionally, you built also customised environment. The idea can be easily populated and applied to other specific tasks. It is a great pleasure to watch your channel and I will recommend everyone to be here (to subscribe) Have a nice day!.
@NicholasRenotte 3 года назад ⁺¹
Thank you so much @Markus! Glad you enjoyed the RL videos, I think it's a super interesting field with a ton of interesting applications. I'm hoping later on this year we might be able to apply some of it into hardware applications with Raspberry Pi or ROS!
@markusbuchholz3518 3 года назад
@@NicholasRenotte Thank you wonderful feedback! Yes ROS/ROS2 is great robotics framework. Now I am more inspired by Nvidia Jetson Xavier since it is "slightly" more powerful. Good luck!!!
@NicholasRenotte 3 года назад ⁺¹
@@markusbuchholz3518 oooh yeah, I took at that yesterday. Looks awesome! The OAK camera looks promising as well!
@stevecoxiscool 2 года назад ⁺¹
I guess what urks me the most about all the universe/retro/baselines gym examples is that it's not straight forward to get your bright/shiny newly trained model to run in other environments. These gym examples have so many interdependencies and one does not really know what is going on inside the box. This is why I am glad you are doing the video on getting other environments to work with RL algos. Unreal is my choice sine Unity already has a ML examples.
@NicholasRenotte 2 года назад
100% I took a look into the Unity environment over the Christmas break and was godsmacked. Well documented, logging and training was clear. I love OpenAI Gym but it seriously Unity ML agents appear to be so much easier to deal with.
@stevecoxiscool 2 года назад
@@NicholasRenotte I really wish Unreal was at par with Unity on the ML technology. I am using UnrealPythonPlugin to send images to a remote python client running opencv DNN. The video doing this on my youtube is a few years old. Your custom gym environment linked to Unreal is doable. Thanks for your videos !!!!
@mzadeh 3 года назад ⁺¹
Thank you very much, very clear and clean.
@NicholasRenotte 3 года назад
Thank you so much @Mostafa!
@kushangpatel983 3 года назад ⁺¹
Really useful tutorial, Nick! Keep it up, mate!
@NicholasRenotte 3 года назад
Thanks @Kushang!
@Spruhawahane 2 года назад ⁺¹
On my mac the kernel keeps dying when I run the basic cartpole example. Don't know how to troubleshoot. Pls help.
@sommojames 2 года назад
Great video, but what's the point with observation space? Looks like your agent is not using it
@ProfSoft 3 года назад ⁺¹
great job , thanks
i have a question , why in the model building you put the last layer activaition function to 'linear' , i think we should make it softmax because i think it is classification problem ??
@NicholasRenotte 3 года назад ⁺¹
Hmmm, could definitely change the activation function there!
@ProfSoft 3 года назад
@@NicholasRenotte
Thank you very much, I just wanted to make sure there was no specific reason for choosing linear Activaotion function, God bless you for this great effort
@Oriol.FernandezPena 3 года назад ⁺⁶
Your content is the best!! 🔥🔥
@NicholasRenotte 3 года назад
Thanks so much!!! 🙏 🙏
@TheNativeTwo Год назад
Great video, I like how you explain each line of code. My one complaint is not your fault... Getting the right environment and versions of the packages. I got right to the end... And couldn't get it working. A bit frustrating lol.
@hariprasad1168 3 года назад ⁺¹
Thanks! This is really helping me a lot.
@frankkreher4832 2 года назад
Thank you, once again, for the very educational video.
Great work!
@ameerazam3269 3 года назад ⁺¹
Again Best ever explanation Sir appreciate your work keep it up for us
@NicholasRenotte 3 года назад ⁺¹
Thanks so much @Ameer!
@idrisbima5369 2 года назад ⁺¹
Hello Nick, wonderful video. I am having the same error message you pointed out in the video and tried resolving it as shown but it is giving me a different error message stating the name model is not defined. Please help
@jugalyadav7110 2 года назад
Hello Nicholas, firstly this video helped me a lot to get my basics cleared up regarding RL.
Currently I am working with my own custom env and building a SAC model over it. I wanted to plot the actor and critic losses, and from your video I get that it should be done within the render function. It would be great if you could post some video summarizing the plots in render function.
Cheers !
@KEFASYUNANA Год назад
Great Videos. Any idea on how to handle 2 or 3 states/observations in the codes
say temperature and pressure or humidity
@padisalashanthan98 Год назад
Great video! I am a little confused on how to solve a multi-states problem. Can you please give some pointers on that?
@samuelebolotta8007 3 года назад ⁺²
Hi Nicholas, great great work! It would be interesting to see a parallel with ML agents from Unity, to see the differences with OpenAI Gym. Thanks!
@NicholasRenotte 3 года назад
YESS! I've been waiting for someone to ask for it, I've started testing it out already, should have a tutorial on it kinda soonish!
@ihebbibani7122 3 года назад ⁺¹
As usual , excellent content. Thank you so much :)
@NicholasRenotte 3 года назад
Thanks so much @Iheb!
@jiajun898 3 года назад ⁺¹
Great tutorial. A question though. What would be the benefit of transferring your reinforcement learning from the keras implementation to the openai gym environment implementation?
@NicholasRenotte 3 года назад
This is Gym, it's more the rl agents that I've started migrating (better stability, control and exporting).
@melikad2768 2 года назад
Hi Nick. Thank you veryy much, I learned a lot. But there is a question: How can see which action should the shower can take? I mean, how can i understand about the action which the agent can take based on the reward?
@RafalSwiatkowski 3 года назад ⁺⁴
Greetings from Poland. Extra tutorial, it will be great if you show how to combine pygame with reinforcement learning
@NicholasRenotte 3 года назад ⁺²
Woah Poland, what's happening! Definitely, I'll get cracking on it. Much love from Sydney!
@RafalSwiatkowski 3 года назад ⁺¹
@@NicholasRenotte Thank u master ;)
@NicholasRenotte 3 года назад
@@RafalSwiatkowski anytime!! 🙏
@AldorCrap Год назад
very useful tutorial. Although I would need some help with mine. I'm working to train a model for optimal path routing. I'm struggling on defining the observation_space, like does it have to be the whole road graph (as a spaces.Graph) or Box with the parameters (like current_coords, dest_coords, edges_max_car_speed) or maybe both? how do I should approach this?
@fidelesteves6393 3 года назад ⁺¹
What an amazing tutorial! Thanks
@NicholasRenotte 3 года назад ⁺¹
Thanks so much @Fidel!
@Sam-iy1kv Год назад
Hi, very nice video ! May I ask one question, what if I need a continuous model for the training task, the discrete action will not feasible , how can I do ?
@boonkhao 7 месяцев назад
Hi, it is a great video tutorial for customizing environment. However, I copy your code and run on jupyter notebook, I stuck in a problem that rl.agents could not find the version of keras. I have tried many ways to solve this but I am still cannot solve it. So, please help me.
@raihankhanphotography6041 3 года назад ⁺¹
Thank you for the tutorial. This was super helpful!
@NicholasRenotte 3 года назад
Anytime @RaihanKhan!
@saaddurrani8930 2 года назад ⁺¹
i am doing a project: RL for smart car (prototype )by using DQN or any other RL algorithm.
So i am thinking to feed in images as a state (from the camera mounted on the car) and my car is able to take 3 actions (forward, right and left).. I am keeping it quiet simple i.e by keeping the car in front of our goal, and as the car sees the goal i want to reward it and take the next action , now if it takes sucha random action where the goal is no more in the vision of the camera, it gets a penalty (state,action,reward/panelty, next state and so on). The episode time is limited to 2 mins.My aim is that the car moves towards it goal (and the more it moves towards the goal the more the size of that feature would be larger, and hence it will get another reward bcz its moving towards its goal) (goal would be an image "Triangle" at the end of the room infront of the car intial position. Now before implementing my DQN into the real life prototype i need to train it on open AI gym (3d). I have no idea how i can build such a environment where i can train my DQN RL by simulation. any help and suggestion are apreciated
@NicholasRenotte 2 года назад
Take a look at how some of the video game driving environments are built! Should be a good start for how to kick it off!
@mohamadalifahim344 3 года назад ⁺¹
Overly tensor flow for baseline import A2C not work.
How to solve
@svh02 3 года назад ⁺¹
hey @Nicholas, awesome as usual !!
Any reason why you chose to build your agent with Keras-RL and not with the ones provided by Stable-Baselines?
Hope you keep making videos about custom environmets. I think that's what's most useful.
RUclips is already crowded with videos about the common environmets for games and stuff like that.
@NicholasRenotte 3 года назад
Was a little early on when I did this, I've since transitioned most of my rl projects to sb! Got plenty planned on custom environments, stay tuned!
@davidowusu1184 3 года назад ⁺²
Great Video. I was able to use this as a basis to create an environment for my specific needs.
I have one question though
Once you've trained your model and saved your weights, how do you use it? I mean actually pass values to the model to get an action as a response
@NicholasRenotte 3 года назад ⁺¹
You can pass the new state to the model and actions are returned as the output. Can then go and plug it into the real model/iot suite etc.
@davidowusu1184 3 года назад
@@NicholasRenotte Thanks so much for the wonderful content and thanks even more these replies. You're awesome.
@sandeepagarwal8566 2 года назад
Thank you for the tutorial.Like sklearn has hyperparameter tuning,Please let us know how can we tune hyperparameters in case of DQN...like any package or library or any kind of reference would be helpful...Thanks
@travelthetropics6190 3 года назад ⁺¹
Thanks for the informative series on reinforcement learning. Are you running this on CPU or GPU ? At [23:23]. I have noticed that in your PC, it is like 47-55 sec per 10000 steps. I am getting 118-120 sec with my GPU and 59-63 sec with my CPU only. It seems like, this small model works better with CPU only, may be due to the extensive copying time to GPU :D
@NicholasRenotte 3 года назад
Yeah, I noticed that as well, with RL oftentimes the model won't benefit as much from GPU acceleration.
@aayusheegupta 2 года назад ⁺¹
Hello Nicholas! Great tutorial on building customized environment with Gym. Could you please share any pointers on how to load our own dataset while building an environment? I want to load and train RL agent with natural language sentence embeddings and create a proof tree.
@TheOfficialArcVortex 2 года назад
Any chance you could do a tutorial on how to use this for physical computing? For example, how would you implement two led's say using GPIO on a raspberry pi when temp goes up or down and an input sensor for temperature. Or say an accelerometer and a motor for balancing.
@ehrotraabhishekm4824 Год назад
Fantastic video, thank you so much for it....i have one doubt regarding DQN in gym....can you please share some details for how to proceed DQN with multi-dimension state space(4 D) which was 1 D in your case (temp)
@khaileng3020 Год назад ⁺²
to fix the Sequential error: just arrange the order of importing library
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
states = env.observation_space.shape
actions = env.action_space.n
print(actions)
def build_model(states, actions):
model = tf.keras.models.Sequential()
model.add(Dense(24, activation='relu', input_shape=states))
model.add(Dense(24, activation='relu'))
model.add(Dense(actions, activation='linear'))
return model
model = build_model(states, actions)
print(model.summary())
def build_agent(model, actions):
policy = BoltzmannQPolicy()
memory = SequentialMemory(limit=50000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=policy,
nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
return dqn
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
@traze78 11 месяцев назад
thanksssssss!! it helped a lot
@farzamtaghipour509 3 года назад ⁺¹
Thank you so much for the content. Thumbs up.
@NicholasRenotte 3 года назад
Thanks so much @Farzam!
@jordan6921 3 года назад ⁺¹
Ooo pygame would be such a cool thing to see. I wonder if Retro Gym environments work too!
@NicholasRenotte 3 года назад
IKR, Pygame is definitely on the list! I've tested with some of the Atari envs and them seem to work, take a while to train but they work!
@emanuelepapucci59 2 года назад
Hi Nicholas, I'm following your tutorial, but I don't know why, at the end of the tutorial when I start: dqn.fit(env, nb_steps=50000, visualize=False, verbose=1) I get this following error: ValueError: Error when checking input: expected dense_76_input to have 2 dimensions, but got array with shape (1, 1, 1). I don't really know how to solve and I'm following just your tutorial... do you have an Idea?
In case, thank you for your time!
@NuHoaNgonLa 3 года назад ⁺¹
what does the Env argument inside the ShowerEnv() class do?
@NicholasRenotte 3 года назад
Should be the parent class, I may have forgotten to run super().__init__() inside of the __init__ function.
@SuperHockeygirl98 Год назад
Hey , thank you so much for this video. It really helped me. I have a question: can you define your observation space using CSV files and then iterate over it, so the agent needs to deal with differing environments?
@kayleechu931 2 года назад
Hi Nicholas, thanks a lot for your video! I wonder how can I know about the objective function of the agent? Is there a way that I can change the objective function myself? Thanks a lot!
@zitongstudio 3 года назад ⁺¹
Hi, I don't understand why during training the reward is around -0.5, while during testing the reward is around -60. Is it because that the number of steps used for training and testing are different? For training it is over 10000 steps, for testing only 60 steps.
@NicholasRenotte 3 года назад
Different starting points without enough steps to get to the final result. Increasing testing steps would allow the agent to iterate closer.
@sahilahammed7386 2 года назад
hi, do the Observation space and state are same? here observation space isn't used while model training. right?
@bananabatsy3708 3 года назад ⁺¹
I am just getting a NotImplemented Error in the FOR LOOP cell. I looked up. It has to do with inheritance. But I cannot find how.
@NicholasRenotte 3 года назад
Heya @BananaBatsy, where about's is the error being triggered?
@andreamaiellaro6581 Год назад
Hi Nicholas, I found this tutorial of great help!Thank you. However I'd like to ask you if what I have in mind is correct or not: within the step function can I update what side the observation_state?
@montraydavis 2 года назад
Fantastic tutorial!
I have a question though regarding the DQNAgent test.
I noticed that at the test function, the only two actions being sent to step are the low and high values. Why is that?
How would I go about this because I need action to be equal to 0, 1 or 2 for my application.
Thanks a lot for this resource!
@PedroAcacio1000 2 года назад
Great video! Thank you very much, sir!
What about if my problem only allow me to determine the reward on the next step, after we took some action? Do you have any video talking about such problems?
Thanks again, your content is helping a lot.
@sebatinoco 3 года назад ⁺¹
Hi Nicholas, amazing video! Quick question, how can I access the current action that the AI is taking? Thanks!
@NicholasRenotte 3 года назад
Heya @Sebastian, I don't believe it's easily accessible through keras-rl. If you're using StableBaselines, you can access it through DQN.predict(obs) e.g. github.com/nicknochnack/StableBaselinesRL/blob/main/Stable%20Baselines%20Tutorial.ipynb shown towards the end.
@Antonio-om4sg 2 года назад
Would it be possible to see the evolution (a plot) of the temperature of the water when the agent is run on the scenario? For each episode we would see, for each step, the evolution of the water temperature
@master231090 3 года назад
Amazing video! I had a question about the activation layer and performing the final action. Your activation layer is a linear function. How does link to picking the action?
@lukejames3570 3 года назад
two quesitons, Is the aciton came out of the network go to next step of the env? and how can you be sure the output of the network is 0 1 2 instead of other random number?
@JJGhostHunters Год назад
I love these tutorials, however I have spent hours and hours trying to even get the most simple environment to "render". I have tried two computers with Spyder, Jupyter Notebooks and even the command line and have never been able to even get a window to pop up with a rendered environment.
I am continuing to learn theory of RL, however it is very frustrating not be able to follow along with these tutorials.
@talitaaraujo1327 3 года назад ⁺¹
Man, your videos are so great! Congrats!!!!! I have one question: i can't install a keras-r12. Maybe you can help me.
@NicholasRenotte 3 года назад ⁺¹
Try keras-rl2, it should be an l instead of 1
@talitaaraujo1327 3 года назад
@@NicholasRenotte Thank you!!!
@anamericanprofessor 2 года назад
Any good links to actually overriding the render function for showing our own custom visualization?
@vigneshpadmanabhan Год назад
Is there a Deep Reinforcement learning algorithm we can experiment on regression based tabular data or sensor data etc? If so it would be much appreciated if you could make a video on it. Thanks !
@PhilippWillms 2 года назад ⁺¹
Where does the 24 come from in defining the neural network layers?
@NicholasRenotte 2 года назад ⁺¹
Completely subjective, could change it to a larger or smaller value depending on the complexity of the problem you're trying to solve Philipp!
@zahrarezazadeh293 2 года назад ⁺¹
Thanks Nicholas for the nice tutorial! I have two questions. 1. I'm trying to implement this on PyCharm with Python 3.10, on a MacOS Monterey, Core i3, but with built-in Python 2 something. I can't install and import tensorflow. It says it can't find a satisfying version. Any idea where the problem comes from? different Python versions? any solutions? 2. I'm starting to build my own environment, which is not like any of the ones available. It's a 2D path an agent should try to stay close to, by going left and right, with some gravity. Any suggestions where to start coding it? or any environments you know similar to this? THANK YOU!
@NicholasRenotte 2 года назад ⁺¹
Woah fair few there, take a look at some of the existing Gym envs, I think there might be some path focused ones I might have seen a while ago.
@caio.cortada Год назад
Have you done any of those using MuJoCo environments? Do you have any bibliography on that?
@jossgm7480 Год назад
How can I access the Q-Values? I see that in the training the "mean_q" variable is displayed. How can I access these values (mean q values)?
@monirimmi8616 2 года назад ⁺¹
Hi Nick,
Thank you very much for your nice explanation with outstanding implementation. To this end, I have a question,
How can I check the model parameter update, for example, the weight of each layer? When each training episode is done? Is there any way to check those parameters?
@NicholasRenotte 2 года назад
I think you can export the final keras model, this should allow you to see the model weights etc
@OmarAlolayan 3 года назад ⁺¹
Thank you Nicholas !
Can you please advice me on how to use the step function if I have a mutlidiscrete action space?
@NicholasRenotte 3 года назад ⁺¹
Heya @Omar, what does your output look like if you run env.action_space.sample()
@OmarAlolayan 3 года назад ⁺¹
@@NicholasRenotte Hi Nicholas, Thank you for your reply
it is big action space it is a 3D action space, (3, 10, 10).
array([[[2, 0, 2, 2, 1, 1, 0, 0, 1, 1],
[1, 1, 2, 2, 2, 0, 1, 0, 2, 0],
[1, 2, 0, 1, 0, 2, 0, 1, 1, 1],
[1, 0, 1, 0, 0, 1, 2, 0, 1, 1],
[1, 2, 0, 2, 2, 0, 1, 0, 0, 2],
[2, 2, 0, 2, 0, 1, 1, 0, 2, 2],
[2, 0, 1, 1, 0, 0, 1, 1, 1, 1],
[0, 2, 2, 2, 2, 1, 0, 0, 0, 2],
[1, 2, 2, 0, 1, 1, 1, 2, 2, 2],
[2, 0, 0, 1, 1, 2, 1, 1, 0, 2]],
[[1, 2, 1, 0, 1, 1, 1, 2, 0, 1],
[0, 1, 0, 0, 1, 1, 2, 2, 1, 2],
[0, 2, 1, 0, 2, 1, 2, 2, 2, 1],
[1, 2, 2, 0, 0, 2, 0, 2, 2, 0],
[0, 2, 0, 0, 0, 0, 1, 2, 1, 2],
[1, 2, 1, 1, 1, 2, 0, 1, 2, 1],
[1, 1, 1, 2, 2, 1, 2, 0, 0, 2],
[2, 1, 0, 1, 1, 2, 0, 0, 0, 2],
[0, 0, 1, 1, 1, 0, 1, 2, 2, 1],
[2, 0, 2, 1, 1, 0, 0, 2, 1, 0]],
[[2, 1, 1, 2, 1, 1, 2, 1, 0, 2],
[0, 1, 2, 1, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 2, 1, 2, 0, 1],
[2, 1, 0, 0, 0, 1, 2, 0, 1, 2],
[2, 0, 2, 1, 0, 0, 2, 0, 2, 1],
[0, 1, 0, 1, 1, 0, 2, 0, 0, 2],
[1, 2, 0, 1, 0, 2, 2, 2, 2, 0],
[0, 0, 0, 1, 1, 2, 2, 2, 0, 0],
[1, 2, 2, 2, 1, 0, 2, 0, 1, 1],
[2, 1, 0, 0, 1, 0, 2, 1, 2, 1]]])
@NicholasRenotte 3 года назад ⁺¹
@@OmarAlolayan oh wow, can you try using this with stable-baselines instead? It might be easier to model as the algorithm will pick up the observation space without the need to define the neural network.
@candychebet896 10 месяцев назад
Hello. Did you get to doing the visualization?
@erfankhordad9403 3 года назад ⁺¹
Thanks Nicholas for the great explanation. I have tested this custom environment with PPO and MlpPolicy and got very low rewards around -40 (even with 200000 time steps for model.learn). Any idea why I get poor results? thanks
@NicholasRenotte 3 года назад
Same env as this one or custom one? Might need a little HPO or possibly an alternate algorithm, I think I did it with a slightly different model in the full RL course with better results!
@fernandomelo8460 3 года назад ⁺¹
First, your channel is amazing.
Second, i try to adapt your custom env to a trading env, but, when i use Deep Learning/Keral RL2 it doesnt looks good, my Reward is always the same (and the maximum).
I think the problem is the NN architecture or/and Rl models (BoltzmannQPolicy/DQNAgent), because the loop before Deep Learning part looks ok, do you have any tips?
@NicholasRenotte 3 года назад
Check this out: ruclips.net/video/D9sU1hLT0QY/видео.html
@davidromens9541 3 года назад ⁺¹
Have the libraries for kera rl or keras-rl2 been updated recently? I have been building a custom environment and training a NAF agent to solve it. Last week it was working, but when I came back this week, the code is throwing a key error:0 when my NAF.fit line is run. Any suggestions or help would be greatly appreciated.
@davidromens9541 3 года назад
FYI I am using google colab which requires me to reinstall all libraries every session. I know its not ideal, but unfortunately I am on Windows.
@NicholasRenotte 3 года назад
Heya @David, not too sure I haven't been using keras-rl2 lately, I've been working with stable baselines in it's place. Did you have errors that you can share?
@davidromens9541 3 года назад
@@NicholasRenotte Turned out to be a weird bug with google colab. Restarted my computer and is working fine now. Thanks for the reply though!
@NicholasRenotte 3 года назад
@@davidromens9541 anytime, you're welcome. Weird though. Building anything interesting in the RL space?
@dhiyamdumur6245 2 года назад ⁺¹
Hi Nicholas!
Very informative video! I would like to know if we can implement DDPG in the context of routing in a simulated networking environment to assess its performance in terms of network delay. Thank you
@NicholasRenotte 2 года назад
Yeah probably! I would think you would have different routes or paths for network load then reward based on latency or something of the like!
@summanthreddemulkkalla6786 2 года назад
sir can you implement the optimally placing the Electric vehicle charging stations using DQN please thank you
@kheangngov8005 2 года назад
Hello Nicholas, Ur video is very helpful. I have some questions to ask. I wonder if it is possible to customize actions space for each state and reward only given at the terminal state. For example state 1, with 3 actions, state 2 with 5 actions, 3 with 10 actions, and reward can be calculated based on those action sequence whether it is a win or lose. Thank you.
@fufufukakaka 2 года назад ⁺¹
keras-rl2 is already archived.
@vincentroye 3 года назад ⁺¹
Excellent tutorial, thanks! Is it possible for a RL model to output a pair of ints or floats ( like [1.5, 2.8] ) instead of a discrete value? What would the output layer look like?
@NicholasRenotte 3 года назад ⁺¹
I believe so, you would need your final layers to have a linear activation function and need a box state space! What's the use case if you don't mind me asking @Vincent?
@vincentroye 3 года назад
@@NicholasRenotte thanks for answering. I'd be interested to see how a model could output the best geometrical coordinates for a given state. It could be a 2D game where the player would have to avoid bombs that pseudo-randomly hit a finite surface for example.
@NicholasRenotte 3 года назад
@@vincentroye oh got it! Might I suggest you approach it slightly differently. You would ideally store the state of the objects coordinates and just output the actions for your agent in response to those coordinates. It would be akin to your agent walking around using something like sonar.
@vincentroye 3 года назад
@@NicholasRenotte could the actions in that case be to move x (left or right) and y (up or down) at the same time? That would be the reason for having 2 outputs. I'd be interested to see how a dqn agent would train the model in that case. In your video it takes a discrete value as nb_actions, how would that be done with 2 continuous outputs? that's where I'm a bit confused, that would give place to a huge amount of possible actions.
@alirezaghavidel4594 3 года назад ⁺¹
Thank you for your amazing work. I have a question regarding the defined Environment. I defined the self.state as vector in the __int__ function (self.state =np.zeros(shape=(5,),dtype = np.int64)), but when I want to recall the self.state in Step function, it is an integer. How can I have the vector state in the Step function as well?
@NicholasRenotte 3 года назад
I just took a look at my code and I think it's not perfect tbh. Try setting initial state inside of the reset method.
@alirezaghavidel4594 3 года назад
@@NicholasRenotte Thank you. I did
@alirezaghavidel4594 3 года назад
@@NicholasRenotte I have another question and I would be appreciated if you help me. I defined the environment for multi-component example and I defined the action space and observation space as vectors and now I want to recall them for RL in keras (you used states=env.observation_space.shape and actions = env.action_space.n for input parameters of built_model function). How can I recall them for my multi component example? do you have any example for multi-component example for RL in keras? Thank you
@alessandroceccarelli6889 3 года назад ⁺¹
Shouldn’t you use a softmax output function instead?
@NicholasRenotte 3 года назад
In retrospect, I think I could've done some tweaking to support an alternate activation.
@vts_22 2 года назад ⁺¹
Hi Nicholas,
I watched all of the Reinforcemenet videos of yours and on the internet.(And i have been trying for 10 hours)
What if my state is [10,20,30,40] and my action_space is Discrete 4.
I am getting "DQN expects a model that has one dimension for each action, in this case 4"
My shape is (None, 1,4 ) i cant fix it.
@vts_22 2 года назад
I should probably change DQN and ADAM
@NicholasRenotte 2 года назад ⁺¹
Discrete(4) will return actions 0,1,2,3 as integer values. That sounds like your observation space is incorrect, looks like that would be (None, 4) if you've got [10,20,30,40]
@vts_22 2 года назад
@@NicholasRenotte I edited Sequential layers by looking at your videos, my observation_space was same with you observation_space in one video. I watched all your videos thanks for videos.I built basic snake game but im amining to design 2d space travel game with gravitation and orbits. My problem was creating wrong layers, i understand it better. Thanks again for your excellent videos.
@watchme3086 5 месяцев назад
what happens if we do not include inheritance from gym Env?
@christiansiemering8129 3 года назад ⁺¹
Hi Nicholas, thx for your great video! Is there an easy way to create customized multi-agent environments with aigym? I want to create an AI that competes against another agent in a Multiplayer-"game".
@NicholasRenotte 3 года назад ⁺¹
AFAIK it's not super straightforward with pre-built RL packages that are out there atm. Will probably get to it in future vids!
@Mesenqe 2 года назад
Nice tutorial, could you please make one tutorial on how to use RL in image classification.
@apreceptorswanhindi 2 года назад
Hey man, Thanks for the wonderful video. I made a custom environment, and Keras-rl2 is taking a lot of time, not utilizing the GPUs. How can I optimize the training of this or similar codes using TensorFlow 2 on remote GPU with Ubuntu 20.0?
20.04.4 LTS (GNU/Linux 5.13.0-52-generic x86_64)
NVIDIA-SMI 515.48.07 Driver Version: 515.48.07
with four NVIDIA GeForce RTX 3080 GPUs 10 GB each
@vincentroye 3 года назад ⁺¹
Could you extend and complexify this tutorial using a dictionary of boxes as observation space please? I can't find any tutorial for the creation of more advanced environments and training with Keras.
@NicholasRenotte 3 года назад ⁺¹
Yup, I'm going to do a deeper dive into building environments. Got it planned @Vincent, ideally it'll be a 3 hour free short course on RUclips.
@vincentroye 3 года назад ⁺¹
@@NicholasRenotte thanks a lot. The part that I don't find easy is the design of the rl model when the observation space is complicated. I tried a few things and I often get problems with input/output shapes and non allowed operations between lists and dicts. My current alternative is to look at the open ai solved environments but I haven't find any working code nor environment for an observation space built with a dict.
@vincentroye 3 года назад
I just found that gist.github.com/bklebel/e3bd43ce228a53d27de119c639ac61ee but they don't work for me.
@NicholasRenotte 3 года назад ⁺¹
@@vincentroye agreed, there isn't a lot out there on building environments! I'm wrapping up my Object Detection series then will start on a deep dive into RL including environment building!
@tareklimem 2 года назад
Hei Nick, i need a class for reinforcement learning. How can we get in contact please?
@tomleyshon8610 Год назад
Hey great tutorial! However, when I run your ipynb I get a TypeError when building the agent. TypeError: Keras symbolic inputs/outputs do not implement '__len__'. This error occurs when executing build_agent(model,actions). I'm wondering if anyone else had this issue when trying to run the notebook?
@hanswurst9667 3 года назад
Cheers Nicholas! How would one got about adding another dimension to the state? for example velocity of the water?
@NicholasRenotte 3 года назад ⁺¹
Heya @hans, you could add another box which represents the by grouping them into a Dict space:
e.g.
from gym.spaces import Discrete, Box, Dict
spaces = {
'temperature': Box(low=np.array([0]), high=np.array([100])), #degrees
'velocity':Box(low=np.array([0]), high=np.array([30])) #litres/sec
}
dict_space = gym.spaces.Dict(spaces)
self.observation_space = dict_space
@hanswurst9667 3 года назад ⁺¹
@@NicholasRenotte Thanks so much for the great reply! You wouldnt know by chance any great resources/tutorials concerning multi agent rl?
@NicholasRenotte 3 года назад ⁺¹
@@hanswurst9667 nothing that I've tested out already unfortunately :( Will let you know when I find something that works seamlessly!

Следующие

Автовоспроизведение

A.I. Learns to Play Space Invaders | Reinforcement Learning