Q Learning simply explained | SARSA and Q-Learning Explanation
HTML-код
- Опубликовано: 10 фев 2025
- This problem is from a book called Reinforcement Learning: In Introduction by Richard S. Sutton and Andrew G. Barto. I found this problem to be a good way to introduce SARSA and Q-Learning. I am not an expert in reinforcement learning, but I find these kind of ideas interesting. I thought it would be cool to explore reinforcement learning and make a video explaining a concept to the best of my ability. I will be making more videos about reinforcement learning in the future and hopefully my explanations get better as time goes on.
Credits:
I used Manim for the animations.
All of the information on reinforcement learning came from the RL book by Sutton and Barto. I didn't explain the concepts well enough in the video to do the book justice. The book is very well written.
The environment is from AIGym.
GitHub:
github.com/mar...
Thanks for the intro video! I’m kinda trying to build an agent from scratch (without using any existing libraries), so first learning the fundamentals
Thanks for this introductory video. It helped me a lot.
Really nice approach to intuitively compare SARSA and Q-Learining, thanks!
Thanks for this video, helping me a lot with my uni work
this really boosted my understanding
I have written a code that will absolutely trick and safe guard the AI to never go bad
Wow, great explained
great work brother
Very interesting!
Nice explanation
Thanks
very well explained
thank you
Nice.
Hello, where could i find code for that?
Hello. My github has the code under the "SARSA-and-Q_Learning" tab. Link to the github page is in the description.
Whatt so q learning tries to predict the future rewards
does this mean it's not even using a neural network?
Q-Learning dont use neuraln neutworks, its a table that the agents learns to complete and then uses to solve a problem
@@manuelabarcacrespo8298 is Q learning also used to generate the training data for an NN?
This is a different kind of ML process called Markov Decision Process.
This specific one doesn't use a neural network. We use NNs as learned models to predict (e.g.) the Q values of (s,a) pairs in situations where the state space is so large that we can't get good estimates of Q(s, a) using the manner described in this video (because it would just take too long), or at other times using them as our policies themselves. Look up stuff like Deep Q-Learning.