As someone who is studying in the field of neuromodulation, it’s quite hard to wrap my head around these tough topics. This course taught by MIT helps me to better understand these difficult topics. Thanks a lot!
Love the marriage of Reinforcement Learning with Deep Learning and Deep Learning exploring and interacting with the environment! Yet another masterpiece Alexander. Thank you
Thank you for such an amazing course! For the seemingly unintuitive optimal solution of the Atari game, I think it is due to the constraints of human that we may not be able to move fast and accurate enough to catch the ball when it speeds up once it breaks through the corners, which is not a problem at all for the computer. To verify this hypothesis, we can lower the accuracy of catching the ball and see how the optimal strategy changes with it.
Awesome video Alexander, Reinforcement Learning is yet, another great and important step in Deep Learning. The set of applications in which Reinforcement can be applied it's amazing! I really liked your explanation about the Q Values using Atari Breakout, it's so good! Thanks for this new class! I'm learning a lot with this course, it's surely complimenting my studies into Deep Learning!
I haven't come across any other material on the topic that's as lucid as this. Thanks a lot for the exposition. A natural question: is this field's advance going to depend on the development of simulators for real-world situations? Can you please provide some details on how you developed the simulator for autonomous vehicles?
Thanks! Definitely, there is a ton of amazing research on real world robotics today. If you're interested in VISTA specifically you can find all the technical details here: www.mit.edu/~amini/vista/ You can find the paper we published that explains in greater detail.
I think there's a mistake at 12:30, the formula for the return is not equal to the formula given on the next slide -- all the terms have been multiplied by gamma^t.
Thanks for the awesome and super clean presentation. I have a question here. The discounted total reward definition on 12:30 and on 12:37 seems different to me. Is there anything I'm missing?
Thanks for the great course! Is there a slack channel/ discord channel/ community group where we can discuss the topics, ask questions, link to resources we found to be helpful?
Hello professor Amini, i am an Italian student of robotics engineer. Sorry for my English. I want to ask you some questions. How much Mu Zero know about the game? I think there was a supervisor who give to AI a reward but i don't understand, for example, if Muzero try to put 2 stones of go in the table the supervisor gives to its a “penalty” and reduce a policies of gradient of the AI. But ow can't it diverge in "strange behaviors"? Now I think there are two case and two question about that: 1) Case of “not optimal rules”, in these is the case for example if these actions is one under the other in tree and so the nodes are in series one with each other. Muzero don’t try to travel down across these part of tree action of these type of action and so don’t try again to increase the number of stones in table. For example, Muzero don’t try to put 3 stones in the table. The first question is: If for example we have a variant of go where we can put one stone in the table and three stones but not 2 stones, with these type of implementation the AI never go in these part of three and never do these type of action, mostly if MuZero learn the rules playing alone. And also not even if Muzero play against a human because in these case MuZero see these type of move to put 3 stones but MuZero don’t go in these part of tree because it was decreased by policies. How did Muzero solve it? 2) Case of “diverge to Mad rules”. In these case in the tree the actions (like put n stones) is one next to each other and so the nodes are in parallel each other and Muzero diverge to try to do nth action of put n stone. The second question is : how did Muzero solve it?
But how do you calculate the target return in deepq learning, how do we find the best cause of action without the agent? It would be nice if that was included as well.
In the policy gradient formular, a higher reward will also result in a higher "loss", which deep learning algorithms typically try to minimize. Why minimize the higher rewards? Or is it a maximization of loss...
This a very good source of info, thank you! I'll continue taking all the following classes throughout the next month. The approach to autonomous driving with reinforcement learning seems to a be different from the one Tesla takes. There's no labeling at all. I wonder if Tesla is also developing a RL model to be employed in the future, because it seems more scalable & easier and faster to train? Or, since I'm no expert, Tesla's approach could have some advantage long term too. Time will show. And maybe Kaparthy (or Elon) will enlighten us on this, on AI day.
44:40 in the loss function, in the case of high likelihood and small reward, the loss is small and therefore no action would be taken to lower the likelihood of a small reward action. Am I getting it right and it's a limitation or I am not getting it right? the resume is that I'm thinking that the loss function is working to increase the likelihood of big reward actions but does not work to lower the likelihood of small reward actions (edit) at least not directly
My question is, is there a way to use AI to train a human to improve in their life? And how to help someone willing to work with an AI to improve their life if it is possible. The AI would have to monitor what helps a person stay motivated and inspire to continue improving towards their goals with an optimal environment and personal queues.
Thanks for this Amazing Lecture with Amazing Instructors provided for free! I will be donating $1M to MIT when I create my great AI Solution and become a billionaire
12:27 you're discounting incorrectly. Gamma should always be more than 1 and is computed this way. Gamma =(1+f)^t f there is discount factor t is nth period (or nth step, or nth state of world) Edit: forgot braces
As someone who is studying in the field of neuromodulation, it’s quite hard to wrap my head around these tough topics. This course taught by MIT helps me to better understand these difficult topics. Thanks a lot!
Quite engaging and intellectually stimulating. Thanks Alex for the explicit analysis of the Deep RL algorithm
What a pedagogically terrific lecture. From the high level explanation, visualization to the sublet knowledge, and state-of-the-art. Happily watching!
Love the marriage of Reinforcement Learning with Deep Learning and Deep Learning exploring and interacting with the environment! Yet another masterpiece Alexander. Thank you
Thank you for such an amazing course! For the seemingly unintuitive optimal solution of the Atari game, I think it is due to the constraints of human that we may not be able to move fast and accurate enough to catch the ball when it speeds up once it breaks through the corners, which is not a problem at all for the computer. To verify this hypothesis, we can lower the accuracy of catching the ball and see how the optimal strategy changes with it.
Awesome video Alexander, Reinforcement Learning is yet, another great and important step in Deep Learning. The set of applications in which Reinforcement can be applied it's amazing! I really liked your explanation about the Q Values using Atari Breakout, it's so good! Thanks for this new class! I'm learning a lot with this course, it's surely complimenting my studies into Deep Learning!
Thanks, Alexander for this great lecture! I can't wait till the next Friday for the next lecture!
In less than 60mins, it is very comprehensive and NOT boring.
The whole presentation is so clean and engaging. Thank you for the awesome introduction to RL🙂.
Thank you for the awesome introduction to RL
Thanks a lot for this lecture! You are a great instructor. Please keep uploading :)
Great intro video for reinforcement learning. Thank you so much!
Thanks a lot , Alexander
Thank Alexander!!! Great learning resource
Really excited!
تمام ویدئو های شما رو میبینم آقای امینی عزیز
بسیار عالی هستن
خیلی ممنون میشیم اگر مثال های عملی بیشتری داشته باشید
تشکر
I haven't come across any other material on the topic that's as lucid as this. Thanks a lot for the exposition.
A natural question: is this field's advance going to depend on the development of simulators for real-world situations? Can you please provide some details on how you developed the simulator for autonomous vehicles?
Thanks! Definitely, there is a ton of amazing research on real world robotics today. If you're interested in VISTA specifically you can find all the technical details here: www.mit.edu/~amini/vista/ You can find the paper we published that explains in greater detail.
What a great great great lecture!! Thank you for making this public!!
so much joy by watching these videos. feel like the lecturer has a great personality. plz keep uploading. make education is equal for every one
This is much more engaging than Lex lecture
Thanks for this.
really thank you so much, my holidays are so much better and more interesting !
I think there's a mistake at 12:30, the formula for the return is not equal to the formula given on the next slide -- all the terms have been multiplied by gamma^t.
Not even Meruem was able to beat the Go World Champion. Really amazing result.
Thanks for this great lecture
Please do PPO and DDPG ...
There aren't good lectures about them on RUclips
Love this thanks
Thank you. Greetings from Paraguay.
Such a great lecture. Really enjoy it.
Awesome lecture, learned a lot about RL in just 57 minutes. Thanks for making it so effortless to learn such a difficult and complex topic.
How can we use deep neural networks to model Q-function and learn it? Cool
Thank you.
Thanks for the awesome and super clean presentation. I have a question here. The discounted total reward definition on 12:30 and on 12:37 seems different to me. Is there anything I'm missing?
This was very interesting. Thanks
Thanks for the great course! Is there a slack channel/ discord channel/ community group where we can discuss the topics, ask questions, link to resources we found to be helpful?
Great Lecture Alex! Thank you!
Awesome lecture!
Thanks a lot for this amazing lecture as a beginner, it helped me a lot, well explained
Hello professor Amini, i am an Italian student of robotics engineer.
Sorry for my English.
I want to ask you some questions.
How much Mu Zero know about the game? I think there was a supervisor who give to AI a reward but i don't understand, for example, if Muzero try to put 2 stones of go in the table the supervisor gives to its a “penalty” and reduce a policies of gradient of the AI.
But ow can't it diverge in "strange behaviors"?
Now I think there are two case and two question about that:
1) Case of “not optimal rules”, in these is the case for example if these actions is one under the other in tree and so the nodes are in series one with each other. Muzero don’t try to travel down across these part of tree action of these type of action and so don’t try again to increase the number of stones in table. For example, Muzero don’t try to put 3 stones in the table.
The first question is:
If for example we have a variant of go where we can put one stone in the table and three stones but not 2 stones, with these type of implementation the AI never go in these part of three and never do these type of action, mostly if MuZero learn the rules playing alone. And also not even if Muzero play against a human because in these case MuZero see these type of move to put 3 stones but MuZero don’t go in these part of tree because it was decreased by policies. How did Muzero solve it?
2) Case of “diverge to Mad rules”. In these case in the tree the actions (like put n stones) is one next to each other and so the nodes are in parallel each other and Muzero diverge to try to do nth action of put n stone.
The second question is :
how did Muzero solve it?
Excellent lecture, thank you.
But how do you calculate the target return in deepq learning, how do we find the best cause of action without the agent? It would be nice if that was included as well.
thank you so much, very interesting lecture!
Thanks a lot for your videos
please can you give us code implementation ?
the explanation is amazing!
good content
I loved the idea of training in a simulation, this could be great for aircrafts, helicopter and other things too :))
Lol!! Nope.
In the policy gradient formular, a higher reward will also result in a higher "loss", which deep learning algorithms typically try to minimize. Why minimize the higher rewards? Or is it a maximization of loss...
This a very good source of info, thank you! I'll continue taking all the following classes throughout the next month.
The approach to autonomous driving with reinforcement learning seems to a be different from the one Tesla takes. There's no labeling at all.
I wonder if Tesla is also developing a RL model to be employed in the future, because it seems more scalable & easier and faster to train?
Or, since I'm no expert, Tesla's approach could have some advantage long term too. Time will show. And maybe Kaparthy (or Elon) will enlighten us on this, on AI day.
44:40 in the loss function, in the case of high likelihood and small reward, the loss is small and therefore no action would be taken to lower the likelihood of a small reward action. Am I getting it right and it's a limitation or I am not getting it right? the resume is that I'm thinking that the loss function is working to increase the likelihood of big reward actions but does not work to lower the likelihood of small reward actions (edit) at least not directly
My question is, is there a way to use AI to train a human to improve in their life? And how to help someone willing to work with an AI to improve their life if it is possible. The AI would have to monitor what helps a person stay motivated and inspire to continue improving towards their goals with an optimal environment and personal queues.
Thanks for this Amazing Lecture with Amazing Instructors provided for free! I will be donating $1M to MIT when I create my great AI Solution and become a billionaire
We will both work together for a solution
Can I join you haha?
We need Alexander to join our team! Big thanks for organizing this course can't wait for the remaining series.
You are one handsome nerd!
12:27 you're discounting incorrectly. Gamma should always be more than 1 and is computed this way. Gamma =(1+f)^t
f there is discount factor
t is nth period (or nth step, or nth state of world)
Edit: forgot braces
You should have hired Korean starcraft player :)...