MIT 6.S191 (2021): Reinforcement Learning

Alexander Amini

Просмотров 104 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 15 янв 2025

Комментарии • 60

@rickknoben1999 3 года назад ⁺²⁶
As someone who is studying in the field of neuromodulation, it’s quite hard to wrap my head around these tough topics. This course taught by MIT helps me to better understand these difficult topics. Thanks a lot!
@josephozone9620 3 года назад ⁺⁴
Quite engaging and intellectually stimulating. Thanks Alex for the explicit analysis of the Deep RL algorithm
@DuongNguyen-fs7wi 2 года назад
What a pedagogically terrific lecture. From the high level explanation, visualization to the sublet knowledge, and state-of-the-art. Happily watching!
@ajaytaneja111 3 года назад
Love the marriage of Reinforcement Learning with Deep Learning and Deep Learning exploring and interacting with the environment! Yet another masterpiece Alexander. Thank you
@seasnowcai 3 года назад ⁺²
Thank you for such an amazing course! For the seemingly unintuitive optimal solution of the Atari game, I think it is due to the constraints of human that we may not be able to move fast and accurate enough to catch the ball when it speeds up once it breaks through the corners, which is not a problem at all for the computer. To verify this hypothesis, we can lower the accuracy of catching the ball and see how the optimal strategy changes with it.
@reandov 3 года назад ⁺³
Awesome video Alexander, Reinforcement Learning is yet, another great and important step in Deep Learning. The set of applications in which Reinforcement can be applied it's amazing! I really liked your explanation about the Q Values using Atari Breakout, it's so good! Thanks for this new class! I'm learning a lot with this course, it's surely complimenting my studies into Deep Learning!
@carletonai6597 3 года назад ⁺⁵
Thanks, Alexander for this great lecture! I can't wait till the next Friday for the next lecture!
@KensonLeung0 3 года назад ⁺¹
In less than 60mins, it is very comprehensive and NOT boring.
@nitinprasad7163 3 года назад ⁺³
The whole presentation is so clean and engaging. Thank you for the awesome introduction to RL🙂.
@hanimahdi7244 3 года назад ⁺¹
Thank you for the awesome introduction to RL
@Majagarbulinska 3 года назад ⁺⁴
Thanks a lot for this lecture! You are a great instructor. Please keep uploading :)
@Sopiro 3 года назад ⁺²
Great intro video for reinforcement learning. Thank you so much!
@medhavimonish41 3 года назад ⁺³
Thanks a lot , Alexander
@Shah_Khan 3 года назад ⁺³
Thank Alexander!!! Great learning resource
@swimwearfactorytosfan9558 2 года назад
Really excited!
@Amir-qe6gv 3 года назад ⁺¹
تمام ویدئو های شما رو میبینم آقای امینی عزیز
بسیار عالی هستن
خیلی ممنون میشیم اگر مثال های عملی بیشتری داشته باشید
تشکر
@nintishia 3 года назад ⁺⁴
I haven't come across any other material on the topic that's as lucid as this. Thanks a lot for the exposition.
A natural question: is this field's advance going to depend on the development of simulators for real-world situations? Can you please provide some details on how you developed the simulator for autonomous vehicles?
@AAmini 3 года назад
Thanks! Definitely, there is a ton of amazing research on real world robotics today. If you're interested in VISTA specifically you can find all the technical details here: www.mit.edu/~amini/vista/ You can find the paper we published that explains in greater detail.
@MuriloBoaretoDelefrate 3 года назад ⁺¹
What a great great great lecture!! Thank you for making this public!!
@R1l2eiy589 3 года назад
so much joy by watching these videos. feel like the lecturer has a great personality. plz keep uploading. make education is equal for every one
@vikkicol9444 3 года назад ⁺¹
This is much more engaging than Lex lecture
@adarshsrivastav2925 3 года назад ⁺⁶
Thanks for this.
@tho_norlha 2 года назад
really thank you so much, my holidays are so much better and more interesting !
@gorgolyt 3 года назад
I think there's a mistake at 12:30, the formula for the return is not equal to the formula given on the next slide -- all the terms have been multiplied by gamma^t.
@leohmcs 3 года назад
Not even Meruem was able to beat the Go World Champion. Really amazing result.
@AChadi-ug9pg 3 года назад ⁺²
Thanks for this great lecture
Please do PPO and DDPG ...
There aren't good lectures about them on RUclips
@naveenrs7460 3 года назад ⁺¹
Love this thanks
@pedrovelazquez138 3 года назад ⁺¹
Thank you. Greetings from Paraguay.
@Fordance100 3 года назад ⁺³
Such a great lecture. Really enjoy it.
@rasheedkhan4512 3 года назад ⁺²
Awesome lecture, learned a lot about RL in just 57 minutes. Thanks for making it so effortless to learn such a difficult and complex topic.
@macknightxu2199 3 года назад ⁺¹
How can we use deep neural networks to model Q-function and learn it? Cool
@szymonk.7237 3 года назад ⁺²
Thank you.
@jishnubanerjee2492 3 года назад
Thanks for the awesome and super clean presentation. I have a question here. The discounted total reward definition on 12:30 and on 12:37 seems different to me. Is there anything I'm missing?
@tomha1 3 года назад ⁺¹
This was very interesting. Thanks
@brooklynxu7075 3 года назад ⁺¹
Thanks for the great course! Is there a slack channel/ discord channel/ community group where we can discuss the topics, ask questions, link to resources we found to be helpful?
@gibsonhan7251 3 года назад ⁺¹
Great Lecture Alex! Thank you!
@SabaRahimi 3 года назад
Awesome lecture!
@bochrachemam 3 года назад
Thanks a lot for this amazing lecture as a beginner, it helped me a lot, well explained
@dariodifrancesco2479 3 года назад
Hello professor Amini, i am an Italian student of robotics engineer.
Sorry for my English.
I want to ask you some questions.
How much Mu Zero know about the game? I think there was a supervisor who give to AI a reward but i don't understand, for example, if Muzero try to put 2 stones of go in the table the supervisor gives to its a “penalty” and reduce a policies of gradient of the AI.
But ow can't it diverge in "strange behaviors"?
Now I think there are two case and two question about that:
1) Case of “not optimal rules”, in these is the case for example if these actions is one under the other in tree and so the nodes are in series one with each other. Muzero don’t try to travel down across these part of tree action of these type of action and so don’t try again to increase the number of stones in table. For example, Muzero don’t try to put 3 stones in the table.
The first question is:
If for example we have a variant of go where we can put one stone in the table and three stones but not 2 stones, with these type of implementation the AI never go in these part of three and never do these type of action, mostly if MuZero learn the rules playing alone. And also not even if Muzero play against a human because in these case MuZero see these type of move to put 3 stones but MuZero don’t go in these part of tree because it was decreased by policies. How did Muzero solve it?
2) Case of “diverge to Mad rules”. In these case in the tree the actions (like put n stones) is one next to each other and so the nodes are in parallel each other and Muzero diverge to try to do nth action of put n stone.
The second question is :
how did Muzero solve it?
@mohammadalaaelghamry8010 3 года назад
Excellent lecture, thank you.
@Samuel-wl4fw 3 года назад
But how do you calculate the target return in deepq learning, how do we find the best cause of action without the agent? It would be nice if that was included as well.
@yonatannisenboym2548 3 года назад
thank you so much, very interesting lecture!
@focusonlife3242 3 года назад
Thanks a lot for your videos
@nadabaydoun6701 3 года назад
please can you give us code implementation ?
the explanation is amazing!
@how2circuit 3 года назад ⁺¹
good content
@harshkumaragarwal8326 3 года назад ⁺¹
I loved the idea of training in a simulation, this could be great for aircrafts, helicopter and other things too :))
@rayamoooooo685 3 года назад
Lol!! Nope.
@Samuel-wl4fw 3 года назад
In the policy gradient formular, a higher reward will also result in a higher "loss", which deep learning algorithms typically try to minimize. Why minimize the higher rewards? Or is it a maximization of loss...
@luccahuguet 3 года назад
This a very good source of info, thank you! I'll continue taking all the following classes throughout the next month.
The approach to autonomous driving with reinforcement learning seems to a be different from the one Tesla takes. There's no labeling at all.
I wonder if Tesla is also developing a RL model to be employed in the future, because it seems more scalable & easier and faster to train?
Or, since I'm no expert, Tesla's approach could have some advantage long term too. Time will show. And maybe Kaparthy (or Elon) will enlighten us on this, on AI day.
@hamzamameche3893 3 года назад
44:40 in the loss function, in the case of high likelihood and small reward, the loss is small and therefore no action would be taken to lower the likelihood of a small reward action. Am I getting it right and it's a limitation or I am not getting it right? the resume is that I'm thinking that the loss function is working to increase the likelihood of big reward actions but does not work to lower the likelihood of small reward actions (edit) at least not directly
@JebBradwell 3 года назад
My question is, is there a way to use AI to train a human to improve in their life? And how to help someone willing to work with an AI to improve their life if it is possible. The AI would have to monitor what helps a person stay motivated and inspire to continue improving towards their goals with an optimal environment and personal queues.
@gedeoneyasu6811 3 года назад ⁺⁶
Thanks for this Amazing Lecture with Amazing Instructors provided for free! I will be donating $1M to MIT when I create my great AI Solution and become a billionaire
@dragon_warrior_ 3 года назад
We will both work together for a solution
@user-kq5cd7bd3o 3 года назад
Can I join you haha?
@gedeoneyasu6811 3 года назад ⁺³
We need Alexander to join our team! Big thanks for organizing this course can't wait for the remaining series.
@devonk298 3 года назад
You are one handsome nerd!
@muhammadsaadmansoor7777 3 года назад ⁺¹
12:27 you're discounting incorrectly. Gamma should always be more than 1 and is computed this way. Gamma =(1+f)^t
f there is discount factor
t is nth period (or nth step, or nth state of world)
Edit: forgot braces
@kyowonjeong4860 3 года назад
You should have hired Korean starcraft player :)...

Следующие

Автовоспроизведение

MIT 6.S191 (2021): Deep Learning New Frontiers