MIT 6.S191 (2019): Deep Reinforcement Learning

Поделиться
HTML-код
  • Опубликовано: 27 дек 2024

Комментарии • 38

  • @kimzauto5045
    @kimzauto5045 5 лет назад +21

    Clearly one of the best videos on the topic, the use of examples was really good.

  • @mmattb
    @mmattb 3 года назад +3

    Unusually clear presentation; well done Alexander.

  • @vinodpareek2268
    @vinodpareek2268 4 года назад +3

    You are one of the best teacher i have ever seen..

  • @Lezmonify
    @Lezmonify 5 лет назад +2

    Is there a typo at 10:01? Intuitively, it seems like the exponent of γ should (i - t) since, in current formulation, the reward terms will quickly go to 0 when t becomes large.

    • @brycejianchen2795
      @brycejianchen2795 5 лет назад

      Yes, I think the coefficient of r[t] should be gamma^0 which is one here

  • @shambles7409
    @shambles7409 3 года назад

    in 34:35 how do I calculate the log-likelihood of the action given the state?

  • @AK47dev1_YT
    @AK47dev1_YT 2 года назад

    فوق العاده بود آقای امینی

  • @Inviaz
    @Inviaz 5 лет назад +2

    What is max Q (s' , a' ) ? When i have a lot of future states and they are unknown , how can I destinate the max Q ( s' , a' ) ? 24:00

    • @tracev9381
      @tracev9381 5 лет назад +3

      sample your network again with the new state.

    • @sarthaksg
      @sarthaksg 5 лет назад

      Here, s' is the next upcoming state and a' is the next action. Max Q(s',a') would be the max of all the Q values for the next action and state. In that equation, the left term is an estimated "actual" value of the future reward which is the sum of the current reward and the reward of the next best action.

    • @imadeddineibrahimbekkouch11
      @imadeddineibrahimbekkouch11 5 лет назад +2

      If your state space is uncountable or continuous, don't use Q models

  • @nosachamos
    @nosachamos 3 года назад +2

    Fantastic, very clear and concise. Great work!

  • @samgears3937
    @samgears3937 4 года назад +1

    At 36.02 does anyone know what theta is? is it a policy?

    • @AAmini
      @AAmini  4 года назад +1

      Theta represents all of the weights of the neural network policy (pi), which is a network that takes as input the state (s_t) and outputs the likelihood of taking each action (a_t).

  • @mehwishqazi4381
    @mehwishqazi4381 5 лет назад

    Very well explained. How to get the slides? The link in the bio mentions coming soon!

  • @r00t67
    @r00t67 5 лет назад

    Very good lecture. Just one moment, i not unrestand hot it policy createng (maby Alexander show it by laser stick, but it not showing in slides)

  • @romesh58
    @romesh58 3 года назад +1

    Great video. The whole series is very good

  • @ahmarhussain8720
    @ahmarhussain8720 2 года назад

    very good way of explaining

  • @davidsasu8251
    @davidsasu8251 2 года назад +1

    I love you guys!

  • @ycnim34
    @ycnim34 5 лет назад +2

    Thank you all for these great videos. One thing I want to mention is that the audio volume is a little bit too low

  • @scottterry2606
    @scottterry2606 3 года назад +1

    Outstanding. Thank you.

  • @hhumar987
    @hhumar987 4 года назад +2

    can you also teach how to write code for it?

  • @hullopes
    @hullopes 4 года назад +1

    It was very clear and helpful.

  • @SHUBHAMKUMAR-xe4is
    @SHUBHAMKUMAR-xe4is 5 лет назад +2

    Amazing video... Kind of Reinforcement Learning in a nutshell..

  • @vincentkaruri2393
    @vincentkaruri2393 4 года назад +1

    This is really good. Thank you!

  • @malekbaba7672
    @malekbaba7672 5 лет назад +2

    Thank you so much guys.

  • @niazmorshedulhaque4519
    @niazmorshedulhaque4519 4 года назад

    Excellent tutorial indeed

  • @hanimahdi7244
    @hanimahdi7244 3 года назад +1

    Thank you!

  • @harrypotter1155
    @harrypotter1155 5 лет назад +3

    What a really nice course!

  • @chicagogirl9862
    @chicagogirl9862 5 лет назад +1

    Good course, thankyou

  • @SphereofTime
    @SphereofTime 9 месяцев назад

    7:00

  • @waqasaps
    @waqasaps 3 года назад

    wow, thanks.

  • @sitrakaforler8696
    @sitrakaforler8696 Год назад

    Dude it's awesome T^T

  • @muhammadnajamulislam2823
    @muhammadnajamulislam2823 5 лет назад +2

    please increase sound level

  • @davidj1395
    @davidj1395 3 года назад +1

    WHAT?!

  • @canelbuino7087
    @canelbuino7087 3 года назад

    This is why terminator is so fake... The AI will learn not to miss a shot within 20mins.