Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Поделиться
HTML-код
  • Опубликовано: 16 дек 2024

Комментарии • 35

  • @ayushthada9544
    @ayushthada9544 6 лет назад +41

    He is the only guy for which I have to watch RUclips video at normal speed. The best thing about Andrej is that he gives a new perspective about the concepts which you already know very well.

  • @TheAIEpiphany
    @TheAIEpiphany 3 года назад +13

    Amazing lecture, thank you, Andrej! (I'm 4 years late but still hahaha).
    00:00 Intro
    01:05 Pong example high-level explanation (from his blog "Pong from pixels")
    11:10 Q&A
    19:13 Pong example detailed walk-through
    33:10 Q&A
    Some notes:
    * He calls the input to sigmoid "logp" when it should actually be a logit and not log probability
    * Numbers on his chart, before the positive reward, are +0.27, +0.24, etc. but that should be +0.9, +0.81 instead (if gamma == 0.9).
    * Some minor typos like backpro prelu instead of backprop relu xD, etc.

  • @alexanderli9067
    @alexanderli9067 6 лет назад +50

    Karpathy somehow manages to talk at 5x the speed of Mnih

  • @panchajanya91
    @panchajanya91 7 месяцев назад

    Andrej is a superstar in the field. It's always a pleasure listening to Andrej.

  • @omegzable
    @omegzable 7 лет назад +20

    This lecture is Gold

  • @PaAGadirajuSanjayVarma
    @PaAGadirajuSanjayVarma 4 года назад +3

    Andre karpathy is so humble that i want to be like him one day.Thanks for great lecture andre

  • @adibhide9019
    @adibhide9019 5 лет назад +5

    And I use to think that I have understood Policy Gradient, nice lecture!

  • @mrbom3977
    @mrbom3977 6 лет назад +4

    Best interpretation of Policy Gradients

  • @st3ppenwolf
    @st3ppenwolf Год назад +1

    Andrej should make money rolling out his own Khan academy-like business, his explanations are just brilliant!

    • @rafaelsouza4575
      @rafaelsouza4575 Год назад

      totally, just incredible piece-by-piece explanation

  • @dubeya01
    @dubeya01 5 лет назад +2

    What a lecture... blown away!

  • @hazemahmed8333
    @hazemahmed8333 4 года назад +3

    this is pure gold thank you so much !!

  • @brishtiteveja
    @brishtiteveja Год назад

    Policy gradient explanation couldn't be simpler than this..

  • @naeemajilforoushan5784
    @naeemajilforoushan5784 7 месяцев назад

    great lecture and speak very well.

  • @sezan92
    @sezan92 6 лет назад +7

    I may be dumb, but how is the np.dot(W2,h) is log probability ?it can be seen in 2:59

    • @arkoraa
      @arkoraa 6 лет назад +7

      I think it's meant to be logit_p, suggesting that it's a quantity that is going to be mapped to a probability by the sigmoid function. It's certainly confusing to name a variable like this. He even wrote the comment wrong, as he briefly mentioned in 3:09

    • @stefanbschneider
      @stefanbschneider 4 года назад

      The input to the sigmoid function is often called "logit". See Hands on ML 2nd edition page 144

  • @sibyjoseplathottam4828
    @sibyjoseplathottam4828 4 года назад +4

    I wish there were two Kartpathy's, one for Computer Visio and another for RL.

  • @rafaelsouza4575
    @rafaelsouza4575 Год назад

    maybe adding a moving average of the last predictions could make the moves more smoother and less trembling.

  • @florentinrieger5306
    @florentinrieger5306 2 года назад

    Ah and great lecture! :)

  • @ryanmckenna2047
    @ryanmckenna2047 10 месяцев назад

    Why is the learning rate in the RMSProp formula -alpha but Andrej's code he uses +alpha?

  • @JohnGFisher
    @JohnGFisher 6 лет назад

    This is fantastic.

  • @aryan_kode
    @aryan_kode 4 года назад +1

    just found out that he is the director of ai at tesla

  • @pauldacus4590
    @pauldacus4590 2 года назад +1

    Andrej simultaneously seems to want to answer questions, and stop answering questions as soon as possible.

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 лет назад +4

    great talk if 0.75

  • @antonio.7557
    @antonio.7557 5 лет назад

    How does it work for a continuous action space? For example if the steering angle is 23 degrees, and it's a succesfull episode, then there is no error between 23(label) and 23(output), and therefore there's no gradient

    • @antonio.7557
      @antonio.7557 5 лет назад

      In a discrete one you can minimize the error between 0.7(probability of up) and 1 (up), i get that. But if it's a continuous action space i don't see how this works. You have to discretize and instead of one steering angle you have a probability value for each of the 360 degree angles?

  • @nanthakr8378
    @nanthakr8378 4 года назад

    Can somebody give me the link for the code

  • @soylentpink7845
    @soylentpink7845 10 месяцев назад

    Sheldon :D

  • @dr.mikeybee
    @dr.mikeybee 10 месяцев назад

    Of course, using an abstract representation rather than raw pixels would be helpful. Pong can be looked at as a simple Newtonian physics environment.

  • @Metalwrath2
    @Metalwrath2 5 лет назад +1

    Pepega

    • @Nickben89
      @Nickben89 5 лет назад +1

      Pepega indeed my friend...