Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Поделиться
HTML-код
  • Опубликовано: 16 дек 2024

Комментарии • 45

  • @zenchiassassin283
    @zenchiassassin283 3 года назад +2

    Some timestamps:
    - Exercise 1 : Effect of discount (factor/rate) and noise : at 32:41
    - Exercise 2 : Policy evaluation with stochastic policy : at 45:22
    - Policy Improvement Idea at 49:21 to 50:10 and 52:55 to 54:12
    - Infinite actions : exact methods barely ever work : at 54:25

  • @MsgrTeves
    @MsgrTeves 7 лет назад +13

    This RL bootcamp is incredible.

  • @afrozenator
    @afrozenator 6 лет назад +2

    Starts at 1:00

  • @sunderrajan6172
    @sunderrajan6172 7 лет назад +5

    Great lecture. If the questions asked are repeated for playback, it will make it even better

  • @nathanfitzpatrick9953
    @nathanfitzpatrick9953 4 года назад

    This guy is not messing around.

  • @johnhart1790
    @johnhart1790 6 лет назад +3

    Great lecture. At 44:04 shouldn't the s in V^(pi)_(k-1) (s) be s'?

    • @coolmig
      @coolmig 5 лет назад

      I wonder the same.. ^_^

    • @emamulmursalin9181
      @emamulmursalin9181 4 года назад

      Yes. The prime on the "s" is missing.

    • @bobsmithy3103
      @bobsmithy3103 3 года назад

      Yes, as it's the discounted value of the next/future state.

  • @shubhanshawasthi4319
    @shubhanshawasthi4319 5 лет назад

    At time 45:13, in the update equations(last 2 on that slide) isn't s' should be in place of s in gamma*V(k-1)(^pi)(s) and gamma*V(^pi)(s) ?

  • @kleemc
    @kleemc 7 лет назад +28

    Great lecture. Would be better if questions are repeated. We can only guess what the questions are.

    • @JadtheProdigy
      @JadtheProdigy 7 лет назад +6

      i never thought a UFC fighter would be watching this. props bro

  • @elzilcho222
    @elzilcho222 6 лет назад +2

    At 20:50, isn't the V*(3,3) supposed to be V*(2,3)?

    • @SayanGHD
      @SayanGHD 6 лет назад +2

      Juna No, if you hit the wall, you stay at that state itself.

    • @HM-wn9on
      @HM-wn9on 4 года назад +1

      @@SayanGHD I can't understand why there are only three choices of the actions except going west(2,3) and why the probability of going north and south are 0.1.

    • @JensOO7
      @JensOO7 3 года назад

      I think Juna made a good point, since it is more likely to include (2,3) and (3,2) as possible states, rather than considering walking into a wall and neglecting one possible move to (2,3).
      But still, I am not certain about it.
      EDIT: At 19:54 he explains it. 80% chance to go where you wanted to go, 10% right and left of said direction. So the robot will not go backwards. Therefore bumping into the walls as explained seems right.

  • @babamam1025
    @babamam1025 7 лет назад +2

    Awesome lectures! Anyone knows where to download the slides?

  • @waleedalzamil2228
    @waleedalzamil2228 6 месяцев назад

    how can I get the slides of this awsome bootcamp
    I still a student and I have been a while studying RL and getting the slides will help me more to refer directly to them when I forget something

  • @gaaligadu148
    @gaaligadu148 5 лет назад +1

    Does anyone know if there are transcripts for these lectures ? I can't hear the student's questions especially

  • @mingsumsze6026
    @mingsumsze6026 Год назад

    Thank you for the lecture. But I don’t get how the valuation of V in policy iteration can be solved by linear system of equations. It looks like unknowns (i.e. V) are on both side of the equation so the equations are nonlinear

  • @chaucao9725
    @chaucao9725 6 лет назад +1

    53:30 poliception

  • @roboticsresources9680
    @roboticsresources9680 6 лет назад +1

    Best lecture in Deep Reinforcement learning

    • @bajdoub
      @bajdoub 6 лет назад +7

      Except there is no Reinforcement Learning in this lecture, only Markov Decision Process solving for optimal policy by value/policy iteration. So no Reinforcement Learning, and certainly no Deep Reinforcement Learning. Reinforcement Learning is an approach to solve MDP without knowing the model. Here the model is known.

  • @rajeev1071
    @rajeev1071 5 лет назад +2

    Some more typos at various places. In the equation for policy iteration last term should contain S' and not S.

    • @bafrot
      @bafrot 5 лет назад

      exactly

  • @marloncajamarca2793
    @marloncajamarca2793 6 лет назад +1

    Awesome lecture!!!

  • @wuzhai2009
    @wuzhai2009 6 лет назад +1

    Outstanding lecture. Very comparable to David Silver's lectures.

    • @volodscoi
      @volodscoi 5 лет назад

      Which one would you recommend? This Bootcamp playlist or David Silver's lectures?
      Thank you in advance!

    • @bafrot
      @bafrot 5 лет назад

      @@volodscoi see this first then go to david silver

  • @miyashitahikaru1952
    @miyashitahikaru1952 7 лет назад

    Awesome lecture

  • @XinHeng
    @XinHeng 7 лет назад +1

    This is an excellent lecture

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 лет назад

    great video.

  • @ethanjyx
    @ethanjyx 5 лет назад +1

    Very well taught lecture!

  • @AndrewJongOnline
    @AndrewJongOnline 5 лет назад

    Could you put this series in a RUclips playlist, please?

    • @MyBlenderDay
      @MyBlenderDay 5 лет назад +1

      Here is the summary: sites.google.com/view/deep-rl-bootcamp/lectures

  • @bofeng6910
    @bofeng6910 5 лет назад

    Great lecture +1

  • @nicolorubattu9816
    @nicolorubattu9816 4 года назад

    24:41

  • @phol5082
    @phol5082 20 дней назад

    exercise 1: 4123

  • @muratcan__22
    @muratcan__22 5 лет назад

    nice lecture

  • @HangyeolKim-b3m
    @HangyeolKim-b3m 4 года назад

    말 개 빠르네 진짜

  • @Seff2
    @Seff2 5 лет назад +3

    Bad lesson... So many Formulas with no hints what the terms all mean. From the Point "Policy Evaluation" I understood nothing anymore. Before I could follow because the graphs gave some understanding what its even about. But I dont even know what a policy is, and suddenly there are no Graphs, just plain formulas and unexplained termina. Started okay, but ended confusing.

  • @purelogic4533
    @purelogic4533 6 лет назад

    Poor motivation in this lecture. The idea of using value iteration is in itself a lookback from achieving a goal. Hence the lookback is simply a step taken through an episodic path to determine which actions are best taken to achieve the goal one step back from the termination point. Now that gives rise to value iteration as the value is determined iteratively through the many steps to be taken to carve out the optimal path to be taken.
    Nevertheless a superb introduction!