RL 6: Policy iteration and value iteration - Reinforcement learning

Поделиться
HTML-код
  • Опубликовано: 22 окт 2024

Комментарии • 66

  • @johnhostetter438
    @johnhostetter438 4 года назад +13

    This video definitely helped me understand policy iteration and value iteration! Been reading Sutton and Barto's Intro to RL, and was having trouble fully wrapping my head around the ideas talked about in chapter 4 for dynamic programming. After watching this I see it isn't as bad as I thought - thanks a lot!

  • @Theo-cn2cy
    @Theo-cn2cy 9 дней назад

    Almost everything that I was confused about was explained really well. Explaining with examples was also very effective. Thank you!

  • @priyankakaswan2114
    @priyankakaswan2114 Год назад +1

    This video was so good. I can't believe for 3 years I kept avoiding reading MDP papers and that Puterman book was so difficult for me to read. Thank you so much and the world needs people like you more.

  • @R34LxxDiAbLo
    @R34LxxDiAbLo 3 года назад +8

    "I hope you are doing absolutely fantastic in your life!" instantly clicked like. I'm not but I appreciate the sentiment!

  • @gyashasvi9748
    @gyashasvi9748 5 лет назад +5

    Sir, l love your explanation.. in the whole video you cover most of the important points that I had faced difficulty. Thank you sir for your generous support.

  • @tungnguyendinh331
    @tungnguyendinh331 4 года назад +2

    Great explanation. I have been confused with value iteration and policy iteration for a long time. Thank you for making such a wonderful video

  • @purreshgoddard1625
    @purreshgoddard1625 4 года назад +2

    Man I cannot believe that you have so less subscribers. With the kind of quality you provide it should have been in the factor of ten thousand

  • @AmberK296
    @AmberK296 4 года назад +2

    How clear your explanations are! Thank you, sir.

  • @홍성의-i2y
    @홍성의-i2y 3 года назад

    Bellman Equation: 6:58 of this video, Sutton's book Equation (3.14).
    Bellman Optimality Equation: 12:19 of this video, equations are in the textbook p.85 (pdf page).
    In the equation in 20:00, the term \sum_a \pi(a|s) disappeared because it is a deterministic policy. Note that p(~| s,a) turned into p(~| s, \pi(s)).
    22:07 of this video contains the summary of the algorithm. It is important to note that policy evaluation has a loop, but policy improvement is just one step.
    Bellman equation takes a role in policy evaluation step, and Bellman optimality equation takes part in the improvement step.

  • @SimplyBetter
    @SimplyBetter 2 года назад +1

    Great visual explanations. Thank you very much for such content. Looking forward to watch more videos on RL and DeepRL.

  • @heinb345
    @heinb345 5 лет назад +1

    Hi Riturj! Question : When applying the Bellman expectation equation V(s)=max(a) (R(s,a)+γT(s,a,s')V(s')) , is that v(s') at the end of the equation do we only look at the next state or do we recursively go back until we reach the beginning of the maze or MDP???

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  5 лет назад +1

      Hey, we look at the next state only. In every iteration of the algorithm we update the value of all the states using the old v(s') values.

    • @heinb345
      @heinb345 5 лет назад +1

      @@aiinsights-riturajkaushik1618 Great that helps alot!

  • @kritikshivanshu
    @kritikshivanshu 4 года назад +1

    sir one question please answer it:
    Ques. Which of the following policy iteration does:
    Options are:
    (A) produce random data
    (B) allocate benifits (add positive value)
    (C) allocate only rewards (positive rewards)
    (D) allocate both positive and negative rewards
    Note: Please explain the answer

  • @veronikasukhot6508
    @veronikasukhot6508 4 года назад +1

    Bravo! Short and clear information, very useful for the preparation to the exam

  • @machinevidhya8608
    @machinevidhya8608 4 года назад +1

    Thanks for the wonderful videos on RL,please post any git snippets if possible

  • @ORagnar
    @ORagnar 2 года назад +1

    I think you do a good job of making things clear. This is pretty abstract stuff to teach.

  • @SASIKUMAR-nr7gc
    @SASIKUMAR-nr7gc 4 года назад +1

    Thank you very much for clear explanation of policy and value iteration

  • @bobhu5185
    @bobhu5185 3 года назад

    24:09 I don't know if I am getting it right, but I think near the end of the convergence step in the first loop, we have already found the best (or nearly best) action given state s. The reason is that to calculate the maximum of V(s), all actions have to be iterated and the values have to be compared. Is it still necessary to backtrack the best action again at the end using V*(s)?

  • @miguelrodriguezmillan2322
    @miguelrodriguezmillan2322 4 года назад +1

    awesome video thank you so much!! It was very helpful for understand it perfectly. One question, what software do you use to create the slides? thank you in advance!

  • @testshade2554
    @testshade2554 4 года назад +1

    very clear and intuitive explaination. tysm

  • @nikhilkumarjha
    @nikhilkumarjha 5 лет назад +1

    A very informative video. Thanks for creating such helpful content.

  • @SamanviKhushi
    @SamanviKhushi 4 года назад

    Hi sir thank you for this video. I have questions The optimal policy and the corresponding state values.
    R(s)= -0.01. How do I do it.

  • @arkadipbasu828
    @arkadipbasu828 2 года назад

    Thank you for the concept

  • @yuyuko8925
    @yuyuko8925 2 года назад

    Why is there not a single video on this subject that actually works out an example with all the numbers /:
    There's many with examples for value iteration but not a single one for policy iteration/evaluation

  • @sachinnegi2161
    @sachinnegi2161 4 года назад +2

    sir, you have a very good future in youtube...keep posting

  • @rishabhkapoor1160
    @rishabhkapoor1160 4 года назад

    why you used Vold at 22:26 as per markov decision procession ,we are not relying at all on past values

  • @dilhanichathurika6801
    @dilhanichathurika6801 5 лет назад +1

    could u upload some examples of using the two algorithms hard to visualise from just the equations. Maybe just some iteration process shown. Having a hard time to understand just from the equations and my school lecture notes have not clearly defined it as well. hope u can show some examples real soon thanks

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  5 лет назад

      Yes. I am thinking about creating some example videos soon. Currently I am mostly focusing on finishing the background topics so that I can move towards more useful topics such as Q learning, Policy gradient methods etc.

    • @dilhanichathurika6801
      @dilhanichathurika6801 5 лет назад

      @@aiinsights-riturajkaushik1618 Sure thanks

    • @animeshgoyal9583
      @animeshgoyal9583 4 года назад

      @@aiinsights-riturajkaushik1618 Yes, it would be great if you can explain DQN and PPO as well in upcoming videos.

  • @BringMe_Back
    @BringMe_Back 2 года назад

    thanks you , well explained .

  • @emamulmursalin9181
    @emamulmursalin9181 4 года назад +1

    শুরুটা বেশ ভাল। প্রেজেনটেশনটা আরেকটু আকর্ষণীয় হলে দেখতে আরও প্রাণবন্ত হবে, সাবস্ক্রাইবারও বাড়বে। শুভকামনা রইল।

  • @muke5hy
    @muke5hy 5 лет назад +1

    Really liked the videos, hope to get new videos soon.

  • @worldofgoblins
    @worldofgoblins 4 года назад

    Great explanation. Makes the equations easy to understand

  • @Mawnzie
    @Mawnzie 4 года назад +1

    Your videos are very nice! I think there's a typo in the q function at 9:47 (it says "A_t =s").

  • @geronimo_one
    @geronimo_one 3 года назад

    thank you sir!
    very much appreciated

  • @pragyakumari9141
    @pragyakumari9141 4 года назад +1

    very good sir,thankyou

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  4 года назад

      Thanks for the.....stay tuned. I'll bring interesting AI stuff soon with real world applications...and top AI lab visits...

  • @Mohammad-gj8ir
    @Mohammad-gj8ir 3 года назад

    Great..!

  • @mariamngah5
    @mariamngah5 5 лет назад

    thumps up for such am awesome video, i have a question, can i solve simple game where my action is depending on the state beside external variable coming from the opponent?

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  5 лет назад

      Hey..thanks. If your external variable can take a discrete set of values then it must be included in the state. Everything should be in the state in this framework.

    • @mariamngah5
      @mariamngah5 5 лет назад

      @@aiinsights-riturajkaushik1618 in that case my state will be so big, is there an ipper limit of states number, if i will code by python?

    • @mariamngah5
      @mariamngah5 5 лет назад

      @@aiinsights-riturajkaushik1618 i have problem with 2000 states each state consists of 3 variables..can i solve it policy iteration??

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  5 лет назад +1

      2000 is not that big. You can do it.

    • @mariamngah5
      @mariamngah5 5 лет назад

      @@aiinsights-riturajkaushik1618 thank you

  • @jiazhenhu5959
    @jiazhenhu5959 3 года назад

    Why did you stop uploading videos? You provide such great explanations!

  • @djfl58mdlwqlf
    @djfl58mdlwqlf 4 года назад +1

    great vid

  • @shivampip
    @shivampip 5 лет назад

    Awesome video, Thanks

  • @rajansharma9101
    @rajansharma9101 4 года назад +1

    I understand whole concept but I can't understand all equations ...I think I should remember all equations for my theory exam..😀😀

    • @rituraj_finland
      @rituraj_finland 4 года назад +1

      I think I should have provided the slides...I'll try to do better from the next videos...

  • @ryans6423
    @ryans6423 4 года назад +1

    Can't read your handwriting

    • @aiinsights-riturajkaushik1618
      @aiinsights-riturajkaushik1618  4 года назад +2

      I agree ! I'll try to make it more clear from the next time. The channel was not well planned initially and I did not have good setup to write on screens.