Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

Поделиться
HTML-код
  • Опубликовано: 11 янв 2025

Комментарии • 139

  • @nsubugakasozi7101
    @nsubugakasozi7101 8 месяцев назад +12

    This lecturer is world class...and this is also the most confident live coding I have seen in a while...she is really really good. Universities are made by the lecturers...not so much the name

  • @foufayyy
    @foufayyy 2 года назад +30

    thank you for posting this. MDPs were really confusing and this lecture really helped me understand it clearly.

    • @-isotope_k
      @-isotope_k 2 года назад

      Yes this is very very confusing topic

  • @pirouzaan
    @pirouzaan Год назад +3

    this was by far the most impressive lecture with live coding that I had seen! I am leaving this virtual lecture room with awe and respect...

  • @dawn-of-newday
    @dawn-of-newday 2 года назад +9

    I wanna appreciate this lecture, its good. i had a difficult time and mental block for this topic. I wanna say thanks for all ur efforts.

  • @meharjeetsingh5256
    @meharjeetsingh5256 Год назад +2

    this teacher is really really good. I wish you were at my Uni so that i could enjoy machine learning

  • @iiilllii140
    @iiilllii140 2 года назад +5

    Thank you for this lecture and the course order. The past lectures about search problems really help you to better understand MDPs.

  • @WojciechBrzoska-w2s
    @WojciechBrzoska-w2s Год назад +1

    It was my n-th iteration of MDP -where n>10 but using terminology of of MDP my knowlege finnally started to converge to proper direction. Thank you for the lecture🙂

  • @vishalsunkapaka7247
    @vishalsunkapaka7247 2 года назад

    professor is so talented can’t say anything just feared over her, can’t take anymore

  • @snsacharya1737
    @snsacharya1737 Год назад

    At 29:36, a policy is defined as a one-to-one mapping from the state space to the action space; for example, the policy when we are in station-4 is to walk. This definition is different compated to the one made in the classic RL book by Sutton and Barto; they define a policy as "a mapping from states to probabilities of selecting each possible action." For example, the policy when we are in station-4 is a 40% chance of walking and 60% chance of taking the train. The policy evaluation algorithm that is presented in this lecture also ends up being slightly different by not looping over the possible actions. It is nice of the instructor to highlight that point at 55:45

    • @aojing
      @aojing 10 месяцев назад +2

      Action is determined from the beginning independent of states in this class...This will mislead beginners to confuse Q and V, as by this definition @47:20. In RL, we take action by policy, which is random and can be learned/optimized by iterating through episodes, i.e., parallel worlds.

  • @chanliang5725
    @chanliang5725 Год назад

    I was lost on the MDP. Glad I find this awesome lecture clears all concepts in MDP! Very helpful!

  • @seaotterlabs1685
    @seaotterlabs1685 2 года назад +7

    Amazing lecture! I was having trouble finding my footing on this topic and now I feel I have a good starting point of the concepts and notations! I hope Professor Sadigh teaches many more AI topics!

  • @shaheerkashif5908
    @shaheerkashif5908 4 дня назад

    Professor Sadigh, the legend you are

  • @muheedmir7385
    @muheedmir7385 2 года назад +5

    Amazing lecture, loved every bit of it

  • @joshuat6124
    @joshuat6124 9 месяцев назад

    Thank you professor! I learnt to much from this, especially the live coding bits.

  • @quannmtt3110
    @quannmtt3110 Год назад +1

    Thanks for the awesome lecture. Very good job at explanation by the lecturer.

  • @carlosloria-saenz6760
    @carlosloria-saenz6760 Год назад

    Great videos, thanks!. At time 47:20 on the board a small typo, I guess it should be: V_{\pi}(s) = Q_{\pi}(s, \pi(s)) if s not the end state.

  • @tosinadekunle646
    @tosinadekunle646 2 месяца назад

    Gamma is to avoid the neutrality of using 1 in the computation of Utility (The Return). 0.9^3 is not neutral compared to 1^3 which is neutral.

  • @yesodabhargava8776
    @yesodabhargava8776 2 года назад +2

    This is an awesome lecture! Thank you so much.

  • @aojing
    @aojing 10 месяцев назад

    @47:20 the definition of Q function is not right and confuses with Value function. Specifically, take immediate reward R out of summation. The reason is Q function is to estimate the value of a specific Action beginning with current State.

    • @aojing
      @aojing 10 месяцев назад

      or we may say the Value function here is not properly defined without considering policy, i.e., by taking action independent of states.

  • @sukhjinderkumar2723
    @sukhjinderkumar2723 2 года назад +2

    Great Lecture, Thank you Professor :)

  • @ammaraboklam2487
    @ammaraboklam2487 2 года назад +3

    Thank you very much
    This is really great lecture it's really helpful

    • @stanfordonline
      @stanfordonline  2 года назад

      Hi Ammar, glad it was helpful! Thanks for your feedback

  • @adityanjsg99
    @adityanjsg99 2 года назад +2

    A thorough lecture!!

  • @Amit1994-g9i
    @Amit1994-g9i 2 года назад +8

    FYI I'm a theoretical physics major, and I have no business in CS and whatsoever

  • @alemayehutesfaye463
    @alemayehutesfaye463 Год назад

    Thank you for your interesting lecture this lecture really helped me to understand it well.

    • @stanfordonline
      @stanfordonline  Год назад

      Hi Alemayehu, thanks for your comment! Nice to hear you enjoyed this lecture.

    • @alemayehutesfaye463
      @alemayehutesfaye463 Год назад

      @@stanfordonline Thanks for your reply. I am following you from Ethiopia and had interest on the subject area. Would you mind in suggesting best texts and supporting video's which may be helpful to have in-depth knowledge in the areas of Markov Processes and decision making specially related to manufacturing industries?

  • @thalaivarda
    @thalaivarda 2 года назад +4

    I will be conducting a test for those watching the video.

  • @vimukthirandika872
    @vimukthirandika872 2 года назад +6

    Thank for amazing lecture!

  • @farzanzeinali7398
    @farzanzeinali7398 2 года назад

    The transportation example has a problem. The states are discrete. If you take the tram, the starting state equals 1, and with state*2, you will never end up in state=3. Let's assume the first action was successful, therefore, the next state is 2. If the second action is successful too, you will be end up in state = 4. you will never end up in state = 3.

    • @faiber49
      @faiber49 2 месяца назад

      That is why she used this line of code when the actions where defined:
      if state * 2

  • @karimdarwich1913
    @karimdarwich1913 6 месяцев назад

    How can I choose the "right" gamma for my problem? Like how can I know that the gamma I choose is good or not ?

  • @marzmohammadi8739
    @marzmohammadi8739 2 года назад

    لذت بردم خانم صدیق. کیف کردم .. مممنووونننن

  • @RojinaPanta1
    @RojinaPanta1 Год назад

    would not removing constraint increase search space making computationally inefficent?

  • @msfallah
    @msfallah 2 года назад

    I think the given definition for value-action function (Q(s, action)) is not correct. In fact value function is the summation of value-action functions over all actions.

  • @alphatensor
    @alphatensor Год назад

    Thanks for the good lecture

  • @dungeon1163
    @dungeon1163 2 года назад +59

    Only watching for educational purposes

    • @-isotope_k
      @-isotope_k 2 года назад +4

      😂😂

    • @mango-strawberry
      @mango-strawberry 7 месяцев назад +1

      😂😂. You know it.

    • @reyy9220
      @reyy9220 Месяц назад

      can yall ever rest?? give women a break ffs

  • @FalguniDasShuvo
    @FalguniDasShuvo 3 месяца назад +1

    Great!👍

  • @vikasshukla831
    @vikasshukla831 2 года назад

    Can in the Dice Game If choose to stay for the step 1 and then quit in the second stage: will I get 10 dollars if I choose to quit in the stage 2? Because If I am lucky enough to go to second stage i.e the dice doesn't roll 1,2 then I am in the "In" state and by the diagram I have option to quit which might give me 10 dollar but for that I should have success in stage 1. Then the best strategy might change. Let know what are your comments?

    • @fahimullahkhan775
      @fahimullahkhan775 2 года назад

      You are right according to the figure and flow of the states, but from the scenario ones get the perception that ones has a chance to either quit at the start or stay in the game.

  • @camerashysd7165
    @camerashysd7165 7 месяцев назад

    Wow this account crazy 😮

  • @pythonmini7054
    @pythonmini7054 2 года назад +2

    Is it me or she looks like callie torres from grays anatomy 🤔

  • @henkjekel4081
    @henkjekel4081 2 года назад

    U should look at andrew ng's lecture, he explains it way better

  • @aswinbiju4038
    @aswinbiju4038 2 года назад +11

    Only watching for educational purposes.

  • @divyanshuy007
    @divyanshuy007 2 года назад +4

    16:42 thumbnail

  • @md.naimul8544
    @md.naimul8544 Год назад +1

    why is she so beautiful 😳😳

  • @rahulkelkar1246
    @rahulkelkar1246 2 года назад

    Does anyone think she look like Zoe Kazan?

  • @mnnuila
    @mnnuila 3 месяца назад

    Seems simple

  • @vikranthrana3019
    @vikranthrana3019 2 года назад +16

    Professor is quite cute ❤️

  • @HolyRamanRajya
    @HolyRamanRajya 2 года назад +1

    Beauty and brainy.

  • @asawriter-f1v
    @asawriter-f1v Год назад +1

    I'm Indian and belongs to Bihar State 🇮🇳🇮🇳

  • @chamangupta4624
    @chamangupta4624 2 года назад

    637

  • @Naentrikakudapikalev
    @Naentrikakudapikalev 2 года назад +2

    Cute lecture by cute lady

  • @saisriteja5290
    @saisriteja5290 2 года назад +1

    i love you