Reinforcement Learning, by the Book

Поделиться
HTML-код
  • Опубликовано: 11 июл 2024
  • The machine learning consultancy: truetheta.io
    Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
    Want to work together? See here: truetheta.io/about/#want-to-w...
    Part one of a six part series on Reinforcement Learning. If you want to understand the fundamentals in a short amount of time, you're in the right place.
    SOCIAL MEDIA
    LinkedIn : / dj-rich-90b91753
    Twitter : / duanejrich
    Github: github.com/Duane321
    Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
    SOURCES
    [1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
    [2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep ...
    [3] D. Silver, Lecture 1: Introduction to Reinforcement Learning, Deepmind, 2015, • RL Course by David Sil...
    [4] Y. Wang, Pricing at Lyft, Lyft, 2022, eng.lyft.com/pricing-at-lyft-...
    [5] A. Irpan, Deep Reinforcement Learning Doesn't Work Yet, 2018, www.alexirpan.com/2018/02/14/...
    [6] D. Silver, et al., Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Deepmind, 2017.
    [7] J. Schrittwieser, et al. Mastering Atari, Go, Chess and Shogi by Planning with a
    Learned Model, Deepmind, 2020.
    [8] R. Roy, et al. PrefixRL: Optimization of Parallel Prefix Circuits using Deep
    Reinforcement Learning, Nvidia, 2022
    [9] J. Degrave, et al. Magnetic control of tokamak plasmas through deep reinforcement learning, Nature, 2021.
    SOURCE NOTES
    [1] is my primary source for this series. It largely determined the notation and the set of topics and motivated much of the commentary. This series could not be what it is without the comprehensive and consistent presentation of this vast subject provided by this text. If you're interested in learning more, I highly recommend this text.
    In preparation, I also took Deepmind's course [2], with the intend to understand a different perspective. The material is similar, though not entirely. Notably, Deepmind's problem statement involves agents receiving observations, which are summarized into an agent-specific state. This is a more application-ready problem statement. Early drafts of the series included this version of the problem, but it was cut, since it complicated following topics. Overall, I learned about RL substantially from this course.
    The Maze demonstration of the value function comes from David Silver's 2015 lecture [3].
    [4] - [9] are sources for all papers and blogs referenced at the start of the video. It includes a blog [4] where Lyft describes how RL is used in their pricing algorithms.
    NOTES
    With regards to the statement "RL won't be the same revolution that Neural Networks were. That's OK - NNs are quite a high bar". I should elaborate. I anticipate a response of, "You can't separate RL and NN, since much of the most impactful applications of RL involve NNS". Yes, comparing these technologies is fraught. In effect, I'm presuming that if a widespread subscription to RL's problem statement is enabled by more performant NNs, that is part of the RL migration. That's not fair if we're to attribute successes to either RL or NN. In my view, this attribution isn't important. I'm merely making the claim that RL will become a primary component in many production systems. The comparison of RL vs NNs is an accidental symptom of how I pitched the RL trend. In retrospect, I would have phrased this differently.
    TIMESTAMP
    0:00 The Trend of Reinforcement Learning
    2:46 A Six Part Series
    3:24 A Finite Markov Decision Process and Our Goal
    9:02 An Example MDP
    12:49 State and Action Value Functions
    15:00 An Example of a State Value Function
    16:28 The Assumptions
    17:58 Watch the Next Video!
    CORRECTIONS
    1) In the MASE, I have a -16 where there should be a -14 (Thank you rogiervdw).

Комментарии • 166

  • @mCoding
    @mCoding Год назад +54

    Let's take a quick break -- immediate cut and continues to next section. Got me good :)

  • @polecat3
    @polecat3 Год назад +10

    I found your plain English explanations of the equations particularly helpful. Thank you!

  • @ryandaniels3258
    @ryandaniels3258 Год назад +17

    After the release of the AlphaTensor paper, this is a pretty timely video, and I wholeheartedly agree with your statement about RL becoming much more relevant outside the realms of simply solving games. Great start to the series, and I look forward to the rest.

  • @ajsindri2
    @ajsindri2 Год назад +2

    Reinforcement learning is so fascinating, I'm so looking forward to the next videos!

  • @timothytyree5211
    @timothytyree5211 Год назад +1

    Reinforcement learning is so fascinating, I'm so looking forward to the next videos, as well!

  • @khaliliskarous2225
    @khaliliskarous2225 11 месяцев назад +3

    I really like how you go through a concrete episode. That makes the formalism come to life. Best into to RL I've gone through. Thanks!

    • @Mutual_Information
      @Mutual_Information  11 месяцев назад

      Thanks - it's nice to hear from people who appreciate the same things I do

  • @gnorts_mr_alien
    @gnorts_mr_alien 3 месяца назад +14

    how are you so clear with your word choice, expressions and tone? it's like you're uploading into my brain directly. I'm spreading the word, you deserve to be an educational youtube superstar.

  • @andrewwalker8985
    @andrewwalker8985 Год назад +33

    Absolutely phenomenal clarity in your explanation. A million thanks for producing this. I’m about to binge watch the series :-)

    • @Mutual_Information
      @Mutual_Information  Год назад

      Love it Andrew! Thank you!
      And if you know anyone else who is interested in RL, it would be huge for me if you shared it with them :)

    • @kiffeeify
      @kiffeeify Год назад

      I can only agree! Very, VERY, very! nice!

  • @_tnk_
    @_tnk_ Год назад +1

    amazing work, looking forward to all the parts

  • @AllemandInstable
    @AllemandInstable Год назад +1

    discovered your channel looking for Fisher Information explanations, and I can tell your channel's content is really good for overview and have nice explanations, and love that you mention that you need to get into books and maths to go in details
    love your content
    keep it up !

  • @pluviophilexing2580
    @pluviophilexing2580 Год назад +70

    You deserve a million followers

    • @grahamjoss4643
      @grahamjoss4643 Год назад +3

      Agreed. Help spread the word.

    • @akshaygulabrao4423
      @akshaygulabrao4423 Год назад

      the content is too specialized, this is like grad school level cs/stats, i doubt there are like a million of those

    • @terjeoseberg990
      @terjeoseberg990 7 месяцев назад +3

      @@akshaygulabrao4423, Have you been to Google around lunchtime? I believe there might be a million of them right there.

    • @umairm.5662
      @umairm.5662 2 месяца назад

      He has some great intuitions. I am lucky to found him..

  • @NicolasChanCSY
    @NicolasChanCSY Год назад +3

    This is one of the best, if not the best, explanation I have seen!
    Can't wait to watch the upcoming videos in the series!

  • @AlisonStuff
    @AlisonStuff Год назад +1

    Ooooooooo look at these energetic edits and the new lighting! Very nice!!

  • @qqq33
    @qqq33 Год назад

    I wish these videos were available when I started learning RL years ago. Nice work!

  • @Marceloruiz
    @Marceloruiz Год назад +10

    I found your channel looking for RL, I started watching your videos and you are amazing, your explanations and examples are very clear, I hope you continue making many more videos!

  • @dashwudt8369
    @dashwudt8369 Год назад +1

    Big thanks for comeback!

  • @dmitrideklerk5701
    @dmitrideklerk5701 18 дней назад +1

    Thank you so much for this video series, and all the useful resources you make available. thank you thank you thank you.

    • @dmitrideklerk5701
      @dmitrideklerk5701 17 дней назад

      You should create a course, I would buy it, you've given so much value in these free videos. And you have a gift for teaching and breaking things down.

  • @tobiasopsahl6163
    @tobiasopsahl6163 Год назад +2

    Starting this series now, really looking forward to it! Your videos have been great so far.

  • @heyna88
    @heyna88 Год назад +1

    Always the best. I was looking forward to your next block of videos... and they didn't disappoint! :)

  • @tower1990
    @tower1990 Год назад +3

    What a great lecture! I have been reading Sutton’s book, however the material is often dense and abstract. I like how you visualise the computation processes for us, it really helps understanding the concepts clearly.
    Thank you. Looking forward to the rest of the series.

    • @Mutual_Information
      @Mutual_Information  Год назад +3

      When I was reading the book originally, I was trying to think of these processes in my head. I was sure many others were trying the same.. and that's a big motivator for this series

  • @MathVisualProofs
    @MathVisualProofs Год назад +1

    Great video! Excited to see the rest of this series. Nice work!

  • @coconut_camping
    @coconut_camping Год назад +4

    My 3 yrs old son probably would understand what RL is by watching this video. The most clear and distinct video of RL I found on RUclips so far. Thanks for sharing it!

  • @simonhradetzky7055
    @simonhradetzky7055 Год назад +1

    My Man! Right on time as my RL project in university is starting haha ty♥️

  • @hedgineering6547
    @hedgineering6547 9 месяцев назад +2

    AWESOME VIDEO! Best I've seen on the topic BY FAR

  • @BadiArt
    @BadiArt 4 месяца назад +1

    Handsoff the best explanation/teacher video I've seen in my entire life.
    well done !

  • @dominichafliger4974
    @dominichafliger4974 2 месяца назад +1

    Props to you man!! This video was absolutely great and explains the topic so good. Thanks a lot and i hope i will understand the next video as easy as this one...

  • @joehindley6185
    @joehindley6185 Год назад

    This is phenomenal maths communication. better than anything I have received in my three year undergraduate degree

  • @munzutai
    @munzutai 7 месяцев назад +1

    This video is criminally under-watched. It perfectly clears up the core taxonomy of RL that I was confused about up until now.

  • @mirolator
    @mirolator Год назад

    These videos are really quite good. I'm going through David Silver's course, then using your videos as a great review and reinforce my understanding. You really do a great job of identifying the concepts that noobs like me would have trouble with and then thoroughly explaining them. I also appreciate that there's no fluff, but lots of substance. Keep up the great work!

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      Thank you very much Miro - glad they're helping. David Silver's lectures have been a solid source for these videos. Also, Hado Van Hasselt's recent series is excellent. I've put in the time for animations and info-density, but those guys are the pros!

  • @Laétudiante
    @Laétudiante Год назад +1

    Truly great video!!!

  • @iamlegend3964
    @iamlegend3964 7 дней назад

    Very satisfying and simple explanations.

  • @gorini
    @gorini Год назад +1

    University of Alberta's Coursera course and Sutton & Barto's book hurt my ignorant self; but you ease some of the pain. Thank you.

  • @tamirtsogbayar3912
    @tamirtsogbayar3912 7 месяцев назад

    In order to learn Deep learning especially RL, I've been revising my algebta calculus for 2 months. extraordinairily, I found your channel

  • @devnachi
    @devnachi Год назад +1

    Awesome Content, Clearly Explained🔥🔥

  • @miriamshahidi7089
    @miriamshahidi7089 2 месяца назад +1

    Found this channel too late. This is an amazing overview. I said "extremely cool" exactly when the instructor did in the video, bonus points!! :D

  • @offensivearch
    @offensivearch 8 месяцев назад +1

    I highly recommend people here interested in RL study from Sutton and Barto. I own three RL books (including S&B) and 5-6 if you count books that include some RL methods but aren't entirely about it. S&B is honestly one of the best self-study textbooks I have encountered, it is a classic for a reason. It is much better and easier to learn RL from it than the other books I have (and indeed any source). S&B does a great job in building intuitons and motivations and it orders topics in the perfect order from start to finish (at least for parts I and II, I am less familiar with part III). MI's videos are great, but it is hard to grasp without reading S&B first. If you really want to understand this stuff, I highly recommend reading from S&B directly in addition to videos like these.

    • @Mutual_Information
      @Mutual_Information  8 месяцев назад +1

      Agreed. There's no competing with the original source - it's a phenomenal text

  • @redoxepk
    @redoxepk Год назад +1

    Really like how you break problems down to a low level, including spelling out explicitly what all the variables & their symbology mean. Ty!

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      Yea, that's where the confusion happens. I do it because it confuses me when people don't!

    • @connorkapooh2002
      @connorkapooh2002 11 месяцев назад

      @@Mutual_Information you've also nailed the balance too. it's equally as tedious when people read out and overly explain

  • @des6309
    @des6309 Год назад +1

    Looking forward!!

  • @lenishpandey192
    @lenishpandey192 2 месяца назад +1

    Wow! Just beautiful.

  • @matveyshishov
    @matveyshishov 4 месяца назад +2

    I'm doing a quick refresher, and I am loving your videos, thank you so much, you've clearly done a lot of hard work, both understanding and explaining in a structured and logical way, and it is beautiful!
    Also, if I may, I want to leave a note for those who are just starting, while I still remember the parts that caused me trouble years ago when I myself was learning this.
    Despite my great respect to Sutton, both the book and the historical development of RL have not been straightforward, both suffer from idiosyncrasies, ad-hoc solutions and sometime confusion.
    My advice to a student would be - it's not just you, it's the topic. Take a deep breath and DECOMPOSE what you are reading.
    Take, for example, the simplest formula in RL, `P(s',r|s,a)`. It often throws people off, because s',r is a joint distribution. Forget it for a moment, assume that P(r'|s') is deterministic (that's what many coding examples do), and you'll be left with a much clearer form `P(s'|s,a)`, with deterministic function `R(s)`.
    Another similar situation is the term "policy", and specifically policy being probabilistic. I would advise the first time a student reads the book to forget about this probabilistic nature of the policy. Having done that we have a much clearer implementation of the policy function, in fact, the standard coding interview dynamic programming question like knapsack problem, where we populate a memoization table. That table is nothing else than policy. And if we instead of storing it, we replace the table with a neural net, you got the idea..
    Having decomposed things this way, you'll see how everything downstream is going to be just this or that block augmented with one more variable or replaced with a neural net.
    For more advanced students, I'd strongly advise to look into optimal control, as it's like a "smarter big brother of RL", the environment is known, opening doors for proper math analytical solutions, you'll see RL better after visiting this parallel universe, and as a bonus - fall in love with Laplace transform.
    PS: "Dynamic programming" is a misnomer, there is nothing dynamic nor it is about the modern meaning of the word "programming" 🤦🏻‍♂ .

    • @kariminem
      @kariminem 4 месяца назад +1

      Thanks for the comment Mat! I am currently doing my masters in RL and I do get the concepts but the real problem is in the code itself, trying to implement it from scratch is like reinventing the wheel I should say, but I think this is an important step for me to actually understand the underlying concepts. Understanding concepts from books is something and implementing it is literally something ELSE.

  • @slowloris4346
    @slowloris4346 Год назад +1

    My goodness you are an amazing teacher, keep it up. I have read up to the end of chapter 3 in Sutton and Barto so far - and it was all a bit nebulous in my mind but these videos really helped collate my intuitions on the basics of the topic.

    • @Mutual_Information
      @Mutual_Information  Год назад

      That's excellent to hear. You're exactly the audience I'm going for. Sometimes I wonder if these concepts are only clear as I write them, but not necessarily clear to the listener. So it's nice to know it lands.

  • @lawson1310
    @lawson1310 Год назад

    Wow, one of the most explanatory videos on RL.

  • @prithvidhyani2002
    @prithvidhyani2002 Месяц назад

    wonderfully explained, had to watch it twice but I'm glad I did.

  • @mrnogood5326
    @mrnogood5326 5 месяцев назад +1

    Thank you ! Very good video 👍

  • @cizbargahjr.8401
    @cizbargahjr.8401 9 месяцев назад +1

    This is so detailed. I love it! :)

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад +1

      As you know, the details matter - glad you like it!

  • @yuktikaura
    @yuktikaura Год назад +1

    Keep up the awesome work

  • @nishanthshetty435
    @nishanthshetty435 Год назад +5

    At 05:24, I think $r \in \mathcal{R} \subseteq \mathbb{R}$ is more appropriate than $ r \in \mathcal{R} \in mathbb{R} $. Loved the video. Looking forward to watching the whole series.

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      Ah yes.. that is a better.. silly mistake. Not quite a big enough error to warrant a re-upload, but I appreciate the note

  • @123ming1231
    @123ming1231 Год назад +1

    it is amazing works !!!! Please keep publishing RL video, we r desperately to see it !!!!

  • @hehehe5198
    @hehehe5198 Месяц назад +1

    thanks man, this is very good and clear

  • @marcin.sobocinski
    @marcin.sobocinski Год назад +1

    Dziękujemy.

  • @alvinjamur1
    @alvinjamur1 Год назад +1

    your channel deserves 10 million followers. very good content! ⚡️

  • @user-co6pu8zv3v
    @user-co6pu8zv3v 8 месяцев назад +1

    Thank you!

  • @azizjedidi1180
    @azizjedidi1180 Год назад +2

    Thank you.

  • @azaih
    @azaih 11 месяцев назад +1

    Thank you for your excellent content

  • @karthikmurthy2511
    @karthikmurthy2511 11 месяцев назад

    Thanks for this series.....@9:50, you nailed it.

  • @rudolfromisch954
    @rudolfromisch954 Год назад +1

    Very nice introduction into this topic! I was just about to start in RL and i find this video extremely motivating

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      The hope is to lower the cost of learning (from reading a textbook to watching a video) - sounds like it's work! Though, you should def still read the book lol

    • @rudolfromisch954
      @rudolfromisch954 Год назад

      @@Mutual_Information You are doing a great service. I know i know, i will read it haha

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Год назад +1

    Awesome!

  • @finlandvickahe-ty2vu
    @finlandvickahe-ty2vu Год назад +1

    I do hope you can keep this up when you become famous

  • @TimScarfe
    @TimScarfe Год назад +1

    Great production value!!

  • @gustavojuantorena
    @gustavojuantorena Год назад +1

    Wow. Really amazing topic.

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      Thank you - yea it's quite a hot topic (though not as hot as it was 4 years ago).

  • @jks234
    @jks234 10 месяцев назад +1

    17:00 Yes, it’s true that the assumptions about being able to reference a world state do not really apply.
    But I feel that essentially, this is a large part of what we learn when we develop a skill.
    Your memory and experience is a gradually developing world state. And thus, experts do repeatedly reference an internal world state as they conduct the task.
    The trained agent is like an experienced expert when we analogize this over to real life.

  • @theHDfiremaster
    @theHDfiremaster 4 месяца назад +1

    Honestly, I really comment but this is an amazing video!

  • @glitchAI
    @glitchAI 9 месяцев назад +2

    I don't comment on YT. but sir you are a great teacher. please keep going.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      Thanks, editing a video now, so no plans of slowing down

  • @iamr0b0tx
    @iamr0b0tx Год назад +1

    Thanks!

  • @grahamjoss4643
    @grahamjoss4643 Год назад +1

    Keep pushing through. I bet your videos will hit big soon!
    So your videos Are very concrete and technical. do you have any video ideas for the big picture around reinforcement learning or other data science topics? Perhaps like how do you think like a data scientist or how do you foster creativity when working with a data set?

    • @Mutual_Information
      @Mutual_Information  Год назад

      Fortunately I've become quite good at being patient with growth. I'm keeping my expectations quite reasonable.
      Regarding big picture stuff - I'd like to do advice pieces, but they aren't high in the queue currently. I'm eager to have some solid RUclips wins under my belt before I speak very generally.. but I will one day. Maybe starting early next year.

  • @IgorAherne
    @IgorAherne Год назад +2

    This is such a beautiful explanation. I spent several years getting my head around RL, q-learning, policy gradients, etc and have several "gaps" in my understanding. Your way of explaining is so much on point and yet simple to understand, - a mark of real knowledge. I'm looking forward to watching these lessons. Thank you!
    PS. came here from Yannic Kilcher :) ruclips.net/video/r8wiBA3ZaQE/видео.html

    • @Mutual_Information
      @Mutual_Information  Год назад

      Glad you enjoyed it Igor ! Yea, Yannic's shoutout was a nice boost :)

  • @fedahussainmuzaffari1910
    @fedahussainmuzaffari1910 Год назад +1

    Finally 🫡

  • @japedr
    @japedr Год назад +6

    Thanks for making this, the educational value is miles ahead of most material you can find online.
    Just a nitpick, if you don't mind: at 5:15 the second ∈ should be ⊂ to denote subset (or ⊆, depending on the convention)

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      Yea someone else pointed that out. Oops! Not quite a big enough deal to warrant a re-upload, but thanks anyway

    • @AlisonStuff
      @AlisonStuff Год назад +1

      @@Mutual_Information I demand a re-upload.

    • @Mutual_Information
      @Mutual_Information  Год назад

      @@AlisonStuff lol I accept demands from anyone other than my sister

  • @NoNTr1v1aL
    @NoNTr1v1aL Год назад +5

    Absolutely amazing playlist! Also, at 5:11, calR is a subset of the real numbers; not an element of the real numbers.

    • @Mutual_Information
      @Mutual_Information  Год назад +3

      ha yea you found it. That's a mistake, but too subtle to warrant a re-upload

  • @marcin.sobocinski
    @marcin.sobocinski Год назад +1

    Oh gosh, had no clue what you're talking about while watching the Fisher Information video...but that RL one is a pure gold! I read mentioned RL brick (e..g. book by Sutton and Barto) and I can hardly imagine better explanation and better summary of the MDP and RL general concept. I hope you can follow up with a Python code. If that's going to be as simple, concise, precise and informative as the video it will be wonderful. Thank you!

    • @Mutual_Information
      @Mutual_Information  Год назад

      Thank you! There is *some* Python code, but not that much. The Python code gets light usage, so I haven't invested in it very much.
      But! I do have a strong habit of answering questions here. So ask away if you have questions. And you can ask many - totally happy to get you to a place where you think you have a good grasp of the subject.

    • @marcin.sobocinski
      @marcin.sobocinski Год назад

      @@Mutual_Information Sorry for a stupid question: what do you mean by "Python code gets light usage"? It's my English 🙁

    • @Mutual_Information
      @Mutual_Information  Год назад

      @@marcin.sobocinski ah sorry - i meant “in the past, people don’t use the Python code very much. So I haven’t spent a lot of time on Python code for these videos.”

  • @user-su6oi2ip5o
    @user-su6oi2ip5o 2 месяца назад

    Finally, someone understand his audience.

  • @patrickorone1149
    @patrickorone1149 Год назад +1

    I already subscribed

  • @santielewaut
    @santielewaut Год назад +1

    Excellent series, but i do need to get that wallpaper haha

    • @Mutual_Information
      @Mutual_Information  Год назад

      Here you go: github.com/Duane321/mutual_information/blob/main/computer_background/background.png

  • @pradiptahafid
    @pradiptahafid Год назад +1

    I bought this book 2 months ago. I just couldn't understand it. something a bit off. you don't know how valuable your video is to give me the idea on what is the book trying to tell me. Thanks you! I am so excited to continue my learning.

    • @Mutual_Information
      @Mutual_Information  Год назад

      You are welcome! That's exact the circumstance I was trying to solve for :)

  • @rogiervdw
    @rogiervdw Год назад +2

    Terrific, very well explained and well paced. Minor typo: @16:02, where it says -16 it should say -14 (doesn’t affect the best policy but might avoid confusion of the careful observer)

    • @Mutual_Information
      @Mutual_Information  Год назад +2

      Ah damn, you're right.. man I really thought I checked the sh*t out of that. lol oh well. I've included a note in the description - thank you!

  • @alexanderskusnov5119
    @alexanderskusnov5119 Год назад +1

    How about using RL instead of PID regulation? (like adaptive control)

    • @Mutual_Information
      @Mutual_Information  Год назад

      Before I make that comparison, I'd do a quick video on PID controllers. It would be harder to discuss using RL *instead* of PID because I'm not familiar with any data/papers which discuss what happens when that migration happens.
      But PID controls themselves are quite cool. Very simple and very effective tools. Now I want to do that video.. hmm

  • @RS-303
    @RS-303 Год назад +2

    ついに来たか!

  • @jiaqint961
    @jiaqint961 Год назад +1

    6:27 Anyone knows the use case to understand Policy-pi as a probability distribution? In RL, most case I encountered, policy-pi is a specific choice of action. Thanks in advance.

    • @Mutual_Information
      @Mutual_Information  Год назад +1

      There are some problems where the optimal strategy is not deterministic. The book gives a toy example (I think in the policy gradient chapter). Outside of that, I could image poker being an environment where it pays to be genuinely randomly at points.

    • @jiaqint961
      @jiaqint961 Год назад

      @@Mutual_Information Thanks so much for the response, never expected to get a response from the man himself, never mind this fast... :D Thanks for the clarification.

  • @hudhuduot
    @hudhuduot Год назад

    As a control system researcher, how can I make use of this for making a contribution and producing research papers

    • @Mutual_Information
      @Mutual_Information  Год назад

      Hm, well this is just an intro series - I won't find anything cutting edge, but maybe it'll inspire some directions?

  • @RS-303
    @RS-303 Год назад

    Anyone know this algorithm?
    "Generalized Data Distribution Iteration"

  • @piero8284
    @piero8284 9 месяцев назад +1

    Great content. The only thing that irritates me a lit bit and even in the book it does not develop so much is what the expectation value function exactly express mathematically, I mean, what is taking the expectation E_pi[G | s], as G can be an infinite sum of Rewards that depends on previous states and actions, for me this notation just express the concept in a high level way.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      I know what you're referring to. E[] is an operator where you have to imagine an integration and the specifics of that integration are omitted, but they matter! Ultimately, you just get used to it..

  • @TuemmlerTanne11
    @TuemmlerTanne11 Год назад

    Are you still working at Lyft full-time? Can't imagine how much work goes into these videos... no way you are able to do this in your free time?! Anyways, happy to see new videos from you, keep it up :)

    • @Mutual_Information
      @Mutual_Information  Год назад

      Ha thank you - yea I have a full time job. That’s why I hadn’t posted all year - it does take time!

  • @lukasbieberich1501
    @lukasbieberich1501 3 месяца назад

    Well explained! But in 13.40, in the episodic case when we have a time limit T I think the action and time value function are time (t) dependent, because here the number of expected rewards starting in state s really depends on how many possible rewards are left. So in this case t should be another variable in the functions name. Independence makes only sense for the infinite case I think

    • @lukasbieberich1501
      @lukasbieberich1501 2 месяца назад

      I thought about it and it also makes sense when T is the stopping time defined as the first t for which S_t enters a final state. But this is Not completely trivial i think. If T is some finite constant the value should be time dependent. Just in case anybody else stumbles across the same problem..

  • @definitelynorandomvideos24
    @definitelynorandomvideos24 11 месяцев назад +1

    Great Intro! I ended up with a small question though: At 8:16 you present the formula of the return with the steps of the summation being k=t+1, while t should be an element of (let's just say it like that for the sake of this question) whole number {0,1,2,...}. Now, would that mean that k=0 in all cases? Thereby reducing gamma to 1, since it's always calculated to the power of 0?
    E.g.: t=5, k=5+1=6, gamma^(6-5-1)=1
    Hope I didn't experess myself to confusingly...

    • @Mutual_Information
      @Mutual_Information  11 месяцев назад +2

      G_t is the sum of future rewards at time t. So if there are 4 periods in an episode and the reward is always 1.2, then e.g. G_1 = gamma^(2-1-1)*1.2 + gamma^(3-1-1)*1.2 + gamma^(4-1-1)*1.2. In other words, k changes from 2 to 3 to 4 in the exponent of gamma and t is fixed over the sum, so gamma isn't always a fixed power
      Does that answer your Q?

    • @definitelynorandomvideos24
      @definitelynorandomvideos24 11 месяцев назад

      @@Mutual_Information yes, thank you very much. I didn't watch the full video yet, cause I looked at it in my lunch break and time was up, but I'll make sure to watch the rest today. Thanks a lot for the swift response!

  • @paulhofmann3798
    @paulhofmann3798 2 месяца назад

    Assuming that a process is Markov is not a strong assumption. All processes can be made Markov by including all relevant (past) variables/states in the state variables.

    • @Mutual_Information
      @Mutual_Information  2 месяца назад +1

      That's true in principle, but it's hard/impossible in practice for real world environments. E.g. markovian models are used alot for financial prediction, but no one has written down a state space that fully captures the history of the economy and stock market. If you just say the latent space *is* all historical observations, that's a huge state space, and so you're not getting anything out of the markov assumption.

  • @anshuraj4277
    @anshuraj4277 Год назад +1

    Hey... How can we connect bro

  • @markusdegen6036
    @markusdegen6036 22 дня назад

    Very good explanation. For the initial distribution at p0. Why is it 1/3 and not 1/9 as it should be p(s,r|s,a)

    • @markusdegen6036
      @markusdegen6036 22 дня назад

      Ah, different distributions. p0 is not initial heatmap, it is just the distribution over starting state (this will not change or)

    • @markusdegen6036
      @markusdegen6036 22 дня назад

      Awesome

    • @Mutual_Information
      @Mutual_Information  21 день назад

      Yep you got it

  • @TheDoc-Worker
    @TheDoc-Worker Месяц назад +1

    I subscribed, but if you fix the backgrounds on your monitors to actually align continuously, I'll share your channel with a few friends. As it stands, I have to keep this between us, I can't be associated with this kind of thing.

  • @forheuristiclifeksh7836
    @forheuristiclifeksh7836 3 месяца назад

    1:00

  • @Ghost_Bear_Trader
    @Ghost_Bear_Trader 5 месяцев назад +1

    I came for the money. I left with a triple PhD in astrophysics and quantum mechanics.

  • @Torqable
    @Torqable 9 месяцев назад +1

    if the results for each action are random, they are meaningless and how could you possibly learn anything?

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      They have a random component, but they aren't completely random. If I say action A gives you a 100 dollars plus or minus 10 dollars with equal probability and action B gives you 20 dollars plus or minus 2 dollars with equal probability, which would you prefer? Both outcomes are random, but one is clearly better.

    • @Torqable
      @Torqable 9 месяцев назад

      @@Mutual_Information but in your example the action has nothing to do with the reward that is received, it's completely random. There is no $100 or $20 comparison to be made, the rewards are all -1, 0, 1, and chosen at random regardless of what action is taken.

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      @@Torqable Oh i see. In that example, I'm just trying to show the mechanics of the interaction between the policy and environment. But there are some slight differences. The up action makes 1 a bit more likely than -1 as compared to the down action.

    • @Torqable
      @Torqable 9 месяцев назад

      @@Mutual_Information well I'm very new to this so apologies if I'm just not getting it. But I thought the probabilities were supposed to be a sort of confidence in receiving a reward vs a penalty and therefore used to select an action, not used to select a reward after an action is taken. Is that incorrect? Also I know it's just an example but wouldn't down sometimes be correct? Am I over-thinking it?

    • @Mutual_Information
      @Mutual_Information  9 месяцев назад

      @@Torqable I don't think you're over thinking it. It's not an easy subject. What you're thinking about is the quality of the policy, and this is just a very weak policy.. and so it looks like nothing is happening. But that's something specific to this case. I'd just continue to consider example and eventually, it'll make sense. In other words, it's hard to form a full view with just this example.

  • @Timotheeee1
    @Timotheeee1 Год назад +1

    came here from ML news

    • @Mutual_Information
      @Mutual_Information  Год назад

      Getting Yannic's shoutout is a big help. Otherwise it's hard to get attention on such in-the-weeds topics.

  • @JasonMitchellofcompsci
    @JasonMitchellofcompsci 11 месяцев назад

    "They use RL to contain nuclear plasma, which is really cool, and not just practical." I think you're pretty wrong here.

    • @Mutual_Information
      @Mutual_Information  11 месяцев назад

      Well if I said that, that would be wrong. I try to be careful with my words. I said the paper shows they can do this, not that this is actually done.
      But, with such subtly is probably not how the audience interprets it. And so I see your point. A caveat would have been appropriate.

  • @snapman218
    @snapman218 Месяц назад +2

    If you actually wrote this out as python code, rather than a bunch of fancy looking math, it's not that impressive

  • @rockapedra1130
    @rockapedra1130 2 месяца назад +1

    Thanks!