Reinforcement Learning: Machine Learning Meets Control Theory

Поделиться
HTML-код
  • Опубликовано: 28 сен 2024

Комментарии • 252

  • @steven-bt7ud
    @steven-bt7ud 3 года назад +45

    I wish i knew this channel at the start of quarantine

    • @hysenndregjoni853
      @hysenndregjoni853 3 года назад

      I found about the channel just as quarantine had started. It was quite the treat.

  • @baseljamal8907
    @baseljamal8907 3 года назад +15

    I just cannot express how grateful I am to prof Steve Brunton for posting these videos. Waking up at 6am to watch him explain is the most satisfying thing ever. Thank you! We all are grateful.

  • @teegnas
    @teegnas 3 года назад +3

    As a CS grad student who took RL in the last semester ... this is truly the best refresher I have seen until now. Thanks a lot for uploading.

  • @Spiegeldondi
    @Spiegeldondi 3 года назад +4

    I love how you emphasize the intersection between machine learning and control (theory). That's exactely what sparks my interest about reinforcement learning!

    • @Eigensteve
      @Eigensteve  3 года назад +1

      Glad you like it! I always found this connection fascinating and a very natural way to merge the two fields.

  • @thelazygardener9493
    @thelazygardener9493 3 года назад +1

    I've been seriously considering starting a degree in A.I./Machine learning but with videos of this quality available for free, it is hard to justify the cost. Subscribed and liked!

    • @thelazygardener9493
      @thelazygardener9493 3 года назад +1

      Just incase you read this and have time to reply... Do you have any suggestions for an education path to your level of understanding? There are degrees for data science, computer science, artificial intelligence, software engineering, etc. They all seem so inter-related. I want to know them all but I'm struggling to pick a starting point.
      My current level of related education is highschool level advanced maths and a year of teaching myself MQL4/5 and R code mostly from free resources online. Just so you know my starting point (or state haha).

  • @thiagocesarlousadamarsola3990
    @thiagocesarlousadamarsola3990 3 года назад +2

    This sweet spot between control theory and machine learning definitely interests me, especially applied to astrodynamical systems. Please, continue making these videos, Professor Brunton!

  • @wizardOfRobots
    @wizardOfRobots 3 года назад +12

    Wow! I would love to see Prof take on RL topics!

  • @ronniec8805
    @ronniec8805 3 года назад +423

    Steve is a phenomenal lecturer, isn't he?

    • @cnbrksnr
      @cnbrksnr 3 года назад +5

      never seen a better one

    • @uforskammet
      @uforskammet 3 года назад

      very much so

    • @flybekvc
      @flybekvc 3 года назад

      He is!

    • @mail2cmin
      @mail2cmin 3 года назад

      Yessss

    • @reocam8918
      @reocam8918 3 года назад

      no, he is the most phenomenal one!! Respect

  • @pradiptahafid
    @pradiptahafid 2 года назад

    What I like is I don't pay for this knowledge. I was planning to take a data science certificate, but you know what. Let me spend 6 months learning by myself I have spent a solid 1 month only on your videos starting from SVD. it has been amazing. I love when a small thing builds up into a bigger thing. Soon I will make a sample project based on what I have learned from your video.

  • @Firestorm-tq7fy
    @Firestorm-tq7fy 2 года назад

    I Like this video, but there is already a very common way to optimise the density problem. It’s called actor critic, where you basically get a second network to learn what reward it is expecting (q-learning) where the actor is a policy gradient network.
    Works fine so far and I know it’s not enough to get away from the semi-supervised but let’s be honest, the “semi” is what really defines the technique. Because the agent needs to learn what “could” be good in future by itself without a supervisor. That’s how animals and humans learn too. so a fully supervised agent wouldn’t be exploring the world on its own anymore.
    Greetings Firestorm
    (ETH-Zurich Student)

  • @subramaniannk3364
    @subramaniannk3364 3 года назад +2

    Yay! Hero has decided to teach Reinforcement Learning

  • @carriefu458
    @carriefu458 2 года назад

    Prof Brunton: You are one bad-ass teacher!!!🤓

  • @sounghwanhwang5422
    @sounghwanhwang5422 3 года назад

    the fantastic lecture that I've ever seen...

  • @fahimehjabbarinia401
    @fahimehjabbarinia401 2 года назад

    the best one i have ever seen

  • @Physicsandmathswithpraveen
    @Physicsandmathswithpraveen 3 года назад +1

    It would be an honor to be supervised for a PhD by him.

  • @timanb2491
    @timanb2491 3 года назад

    it's brilliant ! . Keep working with this topic please

  • @SiriGadipudi
    @SiriGadipudi Год назад +2

    All of your lecture series are very good and very helpful. A series on convex optimization problems would be good. Any thoughts about it?

  • @pardonchawatama941
    @pardonchawatama941 3 года назад +1

    Great lesson.. Thank you

  • @gamuchiraindawana2827
    @gamuchiraindawana2827 10 месяцев назад

    It would be nice to see you implement some of these algorithms in Python, or whatever you're comfortable with. But great video none the less.

  • @jazonsamillano
    @jazonsamillano 2 года назад

    This guy is super smart!

  • @hoaxuan7074
    @hoaxuan7074 3 года назад

    The central limit theorm applies not just to sums but equally to sums and differences. The fast Hadamard transform is done using patterns of addition and subtraction. Hence the CLT applies. If you transform a vector of random numbers from the uniform the result is a vector of random numbers from the Gaussian distribution. There is a 1969 paper about it. One technical problem is the transform leaves vector length unchanged. The Gaussians are ever so slightly entangled with each other through that. Anyway it is extremely fast. I think that after a 50 year break everyone should study the fast Hadamard transform again. It has too many uses for machine learning and neural nets to ignore.

  • @basicmachines
    @basicmachines 3 года назад

    When he talks about the environment being probabilistic he uses the example of a backgammon game which involves the role of a dice unlike chess. But he could also have pointed out that the random element in the game of chess is not the game itself but the opponent. You never know for sure what your opponent will do so you need a probabilistic approach to the game of chess too.

  • @MarinaOvchinnikov
    @MarinaOvchinnikov 9 месяцев назад

    16:30 - wonder if it's differential or differentiable programming? Great video.

  • @aiahmed608
    @aiahmed608 2 года назад

    Thank you, professor!

  • @matthewjames7513
    @matthewjames7513 3 года назад

    Great talk, but I understand the difference between the quality function Q(s,a) and the policy pi(s,a). They seem to do the same thing?

  • @billfero
    @billfero 3 года назад +3

    Apart from robotics, games and finance, which fields do you see reinforcement learning having the biggest impact in the next 2-3 years?

    • @Eigensteve
      @Eigensteve  3 года назад +2

      Good question... it is definitely becoming big in other fields of scientific computing. We are using it for laser tuning and fluid flow control. Self-driving car researchers are definitely using it a lot too for autonomy applications.

  • @enesbilgin5010
    @enesbilgin5010 2 года назад +1

    Thank you for putting all this material together. A comment about 8:12, why you'd want to learn a probabilistic policy is not exactly because the environment is stochastic. For any environment that is fully observable by the agent, there is a deterministic policy that is optimal (except in game settings, such as rock-paper-scissors). Probabilistic policies are needed in certain partially observable environments, where there are aliased states which look the same to the agent, i.e. the observation is the same, but the states are actually different.
    For more details:
    ruclips.net/video/MzVlYYGtg0M/видео.html

    • @Eigensteve
      @Eigensteve  2 года назад +1

      Thanks for your thoughts on this. That is a really good point. I was just thinking of making a short video on POMDPs to start sorting some of this out.

    • @enesbilgin5010
      @enesbilgin5010 2 года назад

      @@Eigensteve thank you, looking forward to your new videos!

  • @youtubeenjoyer199
    @youtubeenjoyer199 Год назад

    Love the videos

  • @emmanuelameyaw9735
    @emmanuelameyaw9735 3 года назад

    Hi Steve, would you agree if economists say the economy is a dynamical system? 🙂
    In macroeconomics, all economic agents (central bank, government, households, etc) choose policies to maximize the value function of the the household. One problem though...maybe the economy is very complex such that it cannot be captured by any dynamic system🙂, and so no robot can learn how to make good monetary and fiscal policies to maximize household welfare (value function). Or perhaps a robot can actually be the next chair of the central bank or even the president...if all that matters is making good fiscal and monetary policies.😀. Nice lecture btw...

  • @pliniocastro1546
    @pliniocastro1546 3 года назад

    Kudos on the awesome lecture

  • @edgostyn
    @edgostyn 3 года назад +1

    Thank you for the excellent class! Is this lecture in your book?

    • @Eigensteve
      @Eigensteve  3 года назад

      Thanks! This is not in the first edition, but we have a new chapter on reinforcement learning in the upcoming 2nd edition.

  • @gama3181
    @gama3181 3 года назад

    Wow! Thank you so much.
    Maybe the next lecture can be about UMAP please :D?

  • @chrism95
    @chrism95 3 года назад

    I just finished my Masters last year on control theory and only in my last class did I learn (or maybe realize) that reinforcement learning was the AI form of control theory, and its so disappointing because I wouldve loved to dedicate my degree to it, its incredibly interesting

  • @Avalanchanime
    @Avalanchanime 3 года назад +1

    Wawawawait, so i can actually use machine learning to optimize/design an IA for ANY purpose that meets the Control Theory methodology?

  • @MertCan-fg5jn
    @MertCan-fg5jn 2 года назад

    Hello Man,
    I do really enjoy learning from your lectures. You're doing a great job and thank you for you to make these all great things free.
    I have wondered if I would design an environment that is like a Scrabble game, and try to learn the agent reading, writing and so more, how could I be sure the agent is really learning from reading books?
    Is this a paradox? I'm not sure :) But I want to try the understand if an agent learns how humans are learning new things, which point I can understand the agent is learning as a human?
    Have you ever thought about it? Please, share your opinions, I have really wanted to learn them.
    Best Regards,

  • @quantumfeet
    @quantumfeet 2 года назад

    Hello Steve, I would like to learn mathematical and physical movement of flies, do you know where ?

  • @usmleck7000
    @usmleck7000 3 года назад

    Amazing 🤩

  • @yariiijan8225
    @yariiijan8225 3 года назад

    *Does Ryzen CPU and nividia GPU driver have issue ? I have heard from Tom Hardware website..... Can I run mechanical software like Openfoam, Hypermesh, Ansys(Simulation), Converge CFD, Ansa in ryzen smoothly and what about linux....can i go for hp omen (R5 4600H GTX 1650)*

  • @sarvagyagupta1744
    @sarvagyagupta1744 3 года назад

    Can you cover some differences between model-based and model-free learning?

  • @tommclean9208
    @tommclean9208 3 года назад

    Is the code on RPCA of the flow around a circle available? I'm really interested in that

  • @bleacherz7503
    @bleacherz7503 3 года назад

    What tools are used to film this presentation style ?

  • @StEvUgnIn
    @StEvUgnIn Год назад

    Matrix agents being programmes make them agents to the matrix. The movie is correct.

  • @1ssbrudra
    @1ssbrudra 3 года назад

    Yeah just found my MPhil field.

  • @SamuelOrjiM
    @SamuelOrjiM 3 года назад

    Crap, wanted to get here first

  • @jeetenderkakkar7570
    @jeetenderkakkar7570 3 года назад

    Pakistan

  • @whasuklee
    @whasuklee 3 года назад +56

    *"WELCOME BACK"*

  • @sankalp1391
    @sankalp1391 3 года назад +41

    Would love for a full series on how can we use RL to control real world dynamical systems!

  • @ethanspastlivestreams
    @ethanspastlivestreams 3 года назад +33

    Viewing reinforcement learning as time delayed supervised learning is a really good way of looking at it.

    • @JousefM
      @JousefM 3 года назад +3

      Indeed!

  • @Globbo_The_Glob
    @Globbo_The_Glob 3 года назад +83

    Just wanted to comment about how much I love these videos. Last year while applying for PhDs I was searching for passions. In a discussion with my friend (a computer scientist), I accidentally outlined genetic programming without knowing it. My friend told me so and I went researching. Found these videos and became enthralled. Now I have a PhD studentship in soft robotics and plan to use SINDy to help with modelling and control and honestly think that giving machines brains may be my future work too. Thanks Brunton, my passion was helped by your own.

    • @Eigensteve
      @Eigensteve  3 года назад +32

      That is amazing to hear! Helping people develop their passions is exactly why I do this!

  • @christiankraghjespersen994
    @christiankraghjespersen994 3 года назад +60

    I still have no idea as to who could possibly dislike these videos

    • @phaZZi6461
      @phaZZi6461 3 года назад

      u

    • @devashishbose1521
      @devashishbose1521 3 года назад

      @@phaZZi6461 I wanted to add a comment but 69 looks so good
      \

    • @Arocarus
      @Arocarus Месяц назад

      It could have been someone who only believes in deterministic models.

  • @s3pi0n
    @s3pi0n 2 года назад +5

    This is THE BEST explanation on reinforcement learning over all the articles, books, or youtube videos, that I've seen so far. Period.

  • @subramaniannk3364
    @subramaniannk3364 3 года назад +6

    Steve, can I make a suggestion? Could you make a few videos on Markov decision processes, Markov chains before you get into RL?

    • @Eigensteve
      @Eigensteve  3 года назад +12

      Good idea, MDP and Markov chains are super interesting, and are one of my favorite topics... I'll definitely add it to the list.

  • @sistemasecontroles
    @sistemasecontroles 3 года назад +8

    Great channel! Please record more videos on the edge of reinforcement learning and control theory. Congrats on your work.

  • @tanujajoshi1901
    @tanujajoshi1901 2 года назад +1

    You viewed RL as a semi-supervised time delayed reward. But the agent gets the reward at the end of the episodes only in policy iteration algorithms, right? In actor-critic algorithms, the agent get reward at each time step for the associated action. So can we call RL a semi-supervised time delayed reward? Just want to clear this.

    • @Eigensteve
      @Eigensteve  2 года назад

      How and when the agent gets rewards depends on the environment and the reward landscape. So not necessarily just at the end of an episode, and not just for policy iteration. Many algorithms will also artificially create denser intermediate rewards, although these are often proxies for a more sporadic reward. So short answer is that it is complicated (which is why RL is often considered its own distinct brand of ML), but sometimes it can be thought of as "partially" or "semi" supervised. But you are right, this label doesn't always fit.

  • @Optinix-gz1qg
    @Optinix-gz1qg 3 года назад +6

    Never clicked a video that fast 😆. Great content prof as always love it!

  • @msauditech
    @msauditech 9 месяцев назад +1

    That's an awesome video indeed. A great introduction to RL!

  • @elultimopujilense
    @elultimopujilense 3 года назад +4

    Is there something you dont know dude? You seem to be an expert on everything. You are such an inspiration.

  • @alexanderschiendorfer2203
    @alexanderschiendorfer2203 3 года назад +2

    At 16:26 did you by any chance mean "dynamic programming" (value iteration, q value iteration, etc.) instead of "differential programming". I couldn"t make sense of the combination of TD, MC and DP?

    • @alexanderschiendorfer2203
      @alexanderschiendorfer2203 3 года назад +3

      Also, it would be awesome if you could elaborate on a comparison of control theory and reinforcement learning. When to use CT, when to use RL, etc.

    • @Eigensteve
      @Eigensteve  3 года назад +1

      Good catch, thanks!

  • @codekomali1760
    @codekomali1760 3 года назад

    excellent lecture....but you kinda glanced over MDP..and did not talk about markov property...I think its kind of important...right?

  • @aakashdewangan7313
    @aakashdewangan7313 Месяц назад

    video on ADRC please. (Active disturbance rejection control,) with implementation on Simulink.
    The other videos r not good by others

  • @RGDot422
    @RGDot422 Год назад

    Hi. Do you believe that AI techniques will make Control Theory techniques obsolete in some years?... I mean, will control theory engineers and researchers be completely replaced by AI guys in some years?

  • @lazyoneswapples2962
    @lazyoneswapples2962 Год назад

    A very well done lecture. Bravo!
    I'd like to make a suggestion, if I may, to modify the Policy function as
    pi(s,a) = Pr(A = a, S = s); A is the place holder for an action, and a is the actions of taking; S is the place holder for the state and s is the given state.

  • @ItsNotAllRainbows_and_Unicorns
    @ItsNotAllRainbows_and_Unicorns 11 месяцев назад

    It seems that RL doesn't work well for drunk people such as going down the stair, or just plain walking. Otherwise, love your video series, especially this one.

  • @tabindahayat3492
    @tabindahayat3492 2 года назад

    I love u, Steve! I have been currently working on Machine Teaching and Project Bonsai. I really needed to know this.

  • @luismora1676
    @luismora1676 3 года назад +1

    Steve PLEASE teach us how to code a machine learning control algorithm with Q learning, I need it for my thesis (sobs)

  • @timeflex
    @timeflex 7 месяцев назад

    Would it be reasonable to suggest the reduction of the complexity of the model of the environment in terms of its well-predicted future states to be the reward function? Basically, the lower the entropy of the system, the better.

  • @souravjha2146
    @souravjha2146 3 года назад +2

    I am binge watching this chanel from past 3 hours

  • @alanzom1503
    @alanzom1503 2 года назад

    How do I solve a continuous action problem (e.g., voltage command to a motor which I want to evolve smoothly without sharp changes)?

  • @richardfredlund3802
    @richardfredlund3802 2 года назад

    actually working the other way round... and considering learning as a human from an RL perspective, it might be that there is some advantage to faster games in chess, where certain things can be learned, with more and faster feedback. (certainly the faster time controls have become increasingly popular)

  • @anassben3333
    @anassben3333 3 года назад

    I have troubles finding the playlist to which this video lecture belong; and I encounter this problem with all your videos, can u please put the put the link to the playlist in the description; Thank you.

  • @LucasDimoveo
    @LucasDimoveo 3 года назад +2

    Did you get a new camera? There seems like there was a jump in video quality!
    Do you think that Control Theory has any use for Memory Shape Materials?

    • @Eigensteve
      @Eigensteve  3 года назад

      Glad you noticed -- we got a new camera setup about 2 years ago, and we have been loving it!
      That is a really interesting idea about MSMs... when you can make a controller "passive" so that the material itself enacts the control law, things typically get more robust and cheaper... interested to learn more about MSMs

  • @darwin6984
    @darwin6984 2 года назад

    Many research about Agent based model before? It seems to me this is quite similar but also different to Reinforcement learning, any one know what are the true difference between Agent based model and Reinforcememt learning?

  • @vicktorioalhakim3666
    @vicktorioalhakim3666 3 года назад

    Finally lectures, free of the trendy ML buzzword nonsense and data science, and actual theory.

  • @azizutkuozdemir
    @azizutkuozdemir 3 года назад

    Can the thing that they learn converted into formula? So we get approximate formulation of problems that are hard to be solved this way?

  • @zeys3316
    @zeys3316 2 года назад

    11:56 I don’t quiet understand how you define state for a chess game. Does one state mean the occupation for all positions in the game? You will have too many different states in this case right?

  • @cuongnguyentranhuu4616
    @cuongnguyentranhuu4616 3 года назад +1

    you have created such high quality content that i just really enjoy watching it instead of playing games :)))

  • @ryanmckenna2047
    @ryanmckenna2047 8 месяцев назад

    When should a system be modelled deterministically vs probabilistically?

  • @____2080_____
    @____2080_____ 3 года назад

    I would suggest that the researchers study how Neanderthal walked. They were far far more efficient of walkers

  • @KRX101
    @KRX101 3 года назад

    The reward signal tells the agent how well he did but not tells him is it right or wrong so its completely different from supervised or semi-supervised learning

  • @tai94bn
    @tai94bn 2 года назад +1

    It's really interesting to watch this video, although I have also studied and read it a few times, its boredom is hard to describe.
    thank you teacher

  • @SRIMANTASANTRA
    @SRIMANTASANTRA 3 года назад +3

    Hi Professor Steve, Lovely presentation.

  • @kennettallgren640
    @kennettallgren640 3 года назад

    In Sweden they have "Bellman jokes". I do not think it is the same guy though...

  • @TheAIEpiphany
    @TheAIEpiphany 3 года назад +1

    Hey Steve! Loved your lecture! Could you tell me what your setup is? I love your production, setup, and content of course!
    Some questions:
    1. Do you have a screen/script in front of you and a green screen behind?
    2. Which cam and mic do you use? Is it only a lav mic? I assume it's not shotgun since you're far away from any particular point of the frame.
    3. How much time does it take to create a video like this one?
    4. How many dry runs do you usually do? Or for this video in particular?
    You're setting a new standard for production (and beyond haha), keep up the good work!
    I'd really appreciate your answers, thank you in advance!

    • @Eigensteve
      @Eigensteve  3 года назад

      Thanks, glad you like it! No script, but I have a screen so I can see where I am relative to the presentation. I use a lav mic and a canon 4k camera. I usually do everything in one run, sometimes I redo the intro a couple times until i'm happy with it.

    • @TheAIEpiphany
      @TheAIEpiphany 3 года назад

      @@Eigensteve thanks Steve!

  • @XecutionStyle
    @XecutionStyle 3 года назад +1

    Is an agent only as good as the reward? If that's the only way to communicate the objective to the agent, a proper reward seems really important, and confusing, and often neglected. Could you please cover this in more detail?

    • @Eigensteve
      @Eigensteve  3 года назад +1

      Excellent point. And, in biological systems, the ultimate reward (dopamine) comes from inside the agent, not from the external environment.

    • @XecutionStyle
      @XecutionStyle 3 года назад

      @@Eigensteve Yeah that's a mind-blow. If the amygdala is part of the environment, then the agent is our consciousness (an immeasurable level of abstraction, a separation from nature). If it isn't part of it, then the reward is coming from within the agent, which ties us deeply to the environment and nature. Google cleverly leaves out objective when defining intelligence. At least Piaget was more helpful by mentioning a mental world model. What is it to you?

  • @JousefM
    @JousefM 3 года назад +2

    Theeeere we go Steve! Waited for this :)

  • @carlphilip4393
    @carlphilip4393 3 года назад +1

    Dear Steve,
    Im very, very grateful that I get to watch such extraordinary instructive videos for free!!! Thinking that elsewhere in the world people are killing others atm (as in Kabul), it gives me a lot of hope seeing how people like you just make the world a little better and allmost brings tears into my eyes. You have such great talent in teaching, thank you!

  • @riccvven2078
    @riccvven2078 Месяц назад

    i bought your books, i hope you'll create a real course about this subject, so cool thks

  • @subhikshaniyer613
    @subhikshaniyer613 2 года назад +1

    Every time he said "good", i felt appreciated for not giving up on a lecture whose subject is far, far away from mine and im pushing myself to try and learn the concept. thank you, steve.. much love.!

  • @bugrahana3395
    @bugrahana3395 3 года назад +1

    function [eigensteve] =eig(steve) {display("best lecturer & scientist in youtube") }

  • @minglee5164
    @minglee5164 2 года назад

    RL can be interpreted from this perspective, amazing

  • @abdullahmeaad8232
    @abdullahmeaad8232 3 года назад +1

    WOW

  • @loopuleasa
    @loopuleasa 3 года назад +1

    top quality
    this is what they said about education on the internet that "the best teacher can teach everyone"
    this is that video for this topic

  • @joshuaattih2091
    @joshuaattih2091 9 месяцев назад

    Can you do separate teaching on Agent Based Models?

  • @zrmsraggot
    @zrmsraggot 2 года назад

    Is there any way a rat can smell a fruitloop from far and follow it until it finds it ?

  • @pierredubois8715
    @pierredubois8715 3 года назад +1

    Thank you so much for this lecture. I really enjoy your videos, this is helpful as a PhD student. I also bought your book "Data-driven science and engineering" which have nice explanations for the tools I use. Keep on this awesome work! Greetings from France!

  • @fzigunov
    @fzigunov 3 года назад +1

    Looks like I'm not the only one working on a video early in the morning! Really cool stuff, love the doggie!!

  • @MohammadVahidPoorhosseini
    @MohammadVahidPoorhosseini 9 месяцев назад

    tell more about reinforcement learning in robot arm.thanks

  • @hudhuduot
    @hudhuduot 3 года назад +2

    Steve is one of the gifted teachers. I wish you can guide postgraduate to make a good publication in control and learning by highlighting the hot topics and promising research aspects.

  • @aliakbari4323
    @aliakbari4323 3 года назад +1

    Yes,indeed he's great

    • @Eigensteve
      @Eigensteve  3 года назад

      Thanks so much! All of the faithful viewers are great!

  • @kouider76
    @kouider76 3 года назад +1

    Simply great subject and excellent presentation thank you prof for all your efforts