I just cannot express how grateful I am to prof Steve Brunton for posting these videos. Waking up at 6am to watch him explain is the most satisfying thing ever. Thank you! We all are grateful.
I love how you emphasize the intersection between machine learning and control (theory). That's exactely what sparks my interest about reinforcement learning!
I've been seriously considering starting a degree in A.I./Machine learning but with videos of this quality available for free, it is hard to justify the cost. Subscribed and liked!
Just incase you read this and have time to reply... Do you have any suggestions for an education path to your level of understanding? There are degrees for data science, computer science, artificial intelligence, software engineering, etc. They all seem so inter-related. I want to know them all but I'm struggling to pick a starting point. My current level of related education is highschool level advanced maths and a year of teaching myself MQL4/5 and R code mostly from free resources online. Just so you know my starting point (or state haha).
This sweet spot between control theory and machine learning definitely interests me, especially applied to astrodynamical systems. Please, continue making these videos, Professor Brunton!
What I like is I don't pay for this knowledge. I was planning to take a data science certificate, but you know what. Let me spend 6 months learning by myself I have spent a solid 1 month only on your videos starting from SVD. it has been amazing. I love when a small thing builds up into a bigger thing. Soon I will make a sample project based on what I have learned from your video.
I Like this video, but there is already a very common way to optimise the density problem. It’s called actor critic, where you basically get a second network to learn what reward it is expecting (q-learning) where the actor is a policy gradient network. Works fine so far and I know it’s not enough to get away from the semi-supervised but let’s be honest, the “semi” is what really defines the technique. Because the agent needs to learn what “could” be good in future by itself without a supervisor. That’s how animals and humans learn too. so a fully supervised agent wouldn’t be exploring the world on its own anymore. Greetings Firestorm (ETH-Zurich Student)
The central limit theorm applies not just to sums but equally to sums and differences. The fast Hadamard transform is done using patterns of addition and subtraction. Hence the CLT applies. If you transform a vector of random numbers from the uniform the result is a vector of random numbers from the Gaussian distribution. There is a 1969 paper about it. One technical problem is the transform leaves vector length unchanged. The Gaussians are ever so slightly entangled with each other through that. Anyway it is extremely fast. I think that after a 50 year break everyone should study the fast Hadamard transform again. It has too many uses for machine learning and neural nets to ignore.
When he talks about the environment being probabilistic he uses the example of a backgammon game which involves the role of a dice unlike chess. But he could also have pointed out that the random element in the game of chess is not the game itself but the opponent. You never know for sure what your opponent will do so you need a probabilistic approach to the game of chess too.
Good question... it is definitely becoming big in other fields of scientific computing. We are using it for laser tuning and fluid flow control. Self-driving car researchers are definitely using it a lot too for autonomy applications.
Thank you for putting all this material together. A comment about 8:12, why you'd want to learn a probabilistic policy is not exactly because the environment is stochastic. For any environment that is fully observable by the agent, there is a deterministic policy that is optimal (except in game settings, such as rock-paper-scissors). Probabilistic policies are needed in certain partially observable environments, where there are aliased states which look the same to the agent, i.e. the observation is the same, but the states are actually different. For more details: ruclips.net/video/MzVlYYGtg0M/видео.html
Thanks for your thoughts on this. That is a really good point. I was just thinking of making a short video on POMDPs to start sorting some of this out.
Hi Steve, would you agree if economists say the economy is a dynamical system? 🙂 In macroeconomics, all economic agents (central bank, government, households, etc) choose policies to maximize the value function of the the household. One problem though...maybe the economy is very complex such that it cannot be captured by any dynamic system🙂, and so no robot can learn how to make good monetary and fiscal policies to maximize household welfare (value function). Or perhaps a robot can actually be the next chair of the central bank or even the president...if all that matters is making good fiscal and monetary policies.😀. Nice lecture btw...
I just finished my Masters last year on control theory and only in my last class did I learn (or maybe realize) that reinforcement learning was the AI form of control theory, and its so disappointing because I wouldve loved to dedicate my degree to it, its incredibly interesting
Hello Man, I do really enjoy learning from your lectures. You're doing a great job and thank you for you to make these all great things free. I have wondered if I would design an environment that is like a Scrabble game, and try to learn the agent reading, writing and so more, how could I be sure the agent is really learning from reading books? Is this a paradox? I'm not sure :) But I want to try the understand if an agent learns how humans are learning new things, which point I can understand the agent is learning as a human? Have you ever thought about it? Please, share your opinions, I have really wanted to learn them. Best Regards,
*Does Ryzen CPU and nividia GPU driver have issue ? I have heard from Tom Hardware website..... Can I run mechanical software like Openfoam, Hypermesh, Ansys(Simulation), Converge CFD, Ansa in ryzen smoothly and what about linux....can i go for hp omen (R5 4600H GTX 1650)*
Just wanted to comment about how much I love these videos. Last year while applying for PhDs I was searching for passions. In a discussion with my friend (a computer scientist), I accidentally outlined genetic programming without knowing it. My friend told me so and I went researching. Found these videos and became enthralled. Now I have a PhD studentship in soft robotics and plan to use SINDy to help with modelling and control and honestly think that giving machines brains may be my future work too. Thanks Brunton, my passion was helped by your own.
You viewed RL as a semi-supervised time delayed reward. But the agent gets the reward at the end of the episodes only in policy iteration algorithms, right? In actor-critic algorithms, the agent get reward at each time step for the associated action. So can we call RL a semi-supervised time delayed reward? Just want to clear this.
How and when the agent gets rewards depends on the environment and the reward landscape. So not necessarily just at the end of an episode, and not just for policy iteration. Many algorithms will also artificially create denser intermediate rewards, although these are often proxies for a more sporadic reward. So short answer is that it is complicated (which is why RL is often considered its own distinct brand of ML), but sometimes it can be thought of as "partially" or "semi" supervised. But you are right, this label doesn't always fit.
At 16:26 did you by any chance mean "dynamic programming" (value iteration, q value iteration, etc.) instead of "differential programming". I couldn"t make sense of the combination of TD, MC and DP?
Hi. Do you believe that AI techniques will make Control Theory techniques obsolete in some years?... I mean, will control theory engineers and researchers be completely replaced by AI guys in some years?
A very well done lecture. Bravo! I'd like to make a suggestion, if I may, to modify the Policy function as pi(s,a) = Pr(A = a, S = s); A is the place holder for an action, and a is the actions of taking; S is the place holder for the state and s is the given state.
It seems that RL doesn't work well for drunk people such as going down the stair, or just plain walking. Otherwise, love your video series, especially this one.
Would it be reasonable to suggest the reduction of the complexity of the model of the environment in terms of its well-predicted future states to be the reward function? Basically, the lower the entropy of the system, the better.
actually working the other way round... and considering learning as a human from an RL perspective, it might be that there is some advantage to faster games in chess, where certain things can be learned, with more and faster feedback. (certainly the faster time controls have become increasingly popular)
I have troubles finding the playlist to which this video lecture belong; and I encounter this problem with all your videos, can u please put the put the link to the playlist in the description; Thank you.
Glad you noticed -- we got a new camera setup about 2 years ago, and we have been loving it! That is a really interesting idea about MSMs... when you can make a controller "passive" so that the material itself enacts the control law, things typically get more robust and cheaper... interested to learn more about MSMs
Many research about Agent based model before? It seems to me this is quite similar but also different to Reinforcement learning, any one know what are the true difference between Agent based model and Reinforcememt learning?
11:56 I don’t quiet understand how you define state for a chess game. Does one state mean the occupation for all positions in the game? You will have too many different states in this case right?
The reward signal tells the agent how well he did but not tells him is it right or wrong so its completely different from supervised or semi-supervised learning
Hey Steve! Loved your lecture! Could you tell me what your setup is? I love your production, setup, and content of course! Some questions: 1. Do you have a screen/script in front of you and a green screen behind? 2. Which cam and mic do you use? Is it only a lav mic? I assume it's not shotgun since you're far away from any particular point of the frame. 3. How much time does it take to create a video like this one? 4. How many dry runs do you usually do? Or for this video in particular? You're setting a new standard for production (and beyond haha), keep up the good work! I'd really appreciate your answers, thank you in advance!
Thanks, glad you like it! No script, but I have a screen so I can see where I am relative to the presentation. I use a lav mic and a canon 4k camera. I usually do everything in one run, sometimes I redo the intro a couple times until i'm happy with it.
Is an agent only as good as the reward? If that's the only way to communicate the objective to the agent, a proper reward seems really important, and confusing, and often neglected. Could you please cover this in more detail?
@@Eigensteve Yeah that's a mind-blow. If the amygdala is part of the environment, then the agent is our consciousness (an immeasurable level of abstraction, a separation from nature). If it isn't part of it, then the reward is coming from within the agent, which ties us deeply to the environment and nature. Google cleverly leaves out objective when defining intelligence. At least Piaget was more helpful by mentioning a mental world model. What is it to you?
Dear Steve, Im very, very grateful that I get to watch such extraordinary instructive videos for free!!! Thinking that elsewhere in the world people are killing others atm (as in Kabul), it gives me a lot of hope seeing how people like you just make the world a little better and allmost brings tears into my eyes. You have such great talent in teaching, thank you!
Every time he said "good", i felt appreciated for not giving up on a lecture whose subject is far, far away from mine and im pushing myself to try and learn the concept. thank you, steve.. much love.!
Thank you so much for this lecture. I really enjoy your videos, this is helpful as a PhD student. I also bought your book "Data-driven science and engineering" which have nice explanations for the tools I use. Keep on this awesome work! Greetings from France!
Steve is one of the gifted teachers. I wish you can guide postgraduate to make a good publication in control and learning by highlighting the hot topics and promising research aspects.
I wish i knew this channel at the start of quarantine
I found about the channel just as quarantine had started. It was quite the treat.
I just cannot express how grateful I am to prof Steve Brunton for posting these videos. Waking up at 6am to watch him explain is the most satisfying thing ever. Thank you! We all are grateful.
As a CS grad student who took RL in the last semester ... this is truly the best refresher I have seen until now. Thanks a lot for uploading.
Great to hear!
I love how you emphasize the intersection between machine learning and control (theory). That's exactely what sparks my interest about reinforcement learning!
Glad you like it! I always found this connection fascinating and a very natural way to merge the two fields.
I've been seriously considering starting a degree in A.I./Machine learning but with videos of this quality available for free, it is hard to justify the cost. Subscribed and liked!
Just incase you read this and have time to reply... Do you have any suggestions for an education path to your level of understanding? There are degrees for data science, computer science, artificial intelligence, software engineering, etc. They all seem so inter-related. I want to know them all but I'm struggling to pick a starting point.
My current level of related education is highschool level advanced maths and a year of teaching myself MQL4/5 and R code mostly from free resources online. Just so you know my starting point (or state haha).
This sweet spot between control theory and machine learning definitely interests me, especially applied to astrodynamical systems. Please, continue making these videos, Professor Brunton!
Wow! I would love to see Prof take on RL topics!
Steve is a phenomenal lecturer, isn't he?
never seen a better one
very much so
He is!
Yessss
no, he is the most phenomenal one!! Respect
What I like is I don't pay for this knowledge. I was planning to take a data science certificate, but you know what. Let me spend 6 months learning by myself I have spent a solid 1 month only on your videos starting from SVD. it has been amazing. I love when a small thing builds up into a bigger thing. Soon I will make a sample project based on what I have learned from your video.
I Like this video, but there is already a very common way to optimise the density problem. It’s called actor critic, where you basically get a second network to learn what reward it is expecting (q-learning) where the actor is a policy gradient network.
Works fine so far and I know it’s not enough to get away from the semi-supervised but let’s be honest, the “semi” is what really defines the technique. Because the agent needs to learn what “could” be good in future by itself without a supervisor. That’s how animals and humans learn too. so a fully supervised agent wouldn’t be exploring the world on its own anymore.
Greetings Firestorm
(ETH-Zurich Student)
Yay! Hero has decided to teach Reinforcement Learning
Prof Brunton: You are one bad-ass teacher!!!🤓
the fantastic lecture that I've ever seen...
the best one i have ever seen
It would be an honor to be supervised for a PhD by him.
it's brilliant ! . Keep working with this topic please
All of your lecture series are very good and very helpful. A series on convex optimization problems would be good. Any thoughts about it?
Great lesson.. Thank you
It would be nice to see you implement some of these algorithms in Python, or whatever you're comfortable with. But great video none the less.
This guy is super smart!
The central limit theorm applies not just to sums but equally to sums and differences. The fast Hadamard transform is done using patterns of addition and subtraction. Hence the CLT applies. If you transform a vector of random numbers from the uniform the result is a vector of random numbers from the Gaussian distribution. There is a 1969 paper about it. One technical problem is the transform leaves vector length unchanged. The Gaussians are ever so slightly entangled with each other through that. Anyway it is extremely fast. I think that after a 50 year break everyone should study the fast Hadamard transform again. It has too many uses for machine learning and neural nets to ignore.
When he talks about the environment being probabilistic he uses the example of a backgammon game which involves the role of a dice unlike chess. But he could also have pointed out that the random element in the game of chess is not the game itself but the opponent. You never know for sure what your opponent will do so you need a probabilistic approach to the game of chess too.
16:30 - wonder if it's differential or differentiable programming? Great video.
Thank you, professor!
Great talk, but I understand the difference between the quality function Q(s,a) and the policy pi(s,a). They seem to do the same thing?
Apart from robotics, games and finance, which fields do you see reinforcement learning having the biggest impact in the next 2-3 years?
Good question... it is definitely becoming big in other fields of scientific computing. We are using it for laser tuning and fluid flow control. Self-driving car researchers are definitely using it a lot too for autonomy applications.
Thank you for putting all this material together. A comment about 8:12, why you'd want to learn a probabilistic policy is not exactly because the environment is stochastic. For any environment that is fully observable by the agent, there is a deterministic policy that is optimal (except in game settings, such as rock-paper-scissors). Probabilistic policies are needed in certain partially observable environments, where there are aliased states which look the same to the agent, i.e. the observation is the same, but the states are actually different.
For more details:
ruclips.net/video/MzVlYYGtg0M/видео.html
Thanks for your thoughts on this. That is a really good point. I was just thinking of making a short video on POMDPs to start sorting some of this out.
@@Eigensteve thank you, looking forward to your new videos!
Love the videos
Hi Steve, would you agree if economists say the economy is a dynamical system? 🙂
In macroeconomics, all economic agents (central bank, government, households, etc) choose policies to maximize the value function of the the household. One problem though...maybe the economy is very complex such that it cannot be captured by any dynamic system🙂, and so no robot can learn how to make good monetary and fiscal policies to maximize household welfare (value function). Or perhaps a robot can actually be the next chair of the central bank or even the president...if all that matters is making good fiscal and monetary policies.😀. Nice lecture btw...
Kudos on the awesome lecture
Thank you for the excellent class! Is this lecture in your book?
Thanks! This is not in the first edition, but we have a new chapter on reinforcement learning in the upcoming 2nd edition.
Wow! Thank you so much.
Maybe the next lecture can be about UMAP please :D?
I just finished my Masters last year on control theory and only in my last class did I learn (or maybe realize) that reinforcement learning was the AI form of control theory, and its so disappointing because I wouldve loved to dedicate my degree to it, its incredibly interesting
Wawawawait, so i can actually use machine learning to optimize/design an IA for ANY purpose that meets the Control Theory methodology?
Hello Man,
I do really enjoy learning from your lectures. You're doing a great job and thank you for you to make these all great things free.
I have wondered if I would design an environment that is like a Scrabble game, and try to learn the agent reading, writing and so more, how could I be sure the agent is really learning from reading books?
Is this a paradox? I'm not sure :) But I want to try the understand if an agent learns how humans are learning new things, which point I can understand the agent is learning as a human?
Have you ever thought about it? Please, share your opinions, I have really wanted to learn them.
Best Regards,
Hello Steve, I would like to learn mathematical and physical movement of flies, do you know where ?
Amazing 🤩
*Does Ryzen CPU and nividia GPU driver have issue ? I have heard from Tom Hardware website..... Can I run mechanical software like Openfoam, Hypermesh, Ansys(Simulation), Converge CFD, Ansa in ryzen smoothly and what about linux....can i go for hp omen (R5 4600H GTX 1650)*
Can you cover some differences between model-based and model-free learning?
Is the code on RPCA of the flow around a circle available? I'm really interested in that
What tools are used to film this presentation style ?
Matrix agents being programmes make them agents to the matrix. The movie is correct.
Yeah just found my MPhil field.
Crap, wanted to get here first
Pakistan
*"WELCOME BACK"*
Would love for a full series on how can we use RL to control real world dynamical systems!
Viewing reinforcement learning as time delayed supervised learning is a really good way of looking at it.
Indeed!
Just wanted to comment about how much I love these videos. Last year while applying for PhDs I was searching for passions. In a discussion with my friend (a computer scientist), I accidentally outlined genetic programming without knowing it. My friend told me so and I went researching. Found these videos and became enthralled. Now I have a PhD studentship in soft robotics and plan to use SINDy to help with modelling and control and honestly think that giving machines brains may be my future work too. Thanks Brunton, my passion was helped by your own.
That is amazing to hear! Helping people develop their passions is exactly why I do this!
I still have no idea as to who could possibly dislike these videos
u
@@phaZZi6461 I wanted to add a comment but 69 looks so good
\
It could have been someone who only believes in deterministic models.
This is THE BEST explanation on reinforcement learning over all the articles, books, or youtube videos, that I've seen so far. Period.
Steve, can I make a suggestion? Could you make a few videos on Markov decision processes, Markov chains before you get into RL?
Good idea, MDP and Markov chains are super interesting, and are one of my favorite topics... I'll definitely add it to the list.
Great channel! Please record more videos on the edge of reinforcement learning and control theory. Congrats on your work.
You viewed RL as a semi-supervised time delayed reward. But the agent gets the reward at the end of the episodes only in policy iteration algorithms, right? In actor-critic algorithms, the agent get reward at each time step for the associated action. So can we call RL a semi-supervised time delayed reward? Just want to clear this.
How and when the agent gets rewards depends on the environment and the reward landscape. So not necessarily just at the end of an episode, and not just for policy iteration. Many algorithms will also artificially create denser intermediate rewards, although these are often proxies for a more sporadic reward. So short answer is that it is complicated (which is why RL is often considered its own distinct brand of ML), but sometimes it can be thought of as "partially" or "semi" supervised. But you are right, this label doesn't always fit.
Never clicked a video that fast 😆. Great content prof as always love it!
That's an awesome video indeed. A great introduction to RL!
Is there something you dont know dude? You seem to be an expert on everything. You are such an inspiration.
At 16:26 did you by any chance mean "dynamic programming" (value iteration, q value iteration, etc.) instead of "differential programming". I couldn"t make sense of the combination of TD, MC and DP?
Also, it would be awesome if you could elaborate on a comparison of control theory and reinforcement learning. When to use CT, when to use RL, etc.
Good catch, thanks!
excellent lecture....but you kinda glanced over MDP..and did not talk about markov property...I think its kind of important...right?
video on ADRC please. (Active disturbance rejection control,) with implementation on Simulink.
The other videos r not good by others
Hi. Do you believe that AI techniques will make Control Theory techniques obsolete in some years?... I mean, will control theory engineers and researchers be completely replaced by AI guys in some years?
A very well done lecture. Bravo!
I'd like to make a suggestion, if I may, to modify the Policy function as
pi(s,a) = Pr(A = a, S = s); A is the place holder for an action, and a is the actions of taking; S is the place holder for the state and s is the given state.
It seems that RL doesn't work well for drunk people such as going down the stair, or just plain walking. Otherwise, love your video series, especially this one.
I love u, Steve! I have been currently working on Machine Teaching and Project Bonsai. I really needed to know this.
Steve PLEASE teach us how to code a machine learning control algorithm with Q learning, I need it for my thesis (sobs)
Would it be reasonable to suggest the reduction of the complexity of the model of the environment in terms of its well-predicted future states to be the reward function? Basically, the lower the entropy of the system, the better.
I am binge watching this chanel from past 3 hours
How do I solve a continuous action problem (e.g., voltage command to a motor which I want to evolve smoothly without sharp changes)?
actually working the other way round... and considering learning as a human from an RL perspective, it might be that there is some advantage to faster games in chess, where certain things can be learned, with more and faster feedback. (certainly the faster time controls have become increasingly popular)
I have troubles finding the playlist to which this video lecture belong; and I encounter this problem with all your videos, can u please put the put the link to the playlist in the description; Thank you.
Did you get a new camera? There seems like there was a jump in video quality!
Do you think that Control Theory has any use for Memory Shape Materials?
Glad you noticed -- we got a new camera setup about 2 years ago, and we have been loving it!
That is a really interesting idea about MSMs... when you can make a controller "passive" so that the material itself enacts the control law, things typically get more robust and cheaper... interested to learn more about MSMs
Many research about Agent based model before? It seems to me this is quite similar but also different to Reinforcement learning, any one know what are the true difference between Agent based model and Reinforcememt learning?
Finally lectures, free of the trendy ML buzzword nonsense and data science, and actual theory.
Can the thing that they learn converted into formula? So we get approximate formulation of problems that are hard to be solved this way?
11:56 I don’t quiet understand how you define state for a chess game. Does one state mean the occupation for all positions in the game? You will have too many different states in this case right?
you have created such high quality content that i just really enjoy watching it instead of playing games :)))
When should a system be modelled deterministically vs probabilistically?
I would suggest that the researchers study how Neanderthal walked. They were far far more efficient of walkers
The reward signal tells the agent how well he did but not tells him is it right or wrong so its completely different from supervised or semi-supervised learning
It's really interesting to watch this video, although I have also studied and read it a few times, its boredom is hard to describe.
thank you teacher
Hi Professor Steve, Lovely presentation.
In Sweden they have "Bellman jokes". I do not think it is the same guy though...
Hey Steve! Loved your lecture! Could you tell me what your setup is? I love your production, setup, and content of course!
Some questions:
1. Do you have a screen/script in front of you and a green screen behind?
2. Which cam and mic do you use? Is it only a lav mic? I assume it's not shotgun since you're far away from any particular point of the frame.
3. How much time does it take to create a video like this one?
4. How many dry runs do you usually do? Or for this video in particular?
You're setting a new standard for production (and beyond haha), keep up the good work!
I'd really appreciate your answers, thank you in advance!
Thanks, glad you like it! No script, but I have a screen so I can see where I am relative to the presentation. I use a lav mic and a canon 4k camera. I usually do everything in one run, sometimes I redo the intro a couple times until i'm happy with it.
@@Eigensteve thanks Steve!
Is an agent only as good as the reward? If that's the only way to communicate the objective to the agent, a proper reward seems really important, and confusing, and often neglected. Could you please cover this in more detail?
Excellent point. And, in biological systems, the ultimate reward (dopamine) comes from inside the agent, not from the external environment.
@@Eigensteve Yeah that's a mind-blow. If the amygdala is part of the environment, then the agent is our consciousness (an immeasurable level of abstraction, a separation from nature). If it isn't part of it, then the reward is coming from within the agent, which ties us deeply to the environment and nature. Google cleverly leaves out objective when defining intelligence. At least Piaget was more helpful by mentioning a mental world model. What is it to you?
Theeeere we go Steve! Waited for this :)
Dear Steve,
Im very, very grateful that I get to watch such extraordinary instructive videos for free!!! Thinking that elsewhere in the world people are killing others atm (as in Kabul), it gives me a lot of hope seeing how people like you just make the world a little better and allmost brings tears into my eyes. You have such great talent in teaching, thank you!
i bought your books, i hope you'll create a real course about this subject, so cool thks
Every time he said "good", i felt appreciated for not giving up on a lecture whose subject is far, far away from mine and im pushing myself to try and learn the concept. thank you, steve.. much love.!
function [eigensteve] =eig(steve) {display("best lecturer & scientist in youtube") }
RL can be interpreted from this perspective, amazing
WOW
top quality
this is what they said about education on the internet that "the best teacher can teach everyone"
this is that video for this topic
Can you do separate teaching on Agent Based Models?
Is there any way a rat can smell a fruitloop from far and follow it until it finds it ?
Thank you so much for this lecture. I really enjoy your videos, this is helpful as a PhD student. I also bought your book "Data-driven science and engineering" which have nice explanations for the tools I use. Keep on this awesome work! Greetings from France!
Looks like I'm not the only one working on a video early in the morning! Really cool stuff, love the doggie!!
tell more about reinforcement learning in robot arm.thanks
Steve is one of the gifted teachers. I wish you can guide postgraduate to make a good publication in control and learning by highlighting the hot topics and promising research aspects.
Thanks so much!
Yes,indeed he's great
Thanks so much! All of the faithful viewers are great!
Simply great subject and excellent presentation thank you prof for all your efforts