Deep Reinforcement Learning: Neural Networks for Learning Control Laws

Поделиться
HTML-код
  • Опубликовано: 16 июн 2024
  • Deep learning is enabling tremendous breakthroughs in the power of reinforcement learning for control. From games, like chess and alpha Go, to robotic systems, deep neural networks are providing a powerful and flexible representation framework that fits naturally with reinforcement learning. In this video, we provide an overview of developments in deep reinforcement learning, along with leading algorithms and impressive applications.
    Citable link for this video: doi.org/10.52843/cassyni.9tngmc
    @eigensteve on Twitter
    eigensteve.com
    databookuw.com
    This video was produced at the University of Washington
  • НаукаНаука

Комментарии • 85

  • @v1e2ridisQ3u4o
    @v1e2ridisQ3u4o 3 года назад +74

    Shouting out Two minute papers in a Brunton video - best crossover episode ever!

    • @greykeith
      @greykeith 3 года назад +16

      what a time to be alive

    • @bluestar2253
      @bluestar2253 3 года назад +2

      I love the Two minute series on YT but it is so hard to pronounce that dude's name! lol

    • @soutrikband
      @soutrikband 3 года назад +3

      I held on to my papers hearing that

    • @harrysvensson2610
      @harrysvensson2610 3 года назад +1

      Shouldn't his channel be called 5 minute papers?

  • @1996butuq
    @1996butuq 3 года назад +12

    Really love your videos!
    Would love to see more about deep reinforcement learning used in the field of Robotics.

  • @Bill0102
    @Bill0102 4 месяца назад +1

    I'm immersed in this. I read a book with a similar theme, and I was completely immersed. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn

  • @MathematicsOptimization
    @MathematicsOptimization 3 года назад +7

    Very valuable resources that get uploaded here, thank you!

  • @Raven-bi3xn
    @Raven-bi3xn Год назад +1

    Really great video.
    The most complicated problem that I’ve seen RL solve is openAI dota 2. Mind blowing.

  • @saitaro
    @saitaro 3 года назад +3

    Your work is so inspiring, Steve.

  • @ajaykumar-rh2gz
    @ajaykumar-rh2gz 2 года назад +1

    Thanks Steve very beautifully explained. From my point you are the best teacher ever I have seen.
    Please teach us or upload some lecture on designing own custom environment.

  • @user-qp2ps1bk3b
    @user-qp2ps1bk3b 3 года назад +1

    thank you for your amazing content! I learn so much about the world with your videos

  • @shanwuli745
    @shanwuli745 3 года назад +2

    Thanks Steve! I believe it will be also interesting to compare model-based and model-free RL.

  • @kouider76
    @kouider76 3 года назад +11

    Thank you Prof. Brunton for the valuable content. I will be a bit greedy and ask if you can upload a video including an example coded in Matlab or Python
    Thank you again for all your efforts

  • @arpitshrimankar7787
    @arpitshrimankar7787 3 года назад +2

    Great video! Would like to see a video about "Hindsight experience replay " in Reinforcement Learning.

  • @swk229
    @swk229 2 года назад +1

    Thank you! I've learned so much from your videos. Please machine learning and reinforcement learning and MPC with fuzzy logic.

  • @samurock100
    @samurock100 Год назад

    Easy to digest information. Enjoying learning from this account.

  • @matthewchunk3689
    @matthewchunk3689 3 года назад

    Thumbs up and subscribed. Keep up the valuable work 👍

  • @ianzhang862
    @ianzhang862 3 года назад +4

    Thank you Professor! Are we starting a new video series on machine learning control?

  • @motbus3
    @motbus3 3 года назад +1

    Really nice exposé video!
    as always, your explanations are fantastic!
    do you think you can give some practical examples with code too ?

  • @comvnche
    @comvnche 3 года назад

    Great videos! Thanks. The only thing I think could be improved, is the sound.

  • @apurvdhir7062
    @apurvdhir7062 3 года назад +1

    Someday I would love to attend your lectures live.:))))

  • @faqeerhasnain
    @faqeerhasnain 11 месяцев назад

    Great Insight. Thank you

  • @hoaxuan7074
    @hoaxuan7074 3 года назад +2

    There were only around 55000 transistors in the quite useful Z80 CPU that was already available in the late 1970s. And certainly that would have been enough for a specialized fast Hadamard transform chip and possibly even a fast neural network chip based on that transform. Lost oppertunities to do things early. Certain realizations about randomization, distribution and dot products could also have made associative memory a thing in general use today. Allowing efficient experimentation with external memory blocks for neural nets.
    Also, even if it is not very popular, I want to mention training by evolution rather than back-propagation. The continuous gray code optimization algorithm works rather well. You can split the training over many cores very easily and it appears you get much better generalization. Which is perhaps due to the full training set being used in its entirety rather than batches. Obviously a negative point is it will work far better with fast networks. However the easy distribution over (say cheap ARM) cores offsets some of the pain. Also some problems are better framed using evolution.

  • @TheAIEpiphany
    @TheAIEpiphany 3 года назад

    Hey Steve! Nice video! Could you please link the things you mentioned in the description?

  • @evanparshall1323
    @evanparshall1323 3 года назад +2

    Steve I would love to see an example of how to implement reiinforcement learning! Discrete control and z-transform videos would also be great!

  • @skeletonrowdie1768
    @skeletonrowdie1768 2 года назад

    lol you're the eigensteve. All other steve's are just a linear combination of your properties XD. love it.

  • @HarrierBr
    @HarrierBr 3 года назад +1

    Very good material. Thank you

  • @mananpatel8025
    @mananpatel8025 2 года назад

    This man is just awesome!

  • @chanochbaranes6002
    @chanochbaranes6002 3 года назад

    Your video's are amazing

  • @vasylcf
    @vasylcf 3 года назад

    Thanks for video

  • @medomed1105
    @medomed1105 3 года назад

    Can you complete this RL series with your great explaination?

  • @mbonuchinedu2420
    @mbonuchinedu2420 Год назад

    Thanks steve.

  • @justaqmal2379
    @justaqmal2379 3 года назад

    nice video! well done!

  • @ashrafbeshtawi3556
    @ashrafbeshtawi3556 3 года назад

    Amazing video

  • @TheKhakPakflyest
    @TheKhakPakflyest 3 года назад

    WE LOVE YOU STEVE!!!

  • @herb.420
    @herb.420 11 месяцев назад

    I think the issue stems from the lack of understanding from the bot to recognize the dynamics and nonlinearities in the state and not being able to apply these predictive dynamic patterns to other environments. Using SINDy and Q-learning in conjunction may provide a solution to an adaptive model. The issue of course is understanding how to apply the learned dynamics of the system to the Q-values in other models. In the case of tic-tac-toe to say connect four would be alligning inputs and recognizing opponents position maneuvering around them and needing to learn the new vertically structure. The dynamics of needing to allign pieces would already be learned but the new upright vertically stacking enviroment would need to be learned. Some rules of certain game elements would obviously be more communacative for the bot

    • @herb.420
      @herb.420 11 месяцев назад

      There needs to be some way for the bot to map what it knows can already be controlled with what is observed and recognize the area that needs to be explored, maybe with classical control systems or something BMR

  • @tiberium87
    @tiberium87 Год назад

    I love that your sequence for bringing up children is: Tic-Tac-Toe --> Checkers --> Chess --> ready for the Real World.

  • @keleviolet
    @keleviolet 3 года назад

    How did you run record the videos that show you after the transparent slides of your presentations?

  • @rforrohitarorasirclasses6500
    @rforrohitarorasirclasses6500 3 года назад

    Very helpful

  • @wydadiyoun
    @wydadiyoun 3 года назад +1

    I was expecting more technical details! nice video though

  • @therocketmanprince682
    @therocketmanprince682 3 года назад

    Amazing

  • @JousefM
    @JousefM 3 года назад +2

    Thumbs up Steve! :)

  • @amiralizadeh6621
    @amiralizadeh6621 Год назад

    what is the expectation in the value function represent? over what random variable?

  • @rawanahmad5905
    @rawanahmad5905 3 года назад

    I know this is not related to this topic, but can you please explain the sliding mode control

  • @turhancan97
    @turhancan97 Год назад +2

    I wonder if Steve, after seeing the current major advances in AI such as GPT-4 or Stable Diffusion, still thinks that general AI is a problem that will be solved hundreds of years from now?

  • @billfero
    @billfero 3 года назад +1

    Refering to your point about a trained agent only being good at the game they were trained on. Are they actually learning, or simply memorizing?

    • @Diego0wnz
      @Diego0wnz 3 года назад

      They are “learning” by estimating how good a state is. Thus it can estimate unobserved states based on previously seen data

  • @amiralizadeh6621
    @amiralizadeh6621 Год назад

    thank you for your amazing lecture. I am confused about the policy. if it's a neural network that gets states as input and commits an action to the environment, why do you represent it as pi which is a function of a state and an ACTION? is action an input or an output for the agent?

    • @zanzi_
      @zanzi_ Год назад

      Its confusing but the action generated by the Neural Network will lead to a new state that then will be used by the Neural Network again. So in a way you could say that it depends on an action too, except the very first state which wouldnt have any action that led to it, unless you implicitly define a "do nothing" action.

  • @herstar9510
    @herstar9510 2 года назад

    It takes a lot of effort to workout what we can use this for and how.

  • @tamirtsogbayar3912
    @tamirtsogbayar3912 2 месяца назад

    Hello Steve
    i appreciate your videos.
    is it possible to create video about architecture that combines RL and Supervised learning. like the method google used in Alpha zero Alpha mu ect gaming bot. please ?
    thank you

  • @alfredomaussa
    @alfredomaussa 3 года назад

    is "Real general artificial intelligence" hard to get, mainly because computer resources or reinforcement learning techniques?
    Which world should get it faster in years?:
    1) we have super computers, but human can't model "real intelligence" yet.
    2) we have powerful scalable reforcement learning tecniques but the actual computers can't run it.

  • @user-mh5hr4cb3o
    @user-mh5hr4cb3o 2 года назад +2

    Hi prof. Can you please tell me what is the difference between the MPC and RL?

    • @skeletonrowdie1768
      @skeletonrowdie1768 2 года назад

      From what I gather, MPC is a deep learning algortihm that extends RL to better predict the state of a sequence of actions. Quora has some answers, which do challenge each other but will give you an idea. www.quora.com/What-is-the-difference-between-Machine-Learning-and-Model-Predictive-Control

  • @hamidkhb3903
    @hamidkhb3903 2 года назад

    Please add the suggested sources in the comments

  • @flopyarcade4407
    @flopyarcade4407 3 года назад +1

    As always: almost uncanny production value. Will certainly recommend to others. I'd love if you'd follow up on this and get more technical, take us by the hand and explore the content in the niches and pockets within the field - maybe driven by curiosity? :p

  • @justinwhite2725
    @justinwhite2725 3 года назад +4

    @6:59 "only a small percentage of humans learn"... I thought that was the whole point of the game...

  • @evandelaney1423
    @evandelaney1423 2 года назад

    @12:44 - "because they are extremely expressive" ... should that be "expensive" instead of "expressive"?

  • @self-drivingscientist512
    @self-drivingscientist512 7 месяцев назад

    Very interesting video! I had a couple of questions related to physical systems. Is there a way to quantify the robustness of such a controller to disturbances? Also, is there a way to “tune” the performance of such a controller without having to re-run the entire training step? Thanks!

  • @veloenoir1507
    @veloenoir1507 3 года назад

    Does Transfer Learning not translate into Reinforcement learning?

  • @XecutionStyle
    @XecutionStyle 3 года назад

    If expert knowledge is inherently limiting how can we ever reach AGI?

  • @balazskecskemeti
    @balazskecskemeti 3 года назад

    I wonder if humans are really that good at transfer learning or is there something else at play.
    Transfer learning isn't the only explanation of the fact that humans can learn new things from just a few examples.
    I mean it could be that we are strongly pre-wired for certain tasks like language or movement, so in reality what we see as "learning" could just be the finishing touch on something that is mostly already within us.
    I am no behavioral scientist but I would be curious what an expert in human learning would say.

    • @MrHaggyy
      @MrHaggyy Год назад +1

      My gut says that neural nets are extremely good at "training" but i doubt it's all to "learning".
      Movements for example are really difficult. You can show or tell someone a movement and many can pull it off in a view attempts. But to master it takes years of training.
      I think we use biological neural nets to do all the basic stuff.
      But there is something else that let us design, evaluate and test a cost function, while we are doing something for the first time. There is something that lets us use informationen from others that let us discard most possible things and train in a much smaller solution space. Also i think we have a pretty good gut feeling if we are "under or over fitted" even without a testset.

    • @Thyme-on_your_sidedish
      @Thyme-on_your_sidedish Год назад

      I took 6 semesters of Spanish, was married to a native Mexican ( aka illegal alien) spen a great deal of time in Mexico and barely know a few phrases.

  • @kevincortez6227
    @kevincortez6227 2 года назад

    I really wish math notation was better. "Just grab whatever symbol from the greek alphabet and use it for whatever variable you want". rely on the reader to know what the asinine mapping of variables to greek letters stand for.
    nice video though.

  • @SamuelOrjiM
    @SamuelOrjiM 3 года назад

    Steve senpai!!!
    Uploaded. Is it possible to set up real time data sharing between two basian reinforcement learning Algorithms to explore the same data space.

    • @SamuelOrjiM
      @SamuelOrjiM 3 года назад

      Is there a stack for. This? Can I get a grant for this😏😔

  • @judgeomega
    @judgeomega 3 года назад

    im not so sure about your claim that transfer learning/general intelligence is 100 years off.
    i think all it will take is a single key insight (which might come in months... or even has already been made and just hasnt gotten into the right hands yet) into model generation to really make a breakthrough into AGI.

  • @rforrohitarorasirclasses6500
    @rforrohitarorasirclasses6500 3 года назад

    Nixe

  • @PhilTomson
    @PhilTomson 3 года назад

    Do humans really take what they learn from playing Go to make them better at some other non-Go related task?

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 3 года назад

    GPT-3 Tranformers can generalize and think abstractly. I taught an agent how to apply a dialectical analysis to any given political or philosophical question and extract the Marxist position on it -- and it did it well. It can apply the method of dialectical deconstruction to ANY topic, ask the right questions, uncover the contradictions and arrive at a dynamic understanding of the topic. Transformers are the future

  • @alfatti1603
    @alfatti1603 3 года назад

    Hi Steve, come to clubhouse please!

  • @somethingnew7538
    @somethingnew7538 3 года назад +3

    "Long story short", starts from 19:00

  • @user-gp8fr1nd3w
    @user-gp8fr1nd3w 2 года назад

    So your videos of google’s robots walking is actually based on evolutionary algorithms not RL…

  • @satriomuhammad8380
    @satriomuhammad8380 Год назад

    ahh.. so that's where the Neural Network is implemented

  • @merv893
    @merv893 Год назад

    Why do want these to transfer to other games, so they can beat us at everything.

  • @JasonWee
    @JasonWee 2 месяца назад

    would be awesome if the talk is spent on the ML coding stuff.... talking on the news is boring where it is reported almost everywhere ...

  • @andrewalbright622
    @andrewalbright622 3 года назад

    I'll be honest, this was less informative than the last RL video. Definitely took the right direction working towards implementing Neural Networks into RL, but didn't really explain a whole lot. Title could have been "What has Been Accomplished with Neural Networks for Deep RL".

  • @andyl9900
    @andyl9900 3 года назад

    No one knows when AGI will happen, 10, 20, 30 years. Giving a falsified timeline like this is harmful to the public.

  • @subucn1
    @subucn1 Год назад

    not impressed with this. very little information. mostly says “this is interesting” or “this paper is interesting”