- Видео 41
- Просмотров 40 217
Sanjiban Choudhury
Добавлен 19 июл 2007
Assistant Professor at Cornell CS. Lead the Portal group, where we work on everday robots for everyday users youtube.com/@PortalCornell. Find out more at www.sanjibanchoudhury.com
5 Levels of Robot Learning
This is a sneak preview of the concepts that I am covering in my course on Learning for Robot Decision Making that I am teaching at Cornell, Fall 2022. Learn more at: www.cs.cornell.edu/courses/cs6756/2022fa/
What does learning mean to a robot? This video takes you on a journey through 5 increasingly richer levels of Robot Learning, from the basic simple levels (learning what the human wants you to do) through to the concepts of interactive no-regret learning, the Bellman equation, building a value estimator to a final unified game-theoretic framework (between a robot player and a value player). Much of what we know today in the fields of reinforcement learning, imitation learning, and mod...
What does learning mean to a robot? This video takes you on a journey through 5 increasingly richer levels of Robot Learning, from the basic simple levels (learning what the human wants you to do) through to the concepts of interactive no-regret learning, the Bellman equation, building a value estimator to a final unified game-theoretic framework (between a robot player and a value player). Much of what we know today in the fields of reinforcement learning, imitation learning, and mod...
Просмотров: 11 480
Видео
Lecture 3: Generalized Weighted Majority -- The Most Versatile Algorithm
Просмотров 9072 года назад
In this third lecture, we discuss one of the most powerful algorithms in learning and decision making Generalized Weighted Majority. Given a set of N options, how do you optimally hedge between these options? GWM not only answers this question, but it also provides an algorithmic template that shows up again and again in various fundamental problems in computer science matching learning, optimi...
Lecture 2: Prediction with Expert Advice
Просмотров 6072 года назад
In this second lecture, we look at a simple, fundamental setting of interactive learning - prediction with expert advice. You have a set of N experts that make predictions at each round. How do you combine their predictions so you do as well as the best expert? We discuss a general class of algorithms, weighted majority, that plays the majority vote. Either the majority is right, and no mistake...
Lecture 1: Interactive Online Learning -- One Ring To Rule Them All
Просмотров 4432 года назад
In this series, we will try to understand the fundamental fabric that ties all of robot learning "How can a robot learn from online interactions?" Our quest is to build up a unified mathematical framework that we will wield to conquer recurring problems in reinforcement learning, imitation learning, model predictive control, and planning. Let's begin! For more information about me and my work, ...
Core Concepts: Interactive No-Regret Learning
Просмотров 1,1 тыс.2 года назад
We explore the concept of interactive learning. Robots must interact with the world to gather data on which they learn. The principled way to learn when your data can be changing, possibly adversarially, is by striving to be "no regret", i.e., do as well as the best policy in hindsight. But greedily picking the best policy in hindsight fails, even on the simplest of examples! Join us as we jour...
Core Concepts: Linear Quadratic Regulators
Просмотров 3,5 тыс.3 года назад
We explore the concept of control in robotics, notably Linear Quadratic Regulators (LQR). We see that a powerful way to think about control is as a dynamic optimization, where the goal is to compute a mapping from states to action that minimize a user-specified cost, e.g. land a rocket without exploding. Moreover, you can do this efficiently thanks to Bellman’s insight that the optimal value of...
Core Concepts: Imitation Learning
Просмотров 2,1 тыс.3 года назад
We explore the concept of imitation learning in robotics. We see that imitation learning is a powerful way to implicitly program robots. Instead of tediously tinkering with rules or tuning reward functions, just demonstrate how you would like the robot to behave. But naively treating imitation learning as mere supervised learning, even on the simplest of examples, leads to very interesting fail...
Lecture 1: What is Imitation Learning?
Просмотров 6 тыс.3 года назад
In this series, we will journey into the depths of imitation learning. The purpose of our quest is to answer at a deep, mathematical level a single question: "What does it mean to imitate?" The answer, as we shall see, is simple and profound but there are many twists and turns along the way. Let's begin. For more information about me and my work, check out www.sanjibanchoudhury.com/ 1. Swamy et...
Lecture 10: Imitation Learning Finale: The Beginning
Просмотров 3413 года назад
Over the past 9 lectures, we have journeyed through the depths of imitation learning. We compressed all knowledge down to a single, game-theoretic framework. Armed with just this knowledge, in this series finale, we finally lift off and set our sights on new and distant frontiers. In a sense, we have only just begun. I hope you enjoy this preview of exciting ideas to come. For more information ...
Lecture 9: Imitation Learning -- It's Only A Game!
Просмотров 4823 года назад
In this ninth lecture, we finally look at imitation learning in its most fundamental form as a game. This is a game between two players a learner that generates a policy, and an adversary that discriminates between the values of learner and human expert. We'll see how this simple game-theoretic framework unifies all existing imitation learning algorithms, as well as giving us brand new algorith...
Lecture 8: Imitation Learning as Distribution Matching
Просмотров 7253 года назад
In this eighth lecture, we look at imitation learning as simply a distribution matching problem, i.e., generate trajectories that look like that of the expert. At the heart of the problem lies a question: "What does it mean for two distributions to be close, and how can we measure closeness?". We derive an estimator for an entire class of f-divergence and show that it ultimately reduces to solv...
Lecture 7: Imitation Learning Through a Bayesian Lens
Просмотров 5513 года назад
In this seventh lecture, we look at imitation learning in a Bayesian setting where we have a prior over possible cost functions the human may prefer. We show that the problem, fundamentally on of exploration vs exploitation, is intractable and explore a couple remedies. The first is to simplify the problem down to Bayesian Active learning where we show efficient greedy algorithms can be near-op...
Lecture 6: Inverse Reinforcement Learning -- From Maximum Margin to Maximum Entropy
Просмотров 2,7 тыс.3 года назад
In this sixth lecture, we look at the problem of recovering the underlying reward or cost function that explains human demonstrations. We show that there are two fundamentally different directions. The first is to view the human as an optimal planner and recover a cost function that they must be optimizing. The second is to view the human as a stochastic process and recover the underlying distr...
Lecture 5: Imitation as a Stairway to Self-Improvement
Просмотров 6823 года назад
In this fifth lecture, we look at the role of values in imitation. Not all imitation errors are equal, some have a greater impact on values than others. Providing imitation learning algorithms with values opens the door to algorithms that can actually outperform the human expert in terms of their own values. We climb the staircase of algorithms that bootstrap imitation learning to ultimately so...
Lecture 4: Imitation from Interventions
Просмотров 9453 года назад
In this fourth lecture, we look at a natural way by which humans teach and learn interventions. We show that naively imitating interventions can go horribly wrong. Instead, our key insight is that interventions are laden with implicit information about the human's value function. We take a look at how one may recover the value function from both deterministic and probabilistic paradigms. For mo...
Lecture 3: Interaction in Imitation Learning
Просмотров 1,2 тыс.3 года назад
Lecture 3: Interaction in Imitation Learning
Lecture 2: Feedback in Imitation Learning -- The Three Regimes of Covariate Shift
Просмотров 2,3 тыс.3 года назад
Lecture 2: Feedback in Imitation Learning The Three Regimes of Covariate Shift
Respecting helicopter performance charts for safe flight
Просмотров 775 лет назад
Respecting helicopter performance charts for safe flight
[RSS 2015] Theoretical limits of speed and planning for forest flight
Просмотров 355 лет назад
[RSS 2015] Theoretical limits of speed and planning for forest flight
[ICRA 2015, AHS 2014] Guaranteed Safe Flight of a Full Scale Helicopter
Просмотров 265 лет назад
[ICRA 2015, AHS 2014] Guaranteed Safe Flight of a Full Scale Helicopter
Guaranteed Safe Flight: Simulation of flying in grand canyon
Просмотров 155 лет назад
Guaranteed Safe Flight: Simulation of flying in grand canyon
[AHS 2013] Autonomous Emergency Landing of a Helicopter Pitch
Просмотров 485 лет назад
[AHS 2013] Autonomous Emergency Landing of a Helicopter Pitch
[JFR'19] High Performance and Safe Flight of Full-Scale Helicopters from Takeoff to Landing
Просмотров 565 лет назад
[JFR'19] High Performance and Safe Flight of Full-Scale Helicopters from Takeoff to Landing
[ICRA'15] The Dynamics Projection Filter
Просмотров 225 лет назад
[ICRA'15] The Dynamics Projection Filter
[ICRA'13] RRT*-AR: Sampling-Based Alternate Routes Planning
Просмотров 855 лет назад
[ICRA'13] RRT*-AR: Sampling-Based Alternate Routes Planning
[ICRA'13] SPARTAN: Flying in Robot City
Просмотров 365 лет назад
[ICRA'13] SPARTAN: Flying in Robot City
[ICRA'16] RABIT* : Interleaving local and global search
Просмотров 645 лет назад
[ICRA'16] RABIT* : Interleaving local and global search
Learning to Gather Information via Imitation of Clairvoyant Oracles
Просмотров 145 лет назад
Learning to Gather Information via Imitation of Clairvoyant Oracles
What is this for a shitty video series ? Is this read from a chat gpt script ?
Gonzalez Sharon Anderson Elizabeth Hernandez Anna
sir u teach so great..can you please teach us deep leaning and tranformers multimodal llm too...i really loved your videos
Sir I am a robotics software engineer and want to learn into the field of imitation and reinforcemnent learning for manipulation based robots . do you recommend i should start learning ML first and then learn rl and IL . Can you please suggest me with some resources too if possible
This is so far the best lecture every i heard on imitation learning
Thank you much professor .. its really good start for me to start with . thank you so much for sharng the knowledge
Great
I really enjoyed these lectures. If you ever came back to it, I'd definitely watch the new content. One minor piece of feedback though: the speed of presentation is not really adapted to how much time is needed to digest the material under consideration. Spending a bit more time on the trickier concepts when they are introduced could be helpful
Can you explain the formulas on min 15:19? (All that page)
Very, very well explained!
Helps me a lot. Thank you Sanjiban!
this helped me lot in my research. Thank you Sanjiban Ji
Thank you so much for this insightful explanation of LQR! :)
Great series of lectures and resources. A lot of thanks.
Hi Sanjiban, thank you greatly for the lecture! I have a question at 15:28. As for the first inequality, as long as all possible policies don't incur the same loss value, the equality wouldn't hold. Correct? Also, in the last inequality terms, isn't that simply showing that for any policy the regret is lower-bounded by 0? How can one conclude that at least one policy must be pretty good as written in the lecture note? Thanks.
Would using a neural network-based policy to perform dataset replacement rather than aggregation at each batch of training using standard gradient descent still be considered as a no-regret learner?
Great question! So online gradient descent over a convex loss function is no-regret. Neural networks are, unfortunately, not convex so the theory doesn't hold for them. But the theory does hold for kernels (like RKHS) and there is work that shows deep networks are approximately equivalent to kernel machines (such as arxiv.org/pdf/2012.00152.pdf)
No minions 🤯🤯🤯
This is a great resource Sir! Your way of explanation with the animations is exemplary!
can u make some videos on coding part of Imitation learning as well, cant find anything online!! thanks in advance.
Great suggestion, will definitely try!
I am sorry but does your series include behavioral cloning somewhere ?
It does! Lecture 2 talks about behavior cloning, where it works and where it fails.
@@sanjibanc thank you so much professor. I really appreciate your work ! Thank you from Vietnam :D
@@tuongnguyen9391 Thank you! Of course! I'll put out more this semester as I am teaching www.cs.cornell.edu/courses/cs6756/2023fa/
@@sanjibanc Thank you professor! Looking forward to the lecture videos of this course!
Thank you so much sir , We have only learnt eigen value placement in school this is very intuitive to understand .
This is extremely informative sir , Thank you
5:40 where do the upper bound equations come from? all notes and videos I see, people just bring it up like it's something super obvious
You can get a lot of them by using the Taylor expansion and looking at what happens when you discard some of the terms
Hope entire lecture series was available😃
Great lecture!
amazing video! thanks
Nice lecture, Sanjiban! Thanks for presenting complicated ideas in an intuitive form, it's very helpful and inspiring!!
Brillant video. Helped me a lot!
This channel is a hidden gem :)
Thanks!
Beautifully explained! Thanks!
Love the floating head thing you've got going on
Such a brilliant lecture
Amazing
Super
Well done Sanjiban, congratulations
These lectures are great ! Thank you Sanjiban !
67vo4 vyn.fyi
Decent Improvement Enough? watch?v=iSgR3pSfeTg