Thanks for publishing these videos. I'm more of a programmer than a maths person, but it's really nice to have an idea about what algorithms there are out there to interpret datasets.
A wonderful book! I never saw such a combination of book, video, and codes from the author. Everything is clearly explained. I don't know how to express my gratitude in words!
Since the Covid crisis confined me to home, you have become one of my favorite youtubers. Great succinct explanations with real applicability to problems both abstract and praactical. THANK YOU!!
Loved this. So Sad I discovered this channel so late! Finally, a channel, which doesn't dumb down and help really improve the vigor mathematically as well as conceptually without being daunted by research papers notations and lingo. I request videos on Optimization as a series - how it works in different algorithms across Supervised, Semi, Unsupervised, Reinforcement.
Thank you for crystal clear lecture. And the topic is fantastic because: a) linear model (simplicity) b) interpretability (for the reasons you have clearly explained yourself). I am looking forward for more content and I am ready to buy yet another your book professor
I used LASSO & Elastic Net for a sports betting prediction model this year in college basketball. The LASSO model did better than EN. Thanks for the explanation! It was very timely for me. :)
Thanks for the comment. Yes, I see the confusion. The x-axis label "1/lambda" is not technically correct. It is just a trend that this increases as lambda decreases, but we shouldn't read this literally as 1/lambda. What I mean is that when lambda->0 in the upper right optimization problem, then there is no sparsity penalization and the optimization will return the least squares solution.
Very interesting video, Professor. As you mentioned, the Elastic Net algorithm combines the benefits of the Ridge Regression and the LASSO algorithms. Is there a circumstance in which one would specifically use LASSO, rather than simply always going with Elastic Net? Does Elastic Net require significantly more computation to implement? Are there issues that come with the greater generality of Elastic Net that LASSO doesn't suffer from?
Thank you for this very helpful video. I was looking for a method for sparse regression and directly used pySindy. However, unfortunately, our data is not suited to be interpreted as a dynamical system. Long story short. From the big possible selection of regression techniques - now I have some kind of overview and now SR3 should be the next step.
Most of the stuff (not all but most) this guy is talking about are very cool and his presentation is very good and constructive so a big thank you. But actually what he is talking about are known for pretty long time (extra non-quadratic term in the minimization was discussed by mathematicians even in 19th century and Tibshirani is not the first to discover its consequences, Americans always think that when they find something they are the first people who discovered it) and have little to do why the learning algorithms and data-driven stuff are powerful. What this guy is talking about is actually classical linear algebra put into some nice algorthmic iterations. That is not the center of gravity of the data-driven science. I mean, you have to know this stuff of course and if you studied science in Europe (not USA but Europe) you know this linear algebra and much more (in Italy you have to finish whole books about quadratic forms to pass an undergrad linear algabra exam) by the time you finished the undergraduate. The power lies on probabilistic stuff based fundamentally on theory developed by Soviet mathematicians Vapnik and Chernovenkis that really made the distinction between classical statistical and probabilistic decision theory and what people novadays call AI.
Everything's great here, only thing..the side by side images at 15:00 aren't selling it for me. I get that l1 would be pointy while l2 would be spherical. But you say, and the consensus says, that l2 can intersect at multiple points...yet the image shows a tangent. Are we talking about the not-shown possibility of that blue line cutting through and forming a secant?, but if that's the case then the same could happen for the diamond. This is unclear to me EDIT (20 seconds later lol) : AH! The idea is that the dimensionality of the point of intersection
Why is the SINDy spot not located at the minimum of the test curve? You put it instead at the knee of the Pareto curve. In ML, we usually use cross validation to locate the minimum of the loss function for the test dataset.
Thank you so much for your clear presentations. Have you been working with causal inference? I have been reading the work of Judea Perl, I find it not very accessible. If you have experience with causal inference, it would be great to know about your insights.
Dear Steven, it appears that you reinvented (partially) kernel-based system identification, popularized by Dr. Lennart Ljung as ReLS. Essentially, it uses inv(x*x') instead of Tikhonov diagonal loading, which is as optimal a solution as it can get. Imho, it is all about how to formalize your "physical" knowledge of the system. BTW, the ReLS's FLOPS are orders of magnitude lower than for biased estimation, compressed sensing, LASSO, etc.
Thank you so much, Sir. A very insightful video. Could you please throw some light on how to decide the threshold value of lambda in LASSO Regression? Is it dependent on the number of features? Thanks again, Sir.
Amazing Math visualizations!!! In particular, what software/programming language did you use to create the 3D versions of the Tibshirani plots? (minute 20:00). I think that the intuition behind the Sparsity induced by the L1 norm is much clearer in higher dimensions. It's a shame that we have to stop at 3 dimensions. Still many thanks for the visualization!
Economic models are typically dynamic systems of difference equations not differential equations...is SINDY applicable to difference equations?? If we can discover nonlinear systems that generate economic data, that would be awesome...but I guess interpretability would still be limited...:).
I Like Someone Who Looks Like You I Like To Be Told I Like To Take Care of You I Like To Take My Time I Like To Win I Like You As You Are I Like You, Miss Aberlin
Prof Steve.... Just keep publishing these videos forever :)
If you apply LASSO on lectures of this topic only Steves' videos will survive.
Thanks for publishing these videos. I'm more of a programmer than a maths person, but it's really nice to have an idea about what algorithms there are out there to interpret datasets.
A wonderful book! I never saw such a combination of book, video, and codes from the author. Everything is clearly explained. I don't know how to express my gratitude in words!
Wow - The best visualization of the topic i have seen so far, it's just amazing how the world learn today, virtually from anywhere - online.
These videos are so much better than any lecture that I had at the university!
Since the Covid crisis confined me to home, you have become one of my favorite youtubers. Great succinct explanations with real applicability to problems both abstract and praactical. THANK YOU!!
Fantastic lecture, Steve! Probably my favourite one to date...
Loved this. So Sad I discovered this channel so late! Finally, a channel, which doesn't dumb down and help really improve the vigor mathematically as well as conceptually without being daunted by research papers notations and lingo.
I request videos on Optimization as a series - how it works in different algorithms across Supervised, Semi, Unsupervised, Reinforcement.
My favourite channel of all time. I hope we're going to get videos on Interpretability for machine learning.
Thank you for crystal clear lecture. And the topic is fantastic because: a) linear model (simplicity) b) interpretability (for the reasons you have clearly explained yourself). I am looking forward for more content and I am ready to buy yet another your book professor
Hi Professor Steve, thank you so much ❤️.
This is pure gold...
You make these topics engaging. Thanks.
I used LASSO & Elastic Net for a sports betting prediction model this year in college basketball. The LASSO model did better than EN. Thanks for the explanation! It was very timely for me. :)
Thanks for the clear explanation and ample good examples.
I have learned a lot from your videos, Prof. Brunton. Thank you!
GREAT lecture. Knew most of the content, but had to watch it to the end anyways.
Nice job! Great visuals. Looking forward to seeing more topics! Thanks for putting your content online.
Excellent presentation Steve.
Thank you for always publishing amazing videos!
Excellent, as always. Extremely good content.
my fav one, just keep publishing
Thanks!
Such a great lecture! Deep but enjoyable on a Saturday morning:)Thank you professor.
Hi Prof Brunton, please correct me if I'm wrong: at 25:43 the least square solution is at lambda = 1 not 0 right? Since 1/0 would throw an error.
Thanks for the comment. Yes, I see the confusion. The x-axis label "1/lambda" is not technically correct. It is just a trend that this increases as lambda decreases, but we shouldn't read this literally as 1/lambda. What I mean is that when lambda->0 in the upper right optimization problem, then there is no sparsity penalization and the optimization will return the least squares solution.
Thank you for the great contribution.
Very interesting video, Professor. As you mentioned, the Elastic Net algorithm combines the benefits of the Ridge Regression and the LASSO algorithms. Is there a circumstance in which one would specifically use LASSO, rather than simply always going with Elastic Net? Does Elastic Net require significantly more computation to implement? Are there issues that come with the greater generality of Elastic Net that LASSO doesn't suffer from?
I want to know this as well!
Thanks for this great lecture.
Please make a video explaining ARMAX model estimation method. Thank you.
Thank you for this very helpful video. I was looking for a method for sparse regression and directly used pySindy. However, unfortunately, our data is not suited to be interpreted as a dynamical system. Long story short. From the big possible selection of regression techniques - now I have some kind of overview and now SR3 should be the next step.
just amazing
great lectures!!!!!
many thanks!
Is there a talk on SR3? Sounds really cool! Will check out the paper
Most of the stuff (not all but most) this guy is talking about are very cool and his presentation is very good and constructive so a big thank you. But actually what he is talking about are known for pretty long time (extra non-quadratic term in the minimization was discussed by mathematicians even in 19th century and Tibshirani is not the first to discover its consequences, Americans always think that when they find something they are the first people who discovered it) and have little to do why the learning algorithms and data-driven stuff are powerful. What this guy is talking about is actually classical linear algebra put into some nice algorthmic iterations. That is not the center of gravity of the data-driven science. I mean, you have to know this stuff of course and if you studied science in Europe (not USA but Europe) you know this linear algebra and much more (in Italy you have to finish whole books about quadratic forms to pass an undergrad linear algabra exam) by the time you finished the undergraduate. The power lies on probabilistic stuff based fundamentally on theory developed by Soviet mathematicians Vapnik and Chernovenkis that really made the distinction between classical statistical and probabilistic decision theory and what people novadays call AI.
Everything's great here, only thing..the side by side images at 15:00 aren't selling it for me. I get that l1 would be pointy while l2 would be spherical. But you say, and the consensus says, that l2 can intersect at multiple points...yet the image shows a tangent. Are we talking about the not-shown possibility of that blue line cutting through and forming a secant?, but if that's the case then the same could happen for the diamond. This is unclear to me
EDIT (20 seconds later lol) : AH! The idea is that the dimensionality of the point of intersection
Thank you very much Dr. Steve.
Awesomeness thank you👍
Why is the SINDy spot not located at the minimum of the test curve? You put it instead at the knee of the Pareto curve. In ML, we usually use cross validation to locate the minimum of the loss function for the test dataset.
Thank you so much for your clear presentations. Have you been working with causal inference? I have been reading the work of Judea Perl, I find it not very accessible. If you have experience with causal inference, it would be great to know about your insights.
Dear Steven, it appears that you reinvented (partially) kernel-based system identification, popularized by Dr. Lennart Ljung as ReLS. Essentially, it uses inv(x*x') instead of Tikhonov diagonal loading, which is as optimal a solution as it can get. Imho, it is all about how to formalize your "physical" knowledge of the system. BTW, the ReLS's FLOPS are orders of magnitude lower than for biased estimation, compressed sensing, LASSO, etc.
Thank you so much, Sir. A very insightful video.
Could you please throw some light on how to decide the threshold value of lambda in LASSO Regression? Is it dependent on the number of features?
Thanks again, Sir.
Thank you Prof :)
Thanks for watching!
Amazing Math visualizations!!! In particular, what software/programming language did you use to create the 3D versions of the Tibshirani plots? (minute 20:00).
I think that the intuition behind the Sparsity induced by the L1 norm is much clearer in higher dimensions. It's a shame that we have to stop at 3 dimensions. Still many thanks for the visualization!
Hello Sir
It was great video. Thank you for this.
May you also make video on SISSO
what is the reference paper that connects svm and elastic lasso?
Here is the paper: arxiv.org/abs/1409.1976
Thanks
Hi Steve could we get a lecture on Sgd stochastic gradient decent and Backpropagation!
Sir can u please make a video on restricted isometry property.
can anyone download the book?
Love these videos how it is easy to watch and understand, even on morning coffee ☕
wow!!
What kind of app. Do you use in your videos?
You can have a similar effect with OBS studio. Add a powerpoint presentation with blue background, and use a blue chroma key to make blue transparent.
@@zhanzo thank you!
Economic models are typically dynamic systems of difference equations not differential equations...is SINDY applicable to difference equations?? If we can discover nonlinear systems that generate economic data, that would be awesome...but I guess interpretability would still be limited...:).
Hi. Professor, please tell us how we can support this channel, shall we just buy the book/ you would set up a Patreon account?
I Like Someone Who Looks Like You
I Like To Be Told
I Like To Take Care of You
I Like To Take My Time
I Like To Win
I Like You As You Are
I Like You, Miss Aberlin
I want to do PhD again :)
another comment for algorithm
Thanks for this excellent lecture!