Thanks for these videos! Nitpick: The error lines at 3:50 should be vertical as you're talking about regression. (For PCA they would be perpendicular as shown.)
I like how you can view this as reconstructing the missing data on one hand, or filtering out the outliers on the other. Naively, those seem like conceptually very different tasks to me, but I guess they're really not.
Hi I stumbled upon this video randomly but ended up watching all of it! I like the way you explain this complex topic in an easy to understand way without bombarding us with maths. I've always wondered how Netflix and RUclips are so good at reccomendaitons and now I know. they find a solution to an ill posed inverse problem by minimizing rank(L) + abs(S) in a convex minimization regime. I have a question what is the origin of the 'low rank' terminology.
Great content. But aren’t there more effective solutions? There is a whole field called robust statistics; one can for example estimate the covariance matrix using Maronna’s M estimators prior to computing the eigenvectors and eigenvalues
Nice video. It is amazing how RPCA introduces Robustness in front of huge differences. I have a question regarding to your choice of mu. In your code you are choosing mu as mu = n1*n2/(4*sum(abs(X(:)))); where does this expression come from?
Nice video. I like the Netflix and POD examples, and I’ll give it a go in my own DMD work. I think the first example could be better motivated by discussing the difficulties that FaceID is having with identifying masked faces and unlocking phones in this pandemic. There has been suggestions that Apple will bring back Touch ID with the iPhone 13 because of this. Not just cops and robbers but issues with everyday people and technology in their pocket. One question, would you get similar results regularizing by the TV norm? Just a hot take. But I feel you should get similar results.
Maybe I’m getting my wires crossed here. Is the TV norm the same as l1? Coming in to this field from pure operator theory has me consulting glossaries more often then not
@@JoelRosenfeld TV norm is a bit different. But it does have some similarities in how it tries to regularize problems by reducing overfitting. TV regularization made the most sense to me in the context of differentiation (great paper by Chartrand: www.hindawi.com/journals/isrn/2011/164564/)
@@JoelRosenfeld Totally depends... some videos are recorded a while in advance and I just sit on them, and others come out a bit faster... looks like now I am ahead about 2-3 months on most videos, but I have at least one from a year ago. :)
Thank you for the great video. I am very interested in the Netflix example (sounds like a missing value imputation problem) but couldn't find any resources/papers explaining it. I am mostly interested in using RPCA for missing value imputation in time-series. Could you please share some materials on that subject?
Hello Mr. Brunton, please, in your book, "Data Driven Science & Engineering " in page 124, in RPCA Code, in "while" instruction, why you use "count < 1000" ? what is you mean by 1000 ?
You can glean a lot from the video itself. He is behind a glass window of some kind and he has a lapel mic to capture sound. He records everything and then flips the video in post. You can tell that because his part is opposite in his whiteboard videos early on. To ensure legibility he wears a black sweater and has a black background, which is clever. In his book he uses Python and MATLAB. I’m guessing everything is assembled in Adobe Premiere or Final Cut Pro. Though, there are lots of free options out there.
Yep, depends on the field. In applied mathematics and statistics, a decade isn't that long. In computer vision and deep learning, a decade feels like a longer time
How exactly is this algorithm trained? I mean, nowhere in the given calculations it was required to have several observations, one matrix was enough. Why can't we just take a picture and extract the right components from it?
How do you create "allFaces.mat " from the yale database so I can follow along in the book? I got the database, but am not sure how to easily import it to matlab.
Sure, in lots of these least-squares regression algorithms it is possible to add constraints. I think of POD and PCA as being essentially the same algorithmically.
Depends on the field. In applied math, definitely. In computer graphics or deep learning, maybe not. Although, the seminal works from both fields are still important.
Hey this may sound creepy but I looked for the channels you have subscribed in order to find channels I might like. I am basically interested in all kind of fields that require advance math in one way or another. If you have some time then would you answer some of my questiong regarding channels you might recommend. Thanks.
I don't think the L1 regression is actually uniquely defined... you can shift it up and down and as long as the line doesn't cross any data points the norm doesn't increase.
@@vicktorioalhakim3666 Right, I would agree with that, but here aren't we just using L1 regularization of a least squares problem? Or am I missing something?
@@JoelRosenfeld Not sure why you're talking about L1 regularization, as the original poster is talking about L1 *regression*, however L1 regularization is just adding a L1-norm term to the original objective function -> not unique on the Pareto curve.
Thanks! I’m enjoying putting my videos together. :) I mention regularization because I thought that’s how these sparse regression approaches worked. Maybe it’s late and I’m just not connecting the dots right now.
How do I go about improving my mathematical know-how? I can do mathematical operations, but I struggle with high level stuff to intuitively understand it sometimes.
When you do your reading, look up anything you don't know how to do, or any terms that may feel unfamiliar. It's easier to do this in a university course setting, but it's also possible to do when self studying. Don't give up! The more you read and study, the more common certain themes come up. If you don't understand anything, treat that concept as a black box and try to understand at a high level what we are trying to achieve. Then slowly work through the black box until you get to a level that you feel satisfied with :)
Probably linear algebra first, statistics second, and for high-dimensional data, it is related to geometry. And all of the algorithms involve optimization.
"very cool, a little bit alarming, but I'm going to walk you through it." Wait, doesn't that mean he might be admitting he's irresponsible?? Good grief. What does he expect people to think?.. "well he's telling us how to do potentially really bad stuff but that's okay cuz he's also telling us it might be bad."
Lots of powerful technologies have good and bad uses. And the cat's out of the bag with this one... really this is standard linear algebra. But I thought it was important to point out that we should at least be aware of the implications.
1:10 The math behind this is intriguing but if the goal is to build more robust surveillance technology is it really worth it? Sure the money's probably good but is helping build out an Orwellian police state morally sound?
There are good and bad applications of most powerful technologies and algorithms. This is by no means the only application of robust statistics, but is one of the easiest to understand and relate to.
@@Eigensteve All new technology is revolutionary but when humanity discovered nuclear fission it took a long time to reign it in. I suspect AI is in the same predicament: www.vice.com/en/article/y3gjjw/the-nypd-sent-a-creepy-robotic-dog-into-a-bronx-apartment-building
I like how he thanks us at the end of every video when WE should be the ones thanking him.
That's how master differs from ordinary teachers. He treat teaching more as a performance 😉
Need a detailed lecture series on RPCA, you are a gem sir
Thank you for such amazing explanation
Finally people can now distinguish Clark Kent from Superman! I thought it is never gonna happen
Imagine do lot of math to reveal Superman's face while he just use his X-Ray view to undress you in O(1) computational cost
Huge fan Mr Steve.
World need more people like you
Thanks for these videos! Nitpick: The error lines at 3:50 should be vertical as you're talking about regression. (For PCA they would be perpendicular as shown.)
Good catch! Agreed, should be vertical for standard SVD. Updated in my most recent slides :)
i love your explanations, they are so eloquent and fluent! thank you!
You are amazing! Your explanations are impeccable! Thank you!
I like how you can view this as reconstructing the missing data on one hand, or filtering out the outliers on the other. Naively, those seem like conceptually very different tasks to me, but I guess they're really not.
Fluid mechanics is certainly a very interesting topic! Many thanks for share it.
Hi I stumbled upon this video randomly but ended up watching all of it! I like the way you explain this complex topic in an easy to understand way without bombarding us with maths. I've always wondered how Netflix and RUclips are so good at reccomendaitons and now I know. they find a solution to an ill posed inverse problem by minimizing rank(L) + abs(S) in a convex minimization regime. I have a question what is the origin of the 'low rank' terminology.
Glad you liked it!
YAY!! I was waiting for the video on RPCA
I think you are doing a really really great job here!
Thank you so much for this video. It was very eye opening for getting into ML
Brilliant! Would this work with kernel PCA as well?
Good question... I found this interesting NeurIPS paper on this topic: papers.nips.cc/paper/2008/file/8f53295a73878494e9bc8dd6c3c7104f-Paper.pdf
Great content. But aren’t there more effective solutions? There is a whole field called robust statistics; one can for example estimate the covariance matrix using Maronna’s M estimators prior to computing the eigenvectors and eigenvalues
Thanks. Then is there any reason to use regular PCA at all?
Very informative. Great video.
From a math perspective, why the low-rank decomposition can handle the outlier shown at 4:40?
Great video, Sir!
Nice video. It is amazing how RPCA introduces Robustness in front of huge differences. I have a question regarding to your choice of mu. In your code you are choosing mu as mu = n1*n2/(4*sum(abs(X(:)))); where does this expression come from?
Nice video. I like the Netflix and POD examples, and I’ll give it a go in my own DMD work. I think the first example could be better motivated by discussing the difficulties that FaceID is having with identifying masked faces and unlocking phones in this pandemic. There has been suggestions that Apple will bring back Touch ID with the iPhone 13 because of this. Not just cops and robbers but issues with everyday people and technology in their pocket.
One question, would you get similar results regularizing by the TV norm? Just a hot take. But I feel you should get similar results.
Maybe I’m getting my wires crossed here. Is the TV norm the same as l1? Coming in to this field from pure operator theory has me consulting glossaries more often then not
Good point. I actually filmed this before the pandemic ;) who knew that partially masked faces were going to be such a thing!
@@JoelRosenfeld TV norm is a bit different. But it does have some similarities in how it tries to regularize problems by reducing overfitting. TV regularization made the most sense to me in the context of differentiation (great paper by Chartrand: www.hindawi.com/journals/isrn/2011/164564/)
@@Eigensteve I had wondered if that was the case. :) How far ahead do you record these? That’s quite the lead time!
@@JoelRosenfeld Totally depends... some videos are recorded a while in advance and I just sit on them, and others come out a bit faster... looks like now I am ahead about 2-3 months on most videos, but I have at least one from a year ago. :)
Awesome as usual
Thanks a lot for sharing your knowledge.
My pleasure
thanks
I really appreciate your help!
Thank you for the great video. I am very interested in the Netflix example (sounds like a missing value imputation problem) but couldn't find any resources/papers explaining it. I am mostly interested in using RPCA for missing value imputation in time-series. Could you please share some materials on that subject?
Hello Mr. Brunton, please, in your book, "Data Driven Science & Engineering " in page 124, in RPCA Code, in "while" instruction, why you use "count < 1000" ? what is you mean by 1000 ?
Great video as always but eveytims i wanted to know your recording setup and software
You can glean a lot from the video itself. He is behind a glass window of some kind and he has a lapel mic to capture sound. He records everything and then flips the video in post. You can tell that because his part is opposite in his whiteboard videos early on. To ensure legibility he wears a black sweater and has a black background, which is clever. In his book he uses Python and MATLAB. I’m guessing everything is assembled in Adobe Premiere or Final Cut Pro. Though, there are lots of free options out there.
paper from 10y ago is recent? Thanks for these very illuminating series
Yes, a paper from 10 years ago is fairly recent. It takes time for algorithms and methods to be adopted by the greater community.
Yep, depends on the field. In applied mathematics and statistics, a decade isn't that long. In computer vision and deep learning, a decade feels like a longer time
I studied PCA last week. And now this. 😆
Hello Mr. Steve, please, what is the features that RPCA extracted it from image?
Wow great video
Do I understand correctly that this method does not help reduce the data dimension?
Danke great video!
Could this be used to solve sudoku's?. Teach it with lots of completed puzzles and the uncompleted puzzle is just a sparse sampling.
there are more robust ways to solve sudoku :d
Cool idea!
How exactly is this algorithm trained? I mean, nowhere in the given calculations it was required to have several observations, one matrix was enough. Why can't we just take a picture and extract the right components from it?
A single image is not typically a low rank matrix. If I'm not wrong the low rank only makes sense when we have a set of images.
How do you create "allFaces.mat
" from the yale database so I can follow along in the book? I got the database, but am not sure how to easily import it to matlab.
I'm doing POD which is based on PCA. Is their constraint PCA?
Sure, in lots of these least-squares regression algorithms it is possible to add constraints. I think of POD and PCA as being essentially the same algorithmically.
why low rank matrix represent normal data?
I cant download or open tem PDF book. Someone are having the same problem?
Haha, video compression failed due to the salt and pepper noise after 15:40. Not very robust.
That is so cool! Nice meta observation!
Is a 2011 pub, recent? Appreciate video but couldn't help but ask.
Depends on the field. In applied math, definitely. In computer graphics or deep learning, maybe not. Although, the seminal works from both fields are still important.
By 0-norm do you mean the number of non-zero entries? Thanks
Yes. When he talks about ||S||, which should be a *sparse* matrix. The non zero entries act as a loss for the algorithm.
@@skeletonrowdie1768 Yes indeed
Is the L1 norm PCA considered RPCA? In essence, is RPCA a subclass of robust optimization?
kAk∗ + λkEk1 is the convex, can you explain that
Is there any available implementation in python? Kind regards.
Thanks for the video
Have you done any topological data analysis? It’s very intriguing
Hey this may sound creepy but I looked for the channels you have subscribed in order to find channels I might like. I am basically interested in all kind of fields that require advance math in one way or another. If you have some time then would you answer some of my questiong regarding channels you might recommend. Thanks.
I don't think the L1 regression is actually uniquely defined... you can shift it up and down and as long as the line doesn't cross any data points the norm doesn't increase.
Indeed, L1-norm minimization is not unique, as shown Boyd's book "Convex optimization".
@@vicktorioalhakim3666 Right, I would agree with that, but here aren't we just using L1 regularization of a least squares problem? Or am I missing something?
@@JoelRosenfeld Not sure why you're talking about L1 regularization, as the original poster is talking about L1 *regression*, however L1 regularization is just adding a L1-norm term to the original objective function -> not unique on the Pareto curve.
@@JoelRosenfeld BTW, great videos!
Thanks! I’m enjoying putting my videos together. :) I mention regularization because I thought that’s how these sparse regression approaches worked. Maybe it’s late and I’m just not connecting the dots right now.
How do I go about improving my mathematical know-how? I can do mathematical operations, but I struggle with high level stuff to intuitively understand it sometimes.
When you do your reading, look up anything you don't know how to do, or any terms that may feel unfamiliar. It's easier to do this in a university course setting, but it's also possible to do when self studying. Don't give up! The more you read and study, the more common certain themes come up.
If you don't understand anything, treat that concept as a black box and try to understand at a high level what we are trying to achieve. Then slowly work through the black box until you get to a level that you feel satisfied with :)
What even is that? Calculus? Statistics? Geometry? What do I google If I wanna learn that maths?
Yes
Probably linear algebra first, statistics second, and for high-dimensional data, it is related to geometry. And all of the algorithms involve optimization.
@@Eigensteve Thank You, will start with Liner Algebra.
can i ask what is the brand of black T-shirt?
I am searching for a good quality T-shirt and stick with it
"very cool, a little bit alarming, but I'm going to walk you through it." Wait, doesn't that mean he might be admitting he's irresponsible?? Good grief. What does he expect people to think?.. "well he's telling us how to do potentially really bad stuff but that's okay cuz he's also telling us it might be bad."
Lots of powerful technologies have good and bad uses. And the cat's out of the bag with this one... really this is standard linear algebra. But I thought it was important to point out that we should at least be aware of the implications.
1:10 The math behind this is intriguing but if the goal is to build more robust surveillance technology is it really worth it? Sure the money's probably good but is helping build out an Orwellian police state morally sound?
There are good and bad applications of most powerful technologies and algorithms. This is by no means the only application of robust statistics, but is one of the easiest to understand and relate to.
@@Eigensteve All new technology is revolutionary but when humanity discovered nuclear fission it took a long time to reign it in. I suspect AI is in the same predicament: www.vice.com/en/article/y3gjjw/the-nypd-sent-a-creepy-robotic-dog-into-a-bronx-apartment-building
Someone does not like The Big Lebowski?
Hi Professor you are so handsome that I really enjoy your video like a TV drama!
Could u talking about architecture robot interactive/creative and AI
its like a ship but one person is absurdly fat