I’m glad I can help! The next several videos will talk about the Riesz theorem, kernel spaces, and the representer theorem. Then we’ll stop and discuss SVMs and VC dimensions. Then back to more Hilbert space stuff, like Fourier series, bases, etc
How does this method work on extrapolation? You had defined a domain for the function but if you then go outside the polynomial is likely bo where near function since it is a large degree polynomial. I think the method is really cool and think the moment matching is a neat probabilistic tie. What is the method like for when your domain is really large?
New to this content and I'm not that strong with matrices, but essentially this is just polynomial regression. Not sure why the weierstrauss thing is so bad, even a piecewise linear function would do much better
Interestingly ~6 years ago when I was a kid and obsessed with memorizing sin(x) I tried to come up with the best order ~3 polynomial for approximating sin from 0 to 45 degrees that I could compute in my head
Yeah, when the Weierstrass approximation method was first conceived, he was much more concerned with showing that it CAN work, rather than work well. And this version by Rudin I think even works better, but I'd have to go double check that to be sure. This is polynomial regression, but where we are using moments as the data rather than point samples. We will get into regression methods a lot more in future videos. This was an easy lift for where my Hilbert space/Machine Learning course is right now.
Great video! Thanks for putting it together. If I may, I’d like to suggest slowing down a bit, you sound out of breath! We have plenty of time to learn. Thanks again.
Well that's not the BEST best approximation now, is it? For that you ought to take into account how the approximation-method behaves in the face of noisy data (function-values) and then one has to balance the accuracy of the estimate against overfitting to noise. The best methods for that are the AIC/BIC model-selection methods...
It gets its name from arising from a projection in a closed subspace of a Hilbert space. It's the best according to the Hilbert space metric. I'd wager that this actually will still do OK in the face of noisy data. This is because the moments arise as integrals rather than point samples, so mean zero white noise will likely be integrated away. But sure, there are better ways to do this. I'm building up a Hilbert space course on my channel now, and so this is what we have available to us now at this point in the course. I'll look into your suggestion and see if I can integrate it into course later. I appreciate it!
@@JoelRosenfeld The level and objective you explain is about what I grasped from the presentation, that I really liked. Looking forward the the next ones. Regarding the best fit: In any real situation the integrals convert into Riemann-sums and your polynomials will start to fit to the noise in the data, this is the problem with any sequence of least-squares fit with increasing number of basis-functions spanning the Hilberty space. The Aikaike Information Criterion/Bayesian IC adds a cost to the free parameters of the model, that cost increases with the number of parameters, this should give a reasonably best balance between fit and robustness.
@@KitagumaIgen Yep, this method will only take us so far. I think closest to what you are speaking about here, that's currently in the works for a future video, is regularized regression methods. Where we penalize the size of the weights. That's about two or three videos out, if I stick to my plan. I haven't investigated AIC/BIC methods you are talking about, but I certainly will try to interweave them when I discuss model selection. Most of my own experience comes from pure Functional Analysis, Operator Theory, and Dynamical Systems. I'm always happy to learn new things.
I'm doing a PhD in ML and was recently getting into functional analysis and measure theory. Great to see this kind of content at this specific time 😅
I’m glad I can be of service! Are there any topics you are wanting to learn more about in particular?
This content is what I need as a grad student studying machine learning
I’m glad I can help! The next several videos will talk about the Riesz theorem, kernel spaces, and the representer theorem. Then we’ll stop and discuss SVMs and VC dimensions. Then back to more Hilbert space stuff, like Fourier series, bases, etc
Tell all your friends, lol!
do you have any machine learning theory courses that follows this series of videos? I would happy to see a Machine learning theory course from you.
Not a formal course yet. It’s something I plan on doing in the future. For now, I am working on a sort of complete series here on RUclips
Ack! Left that note for myself in the video at 0:49. Whoops!
How does this method work on extrapolation? You had defined a domain for the function but if you then go outside the polynomial is likely bo where near function since it is a large degree polynomial. I think the method is really cool and think the moment matching is a neat probabilistic tie. What is the method like for when your domain is really large?
Threw together this short to answer your question: ruclips.net/user/shortsMDrjOcmWQA8
Thanks for the video, it is helpful! One small nitpick, the background music obscured your voice, and was distracting.
I'm sorry you found it distracting. I try to turn it off before we get to any real details. Which part of the video did you find it the worst?
you should reduce the saturation
maybe get a better display to see the actual colors
Interesting, I'll give it some thought. No one has ever mentioned it before.
This is amazing.
Thank you!
New to this content and I'm not that strong with matrices, but
essentially this is just polynomial regression.
Not sure why the weierstrauss thing is so bad, even a piecewise linear function would do much better
Interestingly ~6 years ago when I was a kid and obsessed with memorizing sin(x) I tried to come up with the best order ~3 polynomial for approximating sin from 0 to 45 degrees that I could compute in my head
Yeah, when the Weierstrass approximation method was first conceived, he was much more concerned with showing that it CAN work, rather than work well. And this version by Rudin I think even works better, but I'd have to go double check that to be sure.
This is polynomial regression, but where we are using moments as the data rather than point samples. We will get into regression methods a lot more in future videos. This was an easy lift for where my Hilbert space/Machine Learning course is right now.
Great video! Thanks for putting it together.
If I may, I’d like to suggest slowing down a bit, you sound out of breath! We have plenty of time to learn.
Thanks again.
I'm glad you liked the video! I'll work on slowing it down a bit. I honestly had 30 minutes to set up and record the first half.
Well that's not the BEST best approximation now, is it? For that you ought to take into account how the approximation-method behaves in the face of noisy data (function-values) and then one has to balance the accuracy of the estimate against overfitting to noise. The best methods for that are the AIC/BIC model-selection methods...
It gets its name from arising from a projection in a closed subspace of a Hilbert space. It's the best according to the Hilbert space metric.
I'd wager that this actually will still do OK in the face of noisy data. This is because the moments arise as integrals rather than point samples, so mean zero white noise will likely be integrated away.
But sure, there are better ways to do this. I'm building up a Hilbert space course on my channel now, and so this is what we have available to us now at this point in the course.
I'll look into your suggestion and see if I can integrate it into course later. I appreciate it!
@@JoelRosenfeld The level and objective you explain is about what I grasped from the presentation, that I really liked. Looking forward the the next ones.
Regarding the best fit: In any real situation the integrals convert into Riemann-sums and your polynomials will start to fit to the noise in the data, this is the problem with any sequence of least-squares fit with increasing number of basis-functions spanning the Hilberty space.
The Aikaike Information Criterion/Bayesian IC adds a cost to the free parameters of the model, that cost increases with the number of parameters, this should give a reasonably best balance between fit and robustness.
@@KitagumaIgen Yep, this method will only take us so far. I think closest to what you are speaking about here, that's currently in the works for a future video, is regularized regression methods. Where we penalize the size of the weights. That's about two or three videos out, if I stick to my plan.
I haven't investigated AIC/BIC methods you are talking about, but I certainly will try to interweave them when I discuss model selection.
Most of my own experience comes from pure Functional Analysis, Operator Theory, and Dynamical Systems. I'm always happy to learn new things.