I love your videos! I'm the person who posted that I will be starting grad school in the fall at the age of 55. I registered for classes at Montclair State University. Combinatorics, numerical analysis, linear algebra. And I'll be a TA. I am looking forward to being a graduate student in mathematics!
The third book at @0:22 is "Meshfree Approximation Methods with MATLAB by Gregory Fasshauer. A good resource for Radial Basis Functions and similar topics..
Yeah it really is. I find Fasshauer does a great job at explaining the topic. Wendland goes into more of the theory, if you want to deeper. I met Fasshauer at a conference last year. Great guy
I am missing something important at around 13:39 in the video. I understand the line h_i(t)= , which uses the reproducing kernel to evaluate h_i at time t. The inner product must be the inner product for H for this to work. The next line looks like the standard property of the inner product, i.e. the complex conjugate of the inner product with entries swapped. What confuses me is the next line, which seems to expand out the definition of the inner product, but rather than the inner product for H, it looks like the inner product for L^2 and I can't figure out why that is. Was there an adjoint lurking about somewhere (rather than a property of the inner product)? How do I see it? .
It’s not an inner product. Each h_i represents some functional through the inner product. That expansion is the functional that h_i represents being applied to the kernel function.
@@JoelRosenfeld Thanks for your response! It finally sunk in. I didn't understand why you were swapping the entries, but it was to make the entries of the inner product match what was used with the Riesz Representation theorem a few lines above. In the next part of the video you do a numerical example where you generate a set of basis functions that are representations of the moments functionals, and then project function f onto that basis functions. By my understanding, the inner product used in your normal equations is the inner product for H, associated with the RBF kernel (defined at 9:00)? Is there an intuitive relationship between the best approximation using the norm associated with that inner product compared to, say, L^2? (I haven't watched your best approximation video, perhaps that question is answered there?)
The “best” approximation depends on the selection of the inner product and Hilbert space. In the best approximation video for L^2 we started with a basis, polynomials, then we selected a space where they reside and computed the weights for the best approximation in that setting. If we change the Hilbert space, then we need to find new weights. However, there is a difficulty that can arise. Perhaps, it’s not so easy to actually compute the inner product for that basis in that particular Hilbert space. This approach avoids that because we take the measurements and THEN select the basis. So we end up with these h’s rather than polynomials. The advantage here is that we never actually have to compute an inner product, we just leverage the Riesz theorem to dodge around it. What I’m setting up here is the Representer Theorem, which we will get to down the line (maybe 5 videos from now?). There it turns out that the functions you obtain from the Riesz theorem are the best basis functions to choose for a regularized regression problem. This was a result of Wabha back in the (80s?)
sorry to be that guy, but is this appliable to a finite number of measurements, say, R^n->R measurements? I've been studying functional analysis since last week, and I skim ahead to know what I'm getting to. I read real analysis up to hilbert spaces which was very brief. But my final objective is in optimization and machine learning, I moved from the math department to the CS department of college, but I want to implement advanced math into my formation as I like analysis a lot which is why I like your channel. I was reading Rudin FA, very fun first chapter until I hit completeness, which was very, very boring and unmotivated, admittedly I skimmed it and went to the next two chapters that are much more interesting IMO. But then at some point it hit me, isn't this way too general? Why am I even considering topologies where the local base could even be not convex if data comes from finite dimensional spaces? shouldn't I be already moving into optimization and machine learning after linear algebra and basic analysis (like PMA)? I want to keep going and study applied Hilbert spaces and analysis like you do at some point, but for the time being I'm trying to get the best use of my time to get something going in the CS department. I'm gonna write an undergraduate thesis and I wanted to put some of my math background, at first I thought just linear algebra and basic analysis, then I got ambitious, learned measure theory, L^p spaces and basic hilbert space theory, but since I'm reading Rudin (I like his style), I'm blindfolded to the applications. Love your videos man
@@gustavoholo1007 the objective of approximation theory is to determine how close to an objective function you are when estimating with a finite number of data points. The representer theorem tells us, within a RKHS, how to get the best possible approximation with a finite amount of data. Does this get us close to our objective function? Maybe, but we need additional assumptions, which are often context dependent. You always need additional assumptions to guarantee convergence, in classical numerical analysis it is usually control on high order derivatives. In this case it depends on the Hilbert space norm. How all this hangs together is mathematics, and if you learn more functional analysis, then you’ll be more flexible than your peers. In my research, I have found a lot of innovations and proofs in unexpected places that others missed. But if you only want implementations of existing tools, then you’ll probably be fine with what you have.
@@JoelRosenfeld I don't really want to just implement what already exists (I find it rather trivial). What I know will be enough for understanding the literature for my thesis, but I have quite a lot of free time. So I want to ask you a few things. I want to find relations between mathematical analysis and machine learning. I've been surveying the literature and doubting if my approach is okay. 1. Which books or concept should I look for? I'm reading rudin's trilogy to get foundations, I am currently at functional analysis Banach spaces and want to get to his Hilbert spaces chapter (RCA was lacking in Hilbert space theory). The applications are rather thin, these books are from before the machine learning boom. 2. One of my practical concerns is overfitting, it seems that these mathematical methods try to fit data maybe too well. How effective is regularization on these methods? 3. Efficiency and quality. Apart from being cool AF and interesting, do they offer something in efficiency or quality of the results? Thank you for answering so early, you're some kind of idol to me. I have found NO ONE in my college (including professors) that has the same interests as me, but you do
@@JoelRosenfeld I want to be an interdisciplinary mathematician that leverages mathematical analysis concepts to machine learning, optimization and control theory. You seem like a hero to me with the content you do. What references or concepts do you recommend me I study? As I said earlier, I've been following Rudin's trilogy as I love his style, but after being done with the theoretical basics I am lost as to where to go next, I'm not in a college where the mathematics department does anything related to advanced machine learning nor anything interdisciplinary for that matter. And in CS I'm about the only person that knows advanced math. I want to apply math to CS problems, specially through mathematical analysis. Thank you for your answer. I wish I had a professor like you
@ it’s hard to say and really depends on what you want. I got my PhD in operator theory and functional analysis. My knowledge back then was pretty nil. I picked all this extra machine learning stuff when I was a postdoc in engineering departments. I found that having a solid foundation in fundamental math was really helpful. For control theory, I studied Nonlinear Systems Theory by Khalil. Machine Learning is harder to pin down, and my knowledge stems from a lot of diverse places. Wendland’s Scattered Data Approximation is an excellent reference for approximation theory. Bishops Patter Recognition is a bible. And Support Vector Machines by Steinwart and Christmann is excellent.
Kirsch’s An Introduction to the Mathematical Theory of Inverse Problems goes deep into basically the problems that NN try to solve, and connects it to operator theory.
@@JoelRosenfeld Putting real subtitles on your videos would solve the issue. It's pretty easy these days, just put a transcript in and youtube will match up the times for you. It's probably even quicker to start with the automatic transcription and just fix the errors.
@@HEHEHEIAMASUPAHSTARSAGA in the past the transcripts were pretty bad that were produced by RUclips. Premiere has a new AI feature that is actually pretty good at catching math terminology. I’ll give it some thought. Just takes more time
Ayo the legend keeps on giving
I love your videos! I'm the person who posted that I will be starting grad school in the fall at the age of 55. I registered for classes at Montclair State University. Combinatorics, numerical analysis, linear algebra. And I'll be a TA. I am looking forward to being a graduate student in mathematics!
That’s awesome! It sounds like you have a fun schedule too. Congrats and let me know how it goes!
The third book at @0:22 is "Meshfree Approximation Methods with MATLAB by Gregory Fasshauer. A good resource for Radial Basis Functions and similar topics..
Yeah it really is. I find Fasshauer does a great job at explaining the topic. Wendland goes into more of the theory, if you want to deeper.
I met Fasshauer at a conference last year. Great guy
0:44 "if we have the time... and space" lol
I am missing something important at around 13:39 in the video. I understand the line h_i(t)= , which uses the reproducing kernel to evaluate h_i at time t. The inner product must be the inner product for H for this to work. The next line looks like the standard property of the inner product, i.e. the complex conjugate of the inner product with entries swapped. What confuses me is the next line, which seems to expand out the definition of the inner product, but rather than the inner product for H, it looks like the inner product for L^2 and I can't figure out why that is. Was there an adjoint lurking about somewhere (rather than a property of the inner product)? How do I see it?
.
It’s not an inner product. Each h_i represents some functional through the inner product. That expansion is the functional that h_i represents being applied to the kernel function.
@@JoelRosenfeld Thanks for your response! It finally sunk in. I didn't understand why you were swapping the entries, but it was to make the entries of the inner product match what was used with the Riesz Representation theorem a few lines above.
In the next part of the video you do a numerical example where you generate a set of basis functions that are representations of the moments functionals, and then project function f onto that basis functions. By my understanding, the inner product used in your normal equations is the inner product for H, associated with the RBF kernel (defined at 9:00)? Is there an intuitive relationship between the best approximation using the norm associated with that inner product compared to, say, L^2? (I haven't watched your best approximation video, perhaps that question is answered there?)
The “best” approximation depends on the selection of the inner product and Hilbert space. In the best approximation video for L^2 we started with a basis, polynomials, then we selected a space where they reside and computed the weights for the best approximation in that setting. If we change the Hilbert space, then we need to find new weights.
However, there is a difficulty that can arise. Perhaps, it’s not so easy to actually compute the inner product for that basis in that particular Hilbert space. This approach avoids that because we take the measurements and THEN select the basis. So we end up with these h’s rather than polynomials. The advantage here is that we never actually have to compute an inner product, we just leverage the Riesz theorem to dodge around it.
What I’m setting up here is the Representer Theorem, which we will get to down the line (maybe 5 videos from now?). There it turns out that the functions you obtain from the Riesz theorem are the best basis functions to choose for a regularized regression problem. This was a result of Wabha back in the (80s?)
sorry to be that guy, but is this appliable to a finite number of measurements, say, R^n->R measurements?
I've been studying functional analysis since last week, and I skim ahead to know what I'm getting to. I read real analysis up to hilbert spaces which was very brief.
But my final objective is in optimization and machine learning, I moved from the math department to the CS department of college, but I want to implement advanced math into my formation as I like analysis a lot which is why I like your channel.
I was reading Rudin FA, very fun first chapter until I hit completeness, which was very, very boring and unmotivated, admittedly I skimmed it and went to the next two chapters that are much more interesting IMO.
But then at some point it hit me, isn't this way too general? Why am I even considering topologies where the local base could even be not convex if data comes from finite dimensional spaces? shouldn't I be already moving into optimization and machine learning after linear algebra and basic analysis (like PMA)?
I want to keep going and study applied Hilbert spaces and analysis like you do at some point, but for the time being I'm trying to get the best use of my time to get something going in the CS department. I'm gonna write an undergraduate thesis and I wanted to put some of my math background, at first I thought just linear algebra and basic analysis, then I got ambitious, learned measure theory, L^p spaces and basic hilbert space theory, but since I'm reading Rudin (I like his style), I'm blindfolded to the applications.
Love your videos man
@@gustavoholo1007 the objective of approximation theory is to determine how close to an objective function you are when estimating with a finite number of data points. The representer theorem tells us, within a RKHS, how to get the best possible approximation with a finite amount of data. Does this get us close to our objective function? Maybe, but we need additional assumptions, which are often context dependent.
You always need additional assumptions to guarantee convergence, in classical numerical analysis it is usually control on high order derivatives. In this case it depends on the Hilbert space norm.
How all this hangs together is mathematics, and if you learn more functional analysis, then you’ll be more flexible than your peers. In my research, I have found a lot of innovations and proofs in unexpected places that others missed. But if you only want implementations of existing tools, then you’ll probably be fine with what you have.
@@JoelRosenfeld I don't really want to just implement what already exists (I find it rather trivial). What I know will be enough for understanding the literature for my thesis, but I have quite a lot of free time. So I want to ask you a few things. I want to find relations between mathematical analysis and machine learning. I've been surveying the literature and doubting if my approach is okay.
1. Which books or concept should I look for? I'm reading rudin's trilogy to get foundations, I am currently at functional analysis Banach spaces and want to get to his Hilbert spaces chapter (RCA was lacking in Hilbert space theory). The applications are rather thin, these books are from before the machine learning boom.
2. One of my practical concerns is overfitting, it seems that these mathematical methods try to fit data maybe too well. How effective is regularization on these methods?
3. Efficiency and quality. Apart from being cool AF and interesting, do they offer something in efficiency or quality of the results?
Thank you for answering so early, you're some kind of idol to me. I have found NO ONE in my college (including professors) that has the same interests as me, but you do
@@JoelRosenfeld I want to be an interdisciplinary mathematician that leverages mathematical analysis concepts to machine learning, optimization and control theory. You seem like a hero to me with the content you do. What references or concepts do you recommend me I study? As I said earlier, I've been following Rudin's trilogy as I love his style, but after being done with the theoretical basics I am lost as to where to go next, I'm not in a college where the mathematics department does anything related to advanced machine learning nor anything interdisciplinary for that matter. And in CS I'm about the only person that knows advanced math. I want to apply math to CS problems, specially through mathematical analysis. Thank you for your answer. I wish I had a professor like you
@ it’s hard to say and really depends on what you want. I got my PhD in operator theory and functional analysis. My knowledge back then was pretty nil. I picked all this extra machine learning stuff when I was a postdoc in engineering departments. I found that having a solid foundation in fundamental math was really helpful.
For control theory, I studied Nonlinear Systems Theory by Khalil.
Machine Learning is harder to pin down, and my knowledge stems from a lot of diverse places. Wendland’s Scattered Data Approximation is an excellent reference for approximation theory. Bishops Patter Recognition is a bible. And Support Vector Machines by Steinwart and Christmann is excellent.
Kirsch’s An Introduction to the Mathematical Theory of Inverse Problems goes deep into basically the problems that NN try to solve, and connects it to operator theory.
Very interesting video, but can you talk a little slower? Often it is not clear what words you are pronouncing, in particular for theorems.
Man, just change to configurations of the video to watch it slower.
Sorry if I talk too fast. I’ll work on it
@@JoelRosenfeld you don't. You talk just fine.
@@JoelRosenfeld Putting real subtitles on your videos would solve the issue. It's pretty easy these days, just put a transcript in and youtube will match up the times for you. It's probably even quicker to start with the automatic transcription and just fix the errors.
@@HEHEHEIAMASUPAHSTARSAGA in the past the transcripts were pretty bad that were produced by RUclips. Premiere has a new AI feature that is actually pretty good at catching math terminology. I’ll give it some thought. Just takes more time
you really need to improve your communication skills. This is a terrible exposition..
@@jfndfiunskj5299 I’m always open to input. What could I change to improve it?