What is a BEST approximation? (Theory of Machine Learning)

Поделиться
HTML-код
  • Опубликовано: 7 янв 2025

Комментарии • 25

  • @mohammedbelgoumri
    @mohammedbelgoumri 9 месяцев назад +9

    I'm doing a PhD in ML and was recently getting into functional analysis and measure theory. Great to see this kind of content at this specific time 😅

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад +5

      I’m glad I can be of service! Are there any topics you are wanting to learn more about in particular?

  • @DistortedV12
    @DistortedV12 9 месяцев назад +4

    This content is what I need as a grad student studying machine learning

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад +3

      I’m glad I can help! The next several videos will talk about the Riesz theorem, kernel spaces, and the representer theorem. Then we’ll stop and discuss SVMs and VC dimensions. Then back to more Hilbert space stuff, like Fourier series, bases, etc

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад

      Tell all your friends, lol!

  • @octaviusp
    @octaviusp 6 месяцев назад +1

    do you have any machine learning theory courses that follows this series of videos? I would happy to see a Machine learning theory course from you.

    • @JoelRosenfeld
      @JoelRosenfeld  6 месяцев назад

      Not a formal course yet. It’s something I plan on doing in the future. For now, I am working on a sort of complete series here on RUclips

  • @JoelRosenfeld
    @JoelRosenfeld  9 месяцев назад +3

    Ack! Left that note for myself in the video at 0:49. Whoops!

  • @noahsegal1334
    @noahsegal1334 9 месяцев назад +2

    How does this method work on extrapolation? You had defined a domain for the function but if you then go outside the polynomial is likely bo where near function since it is a large degree polynomial. I think the method is really cool and think the moment matching is a neat probabilistic tie. What is the method like for when your domain is really large?

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад +2

      Threw together this short to answer your question: ruclips.net/user/shortsMDrjOcmWQA8

  • @tux1968
    @tux1968 8 месяцев назад +3

    Thanks for the video, it is helpful! One small nitpick, the background music obscured your voice, and was distracting.

    • @JoelRosenfeld
      @JoelRosenfeld  8 месяцев назад +1

      I'm sorry you found it distracting. I try to turn it off before we get to any real details. Which part of the video did you find it the worst?

  • @cunningham.s_law
    @cunningham.s_law 9 месяцев назад +1

    you should reduce the saturation
    maybe get a better display to see the actual colors

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад

      Interesting, I'll give it some thought. No one has ever mentioned it before.

  • @smolboye1878
    @smolboye1878 9 месяцев назад +1

    This is amazing.

  • @ckq
    @ckq 9 месяцев назад +1

    New to this content and I'm not that strong with matrices, but
    essentially this is just polynomial regression.
    Not sure why the weierstrauss thing is so bad, even a piecewise linear function would do much better

    • @ckq
      @ckq 9 месяцев назад +2

      Interestingly ~6 years ago when I was a kid and obsessed with memorizing sin(x) I tried to come up with the best order ~3 polynomial for approximating sin from 0 to 45 degrees that I could compute in my head

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад

      Yeah, when the Weierstrass approximation method was first conceived, he was much more concerned with showing that it CAN work, rather than work well. And this version by Rudin I think even works better, but I'd have to go double check that to be sure.
      This is polynomial regression, but where we are using moments as the data rather than point samples. We will get into regression methods a lot more in future videos. This was an easy lift for where my Hilbert space/Machine Learning course is right now.

  • @skillick
    @skillick 9 месяцев назад +2

    Great video! Thanks for putting it together.
    If I may, I’d like to suggest slowing down a bit, you sound out of breath! We have plenty of time to learn.
    Thanks again.

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад +1

      I'm glad you liked the video! I'll work on slowing it down a bit. I honestly had 30 minutes to set up and record the first half.

  • @KitagumaIgen
    @KitagumaIgen 9 месяцев назад

    Well that's not the BEST best approximation now, is it? For that you ought to take into account how the approximation-method behaves in the face of noisy data (function-values) and then one has to balance the accuracy of the estimate against overfitting to noise. The best methods for that are the AIC/BIC model-selection methods...

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад +1

      It gets its name from arising from a projection in a closed subspace of a Hilbert space. It's the best according to the Hilbert space metric.
      I'd wager that this actually will still do OK in the face of noisy data. This is because the moments arise as integrals rather than point samples, so mean zero white noise will likely be integrated away.
      But sure, there are better ways to do this. I'm building up a Hilbert space course on my channel now, and so this is what we have available to us now at this point in the course.
      I'll look into your suggestion and see if I can integrate it into course later. I appreciate it!

    • @KitagumaIgen
      @KitagumaIgen 9 месяцев назад +1

      @@JoelRosenfeld The level and objective you explain is about what I grasped from the presentation, that I really liked. Looking forward the the next ones.
      Regarding the best fit: In any real situation the integrals convert into Riemann-sums and your polynomials will start to fit to the noise in the data, this is the problem with any sequence of least-squares fit with increasing number of basis-functions spanning the Hilberty space.
      The Aikaike Information Criterion/Bayesian IC adds a cost to the free parameters of the model, that cost increases with the number of parameters, this should give a reasonably best balance between fit and robustness.

    • @JoelRosenfeld
      @JoelRosenfeld  9 месяцев назад

      @@KitagumaIgen Yep, this method will only take us so far. I think closest to what you are speaking about here, that's currently in the works for a future video, is regularized regression methods. Where we penalize the size of the weights. That's about two or three videos out, if I stick to my plan.
      I haven't investigated AIC/BIC methods you are talking about, but I certainly will try to interweave them when I discuss model selection.
      Most of my own experience comes from pure Functional Analysis, Operator Theory, and Dynamical Systems. I'm always happy to learn new things.