how to find the median ... of a function!

Michael Penn

Просмотров 27 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024

Комментарии • 94

@TheEternalVortex42 10 месяцев назад ⁺⁷⁹
g(t) here is calculating the mean absolute error (in statistics), and it's well-known that the median is the value that minimizes MAE. Similarly, the mean minimizes the MSE. So this is actually a pretty common thing to do in statistics.
@ddognine 10 месяцев назад ⁺²
To be more precise, the mean is the value that minimizes the sum of the square errors while the median is the value that minimizes the sum of the absolute errors.
@baerlauchstal 10 месяцев назад ⁺³
I was going to say, if we perform the same calculation using not the integral of |t - f(x)| but that of (t-f(x))^2, we get not the median but the mean.
Another definition of the median is that it's the value of t such that the set of x with f(x) < t and the set of x with f(x) > t have the same total length (or, more formally, the same *measure*). To me it's not screamingly obvious at first glance that this is the same as the integral definition, but it starts to make perfect sense when you think about it a bit. The key thought is "Will both raising or lowering the value of t a tiny bit add more to the total area than it removes?"
@pianoman2197 10 месяцев назад
@@ddognine argmin of SSE is the same as argmin of MSE, and argmin of SAE is the same as argmin of MAE, so actually your statement is no more precise than the statement by @TheEternalVortex42.
@adiaphoros6842 10 месяцев назад ⁺¹
@@baerlauchstal What would the analogue for mode be? Also, talking about |t - f(x)| and (t-f(x))^2 reminds me of L-p metrics, so what do we get, if we instead integrate |t - f(x)|^n where n > 2?
@baerlauchstal 10 месяцев назад
@@adiaphoros6842 I'm not an expert by any means, but it feels to me as if a mode of a function would have to be approached something like this.
We're after, in some sense, the most commonly occurring y-value. But the y-value is, in general, continually varying, rather than remaining constant. So I think we might have to choose a point where the rate of this variation is very small: in other words, a y-value whose *neighbourhoods* correspond to the largest possible (that is, maximum-measure) sets of x-values.
So how about this, at least for differentiable f? For a given value of t, calculate all values of x such that f(x) = t; in other words, the set f^(-1)({t}). Then, sum the absolute values of 1/f'(x) at all those points (this sum might be plus infinity, note). Call this function S(t). S corresponds to, loosely speaking, the total size of all the ranges of x-values corresponding to small ranges of y values around y=t (or rather, the total length of the x-ranges divided by the length of the y-range; differentiability of f implies that this ratio either tends to a well-defined limit, or diverges to infinity, as the length of the y-interval tends to zero).
Then the mode is any value of t such that S(t) is locally maximised, including any points where it goes off to infinity (which will be the stationary points of f if it has any).
That could be nonsense, though. It would be nice to get the opinion of someone who knows what they're talking about. (And I'm probably gonna wake up tomorrow in a cold sweat thinking of an obvious blunder in the above.) Also, it requires differentiability, whereas our median and mean definitions only required (Lebesgue) integrability, which is a much lower bar.
I might need to give this more thought...
@vadimkhudiakov526 10 месяцев назад ⁺¹¹¹
This video inspires a question how to "sort" a function
@minamagdy4126 10 месяцев назад ⁺¹²
I'd imagine, for a function on [x,y] to [a,b] (where there exist u,v such that f(u)=a and f(v)=b), that we can define the reordering of f being g such that g(x)=a, g(y)=b. Then, for a
@jakobr_ 10 месяцев назад ⁺⁷
I’m imagining a process that iteratively rearranges intervals of the function’s domain so that in the limit you’re left with an increasing function.
Something like this: Choose a natural number n, and partition the codomain of the function into n intervals. For each codomain interval, identify all intervals of the domain that f sends to each codomain interval. Rearrange the domain intervals so that their respective codomains are in non-decreasing order. (there is likely more than one way to do this). Repeat as needed with bigger n and smaller max codomain partition size.
Then the median is this new function evaluated at (a+b)/2
@jakobr_ 10 месяцев назад ⁺³
There’s probably a more direct way to do this using the derivative f’ to “weight” each output value. With the new f’ being equal to the reciprocal of the sums of reciprocals of all of the derivatives at points where the function evaluates to the same value. Like resistors in parallel
@BJ-sq1si 10 месяцев назад
Take its inverse
@minamagdy4126 10 месяцев назад
@@jakobr_ I believe one way is to split the domain of f [x,y] into intervals [x_i,y_i] such that f is monotonic in each of them. This can be done by splitting where the derivative of f is zero (note: we can ignore saddle-point filtering, as having too many intervals don't impact the correctness of the calculation).
We then define g_i with domain [x_i,y_i] equalling f if f is increasing in that domain, and reversed in each decreasing interval x_i, y_i when generating g_i such that f(x_i+t)= g_i(y_i-t) for 0
@DirtShaker 10 месяцев назад ⁺⁹
19:27 - i think the second integral here that goes until 4 is also meant to be * dx and not * dt.
@InverseTachyonPulse 10 месяцев назад ⁺²
You are correct, he's just splitting the integral from 0 to 4 into two integrals, both are still with respect to x 💁🏻‍♂️
@journeymantraveller3338 10 месяцев назад ⁺³
A nice exercise is to do this for sin() on [0,2] I got f.med= sin(1), which is f((a+b)/2). A bit harder is for sin() on [1,2]. The median intersects twice. I got f.med= sin(pi/2 - 1/4) which is NOT f((a+b)/2), for x = pi/2 - 1/4 and x= pi/2 +1/4.
I also did this in R. You can try it for any function:
f
@Alan-zf2tt 10 месяцев назад
This lead to a very interesting discussion touching on Riemann/Lebesgue ways of doing things and so naively I wonder if it is a natural consequence as measuring things in space depends upon the space itself.
So in probabilities, to me, it suggests an assumption that these things happen, often in a Gaussian way or ways, and that sweeps away irregularities. In other way, as Michael said in the video, these things are nice.
We know monsters exist in math - but those are not allowed into probability theories and methods?
@kajdronm.8887 10 месяцев назад ⁺⁷
The median could also be defined by Lebesgue measure.
The median of a function with domain (a;b) is t such that the set of all x (a
@divisix024 10 месяцев назад
It should be b-a instead of a-b since a
@kajdronm.8887 10 месяцев назад
@@divisix024 Ah, yes. You're right. Thanks for the hint.
@nadavslotky 10 месяцев назад ⁺¹
Exactly what I wanted to write, but phrased way better than I would have.
Intuitively, the Lebesgue measure does exactly what we want here: it "counts" the "number" of points for each level of the function, and then "adds them up".
This doesn't, however, provide a way to actually calculate t for some arbitrary function, unless I'm not seeing something...
@Nikolas_Davis 10 месяцев назад ⁺¹
@@nadavslotky
I had the same thought. By the way, Lebesgue himself had described very vividly the way his measure works: imagine two bank clerks, one experienced and one inexperienced, both faced with the task of counting a number of coins. The inexperienced one counts the coins as they come to him in random order. The experienced one first sorts them into pennies, dimes, quarters, etc., and then says: "I have mes(E_1) pennies, mes(E_2) dimes," etc. The value of the coin corresponds to the function value, and the measure of each coin type corresponds to how many coins of this type there are.
@LucasDimoveo 10 месяцев назад ⁺²⁵
What area of mathematics do you specialize in? I don't think I've ever seen someone jump between Analysis and Algebra as often as you do. It is really cool to see someone so well rounded. People who like one topic tend not to like the other
@Debg91 10 месяцев назад ⁺⁸
I think he specialises in Lie algebras and vertex operator algebras, an algebraic structure with applications in mathematical physics and pure mathematics as well.
@hybmnzz2658 10 месяцев назад ⁺¹
He does vertex algebras and Lie theory stuff under Richard Borcherds if I recall correctly.
That "explains" your question but honestly, there is no reason an analyst can't lecture algebraic topics or vice versa. We aren't exactly doing off the press arxiv research papers on this channel, and all math is pretty cool.
@doctorb9264 10 месяцев назад
He also excels at Number Theory.
@Alan-zf2tt 10 месяцев назад ⁺²
I have to agree with you. Too often math education is presented as a very strict limited view on how to do things ... perhaps that is because learning outcomes for the course or branch of math requires it in order for student to show those learning outcome skills in assessments. But the downside is that students often go away with limited perspective due to intense focus placed on learning methods required for learning outcomes assessments. It is a vicious circle in many ways.
But that is what a successful student has to do in order to obtain pass marks - it is what it is.
It is very refreshing to see Michael often using all math skills to solve problems. Some of his differential equations videos are incredible for that reason.
His view seems to be: each branch of math (as taught) has different tools. Within reason (that is a biggie!) those can be used to built a math toolbox to be used to solve complicated problems.
Safety warning: look out for his slight of hand isomorphisms (hint: after all x and y are just dummy variables (implication wolg: they can be swapped around) )
Math is great!
@randomtiling4260 10 месяцев назад ⁺²
@jojijoestar4762 I dont think he works under Borcherds, they're at different universities. I think they just collabed on a video once, since Borcherds invented vertex algebras
@mattgarber8658 10 месяцев назад ⁺⁴
This is a great video on a topic I've actually been thinking a lot about recently!
For those interested in machine learning and/or statistics, the alternate definition of median Michael uses here is why we say minimizing mean absolute error returns median predictions (as opposed to MSE returning mean). Very useful to understand if you find unusually distributed data in the wild and need to build a useful model of it.
Thanks Michael!
@journeymantraveller3338 10 месяцев назад
I have also done a lot of work in R recently on this. I may have found another definition of median of a function.
@Jeathetius 10 месяцев назад ⁺⁶
It’s seems to me that the definition of both the mean of a function and a median of a function are both more natural in the context of Lebesgue integration rather than Riemann integration. With Lebesgue integration you can directly extend the intuition we have for the the mean and median of a finite, discrete set of values to a continuous case that relies on an integral because we are able to keep using the language of sets of values and have a direct notion of the size of these sets. I’d be tempted to first answer the question in that form and then to just work backwards and show that the definitions agree when everything is Riemann integrable.
@afuzzycreature8387 10 месяцев назад ⁺²
the traditional definitions rely on the idea that the space on [a,b] is a uniformily distributed probability measure, that is the probability density for t,t' in [a,b] is the same for all t,t'. At that point the median is nothing more than the minimizer of expected loss on that space E[|t-f(x)|]
@joeyhardin5903 10 месяцев назад ⁺⁴
Haven't watched yet, but with some memory of GCSE statistics, my guess would be something like you integrate the function between your lower bound and a variable upper bound. Then find the x-value that corresponds to a y-value exactly halfway between the lower and upper bounds. Then plug that x-value back into the original function and that should be your median. This would also work for something like the quartiles. The idea comes from the cumulative frequency distribution.
@PaulEfremoff 10 месяцев назад ⁺¹¹
How to find the sorted-function of a function? (continuous version of sorting)
@FireStormOOO_ 10 месяцев назад ⁺³
Also where my mind immediately jumped
@michaelguenther7105 10 месяцев назад ⁺¹
If we use the Leibniz rule to evaluate dg/dt, we see that we will evaluate the integrand |t-f(x)| where f(x)=t, and get zero for those terms, leaving only the integral of the partial derivative of |t-f(x)| wrt t, which is either x or -x (depending on whether the integrand was (t-f(x)) or (f(x)-t) ) evaluated at the appropriate limits. Determining those limits of integration are what make this difficult, i.e. solving f(x)=t for x.
@insouciantFox 7 месяцев назад ⁺¹
When I was in school, I was taught that the median was the value of t such that the normalized integral of the function equals ½ at t; viz.:
If q = int_a^b f(x)dx
and 1/q * int_a^tf(x)dx =½,
Then t is the median.
Where does this reasoning comes from and what does it actually represent?
@pierreabbat6157 10 месяцев назад ⁺²
How would you compute the median of a set of points in the plane, 3-space, etc.?
@Alan-zf2tt 10 месяцев назад
I may be wrong but I read this video as fitting for a first level graduate math course considering events happening in x-y plane, continuity on domain, differentiability on domain and such things. Also suggesting it may be a requisite for students not taking a major in math but needing a math base anyway. Physics? Engineering? Electronics?
That is to say your question is astounding and probably beyond scope of the video?
@journeymantraveller3338 10 месяцев назад
I think they do this sort of thing in cluster analysis. k-means. Centroid etc.
@scottmiller2591 10 месяцев назад ⁺³
This made me think about weighted medians of functions by putting the weighting function into the measure.
@krisbrandenberger544 10 месяцев назад ⁺³
Hey, Michael! The values of a and b should be x_0 and x_(2*N-2), respectively, making the formula for a_n @10:25 equal to a+(b-a)/(2*N-2)*n, and the median value of the function f(x_(N-1)).
@journeymantraveller3338 10 месяцев назад ⁺¹
Also, no limit is then needed if you continue the subbing.
@krisbrandenberger544 10 месяцев назад
@@journeymantraveller3338 No, because you have (N-1)/(2*N-2) which simplifies to 1/2.
@journeymantraveller3338 10 месяцев назад
Exactly. @@krisbrandenberger544
@rinner2801 10 месяцев назад ⁺⁵
I wish I had the math skills to understand more of your content - it really makes me want to go back to school (no luck there though).
@masonskiekonto590 10 месяцев назад ⁺³
Watch the videos and pay attention, in a few months you will find yourself understanding all of the important stuff without even opening a book.
@snared_ 10 месяцев назад ⁺¹
@@masonskiekonto590 this is awful advice lmao. Do exercises out of textbooks, watching videos, even attentively, will not really help you develop. It does get more comfortable though I'll give you that, but to understand I feel you should actually act not just observe.
@2eanimation 10 месяцев назад ⁺¹
Grab yourself a textbook and start learning, at least that's what I do whenever I find something interesting. The Genesis Library is always an option if you are low on cash :)
@rgqwerty63 10 месяцев назад ⁺⁹
Would the median not be m, such that the integral from a to m is half the total area from a to b?
@landsgevaer 10 месяцев назад ⁺¹
No. 😉
@gamerpedia1535 10 месяцев назад ⁺³
Integral from a to m = integral from a to b ÷ 2
F(m)-F(a) = (F(b)-F(a))/2
F(m) = [F(a)+F(b)]/2
That's the mean value theorem!
@johnchessant3012 10 месяцев назад ⁺⁵
you're thinking of probability distributions
@TehDaddyShark 10 месяцев назад ⁺²
This was a tremendously helpful video. Thanks!
@marc-andredesrosiers523 10 месяцев назад ⁺¹
Looking at indicator functions is also nice.
@ilyafoskin 10 месяцев назад ⁺³
0:00 I thought he said “So shortly after a student learns about the death of an integral…”
@Kapomafioso 10 месяцев назад
Now vid on how to get the mode pls :P
(probably not computationally possible, since that probably depends on the "binning", at least for finitely many points in a set. For a function it's probably it's "most constant" part.)
@Chalisque 10 месяцев назад
Only started this video. I'm guessing that the median of f on [a,b] is best defined as the value y such that the measure of { x : f(x) >= y } is equal to (a-b)/2.
@JosuaKrause 10 месяцев назад ⁺²
could you, as a less practical definition, use 0 = \int sgn(f(x) - t) dx where t is the median if the integral is 0?
@afuzzycreature8387 10 месяцев назад ⁺¹
you get this when you take the derivative of the \int |f(x)-t| dx with respect to t.
@89alcatraz89 10 месяцев назад
@@afuzzycreature8387 I was also thinking about it but in more descriptive terms but this woudl be the equivalent of havign the same summary lenght of intervals where function is above and below median, which is exactly what we expect based on discrete example
@Happy_Abe 10 месяцев назад ⁺²
Does this require continuity?
@Alan-zf2tt 10 месяцев назад
0:23 on the board states it does.
What happens around discontinuities probably has been defined and figured out if discontinuities are allowed at all?
@nathanisbored 10 месяцев назад
my intuition would be to first sort the function values in increasing (or non-decreasing) order and then check the value at the halfway point. thats what i would do for a finite set after all
@snared_ 10 месяцев назад
yeah, now it's up to us to understand how this definition of median plays with the one he defines in the video - I think they are similar/maybe the same though I would actually need to think about it
@afuzzycreature8387 10 месяцев назад
@@snared_ the discrete version probably came first (galileo was using it, iirc, great scientific invention that median!) the continuous version likely came later... however, defining either in measure theoretic basis. Oh, much more complicated thing. One can establish a formula for the median of a distribution easily. It is another matter to define it as an estimator. Or to show how the data driven version arises out of mean absolute error (absolute loss)
@TheMichaelmorad 10 месяцев назад
I remember when I was 12 or 13 I invented the riemann sum of an integral using the average, I basically said that the area equals the median times the length of the intevral
@AJ-et3vf 10 месяцев назад ⁺¹
Awesome video. Thank you
@d.h.y 10 месяцев назад
This video was really helpful 🤩🤩
@dominicellis1867 10 месяцев назад
How do you use this formula to prove that the skew index provides an inequality on the 3 central tendencies mean median and mode?
@natepolidoro4565 10 месяцев назад ⁺¹
Then you can talk about a function being skew right or left and then use statistical concepts
@JacobHa 10 месяцев назад ⁺¹
Originally I thought it is something similar to the median of a continuous random variable🤔
@86congtymienbac80 10 месяцев назад ⁺¹
Yes!
@aneeshsrinivas9088 10 месяцев назад
How do you extend the 2nd median definition to finding the median of a function over (-∞,∞)?
@lucanina8221 10 месяцев назад
Wow the MAE is the constant function which approximates the L1 norm of f
@Mathymagical 10 месяцев назад
Define the continuous version of the mode computation.
@alexanderf8451 10 месяцев назад
Consider how we would find the the mode of a strictly increase/decreasing function . . .
@minwithoutintroduction 10 месяцев назад
22:07
@jamesfortune243 10 месяцев назад
Nice generalization.
@BerndSchnabl 10 месяцев назад ⁺²
backward flip **** backward flip **** backward flip
@szymonraczkowski9690 10 месяцев назад
cool
@Double_U_tau_Phi 10 месяцев назад ⁺³
First

Следующие

Автовоспроизведение

a formula for the "circumference" of an ellipse.