Suggestion: 1. there were sometimes you tried to introduce some formula, and you said the formula out loud without showing it first. Introducing a complicated formula only with words is very unclear and audience like me might get lost in the words. It is better to show the formula beforehand than introduction, so the content is clear to all audience. 2.For algebra auto pilot, it is better to show each step on screen and leave reasoning alongside the step. It would be much clearer to the audience and without need for them to write down formula and try it themselves. Overall it is a very good video and keep the good work up.
The more I learn the more it seems that f(x)f(y) = g(x²+y²) is probably the most important property of the gaussian, I wish this had been emphasized more to me in my first statistics course
Wow, it took a long time for me to find this video but this is awesome! I wanted to get into the higher dimensional cases and the curse of dimensionality but I never had the time, so I’m really glad you made this!
One time this same came up in a project I was working on. I managed to work it out as far as this video goes. It worked fine for real numbers in Euclidian space. But I wanted to extend it to non-Euclidian spaces (such as integers in a ball of radius r in Manhattan space, or binary values in a ball of radius r in Hamming space) but never figured out if that was possible or not.
I think this exercise is an amazing introduction to the math of probability and probability in general. I'm trying to learn math and computer science on my own with English being my second language 😢 It's this kind of video that I find helpful and clarify the road for me 😍😍 Thanks I'll attempt this exercise myself to challenge myself 🥰
Hi, also a 8:00 r can’t be chosen uniformly because we expect to see the same number of points at each ring (in 2D) or at each spherical surface (in 3D). These things are tricky and to paradoxes. Related, There are a few very good lectures from Richard Hamming about sphere packing on RUclips.
A great example of a simple question having a non-trivial answer all explained in an enjoyable and informative way while connecting areas of maths in new (at least for me) interesting ways. You have a real talent for this medium and I look forward to seeing what you come up with next.
Cool stuff! It might be interesting to graph the "pairwise scatter plot"/ "scatter plot matrix" in higher dimensions to try visualizing all the points near the surface of the ball.
The choice of a normal distribution could also be motivated by this fact: e^-r^2 = e^-(x1^2+x2^2+...) = e^-x1^2 • e^-x2^2 • ... showing both the rotatonal invariance (dependent only on r) and the property that it can be expressed as a product of functions each of which only depends on one coordinate.
A funny thing is that Box-Muller makes a pair of independent values and normal() only makes one of them. When you're generating a bunch of them it would be computationally cheaper to make the whole pairs.
Wait, a 3D ball’s random points should appear more dense near the center since more depth of the ball is viewed near the center (similar to looking at a circle from within the plane of the circle, though outside). You didn’t let the 3D projection run long enough to really see this happening. Also, you can indeed show a 2D shadow of the 3D shadow of a 4D object. If you can show a 3D object in 2D, you can go from 4 to 3. While much fidelity will be lost, it would be interesting to see the densities near the center progressively go up (very quickly in each dimension!).
That's a nice observation. However, I think we can't just show a slice of a 4D ball in 3D because the probability that a point falls on any particular 3D slice is 0.
Great presentation in the middle and end, well written, and the math is aesthetically pleasing. I think you could do more to motivate the premise though. You reference another video, but the vast majority of people will not have seen it, and won't desire to unless you give them a reason other than "my video is based on this." Kinda circular. After a few minutes, I felt a little disappointed that there wasn't even an example where this kind of algorithm would come up. Doesn't have to be a practical example, just justify why we're sampling randomly in a square rather than just normalizing n dimensional vectors. This is a very unnatural restriction and it needs to be acknowledged
A reference being circular means this refers to that, but that refers back to this. Him referring to another video is *not* circular. Perhaps you meant some other word. As for "the vast majority", it's fairly reasonable that most people watching SoME2 videos watched a lot of the SoME1 videos, or may desire to. The reason the naive method involves uniform sampling in a square then rejecting anything outside the circle is that it is *incredibly* easy to program and efficient. Your standard random number generator samples uniformly on the interval 0 to 1. Two of those sample uniformly on a square. The rejection only loses 21.5% for a circle, so you don't even have to try often. I don't get your "just normalizing n dimensional vectors" comment. The whole point is to *generate* n dimensional vectors.
Great video - nicely done! For the rotational invariance of the standard multivariate normal, there is also an easier way to show this: it has p.d.f. proportional to exp(-norm(x) ** 2 / 2) and by definition norm(x) is preserved under orthogonal matrices. The p.d.f. of n independent normals is also easy to derive as it must be the product exp(-x_1 ** 2 / 2) * ... * exp(-x_n ** 2 / 2), by independence. Your computational approach was more direct, but it is also interesting to see that normal variables are the most natural choice here - it's basically the distribution you would come up with if you wanted to construct a rotationally invariant higher dimensional distribution.
Nice job. A year ago i occasionally stumbled upon this problem. For n-dimensional ball I made method to choose a random using the random point from (n-1)-dimensional point rescale it, and then give random height for new dimension. I got SOME crazy integrals for this rescaling part, I hoped to get polynomial time. I wrote on python how to calculate iteratively these integrals. And a while ago it f**king dawned on me, that multivariate normal distribution has rotational symmetry (I knew about it, but it just slipped my view). I was so disappointed.
Question. You required a multivariate distribution that was rotationally invariant, chose the multivariate normal distribution, showed that it was rotationally invariant, and used it. Nice. But ... are there any other multivariate distributions that are rotationally invariant? If so, what are they, and is selecting random points from them more or less efficient than selecting random points from the multivariate normal distribution? If not, how do you prove that the multivariate normal distribution is the only multivariate distribution that has this property?
Around 14:23, you normalize the Gaussian multi-variate, and later compute a new, random length of the vector. I see how this is the computationally simplest approach, but what about re-using the multi-variate vector length for that? After all, the PDF for the length is just a simple N(0,1), and we know the desired PDF, so one could be massaged into the other. Again, I see that this is more difficult in code, but it feels mathematically "purer" because it doesn't "throw away" any of the randomness it takes in.
couldn't you instead of separately choosing a point on the unit sphere and then separately sampling r from a suitable distribution, just sample a gaussian and then scale it suitably in a direct way? that way we have one less random number to generate. it probably wouldn't even be that hard. the squared norm of a standard multivariate gaussian random variable is distributed according to a chi-squared distribution. so we would need to work out the distribution of the (non-squared) norm - which amounts to transforming the argument of the cumulative distribution function. as soon as we have the cdf of our point's norm, we can just evaluate the cdf at that norm and we get a uniform random variable, and we can proceed as you outlined with inverse transform sampling. the cdf contains a gamma function though, so maybe we're better off just generating an additional uniform random number...
Might be worth spending a second just to mention LCG PRNG's and Marsaglia planes --- or at least how to check the PRNG your numpy library is using, and seeding it, otherwise you are not really doing random sampling, you only _think_ you are. All the fancy algorithms in the world are not generating random samples if the PRNG is broken or being incorrectly used.
Suggestion:
1. there were sometimes you tried to introduce some formula, and you said the formula out loud without showing it first. Introducing a complicated formula only with words is very unclear and audience like me might get lost in the words. It is better to show the formula beforehand than introduction, so the content is clear to all audience.
2.For algebra auto pilot, it is better to show each step on screen and leave reasoning alongside the step. It would be much clearer to the audience and without need for them to write down formula and try it themselves.
Overall it is a very good video and keep the good work up.
There's a lot to be said for the audience writing it down. We learn more from doing than from being told.
Incredible video! I'm mindblown by your conciseness and efficiency as you detail the reasoning step by step.
The more I learn the more it seems that
f(x)f(y) = g(x²+y²)
is probably the most important property of the gaussian, I wish this had been emphasized more to me in my first statistics course
Wow, it took a long time for me to find this video but this is awesome! I wanted to get into the higher dimensional cases and the curse of dimensionality but I never had the time, so I’m really glad you made this!
One time this same came up in a project I was working on. I managed to work it out as far as this video goes. It worked fine for real numbers in Euclidian space. But I wanted to extend it to non-Euclidian spaces (such as integers in a ball of radius r in Manhattan space, or binary values in a ball of radius r in Hamming space) but never figured out if that was possible or not.
Immaculate. James == Goat. Can't wait to see your vid next year!
I think this exercise is an amazing introduction to the math of probability and probability in general. I'm trying to learn math and computer science on my own with English being my second language 😢 It's this kind of video that I find helpful and clarify the road for me 😍😍 Thanks I'll attempt this exercise myself to challenge myself 🥰
Hi, also a 8:00 r can’t be chosen uniformly because we expect to see the same number of points at each ring (in 2D) or at each spherical surface (in 3D). These things are tricky and to paradoxes. Related, There are a few very good lectures from Richard Hamming about sphere packing on RUclips.
A great example of a simple question having a non-trivial answer all explained in an enjoyable and informative way while connecting areas of maths in new (at least for me) interesting ways. You have a real talent for this medium and I look forward to seeing what you come up with next.
Great job! Love the animations. Good stuff.
Cool stuff! It might be interesting to graph the "pairwise scatter plot"/ "scatter plot matrix" in higher dimensions to try visualizing all the points near the surface of the ball.
Beautifully explained!
The choice of a normal distribution could also be motivated by this fact:
e^-r^2 = e^-(x1^2+x2^2+...) = e^-x1^2 • e^-x2^2 • ... showing both the rotatonal invariance (dependent only on r) and the property that it can be expressed as a product of functions each of which only depends on one coordinate.
don't wait until the next summer. The video is great, keep it up :)
A funny thing is that Box-Muller makes a pair of independent values and normal() only makes one of them. When you're generating a bunch of them it would be computationally cheaper to make the whole pairs.
Excellent video!
Awesome stuff. Such a rollercoaster ride.
Just check if it's norm is greater than R, the radius. I think this generalizes pretty well.
Wait, a 3D ball’s random points should appear more dense near the center since more depth of the ball is viewed near the center (similar to looking at a circle from within the plane of the circle, though outside). You didn’t let the 3D projection run long enough to really see this happening.
Also, you can indeed show a 2D shadow of the 3D shadow of a 4D object. If you can show a 3D object in 2D, you can go from 4 to 3. While much fidelity will be lost, it would be interesting to see the densities near the center progressively go up (very quickly in each dimension!).
That's a nice observation. However, I think we can't just show a slice of a 4D ball in 3D because the probability that a point falls on any particular 3D slice is 0.
@@pythagoras_j4646 Not a slice, the shadow.
😍 love the content!
Excellent work.
Great presentation in the middle and end, well written, and the math is aesthetically pleasing. I think you could do more to motivate the premise though. You reference another video, but the vast majority of people will not have seen it, and won't desire to unless you give them a reason other than "my video is based on this." Kinda circular. After a few minutes, I felt a little disappointed that there wasn't even an example where this kind of algorithm would come up. Doesn't have to be a practical example, just justify why we're sampling randomly in a square rather than just normalizing n dimensional vectors. This is a very unnatural restriction and it needs to be acknowledged
A reference being circular means this refers to that, but that refers back to this. Him referring to another video is *not* circular. Perhaps you meant some other word. As for "the vast majority", it's fairly reasonable that most people watching SoME2 videos watched a lot of the SoME1 videos, or may desire to.
The reason the naive method involves uniform sampling in a square then rejecting anything outside the circle is that it is *incredibly* easy to program and efficient. Your standard random number generator samples uniformly on the interval 0 to 1. Two of those sample uniformly on a square. The rejection only loses 21.5% for a circle, so you don't even have to try often.
I don't get your "just normalizing n dimensional vectors" comment. The whole point is to *generate* n dimensional vectors.
Great video.
Great video - nicely done! For the rotational invariance of the standard multivariate normal, there is also an easier way to show this: it has p.d.f. proportional to exp(-norm(x) ** 2 / 2) and by definition norm(x) is preserved under orthogonal matrices. The p.d.f. of n independent normals is also easy to derive as it must be the product exp(-x_1 ** 2 / 2) * ... * exp(-x_n ** 2 / 2), by independence. Your computational approach was more direct, but it is also interesting to see that normal variables are the most natural choice here - it's basically the distribution you would come up with if you wanted to construct a rotationally invariant higher dimensional distribution.
Great video! Makes me wanna learn Manim now ...
This video is absolutely top notch quality!!! Way to go
Great video! Keep it up
Nice job. A year ago i occasionally stumbled upon this problem. For n-dimensional ball I made method to choose a random using the random point from (n-1)-dimensional point rescale it, and then give random height for new dimension. I got SOME crazy integrals for this rescaling part, I hoped to get polynomial time. I wrote on python how to calculate iteratively these integrals. And a while ago it f**king dawned on me, that multivariate normal distribution has rotational symmetry (I knew about it, but it just slipped my view). I was so disappointed.
You could also pick a random point on the sphere and throw away 2 coordinates.
0:58 Hm... To my understanding, ball has interior and sphere is boundary of ball. Sometimes ball is also called disk. I guess you meant to use "
You're correct. I made a mistake there and meant less than or equal to.
Question. You required a multivariate distribution that was rotationally invariant, chose the multivariate normal distribution, showed that it was rotationally invariant, and used it. Nice. But ... are there any other multivariate distributions that are rotationally invariant? If so, what are they, and is selecting random points from them more or less efficient than selecting random points from the multivariate normal distribution? If not, how do you prove that the multivariate normal distribution is the only multivariate distribution that has this property?
Around 14:23, you normalize the Gaussian multi-variate, and later compute a new, random length of the vector. I see how this is the computationally simplest approach, but what about re-using the multi-variate vector length for that? After all, the PDF for the length is just a simple N(0,1), and we know the desired PDF, so one could be massaged into the other.
Again, I see that this is more difficult in code, but it feels mathematically "purer" because it doesn't "throw away" any of the randomness it takes in.
couldn't you instead of separately choosing a point on the unit sphere and then separately sampling r from a suitable distribution, just sample a gaussian and then scale it suitably in a direct way? that way we have one less random number to generate.
it probably wouldn't even be that hard. the squared norm of a standard multivariate gaussian random variable is distributed according to a chi-squared distribution. so we would need to work out the distribution of the (non-squared) norm - which amounts to transforming the argument of the cumulative distribution function. as soon as we have the cdf of our point's norm, we can just evaluate the cdf at that norm and we get a uniform random variable, and we can proceed as you outlined with inverse transform sampling.
the cdf contains a gamma function though, so maybe we're better off just generating an additional uniform random number...
"it probably wouldn't even be that hard."
🤣The phrase "famous last words" springs to mind.
3:42 Hubbard and Hubbard has videos???
Nah, the proof is from the book. I just animated it.
Might be worth spending a second just to mention LCG PRNG's and Marsaglia planes --- or at least how to check the PRNG your numpy library is using, and seeding it, otherwise you are not really doing random sampling, you only _think_ you are. All the fancy algorithms in the world are not generating random samples if the PRNG is broken or being incorrectly used.