Grant (3b1b) and Matt Parker (standup maths) recently did a video together on why max(random(), random()) has the same distribution as sqrt(random()), and honestly, I really like your video here better! Having the classic challenge of plotting a uniform distribution of points in a circle as a motivating example made the whole thing a lot more concrete for me. Not to mention that you made this 3 years ago! It also helped me intuitively understand why you need to multiply or divide by r when integrating or taking a derivative in polar coordinates. Great video! ❤
Really good video. When it started I thought the rejection sampling method was really dumb: “why would he use Cartesian?”. So I paused the video and went ahead to implement it in polar coordinates. Ran the test and most of the points were in the center “well sh*t why is it like this?”. I then decided to keep watching and I am so grateful because this is one of the best videos I’ve seen in this vein. Definitely on par with the 3b1b videos but with a special you in it
@@nubdotdev Honestly I immediately found myself saying nope it will need to be sqrt(r) but only because it hit me that the reality of what the algorithm is doing is selecting a random point on a random circle. After all r defines the radius of the random circle and theta defines a random point on said circle. Worse I realised that this problem cannot be fully solved on a machine with finite precision, the space of all possible points this method can generate on a machine with finite precision is denser towards the centre. Using the sqrt technique to compensate for this uneven density of possible points may solve the issue visually and work for most purposes but it does not change the fact that no two points at r=0.8 can be as close together as those at r=0.1 etc the possible points at r=0.8 are simply more sparse due to the finite angular resolution of theta when this number is limited by a machine with limited precision.
Doing a whole lot of math only to arrive at the conclusion that the easiest solution was already the best... If you're not even mad about that: Congratulations, you're thinking like a mathematician.
@@aescafarstad Nah. The algorithm is so simple that either you're running a small number of cases and it doesn't matter if you're unlucky, or you're running a large number of cases and the chance of getting noticeably unlucky is so low that it'll never happen in a trillion years
@@ObsessiveClarity There are case when you can't think like that, i work on real time code, you usually don't use code that can have a unpredictable time execution variability (even if it is rare) , constant time if the best way to go in that line of work
I don't know if you're being serious, or are just trying to be funny. Assuming that you're being serious: Intuition may give you the best result without any conscious effort and without you taking a lot of time. But if you wanna be sure about things, and don't wanna live in a castle with pillars of salt then _Mathematics is the way to go_ .
@@aescafarstad Readability of code is an important part of software engineering. We just saw a 18 min video that took a lot of time and effort to make. It was also aimed at an audience with a special interest in math. Documenting that type of code yourself is gonna be a pain in the ass for such a simple task. It is also possible to make an upper bound of say 100 loops. Then you get constant time without any problems before the heat death of the universe.
A couple of years ago I was trying to make an animation for my presentation. I wanted to show electrons on the surface of nanoparticles to teach how they are influenced by the electron magnetic field. I did a 3D sphere and put a bunch of random electrons. For some reason I wasn't able to understand UNTIL TODAY, they were all clustered in the center of the sphere. It was so annoying, I ended up doing something like the rejection sampling. Thank you!
Similar thing happened to me. Was making a brownian dynamics simulation and had to randomly distribute molecules on the surface of a sphere. I learned you can use a uniform distribution for phi (angle from x-axis, 0 to 2pi) but then you can't for theta, it'll be grouped at the poles. Instead, choose a uniform distribution for z. (Or equivalently, cos(theta): get a random number from -1 to 1 then do acos to get theta.) But that was for a single r value. To fill the whole sphere, since each shell has an area dependent on r², I guess you'd probably take the inverse of the integral like in the video and you'd do ³√(U) to get a random r.
@@pfeilspitze yeah in some high dimension, the sum of squares being less than 1 is very unlikely. I can think of a big sphere in a big cube all I want, it just works much differently in, say, 9D.
The raw polar coordinate generation could be incredibly useful for calculating a random bullet spread in a shooter, that cleanly increases the longer you shoot.
@@lukasf3070 no its not necessarily realistic, human hands provide better stability in a certain dimension and the gun tends to recoil upwards, so its neither a simple x-y spread nor a radius-angle spread
@@nathariel666 interesting. I bet competitive small bore shooters will know from experience about any non uniform distribution caused by other factors. Love the video :)
GPUs do have fast sin/cos, way faster than the ones on CPU, but the accuracy of sin/cos on GPU is usually way lower (like crazy lower). So you need to know exactly what are you doing to select one over the other.
They're still more accurate than things like lookup tables. And you can use either the high precision/slow algorithm or the low precision/fast hardware accellerated instruction as needed. Actually, I think their high precision algorithm may use the hardware accellerated instruction to shortcut the evaluation, using the magic of polynomial approximations of increasing order (the hardware does a low order polynomial approximation, and then you can extend it "by hand"). In any case, a 32-wide SIMD pretty much guarantees that at least one sample will be rejected on the first round, meaning you will almost always need multiple iterations. Also, any halfway decent random number generator is somewhat expensive (probably comparable to a high precision trig function.), so the retry is going to be more expensive that it appears at a glance. I'd be surprised if the rejection sampling algoritm outperformed any of the other algorithms on GPUs.
@@Keldor314 Amusingly, "halfway decent RNG" might be a stretch for most GPU rendering applications... Ex. for a shader operating once per pixel [x,y], a "random number" might just be an arbitrary hardcoded dot product ([p, q, k] .* [x, y, 1] mod m) for each "generated" random number, with different out-of-butt values for p,q,k,m each time. (And then, you might norm a uniformly random 3-vector using the fast-inverse-sqrt trick, and just accept the resulting density distortion.)
@@AySz88 You can write a perfectly good RNG for use in compute shaders. Essentially any algorithm that works on a CPU will work perfectly fine on a GPU, but there are some differences in performance and memory characteristics that should inform you what algorithm to use. The tricky thing is that GPUs run 100's of thousands of threads in parallel, and each thread is probably going to need its own RNG state. Moreover, if the RNG is in a hot code loop, its state really needs to live in registers, the cache or in local shared memory for performance reasons. Thus, you want to choose a RNG with a relatively small state footprint. Another thing to consider is that integer division is fairly slow (similar to transcendential functions), so it's best to avoid algorithms that use it, although I think it's fallen out of favor for RNGs anyway. The RNG I used for my work is a variant of a XORshift generator. For each RNG, there were 4 independent XORshift generators, which would be combined together to produce the next random number. For performance, I had the generators overlap between RNGs for different threads, so a SIMD warp of 32 threads would have each thread would update a single XORShift generator, and combine it with the states from 3 adjacent threads, which imploves the properties without needing extra state over a fairly basic RNG such as LCG. It's a fairly solid RNG, probably usable for anything other than cryptography.
Or in situations where branch divergence is an issue. Not sure about the newest GPUs, but at least in the past, just one thread in a group of threads rejection a sample would be equivalent in performance to all of them rejecting it, as they have to wait for that one thread to complete.
I haven't had any problems with rejection sampling on spheres in 3d as the equation is xx+yy+zz=rr. Haven't had the need to try in higher dimensions than 3 yet.
@@kylefillingim6258 he is probably talking about high-dimensional problems. Could occur in machine learning applications or other data analysis, you can have 1000s of dimensions. If i am not mistaken, the volume of an n-dimnesional hypersphere converges to 0 with n approaching infinity, whereas the volume of a hypercube of side length 1 is always 1.
@@kylefillingim6258 The probability to accept a sample in rejection sampling scales down exponentially with the dimension. The dimension is the number of variables, so this is a big deal for problems with hundreds or thousands of variable. Although rejection sampling becomes inefficient, it's not hard to sample points uniformly in a high dimensional sphere because other methods scale efficiently. If you take an N-component vector (x_1,...,x_N) where each component is a Gaussian random variable, then normalize the vector, it will be a uniformly random point on the N-sphere. In general sampling points of a N-dimensional volume (which is equivalent to estimating the volume of the shape) is believed not to be possible in poly(N) time without additional assumptions like convexity of the volume.
@@kylefillingim6258 It looks like for a sphere in three dimensions, you'd be rejecting 48% of your samples -- i.e. keeping a little over half -- and needing to run the algorithm on average about twice. Success rate = (volume of circle radius=1 / volume of cube 2x2x2) = (4π/3)/8 = π/6 ≈ 0.5236. And, your E(x) would be 1 / success rate = 6/π ≈ 1.909
As a mathematics and a computer engineering student, i have to say I loved this video so much, even though it has some imperfections you already mentioned in your comment. This is the stuff that makes me love my studies
18:00 "Sine and Cosine operation" Not entirely correct (although in python it probably is), this is an entirely different rabbit hole. While sin() and cos() pretty much take the same time, calculating both of them for the same angle does not have to take twice the time. GCC for example does an optimization where it detects such cases at compile time and calls a sincos() function instead, which is specifically optimized to calculate both the sine and cosine and is closer in time to just a single call of either sin() or cos(), than to calling both sin() and cos(). Another thing with compiled languages that I've not tested but which might be interesting would be to see how much these are effected by branching. Rejection sampling will inevitably contain branching. A maximum can be done as it's own instruction so that shouldn't need any and the reflecting sum could be transformed into a branchless version like 1 - abs(rand() - rand()).
@@InfoTeddy Not necessarily. I tried a few examples, a maximum like if(b > a){ a = b; } is optimized into a maximum by most compilers. However, the second case if(r > 1) { r = 2 - r; } most compilers don't detect. Even a simpler version like writing the absolute value like if(r < 0) { r = -r; } is not caught by most compilers. Clang does optimize both of the these, although the resulting assembly code is much more complicated than what you would get if you just use an abs() function.
@@wedding_photography In a floating point number the first bit is used for the sign after that comes the exponent and then a significand field (Look up Floating-point arithmetic if you want a more in depth explanation). What this means for an absolute value is that only the first bit needs to be set to zero, this is usually done by performing a bit-wise and operation.
When I needed to generate uniform random points in a circle, the method I came up with was to first generate random points in a triangle with base r and height 2*π*r. Then I applied a transformation to "roll" the triangle around itself to form the circle. Each vertical slice maps to the ring with the same length.
Interesting, can you please tell what transformation do you apply ? Also, as for the 'rolling' of the triangle, I guessed 'rotating' the triangle, then the point joining the base and height traces the circle, but that doesn't tell the importance of 2πr height, do tell if it's wrong 😅
@@kg3217 Picture a right triangle sitting on the x-axis with the right angle at (R,0), one vertex at the origin, and the third vertex at (1, 2πR). The transformation for a point (x, y) in the triangle is r = x and θ = y/x. The height is 2πR because points along the vertical edge map to the outer circumference of the disk. You could do it with a 1x1 triangle instead, in which case the transformation would be r = x*R and θ = 2π*y/x, but the way I did it the "rolling" step is an area-preserving transformation.
As for the "rolling", picture slicing the triangle into lots of vertical strips, then bend each strip around into a ring so that it connects with itself at the base. Each strip has height 2πx and distance x from the origin, so the rings together form a complete disk.
It might be a bit faster that way. I'm not sure though. You'd need to adjust the final transformation, but maybe you could combine a couple of the multiplication steps that way. For points in the upper half of the triangle, I just rotated them into the bottom half before applying the transformation rather than using rejection sampling.
I have watched almost every SoME1 video that I could find, and this is the only one that seems to actually construct an algorithm out of a mathematical abstraction, bravo for that...very relevant. I added this to the list of all SoME1 videos that I could find. @
I made a mediocre SoME1 video for your list (well, I didn't make it for your list, but, you know): ruclips.net/video/bkjpn9M7Cfg/видео.html #shamelessplug
"instead of transforming a distribution, reject any results that fall outside of the intended range." StalinSort: remove any elements that are not in sorted order. Congratulations, your list is now sorted.
As a computer graphics artist, these concepts are indeed very useful in real world implementations, scattering points in various controlled ways is a super common task. The issue with polar coordinates came intuitively to me from experience of manipulating geometry on a daily basis, but seeing more of the theory of why gave me deeper understanding past the intuition, cheers for that. Rejection sampling seemed like a costly prospect at first, but I'm always worried about performance when there's sqrt's in the algorithm, cool to see that the performance justified the brute force algorithm after all!
What a great video! Asking all the right questions at the right time. Two things come to my mind. 1) The "curse of dimensionality" [1]. When sampling points on a hypersphere in N dimensions, the rejection method becomes worse and worse, because the ratio of volumes between hypersphere and hypercube shrinks. It would be interesting to see a comparison of the different methods with the dimension as the parameter. 2) If it is the cos/sin calculation that slows the computation down, would it be possible to speed up the algorithms using a precomputed table (with linear interpolation)? [1]en.wikipedia.org/wiki/Curse_of_dimensionality
I thought about adding a segment on higher dimensions but I ran out of time for the competition. I would like to explore that in a future video. I'm willing to bet that using a trig table could really speed up the other three algorithms and I'll definitely try it out. Thanks so much for your feedback!
If you want to sample a point in a 100 or 1000 dimensional hypersphere. Generate a vector from 100 normal random variables. Normalize it to unit length. Scale it by the 100th root of a uniform random variable.
@@donaldhobson8873 No, you wouldn't get a random direction in that way even in 2D. This chooses a random direction from a Hypercube, which overly prioritises pointing to the corners. To generate the direction you will have to use the generalisation of polar/spherical coordinates and work through their respective taylor coefficients. In 2D the taylor coefficient of polar coordinates is just r, which is where the uniform distribution for r of f(r)=2r comes from. In 3D it is sin(θ)r², so you would take the first angle from 0° to 180° using the PDF f(θ)=sin(θ), then generate the second angle from 0° to 360°, then generate r like you mentioned. The exponent of r will allways be d-1 for d-Dimensional spheres, which is why your 100th root formula works for r, but the different angles escalate quickly: The first (call it zeroth) angle is choosen from 0 to 360° uniformly. All other 1 to d-2 angles are chosen between 0° and 180°. The 1st (2nd) angle is chosen from the normalised version of sin(x), which is f(x)=sin(x)/2 The 2nd (3rd) angle is chosen from the normalised version of sin²(x), which is f(x)=sin²(x)/(π/2) The 8th (9th) angle is chosen from the normalised version of sin⁸(x), which is f(x)=sin⁸(x)/(35π/128) You may notice that the sins themselves just go up in exponent. What you will find when either trying to find it's inverse PDF, or just trying to find the normalisation constant, is that this is kinda hard to integrate. In fact the general indefinite integral of sin^n(x) is -cos(x)₂F₁(1/2;(1-n)/2;3/2;cos²(x)) uppon realising which the appropriate reaction is looking up what ₂F₁ is, followed by looking for a different problem to solve. This is btw. related to why it is super hard to find the volumes of hyperspheres, except this is a lot harder than even that. The conclusion is don't try to solve this in nD, unless you are looking for a topic for your masters thesis in mathematics. Also whatever you would come up with would likely just be slower than rejection sampling, to a more and more extreme extent the higher the dimension.
@@Redjard I specified normal random variables. Ie Gaussian bell curves. I know that using N uniform random variables won't work. When you take n independent gaussians, you get a rotationally symmetric N dimensional bell curve distribution. Scale the vector to unit length and you have a random point on the surface of a sphere. Then you just need to scale it by the Nth root of a uniform random.
I haven’t watched the entire video yet, but on the explanation of the problem my immediate first thought is since this is a circle, choose a a random angle where 0
I'm so glad the algorithm handed me this video. I've long thought about a random point in a square being easy, but not easy in a circle. I even thought to myself that just picking r & theta randomly was going to not work, but I wasn't able to fully quantify what the solution would be. This video was awesome! I think one more fun way to show that the pdf distribution of sqrt(rnadom) being the same as random+random but 2-r if r>1, would be to just graph both of them with random() on the x and r on the y. And then you can see with the second one that you flip the right half back over, double the heights, and they should match!
I came across this whilst trying to figure out how to plot shapes within a circle using vanilla JS and your video gave me the answer, whilst helping me understand how to get there. Thank you
I used PDF and CDF all the time in my statistics class, and I never really found it satisfying. I like them a lot more having seen these elegant geometric representations. Thanks!
Very nice video! Just a point to add about performance: Sometimes you really want to avoid an algorithm that relies on conditionals like an if statement. An if statement in the middle of a loop could very well throw a big wrench into your performance. So with that in mind, even if you disregard the learned lessons about CDFs and PDFs, the various methods are all quite useful in their own right :)
If you're worried about speed, remember that rejection sampling completely breaks down as n=>oo, since the volume of a hypersphere eventually approaches zero as the dimension grows. So the probability of a missed sample increases until it's 1. So if I wanted to randomly sample 20 dimensional vectors, my hunch is that a CDF is a better fit, and would perform much faster than a rejection sampling approach.
After watching, I was surprised to see this was a small channel with only one video, haha. Your structure and presentation are perfect. I'd thought about this before and done the math to figure out a square root was appropriate, but I had no idea about the reflected sum and maximum methods. I'll remember those if I ever need a random function with a distribution like this. (In most programming environments, either one should be much faster than a square root. I have no idea why the reflected sum was so slow in your Python benchmarks.) Even for finding random points in a circle specifically, these methods are potentially useful in practice if results are expected in polar coordinates, or if loops with unbounded iteration counts are unavailable or very inefficient.
Thank you so much for the kind words! I definitely should have mentioned how Python may perform differently compared to other languages and that it's not uncommon to need polar coordinates. In the future, I want to make a follow-up video that will touch upon this and maybe even expand into higher dimensions.
@@nubdotdev I have just benchmarked the four functions in Rust (because it's the only "native compiled" language in which I know how to write benchmarks). I got almost the same results: rejection well ahead (8ns), then square root (18ns), then maximum (19ns) and reflecting sum (19ns) (note: the time measures are worthless except for comparing the run time of the four functions between them) . After doing more tests, I found out that the sqrt method is only slower because it does one less call to "random". Using a different random number generator than the standard one or different float size gives the same result: rejection seems to always be the fastest method.
From the title and thumbnail I went through this entire thought process of rejection sampling, to polar sampling, realising the distribution would be uneven, to thinking about how to make it even, to returning to the cartesian rejection sampling, in the space of just a few seconds. I clicked on the video hoping you were going to show me a better way. You did a great job of exploring them thoroughly but it's refreshing that the simplest seems to be the best. Nice video!
Whenever i watch these kinds of vids i think about all those people passioned about maths and other subjects that had a hard time finding a book, not even dreaming about this kind immense repository of information, incredibly fast search and access, intuitive means of visualization, computing power and programs, easy communication within a global community. Today’s context for developing minds is even more impressive than today’s context for developing athletic performance.
I've watched 3 of these videos and they are all so amazing since they take ideas I've thought about in my free time and make them more rigorous and extend them.
Great video. I have to do a lot of sampling of different distributions for different computers over time. One thing I learned is what is fast for one coding language/computer may change over time. For example, I have seen people write less precise or faster version of sqrt, cos, or getting a random number. Like if your sqrt is fast, you can compute cos theta and then use sqrt(1-cos theta^2) for the sine.
Before watching this video, I spent a minute thinking about how I would do it. The sqrt method (shown at 10:10) is what I came up with first, but then I wanted to avoid the potential slowness of sqrt() and came up with a right triangle method: # Get random X and Y coordinates within a unit square x = random() y = random() # Confine coordinates to a right triangle within the square, by transposing (swapping) them if they're in the wrong half of the square if x > y: tmp = x x = y y = tmp if y != 0: theta = x / y * 2 * pi else: theta = 0 r = y return r * cos(theta), r * sin(theta) Compared to the methods shown in the video, this has the advantage of only calling random() twice instead of three times (useful if the random function is especially slow), while not calling sqrt(). It works by picking a random coordinate in a square, dividing the square into two right triangles, and transposing the coordinates if they're not in the desired triangle. Then the X coordinate is scaled from the width of the triangle at that Y coordinate (which is equal to Y), to the correct range for theta, and Y is treated as r. The function can be streamlined to the following: theta = random() # X coordinate in a unit square r = random() # Y coordinate in a unit square # Confine coordinates to a right triangle within the square, by transposing them if they're in the wrong half of the square if theta > r: tmp = theta theta = r r = theta # Treat Y coordinate as r, and convert X into theta if r != 0: theta = theta / r * 2 * pi return r * cos(theta), r * sin(theta) Plotted out with 1,000,000 and 1,000,000,000 samples: kingbird.myphotos.cc/circle_of_uniform_randomness_-_1e6_samples.png kingbird.myphotos.cc/circle_of_uniform_randomness_-_1e9_samples.png And although mathematically less pretty, it can be streamlined further: theta = random() # X coordinate in a unit square r = random() # Y coordinate in a unit square # Confine coordinates to a right triangle within the square, by transposing them if they're in the wrong half of the square if theta > r: r = 1 - r # the phase of theta doesn't actually matter; it doesn't have to be within [0, 2*pi], as long as the difference between its min and max values (for this particular r value) is 2*pi # Treat Y coordinate as r, and convert X into theta if r != 0: theta = theta / r * 2 * pi return r * cos(theta), r * sin(theta) Edit: This is actually a little bit slower than the sqrt method (10% slower with a very fast random() function). I still find it interesting, though.
I think your post summarises very well why micro-optimisation is root ;-) of all evil. Square root is one of the fast functions on (very likely) all modern processors complicated enough to contain sine and cosine. To the extent at which, for example, informed code optimisation within the compilers does leverage square roots and multiplication to avoid low-order half-integer exponents. (As a nice exercise on algorithm development, your method of course is nice and fun. Though, there might be another potential pitfall with odd behaviour due to the 'x/y' term and thus further coupling consecutive random numbers in the process-at least with some RNGs.)
An equiareal (area-preserving) mapping from the square to the circle that involves no cutting or overlapping is the Shirley-Chiu concentric mapping. Basically, it maps every concentric square into a concentric circle. This could be useful if you need your mapping from square to circle to be continuous and bijective.
One way to improve rejection sampling is to put copies of the circle in the corners of the square. You can even recurse that approach. Of course that takes extra computation time, but might be a useful trick if you are more concerned about efficiently using your entropy vs. computation cycles
pre-video guess: for the angle make it random(0, 2*pi), and your dist is 1 - random(0, 1)^2 to account for square law edit: pre-video desmos-screwing around: sqrt(5) * (x^(-x) - 1) is like super close but doesn't look completely right imma watch the vid now lol edit 2: it was literally sqrt(x) are you kidding me
I became so excited I found one more brilliant math channel. I anticipated I spend next hour whatching other videos on this channel and boom, found this is the only one yet! So I'm looking forward for new videos. This is so exciting to join a good channel at the very beginning! After a years you can tell, that you was subscribed to this million viewers channel from the very first video like telling you saw the first rocket launch by yourself!
Really interesting! There were more possible approaches than I expected, and I like that you explained the math and theory behind each. Curious that rejection sampling worked out to be so fast, but no trig calls and fewer random calls makes sense.
Cool stuff! Another reason it's useful to investigate alternatives to rejection sampling is that often, having a predetermined number of random numbers required to generate a sample is useful, or even having continuity of the output relative to changes in the input random numbers.
still, in the general case of the target space being irregular, i.e. not a circle, sphere, square, or any simple convex shape, but a space with a complex definition, the boundary check method is still the simplest to implement, execute, and modify : you just need to define the boundaries and the inside/outside check function
Although the average time for choosing a point in the disc is minimized by rejection sampling, it is also the method whose times have the highest standard deviation. If consistent performance matters (and sometimes it does in software), then there is a case for using the polar coordinate method.
Yes instead of just providing a total runtime of generating a few million samples it would have been more interesting to measure the standard deviation of individual sample times for each algorithm. Of course that requires a decent microbenchmark analysis. So there is more "left to do".
When you first showed using rejection sampling I thought to myself "why not use polar coordinates? but I don't know how to account for the results being more dense in the center". Then when you showed the maximum method gets the same results and proved it my mind was blown. Absolutely crazy amazing stuff
The Path of Exile developers have talked about how they ended up using the square root method for selecting where to have meteors fall in a circular area.
I saw the thumbnail and before I let the video play I thought about how I'd approach it. I landed on using vectors, where you would have a random number strictly less than radius as magnitude and and a random value between 0 and 1 to represent the angle. Needless to say I was quite happy when I saw you switch to polar coordinates.
Another (perhaps more) intuitive way is to match a random roll for radius to a corresponding percentage area of the circle. For example, a lets say random() gives 0.5 -> we then map the point at a radius (R) such that a circle with that R will have half of the maximum area. The probability distribution is the same.
What came to my mind is real time systems, where all steps need to take a precise time. No random delays are allowed, so the rejection sampling is simply not an option.
@@piteoswaldo It would also probably not be good if you needed to do this a lot and thus were using a SIMD processor like a GPU to run it. To use SIMD efficiently you want each processor in the group to take the input data and perform exactly the same instructions on it in lockstep with the rest of the group. If it is even allowed that one ends up taking longer the entire rest of the group will sit idle waiting for it to complete so it can return the results of all operations in parallel. That is just simply because of how SIMD works it reads in parallel and writes in parallel so branching code that stops the intermediate processing happening in lock step is inefficient.
Matt Parker recently did a video about the sqrt(random) and max(random) methods having the same distribution and the whole time I was thinking "where have I heard this before"
I love all the new SoME1 videos that are randomly showing up in my recommended and this is one of my favorites. As a fellow programmer, I've always wondered if this problem had a more elegant solution, and now I know!
I remember trying to figure out this EXACT problem recently, and I wasn't able to find any nice StackOverflow solutions, so that was really frustrating to me as I wasn't able to figure it out by myself either! This video is satisfying to me on a whole different level because of that struggle! :)
I needed it one time and i kjnew stuff about integrals and distribution functions, so i figured it out :) The first method was so boring. :D Wouldn't figure out the third and forth method. They were nice solutions.
This was really interesting to watch. Thankfully most usecases I've found for random circle positions are for random positions on the edge of a unit circle, which can be derived by just randomly rolling a number between 0 and 2π. I was kind of expecting rejection sampling to be the fastest, even if technically its runtime is unknowable.
Perfect example how the simplest solution is simply the best. I always used rejection sampling because it was very intuitive and I'm happy that it's also the most efficient.
Really great video; loved the animations! I wasn't aware of there being so many ways to randomly generate points on a disc. It looks like you put a lot of work and preparation into producing and reporting your findings. You sounded like a well-trained scientist/mathematician. It felt like I was being guided through a series of enjoyable, thought-provoking experiments. I think I may have found a way of making your other methods more competitive with the rejection sampling. I am curious whether the other methods could have beaten rejection sampling if we had optimized or done away with the sine and cosine function calls in the random-angle part of the code since the trigonometric functions seemed to be the bottleneck in the non-rejection-sampling approaches. This question has gotten me thinking of ways to improve the random angle part. At first, I thought about utilizing the Pythagorean identity for sines and cosines: 1 = cos(theta)^2 + sin(theta)^2. That alone would cut the number of trig function calls in half by allowing us to recycle one function call: sin(theta) = ±sqrt(1 - cos(theta)^2), where the ± would be positive for theta between 0 and pi and negative between pi and 2*pi. But then I thought about the possibility of using vector math to implicitly compute cos and sin. For instance, there are relationships between two of the ways of performing vector multiplication and trig functions: cos(theta) = dot_product(u,v) and sin(theta) = cross_product(u,v), where u and v are unit vectors. (Actually, the cross product results in a vector perpendicular to u and v, but the implementation is trivial for axis-aligned vectors.) But then I finally realized a simpler method. Note that randomly generating the ordered pair (cos(theta), sin(theta)) by using a uniformly distributed theta is just one way of picking a random point on the unit circle. You can also do this by generating a point directly on the unit circle just by generating a random unit vector, avoiding extra vector math, and skipping the trigonometric function calls entirely. Then you just scale this point using r to get a random point in the disc. Let x and y be two uniformly distributed random numbers between -1 and +1. Using the Pythagorean theorem to find the length of the hypotenuse of the random right triangle formed with the origin (0,0) and the two random axis-aligned points (x,0) and (0,y), you can normalize this random vector by dividing by the magnitude, turning it into a unit vector whose coordinates lie on the unit circle. The random unit vector is given as = /sqrt(x^2 + y^2). So the return statement would become "return r * u, r * v". One caveat: There is a nonzero probability on a computer of generating the zero vector (when both x and y are 0), albeit this is astronomically small, and this will result in a zero divided by zero error (NaN). It's a bit inelegant, but you could add a branch in your code checking for whether (x,y) = (0,0), and if so, return (0,0). Other solutions not requiring branching are possible too. You could instead divide the random vector by max(epsilon, sqrt(x^2 + y^2)) or divide by (epsilon + sqrt(x^2 + y^2)), where epsilon is some very small positive value. Interesting aside: The max function can be implemented without a branch (if statement) if you exploit the absolute-value function: max(a,b) = (a+b)/2 + abs(b-a)/2. Similarly, min(a,b) = (a+b)/2 - abs(b-a)/2. The abs() function doesn't require a branch for floating-point numbers since simply setting the sign bit of a floating-point number to zero makes it positive. Branches are important to avoid because they impede instruction-level parallelism and don't play well with SIMD/GPU architectures. Alas, I'm not familiar with Python's implementation, so you should always use a timer to determine what's most efficient.
Your methods will scale very differently when you start to scale them to higher dimensions, starting a sphere which only fills 52% of a cube, so the cost of rejection goes up to 1.9. And when you get away from geometrical realities and start looking at problems with much more dimensions, rejection becomes very rapidly very exensive. For a 10-sphere (the equivalent of a sphere in a mathematical system with 10 independant variables) you are already rejecting 99.75%, so only every 400th sample is usefull. The other methods get more complex as well, of course, but generally not as badly as the rejection approach.
There are analogous results for picking points in n-balls and on their surfaces (i.e. on (n - 1)-spheres). They all revolve around finding a suitable scaling for the volume or surface element (basically, the Jacobian). They fall under the general rubric of disc and sphere point-picking (q.v.). For the rejection sampling technique (a class of Monte Carlo method), there is also a related question about the probability of the sum of n uniform variates on [0, 1] being
The primary reason is that computing sin/cos to full double precision accuracy over full range is really expensive. In fact most modern processors do not implement it anymore, and it is done in software using CORDIC algorithm, with many steps. Even if implemented in microcode (i.e. fsincos function of x87), it takes between 190 and 240 cycles. If you know the input is in small range, and you are willing to sacrafice accuracy (i.e. float, not double, and bigger errors in result than 1 ULP), then vectorized approximation or GPU (which most of them do not have guarantees about sin/cos - but they are fast, sometimes just 4-5 cycles), can do it faster. The branching is not a big issue, on CPUs, because of the time this backward branch is not taken, and CPU can speculate without knowing the outcome of the branch. On GPU and CPU probably quality and speed of your random number generator could play a big role too.
Truly a very interesting problem. It also nicely connects to multi-variable calculus. Specifically the Jacobean determinant which can be used to verify that the function has the desired area-preserving property.
Thanks. I won't go into details, but this has given me an insight into a completely different problem: creating a random playlist from a list of audio tracks (or shuffling a pack of 52 cards).
Me just putting points in a square randomly and checking the distance from the point to the center xD and if that is larger than the radius of the spehre I move them
Great job! Very interesting video. I love that you actually coded each solution up, to make it concrete. I just wanted to point out that even though rejection sampling is faster on average, it is less deterministic, so for an application where maximum latency is important, it might be worth going with one of the other solutions.
I had to scroll down way too far for this 😅 It's not just less deterministic, in theory it's not deterministic at all. In theory it's possible this algorithm will never finish. But also in many use cases users will accept a longer response as long as it's consistent. It always depends on your use case, there is no single true or even better answer 👌
I dabble in programming sometimes due to my studies, and I am glad to see that rejection sampling, which I immediately thought of when finding a random point on a sphere was mentioned, turned out to in fact be the fastest. I completely would've fallen into the trap of not realizing that I need to take the square root of r in polar coordinates, though. Stresses the importance of checking that the code really does give the results it was supposed to especially when dealing with probabilities and randomness. I wonder if there are games out there with an incorrect implementation of the probability due to someone doing that mistake with r.
Man I really loved it. I had come across this problem once, but left it at square root method, considering it to be done and dusted. I am amazed at the elegance of the other methods you showed out here. Thanks a lot...
I’m not really good at math or comp sci but I thought of thing where you take concentric circles then unwrap them and it makes a triangle with the height of r and base of 2pir. I sort of aligned the height along the y axis. So you get a random x and y coordinate between those bounds. You’d still have to flip half of it along the diagonal or rotate it some how. Seems like it’d be fast, but I’m not sure if that’s basically one of the ones you did or how to implement it.
Yup, that is another option! Interestingly, although the logic isn't the same as the "lots of tiny triangles" logic, the resulting algorithm is essentially the same. There are lots of ways to flip things along the diagonal or rotate them or whatnot to produce a distribution within a triangle; some may be faster than others but they're all pretty similar.
My favorite thing about this solution (maximum-comparison) is that it makes a function of points in a cube onto points in a circle, which isn't something I would normally think of as possible! This was a great prompt and a fantastic working out.
@@davidjames1684 depends a lot on how fast the thing you're trying to precompute is since randomly accessing memory is normally quite slow due to cache misses. in this case I think doing the rejection sampling again every time might actually be faster if the rng isn't slow
It's safe to assume that std functions like that are really well optimized. That said, they have a very specific purpose. Depending on you application you may be able to get away with using an approximation (such as a Taylor series expansion).
@@pvic6959 Well, floating point numbers are weird. They can have values such as infinity, negative infinity and a load of Not a Number values. My guess is it is slower in order to deal with such edge cases.
@@nubdotdev you're really good at it! It looks like you don't use the default fonts or text reveals. Is your project saved on GitHub? I'd love to see it :)
Very well presented! For why rejection sampling is faster, consider the time for and number of function calls. Per point returned, rejection sampling needs on average 4/π calls to sqrt() and 4/π to rnd(), and the other methods all need 2 trig. calls and 2 rnd(). Since sqrt() and trig. functions have roughly similar execution time on FPU, that means rejection sampling needs a time of t_point = 4/π × (t_FPU + t_rnd) and the others t_point = 2 × (t_FPU + t_rnd), so the expected ratio of execution times is 2/π ≈ 0.64. For total runtime, add a certain base time for loop overhead, equal across all methods. A small-ish t_overhead < t_point then explains your observed execution times.
This is a nice video to understand the Box-Muller transform - which needs as a starting point a uniformly distributed point in a unit circle and then generates 2 normally distributed random numbers from that.
It was such a relief to see you finally get around to mentioning execution time... I spent the whole video looking at all those trig functions and square roots internally screaming about how slow those methods were :)
Hmm, I did exactly this (the first 100% success method) about a year ago, but I wasn't able to explain it to anyone with my explaining skills. xD (and it also took me about 6 hours to make it)
BEFORE YOU COMMENT: If you're about to ask why you can't just pick a random radius r and angle theta between 0 and 2pi, please watch the video. I address this point at 4:18. This video has gotten so much more popular than I thought it ever would! This has been really inspiring and I will definitely continue to make content. Thank you all so much for your questions, comments, and criticisms. I'll respond to some of them here: 1. Some of you have suggested a way to pick 2 cartesian coordinates without rejection by setting x to a random value on the interval [-1, 1] and y to a random value on the interval [-sqrt(1-x^2), sqrt(1-x^2)]. While this does select a random point in the disk, it is not uniform. Points will be much more densely packed on the left and right edges. 2. My final conclusion about which algorithms were the fastest was kind of hasty. Firstly, polar coordinates are preferred over cartesian, which would make rejection sampling far slower than the other algorithms. Secondly, the three algorithms that do use trig functions could be greatly optimized with the use of precomputed trig tables and linear interpolation. Finally, I probably should have used a better language for speed than Python when testing everything.
The thing you REALLY need to worry about is (quickly) getting truly random numbers -- good luck going down that rabbit hole!!! :-) [ oh, and bring your checkbook... ]
You have a real knack for concisely and accurately explaining complex concepts in easy to understand language, and your graphics are excellent! You should make more videos like this one!
@@davidjames1684 Hahaha. Oh wait... I should explain: Truly random numbers (with the emphasis on TRULY random) are surprisingly hard to generate with a deterministic machine such as a computer. That is indeed a rabbit hole to go down. The checkbook is probably for getting an extension card with a radioactive sample as a non-deterministic data source. That or for 1 HD-camera and 256 lava lamps.
@@TheAgamemnon911 Not really. Are you forgetting that computers have excellent memory capabilities, therefore, a bunch of people could flip coins a bunch of times, record the results in the computer, and use that as the random number generator (starting at some random point in the recorded sequence)? Other methods have been shown to be quite random as well. I have done many Monte Carlo type simulation programs involving the use of pseudo random numbers, and have always gotten very close to the mathematically correct answer, therefore confirming the PRNG is quite good, basically good enough.
Very nice video man, and clearly explained. I have a maths background but have been learning Python, so the Python implementations were a bonus for me. Keep up the good work - subscribed!
Your voice is sharp and clear, not too fast, not too slow, not slurred and not intimidating. You have a dynamic way of speaking, which is very engaging. Basically I'm saying you deserve the attention because this is a high-quality video. I love that 3b1b's project is inspiring content like this.
Extremely good video! I loved pausing and trying to figure out why certain things were true and how I personally would have tried going about it. The feeling of having everything click is great. I also love how in the end rejection sampling ended up being the fastest method lol
I never really did math competitions, and now as a computer science student I wish I was better at math since it is so important especially when it comes to algorithms (so many competitive programming problems are just math now), and the logical mindset that math encourages is extremely valuable. It's really daunting imagining how much I don't yet know and need to learn, but the beautiful solutions presented in these types of videos inspire me to keep pushing forwards (:
My guess what this video is going to show: The simple method you should use: random point in square, reject if x²+y²>1. But what's probably gonna be mentioned: theta = random; r = sqrt(random). However, in practice, the first method will be more resilient to numerical precision issues, and runs faster because you generally don't need to do many passes. The main value of alternative methods is being able to implement it branch free/constant time, which can be beneficial for security applications.
The issue with square root method, is not the square. It is the cos and sin. Modern hardware does not have functions to compute them as single instruction, because it would be still approximated using series expansion and looping and take time. Square root is implemented similar way, but it is way faster to do (using initial approximation using tables, then doing one or two iterations of rapidly converging method, like Newton-Raphson method), no matter the size of the input value, but with trig functions it is harder. So even if you compute sin and cos at the same way (they do have same series coefficients and values, just with different sign), it will be pretty slow, if you want to compute to high accuracy and for any input value. However, if you know that your input is 0 to 2pi, and you are find with reduced precision, there are tricks to compute sin and cos faster. Usually Taylor series, even with range reduction first, is not used for this, because these series converge slowly, especially far from 0. Usually a CORDIC algorithm is used with precomputed rotations, or few terms using Chebyshev interpolation. In some cases, where the input is small and the accuracy does not matter (games, UI toolkits), a lookup table with simple interpolation is used, or very crude approximation using just 2 terms). For sin and cos on GPUs is very fast, but has reduced accuracy, both of input and output result. I tried, the sqrt + sincos method, in C, and compiler flags to emit x87 fsincos instruction to hardware directly (-Ofast -mfpmath=387), and computer 100 million random points (and make sure it is not optimized away). sqrt+sincos method is about 3.55 seconds. Using rejection sampling, with x87 gives me 2.20 seconds, but if using sse2 (standard for a decade), it is 1.92s. The primary reason is the vectorization, and branch prediction, that see that the loop is taking usually 1 iteration, as we almost always land inside the disk. I used drand48 for random numbers, which is not the fastest method, but both approaches uses 2 calls to them, so i decided to consider that aspect not important. There is however another aspect to consider. Even if the sqrt, sin and cos were ultra fast, the resulting points will in fact NOT be uniformly distributed over the disk. This is due to the rounding errors, and non-linearities in all these functions. The effects would be very small, but would be there. The rejection sampling method does not have this issue (one could argue that there might minor issue extremely close to the edge, but it would be hyper small).
This should technically be titled the sexy title of: "The time optimal way to generate a random point uniformly in a circle". Finding a point is a geometric search problem. Fun video. Well produced, good microphone, and made me realize I should brush up on my statistics!
Great video, you really dove deep on the details! I've used the first method in game programming. With uniform random number generator from a simple PRNG, rejecting any outside of the circle. The additional twist is I only used X (or Y) and allowed me to generate numbers with a Normal distribution. Very handy for role-playing games, enemy AI, and rogue-like map generation. And can be done with just integer math and relatively cheaply even on the old 16-bit system I was programming on.
The problem of unevenly distributed samples using polar coordinates at 4:32 comes up in the physics behind CT scans in medical imaging. Without getting too technical, a common workaround is to weight points further away from the center more to even out the distribution, or to simply interpolate between samples. Edit: now that I think about it, disks also have this problem (CDs, HDDs, and even vinyls). Either the disk rotation needs to be able to change speeds for data to be stored evenly spaced across different radii, or the data spacing needs to be different across different radii for the rotation speed to be constant.
If we assume, that factory recorders scratch the image discs with constant precision, vinyls are the only music format with the decreacing sound quality (more and more wavepeaks per cm of track).
for some reason having a infinite while loop that breaks when a random chance hits always gives me stress that it might hang even tho mathematically it's very low chance
Please make more like this. If you become 3b1b for programming you will be my new favorite youtuber. This was wondering and engaging to watch. Just subbed!
Love and respect the math, but feel like sometimes programming for performance is about cheating the calculation of functions. I might be curious how well something like the fast sqrt constant might work to optimize the performance of operations
I think the square root calculation is based on Newtons method, and sine amd cosine on taylor series. A computer is fast, but these operations naturally slow things down. Multiple multiplications, additions and even taking derivatives in the case of the sq root are needed.
Can you explain this in shapes and colors?
yellow triangle
[Green triangle] [orange star] fractal point [circle sector] circle
Grant (3b1b) and Matt Parker (standup maths) recently did a video together on why max(random(), random()) has the same distribution as sqrt(random()), and honestly, I really like your video here better! Having the classic challenge of plotting a uniform distribution of points in a circle as a motivating example made the whole thing a lot more concrete for me. Not to mention that you made this 3 years ago!
It also helped me intuitively understand why you need to multiply or divide by r when integrating or taking a derivative in polar coordinates.
Great video! ❤
Really good video. When it started I thought the rejection sampling method was really dumb: “why would he use Cartesian?”. So I paused the video and went ahead to implement it in polar coordinates. Ran the test and most of the points were in the center “well sh*t why is it like this?”. I then decided to keep watching and I am so grateful because this is one of the best videos I’ve seen in this vein. Definitely on par with the 3b1b videos but with a special you in it
Thanks, that means a lot! I went through the same experience when I was first exploring the problem.
Rare praise indeed. Justified too. Old UK duffer here :)
Thanks for censoring shit
Exactly my thoughts too! Great video indeed.
@@nubdotdev Honestly I immediately found myself saying nope it will need to be sqrt(r) but only because it hit me that the reality of what the algorithm is doing is selecting a random point on a random circle. After all r defines the radius of the random circle and theta defines a random point on said circle. Worse I realised that this problem cannot be fully solved on a machine with finite precision, the space of all possible points this method can generate on a machine with finite precision is denser towards the centre. Using the sqrt technique to compensate for this uneven density of possible points may solve the issue visually and work for most purposes but it does not change the fact that no two points at r=0.8 can be as close together as those at r=0.1 etc the possible points at r=0.8 are simply more sparse due to the finite angular resolution of theta when this number is limited by a machine with limited precision.
Doing a whole lot of math only to arrive at the conclusion that the easiest solution was already the best... If you're not even mad about that: Congratulations, you're thinking like a mathematician.
And if you're thinking like a software engineer you'd often prefer a constant_time algorithm to a faster_on_average one.
@@aescafarstad Nah. The algorithm is so simple that either you're running a small number of cases and it doesn't matter if you're unlucky, or you're running a large number of cases and the chance of getting noticeably unlucky is so low that it'll never happen in a trillion years
@@ObsessiveClarity There are case when you can't think like that, i work on real time code, you usually don't use code that can have a unpredictable time execution variability (even if it is rare) , constant time if the best way to go in that line of work
I don't know if you're being serious, or are just trying to be funny.
Assuming that you're being serious:
Intuition may give you the best result without any conscious effort and without you taking a lot of time. But if you wanna be sure about things, and don't wanna live in a castle with pillars of salt then _Mathematics is the way to go_ .
@@aescafarstad Readability of code is an important part of software engineering. We just saw a 18 min video that took a lot of time and effort to make. It was also aimed at an audience with a special interest in math. Documenting that type of code yourself is gonna be a pain in the ass for such a simple task.
It is also possible to make an upper bound of say 100 loops. Then you get constant time without any problems before the heat death of the universe.
Bro, ol mate just randomly decided to drop a professional level video for no reason. Legend
Lol yeah. I looked too
And everything about these solutions and the fact that he has made this video is wholly imbecilic
For 5000$, actually
@@ivanjermakov huh
@@official-obama This was a submission to 3blue1brown's Summer of Math Exposition.
A couple of years ago I was trying to make an animation for my presentation. I wanted to show electrons on the surface of nanoparticles to teach how they are influenced by the electron magnetic field. I did a 3D sphere and put a bunch of random electrons. For some reason I wasn't able to understand UNTIL TODAY, they were all clustered in the center of the sphere. It was so annoying, I ended up doing something like the rejection sampling. Thank you!
Note that rejection sampling gets worse and worse as you increase the dimension, which is also an interesting math factoid to work through.
Similar thing happened to me. Was making a brownian dynamics simulation and had to randomly distribute molecules on the surface of a sphere. I learned you can use a uniform distribution for phi (angle from x-axis, 0 to 2pi) but then you can't for theta, it'll be grouped at the poles. Instead, choose a uniform distribution for z. (Or equivalently, cos(theta): get a random number from -1 to 1 then do acos to get theta.)
But that was for a single r value. To fill the whole sphere, since each shell has an area dependent on r², I guess you'd probably take the inverse of the integral like in the video and you'd do ³√(U) to get a random r.
@@pfeilspitze yeah in some high dimension, the sum of squares being less than 1 is very unlikely. I can think of a big sphere in a big cube all I want, it just works much differently in, say, 9D.
The raw polar coordinate generation could be incredibly useful for calculating a random bullet spread in a shooter, that cleanly increases the longer you shoot.
For the shooter case, I would use two gauss-distributed random values for the x and y coordinate.
I assume that to be the most realistic approach.
@@lukasf3070 no its not necessarily realistic, human hands provide better stability in a certain dimension and the gun tends to recoil upwards, so its neither a simple x-y spread nor a radius-angle spread
@@nathariel666 interesting. I bet competitive small bore shooters will know from experience about any non uniform distribution caused by other factors. Love the video :)
@@nathariel666 But probably described rather well by using two gauss-functions with different variances in x & y.
@@ShieTar_ if by "rather well" you mean not even approximately good then yes
The other methods are still useful in SIMD architectures aka (GPUs), where arbitrary length loops cause significant performance issues.
GPUs do have fast sin/cos, way faster than the ones on CPU, but the accuracy of sin/cos on GPU is usually way lower (like crazy lower). So you need to know exactly what are you doing to select one over the other.
They're still more accurate than things like lookup tables. And you can use either the high precision/slow algorithm or the low precision/fast hardware accellerated instruction as needed. Actually, I think their high precision algorithm may use the hardware accellerated instruction to shortcut the evaluation, using the magic of polynomial approximations of increasing order (the hardware does a low order polynomial approximation, and then you can extend it "by hand").
In any case, a 32-wide SIMD pretty much guarantees that at least one sample will be rejected on the first round, meaning you will almost always need multiple iterations. Also, any halfway decent random number generator is somewhat expensive (probably comparable to a high precision trig function.), so the retry is going to be more expensive that it appears at a glance.
I'd be surprised if the rejection sampling algoritm outperformed any of the other algorithms on GPUs.
@@Keldor314 Amusingly, "halfway decent RNG" might be a stretch for most GPU rendering applications... Ex. for a shader operating once per pixel [x,y], a "random number" might just be an arbitrary hardcoded dot product ([p, q, k] .* [x, y, 1] mod m) for each "generated" random number, with different out-of-butt values for p,q,k,m each time. (And then, you might norm a uniformly random 3-vector using the fast-inverse-sqrt trick, and just accept the resulting density distortion.)
@@AySz88 You can write a perfectly good RNG for use in compute shaders. Essentially any algorithm that works on a CPU will work perfectly fine on a GPU, but there are some differences in performance and memory characteristics that should inform you what algorithm to use.
The tricky thing is that GPUs run 100's of thousands of threads in parallel, and each thread is probably going to need its own RNG state. Moreover, if the RNG is in a hot code loop, its state really needs to live in registers, the cache or in local shared memory for performance reasons. Thus, you want to choose a RNG with a relatively small state footprint. Another thing to consider is that integer division is fairly slow (similar to transcendential functions), so it's best to avoid algorithms that use it, although I think it's fallen out of favor for RNGs anyway.
The RNG I used for my work is a variant of a XORshift generator. For each RNG, there were 4 independent XORshift generators, which would be combined together to produce the next random number. For performance, I had the generators overlap between RNGs for different threads, so a SIMD warp of 32 threads would have each thread would update a single XORShift generator, and combine it with the states from 3 adjacent threads, which imploves the properties without needing extra state over a fairly basic RNG such as LCG. It's a fairly solid RNG, probably usable for anything other than cryptography.
What about GPU's made for precision?
Worth mentioning that rejection sampling simply isn't practical in higher dimensions. The odds work against you and you reject most samples.
Or in situations where branch divergence is an issue. Not sure about the newest GPUs, but at least in the past, just one thread in a group of threads rejection a sample would be equivalent in performance to all of them rejecting it, as they have to wait for that one thread to complete.
I haven't had any problems with rejection sampling on spheres in 3d as the equation is xx+yy+zz=rr. Haven't had the need to try in higher dimensions than 3 yet.
@@kylefillingim6258 he is probably talking about high-dimensional problems. Could occur in machine learning applications or other data analysis, you can have 1000s of dimensions. If i am not mistaken, the volume of an n-dimnesional hypersphere converges to 0 with n approaching infinity, whereas the volume of a hypercube of side length 1 is always 1.
@@kylefillingim6258 The probability to accept a sample in rejection sampling scales down exponentially with the dimension. The dimension is the number of variables, so this is a big deal for problems with hundreds or thousands of variable. Although rejection sampling becomes inefficient, it's not hard to sample points uniformly in a high dimensional sphere because other methods scale efficiently. If you take an N-component vector (x_1,...,x_N) where each component is a Gaussian random variable, then normalize the vector, it will be a uniformly random point on the N-sphere. In general sampling points of a N-dimensional volume (which is equivalent to estimating the volume of the shape) is believed not to be possible in poly(N) time without additional assumptions like convexity of the volume.
@@kylefillingim6258 It looks like for a sphere in three dimensions, you'd be rejecting 48% of your samples -- i.e. keeping a little over half -- and needing to run the algorithm on average about twice.
Success rate = (volume of circle radius=1 / volume of cube 2x2x2) = (4π/3)/8 = π/6 ≈ 0.5236.
And, your E(x) would be 1 / success rate = 6/π ≈ 1.909
As a mathematics and a computer engineering student, i have to say I loved this video so much, even though it has some imperfections you already mentioned in your comment. This is the stuff that makes me love my studies
I could not agree more.
18:00 "Sine and Cosine operation"
Not entirely correct (although in python it probably is), this is an entirely different rabbit hole.
While sin() and cos() pretty much take the same time, calculating both of them for the same angle does not have to take twice the time.
GCC for example does an optimization where it detects such cases at compile time and calls a sincos() function instead, which is specifically optimized to calculate both the sine and cosine and is closer in time to just a single call of either sin() or cos(), than to calling both sin() and cos().
Another thing with compiled languages that I've not tested but which might be interesting would be to see how much these are effected by branching.
Rejection sampling will inevitably contain branching. A maximum can be done as it's own instruction so that shouldn't need any and the reflecting sum could be transformed into a branchless version like 1 - abs(rand() - rand()).
Wouldn't the compiler optimize it to a branchless version if possible?
@@InfoTeddy Not necessarily. I tried a few examples, a maximum like if(b > a){ a = b; } is optimized into a maximum by most compilers.
However, the second case if(r > 1) { r = 2 - r; } most compilers don't detect. Even a simpler version like writing the absolute value like if(r < 0) { r = -r; } is not caught by most compilers. Clang does optimize both of the these, although the resulting assembly code is much more complicated than what you would get if you just use an abs() function.
Would a JIT, like Numba, give any advantage to a particular method?
but abs() also requires branching
@@wedding_photography In a floating point number the first bit is used for the sign after that comes the exponent and then a significand field (Look up Floating-point arithmetic if you want a more in depth explanation). What this means for an absolute value is that only the first bit needs to be set to zero, this is usually done by performing a bit-wise and operation.
When I needed to generate uniform random points in a circle, the method I came up with was to first generate random points in a triangle with base r and height 2*π*r. Then I applied a transformation to "roll" the triangle around itself to form the circle. Each vertical slice maps to the ring with the same length.
Interesting, can you please tell what transformation do you apply ?
Also, as for the 'rolling' of the triangle, I guessed 'rotating' the triangle, then the point joining the base and height traces the circle, but that doesn't tell the importance of 2πr height, do tell if it's wrong 😅
@@kg3217 Picture a right triangle sitting on the x-axis with the right angle at (R,0), one vertex at the origin, and the third vertex at (1, 2πR). The transformation for a point (x, y) in the triangle is r = x and θ = y/x. The height is 2πR because points along the vertical edge map to the outer circumference of the disk. You could do it with a 1x1 triangle instead, in which case the transformation would be r = x*R and θ = 2π*y/x, but the way I did it the "rolling" step is an area-preserving transformation.
As for the "rolling", picture slicing the triangle into lots of vertical strips, then bend each strip around into a ring so that it connects with itself at the base. Each strip has height 2πx and distance x from the origin, so the rings together form a complete disk.
Could you use barymetric coordinates for this? i.e. make the triangle vertices 0,0 0,1 and 1,0 with the test for containing the value being x+y
It might be a bit faster that way. I'm not sure though. You'd need to adjust the final transformation, but maybe you could combine a couple of the multiplication steps that way. For points in the upper half of the triangle, I just rotated them into the bottom half before applying the transformation rather than using rejection sampling.
I have watched almost every SoME1 video that I could find, and this is the only one that seems to actually construct an algorithm out of a mathematical abstraction, bravo for that...very relevant. I added this to the list of all SoME1 videos that I could find. @
ruclips.net/video/MsNQtj3zVs8/видео.html
I made a mediocre SoME1 video for your list (well, I didn't make it for your list, but, you know): ruclips.net/video/bkjpn9M7Cfg/видео.html #shamelessplug
"instead of transforming a distribution, reject any results that fall outside of the intended range."
StalinSort: remove any elements that are not in sorted order. Congratulations, your list is now sorted.
As a computer graphics artist, these concepts are indeed very useful in real world implementations, scattering points in various controlled ways is a super common task. The issue with polar coordinates came intuitively to me from experience of manipulating geometry on a daily basis, but seeing more of the theory of why gave me deeper understanding past the intuition, cheers for that. Rejection sampling seemed like a costly prospect at first, but I'm always worried about performance when there's sqrt's in the algorithm, cool to see that the performance justified the brute force algorithm after all!
This was fascinating, especially as someone who's often felt that a random sampler wasn't doing what I expected it to!
3b1b has done a wonderful thing with that challenge
What a great video! Asking all the right questions at the right time.
Two things come to my mind.
1) The "curse of dimensionality" [1]. When sampling points on a hypersphere in N dimensions, the rejection method becomes worse and worse, because the ratio of volumes between hypersphere and hypercube shrinks. It would be interesting to see a comparison of the different methods with the dimension as the parameter.
2) If it is the cos/sin calculation that slows the computation down, would it be possible to speed up the algorithms using a precomputed table (with linear interpolation)?
[1]en.wikipedia.org/wiki/Curse_of_dimensionality
I thought about adding a segment on higher dimensions but I ran out of time for the competition. I would like to explore that in a future video.
I'm willing to bet that using a trig table could really speed up the other three algorithms and I'll definitely try it out.
Thanks so much for your feedback!
@@nubdotdev looking forward to it :)
If you want to sample a point in a 100 or 1000 dimensional hypersphere. Generate a vector from 100 normal random variables. Normalize it to unit length. Scale it by the 100th root of a uniform random variable.
@@donaldhobson8873 No, you wouldn't get a random direction in that way even in 2D. This chooses a random direction from a Hypercube, which overly prioritises pointing to the corners. To generate the direction you will have to use the generalisation of polar/spherical coordinates and work through their respective taylor coefficients.
In 2D the taylor coefficient of polar coordinates is just r, which is where the uniform distribution for r of f(r)=2r comes from. In 3D it is sin(θ)r², so you would take the first angle from 0° to 180° using the PDF f(θ)=sin(θ), then generate the second angle from 0° to 360°, then generate r like you mentioned.
The exponent of r will allways be d-1 for d-Dimensional spheres, which is why your 100th root formula works for r, but the different angles escalate quickly:
The first (call it zeroth) angle is choosen from 0 to 360° uniformly.
All other 1 to d-2 angles are chosen between 0° and 180°.
The 1st (2nd) angle is chosen from the normalised version of sin(x), which is f(x)=sin(x)/2
The 2nd (3rd) angle is chosen from the normalised version of sin²(x), which is f(x)=sin²(x)/(π/2)
The 8th (9th) angle is chosen from the normalised version of sin⁸(x), which is f(x)=sin⁸(x)/(35π/128)
You may notice that the sins themselves just go up in exponent. What you will find when either trying to find it's inverse PDF, or just trying to find the normalisation constant, is that this is kinda hard to integrate. In fact the general indefinite integral of sin^n(x) is -cos(x)₂F₁(1/2;(1-n)/2;3/2;cos²(x)) uppon realising which the appropriate reaction is looking up what ₂F₁ is, followed by looking for a different problem to solve.
This is btw. related to why it is super hard to find the volumes of hyperspheres, except this is a lot harder than even that.
The conclusion is don't try to solve this in nD, unless you are looking for a topic for your masters thesis in mathematics.
Also whatever you would come up with would likely just be slower than rejection sampling, to a more and more extreme extent the higher the dimension.
@@Redjard I specified normal random variables. Ie Gaussian bell curves.
I know that using N uniform random variables won't work. When you take n independent gaussians, you get a rotationally symmetric N dimensional bell curve distribution. Scale the vector to unit length and you have a random point on the surface of a sphere. Then you just need to scale it by the Nth root of a uniform random.
Damn, this was actually pretty useful.
You might have fallen down a rabbit hole, but you also created a path out for those who follow behind you!
I haven’t watched the entire video yet, but on the explanation of the problem my immediate first thought is since this is a circle, choose a a random angle where 0
I'm so glad the algorithm handed me this video. I've long thought about a random point in a square being easy, but not easy in a circle. I even thought to myself that just picking r & theta randomly was going to not work, but I wasn't able to fully quantify what the solution would be. This video was awesome! I think one more fun way to show that the pdf distribution of sqrt(rnadom) being the same as random+random but 2-r if r>1, would be to just graph both of them with random() on the x and r on the y. And then you can see with the second one that you flip the right half back over, double the heights, and they should match!
sorry about spacing, youtube comments are hardcore buggy
I came across this whilst trying to figure out how to plot shapes within a circle using vanilla JS and your video gave me the answer, whilst helping me understand how to get there. Thank you
I used PDF and CDF all the time in my statistics class, and I never really found it satisfying. I like them a lot more having seen these elegant geometric representations. Thanks!
Very nice video! Just a point to add about performance: Sometimes you really want to avoid an algorithm that relies on conditionals like an if statement. An if statement in the middle of a loop could very well throw a big wrench into your performance. So with that in mind, even if you disregard the learned lessons about CDFs and PDFs, the various methods are all quite useful in their own right :)
If you're worried about speed, remember that rejection sampling completely breaks down as n=>oo, since the volume of a hypersphere eventually approaches zero as the dimension grows. So the probability of a missed sample increases until it's 1. So if I wanted to randomly sample 20 dimensional vectors, my hunch is that a CDF is a better fit, and would perform much faster than a rejection sampling approach.
I'll keep this in mind next time I have to randomly sample from an infinite dimesional hypersphere
@@pehdfms8621turns out there is no such thing as a uniform measure on a infinite dimensional ball :(
After watching, I was surprised to see this was a small channel with only one video, haha. Your structure and presentation are perfect.
I'd thought about this before and done the math to figure out a square root was appropriate, but I had no idea about the reflected sum and maximum methods. I'll remember those if I ever need a random function with a distribution like this. (In most programming environments, either one should be much faster than a square root. I have no idea why the reflected sum was so slow in your Python benchmarks.) Even for finding random points in a circle specifically, these methods are potentially useful in practice if results are expected in polar coordinates, or if loops with unbounded iteration counts are unavailable or very inefficient.
Thank you so much for the kind words! I definitely should have mentioned how Python may perform differently compared to other languages and that it's not uncommon to need polar coordinates. In the future, I want to make a follow-up video that will touch upon this and maybe even expand into higher dimensions.
@@nubdotdev I have just benchmarked the four functions in Rust (because it's the only "native compiled" language in which I know how to write benchmarks).
I got almost the same results: rejection well ahead (8ns), then square root (18ns), then maximum (19ns) and reflecting sum (19ns) (note: the time measures are worthless except for comparing the run time of the four functions between them) . After doing more tests, I found out that the sqrt method is only slower because it does one less call to "random". Using a different random number generator than the standard one or different float size gives the same result: rejection seems to always be the fastest method.
From the title and thumbnail I went through this entire thought process of rejection sampling, to polar sampling, realising the distribution would be uneven, to thinking about how to make it even, to returning to the cartesian rejection sampling, in the space of just a few seconds. I clicked on the video hoping you were going to show me a better way. You did a great job of exploring them thoroughly but it's refreshing that the simplest seems to be the best. Nice video!
It's refreshing to see that I'm not the only one who does this
Whenever i watch these kinds of vids i think about all those people passioned about maths and other subjects that had a hard time finding a book, not even dreaming about this kind immense repository of information, incredibly fast search and access, intuitive means of visualization, computing power and programs, easy communication within a global community. Today’s context for developing minds is even more impressive than today’s context for developing athletic performance.
I've watched 3 of these videos and they are all so amazing since they take ideas I've thought about in my free time and make them more rigorous and extend them.
So far I've learned about
- Bernstein polynomials
- Totient sum
- Irwin-Hall Distribution
Great video. I have to do a lot of sampling of different distributions for different computers over time. One thing I learned is what is fast for one coding language/computer may change over time. For example, I have seen people write less precise or faster version of sqrt, cos, or getting a random number. Like if your sqrt is fast, you can compute cos theta and then use sqrt(1-cos theta^2) for the sine.
Before watching this video, I spent a minute thinking about how I would do it. The sqrt method (shown at 10:10) is what I came up with first, but then I wanted to avoid the potential slowness of sqrt() and came up with a right triangle method:
# Get random X and Y coordinates within a unit square
x = random()
y = random()
# Confine coordinates to a right triangle within the square, by transposing (swapping) them if they're in the wrong half of the square
if x > y:
tmp = x
x = y
y = tmp
if y != 0:
theta = x / y * 2 * pi
else:
theta = 0
r = y
return r * cos(theta), r * sin(theta)
Compared to the methods shown in the video, this has the advantage of only calling random() twice instead of three times (useful if the random function is especially slow), while not calling sqrt(). It works by picking a random coordinate in a square, dividing the square into two right triangles, and transposing the coordinates if they're not in the desired triangle. Then the X coordinate is scaled from the width of the triangle at that Y coordinate (which is equal to Y), to the correct range for theta, and Y is treated as r.
The function can be streamlined to the following:
theta = random() # X coordinate in a unit square
r = random() # Y coordinate in a unit square
# Confine coordinates to a right triangle within the square, by transposing them if they're in the wrong half of the square
if theta > r:
tmp = theta
theta = r
r = theta
# Treat Y coordinate as r, and convert X into theta
if r != 0:
theta = theta / r * 2 * pi
return r * cos(theta), r * sin(theta)
Plotted out with 1,000,000 and 1,000,000,000 samples:
kingbird.myphotos.cc/circle_of_uniform_randomness_-_1e6_samples.png
kingbird.myphotos.cc/circle_of_uniform_randomness_-_1e9_samples.png
And although mathematically less pretty, it can be streamlined further:
theta = random() # X coordinate in a unit square
r = random() # Y coordinate in a unit square
# Confine coordinates to a right triangle within the square, by transposing them if they're in the wrong half of the square
if theta > r:
r = 1 - r # the phase of theta doesn't actually matter; it doesn't have to be within [0, 2*pi], as long as the difference between its min and max values (for this particular r value) is 2*pi
# Treat Y coordinate as r, and convert X into theta
if r != 0:
theta = theta / r * 2 * pi
return r * cos(theta), r * sin(theta)
Edit: This is actually a little bit slower than the sqrt method (10% slower with a very fast random() function). I still find it interesting, though.
I think your post summarises very well why micro-optimisation is root ;-) of all evil. Square root is one of the fast functions on (very likely) all modern processors complicated enough to contain sine and cosine. To the extent at which, for example, informed code optimisation within the compilers does leverage square roots and multiplication to avoid low-order half-integer exponents. (As a nice exercise on algorithm development, your method of course is nice and fun. Though, there might be another potential pitfall with odd behaviour due to the 'x/y' term and thus further coupling consecutive random numbers in the process-at least with some RNGs.)
An equiareal (area-preserving) mapping from the square to the circle that involves no cutting or overlapping is the Shirley-Chiu concentric mapping. Basically, it maps every concentric square into a concentric circle. This could be useful if you need your mapping from square to circle to be continuous and bijective.
One way to improve rejection sampling is to put copies of the circle in the corners of the square. You can even recurse that approach. Of course that takes extra computation time, but might be a useful trick if you are more concerned about efficiently using your entropy vs. computation cycles
pre-video guess: for the angle make it random(0, 2*pi), and your dist is 1 - random(0, 1)^2 to account for square law
edit: pre-video desmos-screwing around: sqrt(5) * (x^(-x) - 1) is like super close but doesn't look completely right imma watch the vid now lol
edit 2: it was literally sqrt(x) are you kidding me
Bruh go back to rocket league
@@ca1lumunger ?
@@ca1lumunger Okay
I became so excited I found one more brilliant math channel. I anticipated I spend next hour whatching other videos on this channel and boom, found this is the only one yet! So I'm looking forward for new videos. This is so exciting to join a good channel at the very beginning! After a years you can tell, that you was subscribed to this million viewers channel from the very first video like telling you saw the first rocket launch by yourself!
Really interesting! There were more possible approaches than I expected, and I like that you explained the math and theory behind each. Curious that rejection sampling worked out to be so fast, but no trig calls and fewer random calls makes sense.
Cool stuff! Another reason it's useful to investigate alternatives to rejection sampling is that often, having a predetermined number of random numbers required to generate a sample is useful, or even having continuity of the output relative to changes in the input random numbers.
still, in the general case of the target space being irregular, i.e. not a circle, sphere, square, or any simple convex shape, but a space with a complex definition, the boundary check method is still the simplest to implement, execute, and modify : you just need to define the boundaries and the inside/outside check function
Although the average time for choosing a point in the disc is minimized by rejection sampling, it is also the method whose times have the highest standard deviation. If consistent performance matters (and sometimes it does in software), then there is a case for using the polar coordinate method.
Yes instead of just providing a total runtime of generating a few million samples it would have been more interesting to measure the standard deviation of individual sample times for each algorithm. Of course that requires a decent microbenchmark analysis. So there is more "left to do".
Hey Justin, you are the best!
I've never saw someone blending math and python in one video so well, please continue your channel :)
When you first showed using rejection sampling I thought to myself "why not use polar coordinates? but I don't know how to account for the results being more dense in the center". Then when you showed the maximum method gets the same results and proved it my mind was blown. Absolutely crazy amazing stuff
The Path of Exile developers have talked about how they ended up using the square root method for selecting where to have meteors fall in a circular area.
I saw the thumbnail and before I let the video play I thought about how I'd approach it.
I landed on using vectors, where you would have a random number strictly less than radius as magnitude and and a random value between 0 and 1 to represent the angle.
Needless to say I was quite happy when I saw you switch to polar coordinates.
Saw Matt Parkers newest video and thought of this video
I got recommended this video after I watched Matt Parkers video, so I guess the RUclips algorithm worked well in this case.
Another (perhaps more) intuitive way is to match a random roll for radius to a corresponding percentage area of the circle.
For example, a lets say random() gives 0.5 -> we then map the point at a radius (R) such that a circle with that R will have half of the maximum area. The probability distribution is the same.
Well sometimes it's not about the average time. I can imagine a guaranteed upper bound could be desired in some cases like high parallelization.
What came to my mind is real time systems, where all steps need to take a precise time. No random delays are allowed, so the rejection sampling is simply not an option.
@@piteoswaldo It would also probably not be good if you needed to do this a lot and thus were using a SIMD processor like a GPU to run it. To use SIMD efficiently you want each processor in the group to take the input data and perform exactly the same instructions on it in lockstep with the rest of the group. If it is even allowed that one ends up taking longer the entire rest of the group will sit idle waiting for it to complete so it can return the results of all operations in parallel. That is just simply because of how SIMD works it reads in parallel and writes in parallel so branching code that stops the intermediate processing happening in lock step is inefficient.
This was very informative. I'm glad that you took the time to compare their efficiency at the end.
That is a well made video. I was surprised to see its your only one published.
Matt Parker recently did a video about the sqrt(random) and max(random) methods having the same distribution and the whole time I was thinking "where have I heard this before"
If you know numpy, you can vectorize the process of generating 3141592 points, which would be much faster.
I love all the new SoME1 videos that are randomly showing up in my recommended and this is one of my favorites. As a fellow programmer, I've always wondered if this problem had a more elegant solution, and now I know!
I remember trying to figure out this EXACT problem recently, and I wasn't able to find any nice StackOverflow solutions, so that was really frustrating to me as I wasn't able to figure it out by myself either! This video is satisfying to me on a whole different level because of that struggle! :)
Glad it could be of service! But in the description, I link to a StackOverflow thread that was actually really helpful for my research.
@@nubdotdev Ah, I guess I didn't look hard enough at the time. :)
I needed it one time and i kjnew stuff about integrals and distribution functions, so i figured it out :) The first method was so boring. :D Wouldn't figure out the third and forth method. They were nice solutions.
This was really interesting to watch.
Thankfully most usecases I've found for random circle positions are for random positions on the edge of a unit circle, which can be derived by just randomly rolling a number between 0 and 2π.
I was kind of expecting rejection sampling to be the fastest, even if technically its runtime is unknowable.
Bruh in the end the first and easiest method was the fastest, what a plot twist
Perfect example how the simplest solution is simply the best.
I always used rejection sampling because it was very intuitive and I'm happy that it's also the most efficient.
Really great video; loved the animations! I wasn't aware of there being so many ways to randomly generate points on a disc. It looks like you put a lot of work and preparation into producing and reporting your findings. You sounded like a well-trained scientist/mathematician. It felt like I was being guided through a series of enjoyable, thought-provoking experiments. I think I may have found a way of making your other methods more competitive with the rejection sampling.
I am curious whether the other methods could have beaten rejection sampling if we had optimized or done away with the sine and cosine function calls in the random-angle part of the code since the trigonometric functions seemed to be the bottleneck in the non-rejection-sampling approaches. This question has gotten me thinking of ways to improve the random angle part.
At first, I thought about utilizing the Pythagorean identity for sines and cosines: 1 = cos(theta)^2 + sin(theta)^2. That alone would cut the number of trig function calls in half by allowing us to recycle one function call: sin(theta) = ±sqrt(1 - cos(theta)^2), where the ± would be positive for theta between 0 and pi and negative between pi and 2*pi. But then I thought about the possibility of using vector math to implicitly compute cos and sin. For instance, there are relationships between two of the ways of performing vector multiplication and trig functions: cos(theta) = dot_product(u,v) and sin(theta) = cross_product(u,v), where u and v are unit vectors. (Actually, the cross product results in a vector perpendicular to u and v, but the implementation is trivial for axis-aligned vectors.)
But then I finally realized a simpler method. Note that randomly generating the ordered pair (cos(theta), sin(theta)) by using a uniformly distributed theta is just one way of picking a random point on the unit circle. You can also do this by generating a point directly on the unit circle just by generating a random unit vector, avoiding extra vector math, and skipping the trigonometric function calls entirely. Then you just scale this point using r to get a random point in the disc.
Let x and y be two uniformly distributed random numbers between -1 and +1. Using the Pythagorean theorem to find the length of the hypotenuse of the random right triangle formed with the origin (0,0) and the two random axis-aligned points (x,0) and (0,y), you can normalize this random vector by dividing by the magnitude, turning it into a unit vector whose coordinates lie on the unit circle. The random unit vector is given as = /sqrt(x^2 + y^2). So the return statement would become "return r * u, r * v".
One caveat: There is a nonzero probability on a computer of generating the zero vector (when both x and y are 0), albeit this is astronomically small, and this will result in a zero divided by zero error (NaN). It's a bit inelegant, but you could add a branch in your code checking for whether (x,y) = (0,0), and if so, return (0,0). Other solutions not requiring branching are possible too. You could instead divide the random vector by max(epsilon, sqrt(x^2 + y^2)) or divide by (epsilon + sqrt(x^2 + y^2)), where epsilon is some very small positive value.
Interesting aside: The max function can be implemented without a branch (if statement) if you exploit the absolute-value function: max(a,b) = (a+b)/2 + abs(b-a)/2. Similarly, min(a,b) = (a+b)/2 - abs(b-a)/2. The abs() function doesn't require a branch for floating-point numbers since simply setting the sign bit of a floating-point number to zero makes it positive.
Branches are important to avoid because they impede instruction-level parallelism and don't play well with SIMD/GPU architectures. Alas, I'm not familiar with Python's implementation, so you should always use a timer to determine what's most efficient.
I really like this explanation. Could I ask what sort of schooling you underwent to be able to write such an explanation? Thank you.
Your methods will scale very differently when you start to scale them to higher dimensions, starting a sphere which only fills 52% of a cube, so the cost of rejection goes up to 1.9.
And when you get away from geometrical realities and start looking at problems with much more dimensions, rejection becomes very rapidly very exensive. For a 10-sphere (the equivalent of a sphere in a mathematical system with 10 independant variables) you are already rejecting 99.75%, so only every 400th sample is usefull. The other methods get more complex as well, of course, but generally not as badly as the rejection approach.
"You couldn't spare a few miliseconds, where did that bring you? Back to me!"
Rejection Sampling Thanos
There are analogous results for picking points in n-balls and on their surfaces (i.e. on (n - 1)-spheres). They all revolve around finding a suitable scaling for the volume or surface element (basically, the Jacobian). They fall under the general rubric of disc and sphere point-picking (q.v.). For the rejection sampling technique (a class of Monte Carlo method), there is also a related question about the probability of the sum of n uniform variates on [0, 1] being
I would expect the other methods to work better in SIMD/GPU environments, due to no backwards branch being employed?
The primary reason is that computing sin/cos to full double precision accuracy over full range is really expensive. In fact most modern processors do not implement it anymore, and it is done in software using CORDIC algorithm, with many steps. Even if implemented in microcode (i.e. fsincos function of x87), it takes between 190 and 240 cycles. If you know the input is in small range, and you are willing to sacrafice accuracy (i.e. float, not double, and bigger errors in result than 1 ULP), then vectorized approximation or GPU (which most of them do not have guarantees about sin/cos - but they are fast, sometimes just 4-5 cycles), can do it faster. The branching is not a big issue, on CPUs, because of the time this backward branch is not taken, and CPU can speculate without knowing the outcome of the branch. On GPU and CPU probably quality and speed of your random number generator could play a big role too.
Truly a very interesting problem. It also nicely connects to multi-variable calculus. Specifically the Jacobean determinant which can be used to verify that the function has the desired area-preserving property.
Just dropped my new mixtape, The Square Root of Random.
🔥
Anyone here after the standupmaths video? It's probably the algorithm, but it's funny seeing stuff pop up after you learn about it!
I saw that video but haven't finished watching it yet, and somehow watched this first.
this is so well edited wtf
Manim
I already had the square root and rejection sampling methods in mind before I clicked on this video, but I'm glad I watched it. That was well done.
Very underrated channel and video
Thanks. I won't go into details, but this has given me an insight into a completely different problem: creating a random playlist from a list of audio tracks (or shuffling a pack of 52 cards).
Me just putting points in a square randomly and checking the distance from the point to the center xD and if that is larger than the radius of the spehre I move them
Great job! Very interesting video. I love that you actually coded each solution up, to make it concrete.
I just wanted to point out that even though rejection sampling is faster on average, it is less deterministic, so for an application where maximum latency is important, it might be worth going with one of the other solutions.
I had to scroll down way too far for this 😅 It's not just less deterministic, in theory it's not deterministic at all. In theory it's possible this algorithm will never finish. But also in many use cases users will accept a longer response as long as it's consistent. It always depends on your use case, there is no single true or even better answer 👌
I dabble in programming sometimes due to my studies, and I am glad to see that rejection sampling, which I immediately thought of when finding a random point on a sphere was mentioned, turned out to in fact be the fastest. I completely would've fallen into the trap of not realizing that I need to take the square root of r in polar coordinates, though. Stresses the importance of checking that the code really does give the results it was supposed to especially when dealing with probabilities and randomness. I wonder if there are games out there with an incorrect implementation of the probability due to someone doing that mistake with r.
Man I really loved it. I had come across this problem once, but left it at square root method, considering it to be done and dusted. I am amazed at the elegance of the other methods you showed out here. Thanks a lot...
Keep on making videos ♡
I’m not really good at math or comp sci but I thought of thing where you take concentric circles then unwrap them and it makes a triangle with the height of r and base of 2pir. I sort of aligned the height along the y axis. So you get a random x and y coordinate between those bounds. You’d still have to flip half of it along the diagonal or rotate it some how. Seems like it’d be fast, but I’m not sure if that’s basically one of the ones you did or how to implement it.
Yup, that is another option!
Interestingly, although the logic isn't the same as the "lots of tiny triangles" logic, the resulting algorithm is essentially the same. There are lots of ways to flip things along the diagonal or rotate them or whatnot to produce a distribution within a triangle; some may be faster than others but they're all pretty similar.
best 4 minutes of my life
My favorite thing about this solution (maximum-comparison) is that it makes a function of points in a cube onto points in a circle, which isn't something I would normally think of as possible!
This was a great prompt and a fantastic working out.
It really, REALLY bugs me that the rejection sampling is the fastest. I wonder how optimized the native sine function is...
@@davidjames1684 depends a lot on how fast the thing you're trying to precompute is since randomly accessing memory is normally quite slow due to cache misses. in this case I think doing the rejection sampling again every time might actually be faster if the rng isn't slow
It's safe to assume that std functions like that are really well optimized. That said, they have a very specific purpose. Depending on you application you may be able to get away with using an approximation (such as a Taylor series expansion).
I also wonder why the built in max() is slow... like wouldnt/shouldnt it do the exact same thing?!
@@pvic6959 Well, floating point numbers are weird. They can have values such as infinity, negative infinity and a load of Not a Number values. My guess is it is slower in order to deal with such edge cases.
@@ThomasdenHollander but python has things for that that should be constant time checks
standupmaths and grant just made a video that have the same theme as the second half of your video. you got there first, congrats!
this is so well done! Did you do the animations using manim?
Yup! It's a really convenient library.
@@nubdotdev you're really good at it! It looks like you don't use the default fonts or text reveals. Is your project saved on GitHub? I'd love to see it :)
Very well presented! For why rejection sampling is faster, consider the time for and number of function calls. Per point returned, rejection sampling needs on average 4/π calls to sqrt() and 4/π to rnd(), and the other methods all need 2 trig. calls and 2 rnd(). Since sqrt() and trig. functions have roughly similar execution time on FPU, that means rejection sampling needs a time of t_point = 4/π × (t_FPU + t_rnd) and the others t_point = 2 × (t_FPU + t_rnd), so the expected ratio of execution times is 2/π ≈ 0.64. For total runtime, add a certain base time for loop overhead, equal across all methods. A small-ish t_overhead < t_point then explains your observed execution times.
Very nice video, I only have a remark: this is not the way to "find" a random point in a circle, it's the way to "generate" a random point in a circle
Generate a random point in a disk.
@@djmips Generate a random point on a circle
For the purposes of this problem space, the two verbs are semantically indistinguishable
This is a nice video to understand the Box-Muller transform - which needs as a starting point a uniformly distributed point in a unit circle and then generates 2 normally distributed random numbers from that.
so super cool
It was such a relief to see you finally get around to mentioning execution time... I spent the whole video looking at all those trig functions and square roots internally screaming about how slow those methods were :)
Hmm, I did exactly this (the first 100% success method) about a year ago, but I wasn't able to explain it to anyone with my explaining skills. xD (and it also took me about 6 hours to make it)
Imagine how long it took to make this video! Also I used to make spigot plugins too!
@@nubdotdev Probably years xd. Btw, funfact is, that I used it in a spigot plugin (when i needed to spawn block at a random place in a cylinder)
This is the second fresh math channel that i see popping out of nothing with their first video from about a month ago and I really like this one too
BEFORE YOU COMMENT: If you're about to ask why you can't just pick a random radius r and angle theta between 0 and 2pi, please watch the video. I address this point at 4:18.
This video has gotten so much more popular than I thought it ever would! This has been really inspiring and I will definitely continue to make content. Thank you all so much for your questions, comments, and criticisms. I'll respond to some of them here:
1. Some of you have suggested a way to pick 2 cartesian coordinates without rejection by setting x to a random value on the interval [-1, 1] and y to a random value on the interval [-sqrt(1-x^2), sqrt(1-x^2)]. While this does select a random point in the disk, it is not uniform. Points will be much more densely packed on the left and right edges.
2. My final conclusion about which algorithms were the fastest was kind of hasty. Firstly, polar coordinates are preferred over cartesian, which would make rejection sampling far slower than the other algorithms. Secondly, the three algorithms that do use trig functions could be greatly optimized with the use of precomputed trig tables and linear interpolation. Finally, I probably should have used a better language for speed than Python when testing everything.
The thing you REALLY need to worry about is (quickly) getting truly random numbers -- good luck going down that rabbit hole!!! :-)
[ oh, and bring your checkbook... ]
You have a real knack for concisely and accurately explaining complex concepts in easy to understand language, and your graphics are excellent! You should make more videos like this one!
@@davidjames1684 Hahaha. Oh wait... I should explain: Truly random numbers (with the emphasis on TRULY random) are surprisingly hard to generate with a deterministic machine such as a computer. That is indeed a rabbit hole to go down. The checkbook is probably for getting an extension card with a radioactive sample as a non-deterministic data source. That or for 1 HD-camera and 256 lava lamps.
dude you're like 3blue1brown 2.0. When I saw the way the first polar coordinate graph changed I knew this vid was special. subbed
@@TheAgamemnon911 Not really. Are you forgetting that computers have excellent memory capabilities, therefore, a bunch of people could flip coins a bunch of times, record the results in the computer, and use that as the random number generator (starting at some random point in the recorded sequence)? Other methods have been shown to be quite random as well. I have done many Monte Carlo type simulation programs involving the use of pseudo random numbers, and have always gotten very close to the mathematically correct answer, therefore confirming the PRNG is quite good, basically good enough.
Very nice video man, and clearly explained. I have a maths background but have been learning Python, so the Python implementations were a bonus for me. Keep up the good work - subscribed!
Your voice is sharp and clear, not too fast, not too slow, not slurred and not intimidating. You have a dynamic way of speaking, which is very engaging. Basically I'm saying you deserve the attention because this is a high-quality video.
I love that 3b1b's project is inspiring content like this.
Extremely good video! I loved pausing and trying to figure out why certain things were true and how I personally would have tried going about it.
The feeling of having everything click is great.
I also love how in the end rejection sampling ended up being the fastest method lol
learned so much
I never really did math competitions, and now as a computer science student I wish I was better at math since it is so important especially when it comes to algorithms (so many competitive programming problems are just math now), and the logical mindset that math encourages is extremely valuable. It's really daunting imagining how much I don't yet know and need to learn, but the beautiful solutions presented in these types of videos inspire me to keep pushing forwards (:
My guess what this video is going to show:
The simple method you should use: random point in square, reject if x²+y²>1.
But what's probably gonna be mentioned: theta = random; r = sqrt(random).
However, in practice, the first method will be more resilient to numerical precision issues, and runs faster because you generally don't need to do many passes.
The main value of alternative methods is being able to implement it branch free/constant time, which can be beneficial for security applications.
The issue with square root method, is not the square. It is the cos and sin. Modern hardware does not have functions to compute them as single instruction, because it would be still approximated using series expansion and looping and take time. Square root is implemented similar way, but it is way faster to do (using initial approximation using tables, then doing one or two iterations of rapidly converging method, like Newton-Raphson method), no matter the size of the input value, but with trig functions it is harder. So even if you compute sin and cos at the same way (they do have same series coefficients and values, just with different sign), it will be pretty slow, if you want to compute to high accuracy and for any input value. However, if you know that your input is 0 to 2pi, and you are find with reduced precision, there are tricks to compute sin and cos faster. Usually Taylor series, even with range reduction first, is not used for this, because these series converge slowly, especially far from 0. Usually a CORDIC algorithm is used with precomputed rotations, or few terms using Chebyshev interpolation. In some cases, where the input is small and the accuracy does not matter (games, UI toolkits), a lookup table with simple interpolation is used, or very crude approximation using just 2 terms). For sin and cos on GPUs is very fast, but has reduced accuracy, both of input and output result.
I tried, the sqrt + sincos method, in C, and compiler flags to emit x87 fsincos instruction to hardware directly (-Ofast -mfpmath=387), and computer 100 million random points (and make sure it is not optimized away). sqrt+sincos method is about 3.55 seconds. Using rejection sampling, with x87 gives me 2.20 seconds, but if using sse2 (standard for a decade), it is 1.92s. The primary reason is the vectorization, and branch prediction, that see that the loop is taking usually 1 iteration, as we almost always land inside the disk.
I used drand48 for random numbers, which is not the fastest method, but both approaches uses 2 calls to them, so i decided to consider that aspect not important.
There is however another aspect to consider. Even if the sqrt, sin and cos were ultra fast, the resulting points will in fact NOT be uniformly distributed over the disk. This is due to the rounding errors, and non-linearities in all these functions. The effects would be very small, but would be there. The rejection sampling method does not have this issue (one could argue that there might minor issue extremely close to the edge, but it would be hyper small).
I also thought of that. Surely doing the rejection sampling five times (only squaring) is passer than doing one trig function...
This should technically be titled the sexy title of: "The time optimal way to generate a random point uniformly in a circle". Finding a point is a geometric search problem. Fun video. Well produced, good microphone, and made me realize I should brush up on my statistics!
Great video, you really dove deep on the details!
I've used the first method in game programming. With uniform random number generator from a simple PRNG, rejecting any outside of the circle. The additional twist is I only used X (or Y) and allowed me to generate numbers with a Normal distribution. Very handy for role-playing games, enemy AI, and rogue-like map generation. And can be done with just integer math and relatively cheaply even on the old 16-bit system I was programming on.
The problem of unevenly distributed samples using polar coordinates at 4:32 comes up in the physics behind CT scans in medical imaging.
Without getting too technical, a common workaround is to weight points further away from the center more to even out the distribution, or to simply interpolate between samples.
Edit: now that I think about it, disks also have this problem (CDs, HDDs, and even vinyls). Either the disk rotation needs to be able to change speeds for data to be stored evenly spaced across different radii, or the data spacing needs to be different across different radii for the rotation speed to be constant.
If we assume, that factory recorders scratch the image discs with constant precision, vinyls are the only music format with the decreacing sound quality (more and more wavepeaks per cm of track).
quality of the visuals is so simple, yet so good! thank you!
Programmer thinking: "I don't want to do one simple extra calculation... I'll do 10 complex calculations instead..."
for some reason having a infinite while loop that breaks when a random chance hits always gives me stress that it might hang even tho mathematically it's very low chance
@@themisir throw a counter in there with a classic: if counter == 10: return 0,0
Please make more like this. If you become 3b1b for programming you will be my new favorite youtuber. This was wondering and engaging to watch. Just subbed!
Love and respect the math, but feel like sometimes programming for performance is about cheating the calculation of functions. I might be curious how well something like the fast sqrt constant might work to optimize the performance of operations
I think the square root calculation is based on Newtons method, and sine amd cosine on taylor series. A computer is fast, but these operations naturally slow things down. Multiple multiplications, additions and even taking derivatives in the case of the sq root are needed.
I'm very impressed by how well this problem and it's solutions are described and explained. Well done and don't be afraid to take us on more journeys.