Probability isn’t always about randomness (
HTML-код
- Опубликовано: 11 авг 2022
- CORRECTION : at 3:15, the edge of lengths of the parallelotope must of course be of length less than 1 for the result to hold, my bad.
vvv NO MUSIC, SCROLL DOWN FOR INFO vvv
An excursion into a problem of higher-dimensional geometry and its surprising probabilistic solution.
Special thanks to my brother for greatly helping me out with the audio side of this project, and to the SoME2 Discord server for being a nice community in which to have worked on this.
Image credit :
Photo at 06:01 : Myrabella / Wikimedia Commons / CC BY-SA 3.0 & GFDL
Chapters :
00:00 Parallelotopes
05:53 Probability??
08:34 A Monte Carlo proof
16:55 …and it works!
19:16 Tying up loose ends
20:43 Conclusion : Gratitude
Music :
I didn't bother adding music because the contest is really stringent when it comes to copyright, and also because it isn't that necessary. Though If you feel like silence is going to be too heavy, here's something to listen to at low volume : • Klaus Schulze - Sequen... Наука
Wow, that's an incredible video! One of the best Some2 submissions in my opinion.
Beautiful video and brilliantly done. I paused at points so I could really drink it in. Great job. I’m in awe. And that was a heartwarming tribute to a great teacher.
Makes it so beautiful once you see it all come together. Brilliant work!
17:45 since the expected value of V is p, on the left hand side we have the variance of V. Which can be computed by summing the variance of the individual independent Bernoulli variables C_i which is p_i (1 - p_i) what we have at 18:43.
Nice Video!
(Or something like that. We have random vectors, but I think we can make an argument along these lines)
lately been reading more about probabilistic proofs and honestly they are so beautiful.
Absolutely amazing video! Subscribed.
Very nice video. I always enjoyed probabilistic proofs when they arose in surprising areas. Erdös' bound on Ramsey numbers is a simple one but the argument is still charming. More complicated subjects in discrete math often hinge on probabilistic tools like the Lovász local lemma or Szemerédi's regularity lemma (used in proof of the Green-Tao theorem).
What a lovely epilogue!
Each time we add a new vector to the basis, longest distance between opposite vertices grows at least as sqrt(n): there were two opposite vertices with distance at least sqrt(n-1) by induction, adding a new basis vector to both of them gives us two more points, which all together form a parallelogram, its longest diagonal (opposite to obtuse angle) is at least sqrt((sqrt(n-1))^2+1^2)=sqrt(n).
Let us then take any point and these two opposite vertices: it lies at least sqrt(n)/2 away from one of them (if both distances are shorter, triangle inequality is not satisfied). So the maximum distance is at least sqrt(n)/2.
Awesome! I remember reading some stuff on how the kolmogorov probability axioma is just a set of rules, which we link to randomness because of the way we often use this. This video showed me atleast what the writers of that statement meant; our analysis op the axioms of probability can be applied in situations where those axioms hold, even if the object of interest does not have to do with randomness!
@13:28 a pop team epic screenshot, I see you're a man of culture
Deserves more views!
Love this it's very visual.
What a cool proof!
I think that the best way to describe probability to a layperson is that it is a way to quantify how much we do not know about system. It could be actually random or just that we do not know enought for us to perceive it at random. This way of thinking nicely ties in with information theory and to direct applications such as telecomunications and stuff.
The Bayesian interpretation of probability is very reasonable, I just felt like it wasn't "hands-on" enough for my purposes here, I'm just more comfortable with the frequentist approach for an explanation.
Very Interesting Great Job
Nice video!
Bravo!
Well this is a video I'll have to watch a second to fully understand
But it is really cool
Awesome video! +1 sub
At 17:40 Jensen’s inequality should probably be mentioned. The modification from proving about E[d] to proving about E[d^2] is not trivial, since in general (E[d])^2 does not equal E[d^2]. Fortunately for this problem, using Jensen’s inequality, we have (E[d])^2
Actually, the reason for the equivalence is simpler and hinted at in the video. If 𝔼[D²] ≤ A², then there exists a vertex for which D² ≤ A² (I can't be bothered to properly typeset the bound), which means that D ≤ A, which is what we had set out to prove.
So the two formulations are equivalent because "D² ≤ A² ⇔ D ≤ A" in this context.
@@syst3ms413 Yes, I agree the two formulations are equivalent for the proof of this particular problem. Of course the two expectations are not equal, just wanted to highlight that. I’m sure you know this 🙂 Nice video and great proof!
Thanks for mentioning it, I came looking for such a comment.
Btw you can also use Variance instead of Jensen: we know Var(d) ≥ 0, and Var(d) = E[d²] - E[d]², so E[d²] ≥ E[d]². We can even get a strict inequality because d is not constant, cheers
Basically, probability is used to make the proof more intuitive. But it's not really probability, it's just measuring some abstract volumes.
Great video, and your professor sounds amazing!
Sort of unrelated, but one of my favorite counterintuitive facts: Which point inside the unit square maximizes the product of distances to the four vertices? (It's not the center.)
Halfway in between two of the vertices?
As in, on the perimeter
@@iamrepairmanman I agree. Center gives you 1/4, midpoint of a side gives you 5/16 (which is 1/16 more).
excellent video and effort!
one note: in 3:30 the conjecture should be "the furthest a point inside of it can be from its CLOSEST vertex is sqrt(n)/2", right?
You can interpret as being kind of the same thing. If a point is at a distance x from its closest vertex, then it's at a distance (at least) x from every other vertex. Maybe that wasn't really explicit though.
simple: the maxidistant point is always equidistant to opposing vertices, and the bisector point between opposing vertices of a parallelotope is by definition the center; the average position of all vertices. Proof by symmetry
alright aside from a brilliant illustration of a topic i've almost never seen discussed, your memes are fucking HILARIOUS keep it up m8
good video. it would be better to say around 4:30 the equations are elsewhere to make room for subtitles. it will greatly benefit the hearing impaired to read the equations at the same time as the subtitles
cool
Should the conjecture at 3:30 also specify a maximal edge length of 1 in the parallelotope?
Yes, it should, my bad.
Funny yellow dog does wacky and uncharacteristic math problem
Without getting very far into this, I'm gonna guess that you get an unexpected answer in higher dimensions due to the same effect that can cause the hyperspheres at the corners of a hypercube to become bigger than the cube itself. I forget the full description of the phenomenon but I do know there are a few videos about it.
Well no, for once the result in higher dimensions is actually a clean generalization of the lower-dimensional cases.
@@syst3ms413 found that out lol 😆
1:13 Would not there be two off-center points, where the minimum distance would be maximized?
The distance from the points on the shorter diagonal would increase if I move the point on the longer diagonal, so it can't be at a maximum.
What did I misunderstand?
You are right, but the max distance is less than in a square
Aaah, the argument was about the upper bound being sqrt(2)/2. Now the sides being
I'd say this can be understood as a rounding argument. Instead of rounding to the closest, round the coordinates probabilistically by using Bernoulli random variable. The value closer to 1 has higher probability of becoming 1 as so on.
I think that here by rounding you actually mean "finding the closest vertex", which yes, is the object of the problem.
@@syst3ms413 No. I meant a random rounding. Let p = (p_1, ..., p_n). Let X_i ~ Be(p_i) be independent. Consider the random vertex V = (X_1p_1, ..., X_np_n) of the parallelotope. Then the E[|V - p|^2] can be computed as you did. The interpretation is that V is a "random rounding" of p. The i-th coordinate has the propensity to become equal to 1 with probability p_i, and thus if a coordinate is close to 1 it has high propensity to be rounded to 1. It makes intuitive sense to me that this way one will get a vertex close (not necessarily closest) to p with high probability.
As a general philosophy, I think of probability as the mathematics of the very complex, rather than mathematics of randomness (of course this is just philosophy). Thus, no wonder, it appears as a method rather than just as a subject.
Okay, I see your point. Sure, why not!
Probability is real. The math you use to operate on it is hard and true. As long as you don't actually have to work with a dataset somewhere, you can give perfectly accurate analysis. Expected value of the binomial distribution *is* p, because that's what it's defined as. You only get unsure if your sample size is anything below infinity.
Just because matrix operations are all over video processing doesn't mean we can't use Graphics Processing Units for machine learning. It's just branding at that rate.
At 11:08, without looking up the formula somewhere else, should there be a division by n? And maybe a limit as n goes to infinity?
Wait, I bet n is the number of dimensions we’re using.
I follow the proof but I don't quite follow how having at least one point with distance less than √n/4 shows that the maximum distance is achieved at √n/4. Wouldn't you want to show there are no points with distance greater than √n/4?
You fix an arbitrary point inside a parallelotope. Then you choose a vertex of the parallelotope randomly. If on average the distance to the fixed point is less or equal than sqrt(n) / 2, then at least on of the vertexes is close enough. So for an arbitrary point we showed that there is a vertex such that the distance between them is less or equal than sqrt(n) / 2
As the previous comment says, the point I fixed in the beginning is perfectly arbitrary. If I can show the theorem for an arbitrary point, it means it's true for all points.
There's promise here, but honestly I was very lost. Lots of confusing words. Like this seems interesting if I had a bachelor's or maybe was working on my masters in mathematics. Or maybe I'm just dumb, which I'm not a genius, lol. But I feel like things were difficult to relate to each other.
Don’t feel bad. I majored in math years ago and I didn’t follow everything. There are a lot of prerequisites to be able to follow everything.
Yes, this is indeed slightly more advanced than many of the other submissions