Next video, explaining the π and how the function e^(-x^2) arises: ruclips.net/video/cy8r7WSuT1I/видео.html As many helpful commenters have pointed out, at 6:37 and 7:15 the narration should say "skews left" instead of "right". In standard terminology, the skew direction refers to the direction of the longer tail.
I'm so excited for a visual and intuitive explanation for it! In my stats class, CLT was proven rigorously but I couldn't "see" why. Later in your Discord, I believe someone explained it to me intuitively in terms of moment generating functions, but I'm excited to see if you can leverage visuals for a more elemementary intuitive explanation
I have a PhD in applied mathematics, I work in numerical weather prediction as a research scientist. Gaussianity is this hardcore part of the basics of forecasting the weather (even though most atmospheric variables, and their errors, are actually non-Gaussian). This video did a great job at teaching the CLT. I have never seen it explained so well.
As an assumption, it may be because, just as the Galton table, the errors and correlation add up in a "normal distribution" kind of way too, cancelling its effects on the overall distribution? (Haven't seen this topic in years).
@@mlucasl I don't know the weather field, but in geology (mineralogist estimation) the non-gaussian variables could be transformed into gaussians ones to work properly with them
Yea.. meteorological variables often follow extreme value distributions.. I remember as I took a minor in applied agrometeorology as minor while majorin stats..
@@mlucasl I thought it was because the weather is more like what is described in chaos theory - small deviations lead to wildly different results. So they don't tend towards something nice like the Gaussian.
I can't tell you how insanely brilliant you are at taking a universal concept that is vaguely understood and illuminating all the nuance hidden in plain daylight to make this understood on a higher level!!! Genius
Dealing with CLT pretty much every day here. Really impressed with how easily you explain it. By far the most intuitive and easily understood explanation of CLT. Salute!
As someone who works with Kalman filtering on a regular basis, this is a very nice video to see. One of the core principles behind the Kalman filter is that all random variables involved must be Gaussian, which seems overly restrictive on the surface. I think this provides an excellent, succinct explanation for why that's actually a reasonable assumption for many systems, since every random process we can directly observe is really just a combination of many smaller processes. I look forward to the next one!
Yeah, I think it's worth remembering that assuming an RV is normally or lognormally distributed is a pretty minimal assumption, since your basically only saying that your observations are the result of a LC of an unknown number RVs that may or may not be orthogonal to eachother, and that there's _some_ kind of minimal number of RVs, depending on their individual distributions, that your measurements are in excess of. If you find that the distribution isn't normal, that actually gives you some information about the individual distributions themselves.
@@Eta_Carinae__ Some types of data can be assumed to be normally distributed, but not all. Some data is naturally uniformly distributed. Other data is naturally exponentially distributed. For instance, let's say I looked at the distances of home runs in the MLB. That is certainly not normally-distributed, since a lot of home runs are very close to the minimum possible distance. Or let's say I looked at the speeds of atoms of an ideal monatomic gas in thermodynamic equilibrium. These won't be normally-distributed. In fact, they will have a χ distribution with 3 degrees of freedom. Or how about gasoline usage? A lot of the population would be around 0, while the rest would probably look roughly normally-distributed, because a lot of people don't own a car. It's generally not a good idea to just assume data should be normally distributed because it depends on many different factors. Those factors are not necessarily equally important or identically-distributed or independent at all. Typically, you can expect to find normally-distributed data when measurements can span a very large range of values relative to the standard deviation, when no particular special values are preferred, when the distribution should be symmetric with respect to the mean, and when data is clustered around the mean. In other words, it's normal if it's normal.
As an actuary, I'd say this is perhaps the best descriptive video/lecture I've ever seen on the CLT. I wish I would have seen this when taking my classes for exam P because the visualizations are so useful in understanding what can be a very dense topic when it's spewed from a chalk board/overhead screen hastily.
Describing the mean of the weights as the center of mass of the distribution was just incredible. And the intuitive matrix multiplication without even mentioning it. You are a great teacher!
I would absolutely love to see a series on probability/combinatorics/statistics on this channel. It's the subject I've struggled the most with in math by far. I think your ability to take the time to really think through and understand what the basic building blocks really mean will become a very valuable resource in my and many other people's math journeys.
I studied probability and statistics in university and learned about the central limit theorem, then totally forgot it. When I saw this video title I knew I had heard of it, but it took a while to remember it, for the first time in probably twenty-five years. Thank you for explaining it much better than our textbook did!
Ngl, I could see China or India doing it, and then half the world learning from Chinese- or Indian- source lessons plans and curricula. Or like, Finland. Hopefully Canada.
As I was watching this video, I wanted to say that videos from your channel inspire me to learn. Not just mathematics, but anything worth doing. Although I am an engineer and enjoy doing what I do, I have never been a huge fan of pure mathematics. But the way you explain concepts just makes it so easy to understand. Even though I might have to rewatch some videos to fully comprehend the meaning, I really enjoy it and it never feels like a chore. I have watched your videos more than my university lectures. I wish there are more teachers like you in this world. Thank you so much 3Blue1Brown!
I really like your style of teaching. The way you help us discover things by slowly unrevealing it, instead of just telling the result, is awesome. Like when you were making the formula for bell curve you just started with e^x and then how slowly slowly, step wise step, by encountering problems and then solving them, you finally reached the formula. That was an awesome mathematical journey. And I enjoyed the ride! Woohoo!
One of the most interesting things I've learned in my math undergrad so far is that Brownian motion follows a normal distribution over time (at least, this was shown in the context of diffusion), which you elegantly explained in the first few minutes of the video. We had derived the diffusion equation from a formula modelling simple Brownian motion. I had never seen the connection between abstracted physical science and pure probability theory until then. Great topic!
A series on probability and statistics would be awesome! Everyone in my university hates prob & stat because our teachers are pretty bad, but I'm sure you could explain it really well.
This video is a great presentation of some of the most important ideas. I could have really used this video before taking one of my hardest classes: college senior level probability and statistics, which I took in 1973. All quarter, I kept asking myself: what were the prerequisites that I was supposed to take but that I must have missed. I have used this information almost every day of my life since then. The world would be so much better off if it was a required class just as freshman algebra class is. Alas, maybe we will be able to teach it better in the future.
You are a crazy good educator my friend, this video was a work of art, masterfully crafted, delightfully beautiful while still highly informative and surprisingly understandable in many levels, thank you very much for it! You're very talented and experienced in highliting the main concepts after building them up perfectly while hinting at a couple very interesting consequences or more complex aspects coming up later, balancing these with immaculate skill, hats off to you!
When i started watching this channel the things he was explaining to me where completely new to me and I was watching to learn those new things. Now, after so many years, few exams away from a degree in software engineering, I'm still watching those videos, but not because I don't know the subject, just because I'm sure that he is going to get to conclusions in such a human and reasonable way giving lots of insights and new points of view that I surly never got at a university course... Deam I love this channel❤️
CLT is a classic and beautiful. What's more mind boggling is that nowadays you can drop the identically distributed assumption and still get a general CLT (Lindeberg, Lyapunov). Probabilists keep finding a version of CLT in different settings that do not even converge to a Gaussian, but a different distribution like Tracy - Widom, Wigner's semicircles etc.
Can't wait to see 3b1b's take on the computation of the Gaussian integral, still one of the craziest places for pi to show up (maybe second to the Basel problem which he already covered). Even though the trick is very well-known, I am sure he'll have something new to say. Happy pi day!
I just learned about the Central Limit Theorem in my AP Statistics class, but my teacher didn’t explain why it was true. Thank, you as always, for teaching at a deep level but still making it understandable 🙏
What I find great about easy to access videos like yours is that they'll make it easier for anyone to understand the intuition behind what they learn at school. Over time, I think the overall level of everyone will increase thanks to that, and we'll have more and more people that can make these fields progress It might be a bit idealistic of a view but I sure hope it's true on the long run
Your assumption is probably correct. I slogged through statistics over 40 years ago and never got the intuitive feel, despite some good teachers. After a few years working in statistics, I lacked the confidence to continue, and switched tracks entirely (to translation). While the best minds will grasp this field quickly, the rest would benefit from seeing it from other angles, whereupon understanding might click.
This couldn't have come at a better time. We just hit on CLT a couple weeks ago in my Engineering probability class. Your video's are always my go to for a deeper understanding of the material and I would say anyone not watching 3B1B is at a disadvantage in STEM. Unmatched visuals and eloquent explanations. Thank you Grant.
Probability and statistics are probably my weakest points in math (math that I've specifically learned about in school, anyways) so a full series would be great. Also this is a really good video as usual and I found it to be pretty easy to understand. Of course I would need practice and to re-watch some bits to clear some areas of misunderstanding I have but that's not an issue. Overall, this was very engaging!
I could show this video to my dog and he would understand it. David Hilbert would surely be impressed with your teaching abilities. Thanks for bringing a smile to this old face:-)
That was the best description and comparison of the difference between variance and standard deviation that I have ever seen. The graphical depiction of variance (as a square shape) versus standard deviation (the square root of variance), producing a line, was a revelation to me.
Hey Grant, awesome video! One point however, I believe at 7:15 you meant to say that the distribution is skewed slightly to the right-as using skew as a descriptor should follow the direction of the tailing data, not the direction of the majority of data. That is, the direction the mean is being “pulled.”
You don't know how glad I am that you made a video on this topic. I have been trying to understand the subject for a long time and this video helped me in an incredible way
Wow, the level of insight that can be gained by watching your videos is truly astonishing. Also, thank you for making me realise that "dice" can also be pronounced as "die" in English
A single one is called a "die" but two or more are called "dice", pronounced how you'd expect. Native English speakers frequently mess those terms up though, usually defaulting to calling everything "dice".
I'm very excited to hear the idea for a video delving into variance. Explaining my trouble with variance: looking at the exponential function, we could absolutely choose to use 2^x or 10^x everywhere and just live with the correction terms ln(2) and ln(10) showing up (sorta like the π vs τ dealio). Out of convenience, e makes a 'better' base. But I can point out a number of 'deeper' reasons to use e as the base for exp than just convenience, and I can point to enough such reasons that using any other number seems 'wrong.' Contrasting this with variance, I'm aware that taking the square of the differences (x-μ) is more convenient, but I can't tell you why it's the 'obvious' or 'correct' choice based on that 'deeper' reasoning. Maybe |x-μ| and |x-μ|^3 use abs() which isn't smooth, but then why not (x-μ)^4, or any other even power? On a desert island, building stats from the start, I don't know how to make that choice for (x-μ)^2 well motivated.
this is probably not the right way to motivate it but perhaps it has something to do with using the “root mean square” (before Bessel’s correction) rather than the arithmetic mean as the “average” deviation? idk im clearly talking out of my ass here and would love to see this properly explained in a future video
or, another thought; maybe think of variance geometrically, as the square of the n-dimensional “distance” between the point (mean,mean,mean,…,mean) and the point whose coordinates are your data points?
@@elrichardo1337 I don't know why I never saw it, it's pretty glaring now that you mention it, but I think you're onto something with the idea of a norm. Almost like I'm asking "if any p-norm works, why do we choose p=2, the Euclidean norm?" For geometry in a flat space, I know (more or less) how to answer that question. If this translates cleanly to variance in stats, I'ma be annoyed that I haven't seen it before.
@@nylonco7134 It's completely to do with a Euclidean norm. In fact, if you think about it, the standard deviation is exactly the Euclidean norm on the space of centered (mean=0) random variables (up to equality almost everywhere if you know these sorts of things, otherwise don't worry about it).
I remember when spreadsheets became a "thing" - Lotus123 - yeah.. I know.. I'm old... and playing around with the dice example in this video and how much it helped me understand Probability and Statistics. This video brings back memories of those days. Being able to visualize some of this stuff makes it so much more intuitive and makes me admire those geniuses from the past who figured it all out without the computing power we have today.
I think I ran into the CLT in the wild while trying to understand Stable Diffusion model training. I have two models with the sole difference being the weight loss function, and I wanted to quantify how this affects the models' ability to reproduce the likeness of the trained subject. To do this, I generated hundreds of random images using each model, randomly batched these ensembles into smaller sets, and calculated a "closeness value" for each batch with respect to a given reference image. By using a fixed set of reference images I'm able to generate a large number of random data sets of closeness values for assessment. Here's where the CLT may have shown up. I then plotted these closeness values in histograms to get a sense of how they are distributed for each batch. These small batch sized distributions were rather asymmetrical, and I didn't expect anything to happen when I combined data from multiple batches (for a given model). But, lo-and-behold, as I added more data to each histogram the shapes began to become much more symmetric, and bell shaped. As soon as I noticed this, this specific 3Blue1Brown video shot into my mind like a bolt of lightning as I began to finally feel the CLT in my bones. One way or the other (need more data), even if I'm imagining it, moments like this are what keep me getting up at the crack of dawn. Thank you very much for the hard work, time, and effort you put into your videos. It has been almost a year since I first watched this one, but its effect has stayed with me this entire time. :D
I am an Electrical engineer and studied probability and statistics in Signals and Systems coursework. I wish we had such an intuitive explanation at the time! Last year I looked at the electricity consumption of a large factory, with many processes happening at once, but with random variations. I was amazed when drawing a histogram of the frequencies of the difference values of the electrical demand, that the shape of the histogram was very close to a bell curve, except for the spike at zero that corresponds to electrical outages. The processes aren't even completely independent.
When I clicked on this video, I thought it would be about the Central Limit Theorem from calculus, so was surprised at its length. I didn't realize there was an identically named theorem in statistics! I've used the standard distribution a fair bit in the realm of data clustering and compression, but didn't understand all the nuance. This really expanded my understanding; great job!
This has been the most exciting math video I've ever watched! I'm in engineering and I've always hated statistics because it's so unintuitive. You just plig values in ang get values out and you memorize what they mean. For the first time, I actually understand why we do the things we do in stats. I especially like how this video implicitly explains why we need a minimum number of samples. Really great video
it's interesting that all of the ideas I came up with while learning statistics in my college are being shown here. My teacher was much less than adequate and only read from the textbook, but I think that might have actually helped me understand distributions as it forced me to learn on my own. I would love to have been able to show this video to my friends who struggled in that class. but, looking at how math-focused this video was, I doubt it would have helped them much.
That formula for the standard distribution is truly beautiful and now that I know why all those factors are in there, keeping it "in place", it's so much easier to memorise!
Only critique is to remind what makes a valid distribution earlier, when you first talking about it again, given how prominent that definition is here. Reminds me how amazing this theorem is from grad school, the visuals are fantastic. Thanks Grant!
Thanks Grant, I've wanted to stee probability/statistics videos in your channel for years, and I was just doing a refresher on the topic and found your new videos! I think you should combine them in a statistics playlist just like your "Essence of" series
Just a brilliant description of basic(?) probability. I thought I understood this already... I didn't. Thanks, so much, for taking the time to put this video together... I can't imagine how long it took... what a labor of love.
In no statistics course I have done have I learned how the formula of the standard normal distribution is derived. It seemed teachers either did not know or saw it as a "given". I have therefore always viewed the normal distributions as unnecessary complex and "unfathomable" (and as a consequence, hard). Now after this video it is clear as day. I love the explanation.
Thank you for this great video. It was greatly intuitive as well as engrossing too. Didn't even feel that 30 minutes had passed by at the end of the video.
I'm thinking about a dice where the face with the value "1" has a probability of 1 and the other faces have a probability of 0. There is no bell curve anymore, just a single spike - no matter how often I roll the dice. Now it doesn't matter how I manipulate the other factors in the experiment. I forced a certain pattern to appear by manipulating just the probability distribution. I can force other patterns by manipulating the experiment even further. The bell curve happens to be an easys one to produce. *Math does not care about whether you manipulate the experiment in "this way" or "that way".* But humans do. "You like bell curves? - You can make it happen!" How about making waves by adding evenly spaced holes where the balls come out? If you take away your abilities of manipulating the experiments then it becomes harder and harder to enforce a bell curve. You only need a handful of changes to make it happen though. This video does a really good job at explaining how you can do that. You can use the insights in this video to bias yourself or debias yourself. I'm sorry. Sometimes I'm thinking backwards. I mean, we know intuitively that deeper understanding can emerge from changing the perspective. But we tend to forget that we prefer certain perspectives.
I did pick up one of these once you showed it off! And, as I was driving around doing chores, I finally realized that the binomial distribution is a discrete version of the Gaussian distribution. So many things that I didn’t learn (or was sleeping) during school 😂
The reason why CLT for sums and CLT for means behave differently in a sense was one of my favorite observations, the intuition is so elegant yet it's fundamental for quite a lot of science, the hints to ponder are all in the video but if anyone wants a more concrete direction - where does the N, number of rolls, show up in the mean of the sum of rolls? Where in its standard deviation? And what does it mean to take a mean rather than a sum?
Eagerly waiting for "Essence of Probability and Statistics" series...it will change the lives of many data professionals like me. Also Grant consider adding paid youtube membership to your channel, with couple of clever badges and emoticons, you'll get more support along with patreon.
Oh man! I bought that same Galton board for my math classroom at MoMath when I visited last year! I wish I run into you there, that would have been awesome. Your videos were incredibly inspirational for the way I taught, so thank you for that! I have a new job now, but I think I'll be able to carry over a lot of the inspiration.
This channel always shows the best way in making people understand how things work by the unbelievable animation. Thank you. I suggest you to show behind the scene how you make the video. That would be so useful for mankind. Just like teachers in every circumstance can make good videos to make their children easily to understand.
I've been waiting for a probability series for a while now! Glad to hear it's being worked on :D. Thanks for the amazing content, and as always, cheers for blowing our minds!
To me, I'd explain this to myself as "increasing the resolution of results to find the natural occurring standard normal distribution within it". It's simply impossible to do that with 2 options summing up, or 6 options as you've pointed out with the dice. But 50 would absolutely work. In other words, if I increase the resolution on the old blocky Mario from the late 80s on Nintendo, Mario suddenly becomes clearer like Mario from Nintendo games in the present. However, this isn't free. In Mario's case, we need more processing power, or in a weird way... time. In the case for results where something only occurs once a day, we need to wait around 50 days for one simple "roll of the dice sum" example you've shown. We'd need years of data in that case to start seeing any normal distribution. And at that point, are we even able to make the assumptions we need to which standard deviation on a bell curve allows us to, in time for it to matter? However, I wish you would have started with the 3.5 decimal you show at the end of the video for the dice, in the very beginning. It's very obvious to understand/feel that you can't roll a literal 3.5 on a die, so you need to find a way to "increase the resolution" by multiplying by 10 or 100.
I highly appreciate the math behind your videos. A video on Taylor's remainder theorem would complement his existing videos on Taylor series and enhance our intuition in calculus.
At 26:56, when you show the moving distribuition, i cannot unsee the time evolution of a wavefunction representing a free electron (moving and spreading)! I'm just imagining the connection you made on that dice example with the wavefunction now.
A theorem that describes one of the most powerful tendencies in nature! Without it, things around us (and possibly including us) will not be the same as we know it.
This is the best content that explains CLT and all the concepts behind that, really outstanding! I would like to ask a question about something that confused me, In some real world examples it have been explained that CLT is used to find out some proportions of a population with some characteristic. For example the percentage of people within certain height interval. If we think about a similar question in rolling dices example I intuitively think that distribution graph at @27:32 infers that percentage of rolls that are 3 or 4 is more than other numbers if we roll the dices 100 times. I understand it actually means that the percentage of the average of any hundred dice rolls being between 3.16 and 3.84 is more likely. But still in the heights in the population example, when we say that 40% of people are between 1.75 and 1.85 cm tall, it also can be interpreted that if we pick a person probability of his height being between 1.75 and 1.85 is 0.4 but some assumption like this would be wrong in the dice example. Why these two things doesn't work together or what is the point that I don't understand? That would be very informative If you cloud give also a small real world example.
A statement like "40% of people are between 1.75 and 1.85 cm tall" coming out of the CLT is only accurate if you started of with a normal distribution (in which case the mean has the same distribution as a single result). In a way there are 2 applications of the CLT hidden here: 1) You argue that a persons height or say a measurement error in Physics is a result of a lot of independent factors whose influence averages out to a normal distribution due to the CLT (obviously for this one the conditions do not strictly hold, kinda like with the Galton board) 2) Then you can make a statement like this for a bunch of measurements by applying the CLT on the actual measurement results. In case of a dice roll 1. is clearly not the case so the statement becomes clearly untrue, in case of heights it is much more feasible that 1) is at least approximately the case.
In the industry we have a multitude of situations where this phenomena occours . Most known to everyone is rain distribution or for instance seeds , But reactors for chemicals production have a lot of distribution problems too. It is a interesting field
I've read the statements and proofs of the CLT and Law of Large Numbers many times, but only after watching this did I finally connect that the sum of N iid samples has standard deviation growing like sqrt(N). So, if you take the mean, then you end up with a sequence whose sd goes to zero, hence convergence in the LLN. Amazing explanation, thank you! (Thanks to Entropie for correcting me!)
Actually the variance of a sum of iid random variables grows with n. (maybe you meant the standard deviation) It is still true that the variance of the mean goes to 0 with rate 1/n nonetheless. The sqrt(n) specifically comes in since it is the rescaling necessary for stabilizing the variance at a constant value > 0 such that the convergence to a normal distribution of same variance becomes possible.
Awesome video. Thanks for posting. Even though I aced my stats classes, I had a huge gnawing discomfort. I'd always ask who came up with these formulae - what do they even mean. I was an outlier (pun intended). This video has finally quenched my curious thirst, after 25 years. Thanks.
an other nice theorem in probability is that all probability distributions are distributed uniformly, and this is the root of generating random numbers from different distributions. Hope to watch clear interpretation from 3B1B.
Interesentingly, is because of CLT that some real-world phenomena behave as Gaussian. They are implicitly sums of i.i.d. random processes. Amazing video btw 😄
Professionally I encounter log-normal distributions far more frequently, distributed computing. I'd love a similarly elegant explanation of how log-normal distributions arise when the CLT assumptions are violated in just the right way. For instance when X is not infinite but extremely skewed. Or when Xi and Xj are not completely independent, but their influence is much greater when Xi is large. In other words big values skew future sampling towards big values, where small values appear independent.
Thank you for this! This deals with the biggest issue I had when introduced to statistics and probablity in my undergrad degree. I hated it, because it always felt like a bunch of random formulas and manipulations thrown at us. I had to wait until postgrad to get a feeling of what was going on. Best part is that you managed to get there without measure theory 😄
Next video, explaining the π and how the function e^(-x^2) arises: ruclips.net/video/cy8r7WSuT1I/видео.html
As many helpful commenters have pointed out, at 6:37 and 7:15 the narration should say "skews left" instead of "right". In standard terminology, the skew direction refers to the direction of the longer tail.
Great
Please make a video on laplace transform
I'm so excited for a visual and intuitive explanation for it! In my stats class, CLT was proven rigorously but I couldn't "see" why.
Later in your Discord, I believe someone explained it to me intuitively in terms of moment generating functions, but I'm excited to see if you can leverage visuals for a more elemementary intuitive explanation
It would be cool to see the Demoivre-Laplace theorem (the original CLT) mentioned.
Cooool! I like solution for WHY
Please consider doing an entire series on probability theory and/or combinatorics.
I second that!
Grant replied to a comment in his last video and said that it'd be surprising if he doesn't make it by next year.
@@harshsharma03 And I stand by that comment. I made this video in part with the intent of inserting it into that series.
@@3blue1brown It did seem that way. Thanks for the amazing work Grant, you've helped me more than I can put in words.
@@3blue1brown I m waiting for u to do a series on theoretical inferential statistics..✨
I have a PhD in applied mathematics, I work in numerical weather prediction as a research scientist. Gaussianity is this hardcore part of the basics of forecasting the weather (even though most atmospheric variables, and their errors, are actually non-Gaussian). This video did a great job at teaching the CLT. I have never seen it explained so well.
As an assumption, it may be because, just as the Galton table, the errors and correlation add up in a "normal distribution" kind of way too, cancelling its effects on the overall distribution? (Haven't seen this topic in years).
@@mlucasl I don't know the weather field, but in geology (mineralogist estimation) the non-gaussian variables could be transformed into gaussians ones to work properly with them
High praise -- I watched in its entirety because of your comment. Thank you.
Yea.. meteorological variables often follow extreme value distributions.. I remember as I took a minor in applied agrometeorology as minor while majorin stats..
@@mlucasl I thought it was because the weather is more like what is described in chaos theory - small deviations lead to wildly different results. So they don't tend towards something nice like the Gaussian.
I can't tell you how insanely brilliant you are at taking a universal concept that is vaguely understood and illuminating all the nuance hidden in plain daylight to make this understood on a higher level!!! Genius
Dealing with CLT pretty much every day here.
Really impressed with how easily you explain it.
By far the most intuitive and easily understood explanation of CLT.
Salute!
We are all dealing with CLT every day everywhere. That is the Mother Nature law )
awesome! What field are you in if i may ask?
As someone who works with Kalman filtering on a regular basis, this is a very nice video to see. One of the core principles behind the Kalman filter is that all random variables involved must be Gaussian, which seems overly restrictive on the surface. I think this provides an excellent, succinct explanation for why that's actually a reasonable assumption for many systems, since every random process we can directly observe is really just a combination of many smaller processes. I look forward to the next one!
Yeah, I think it's worth remembering that assuming an RV is normally or lognormally distributed is a pretty minimal assumption, since your basically only saying that your observations are the result of a LC of an unknown number RVs that may or may not be orthogonal to eachother, and that there's _some_ kind of minimal number of RVs, depending on their individual distributions, that your measurements are in excess of. If you find that the distribution isn't normal, that actually gives you some information about the individual distributions themselves.
Rudolf E. be all, like, "Dude, the product or convolution of two Gaussian PDFs is Gaussian."
@@Eta_Carinae__ Some types of data can be assumed to be normally distributed, but not all. Some data is naturally uniformly distributed. Other data is naturally exponentially distributed. For instance, let's say I looked at the distances of home runs in the MLB. That is certainly not normally-distributed, since a lot of home runs are very close to the minimum possible distance. Or let's say I looked at the speeds of atoms of an ideal monatomic gas in thermodynamic equilibrium. These won't be normally-distributed. In fact, they will have a χ distribution with 3 degrees of freedom. Or how about gasoline usage? A lot of the population would be around 0, while the rest would probably look roughly normally-distributed, because a lot of people don't own a car. It's generally not a good idea to just assume data should be normally distributed because it depends on many different factors. Those factors are not necessarily equally important or identically-distributed or independent at all.
Typically, you can expect to find normally-distributed data when measurements can span a very large range of values relative to the standard deviation, when no particular special values are preferred, when the distribution should be symmetric with respect to the mean, and when data is clustered around the mean. In other words, it's normal if it's normal.
Or if you combine the data, e.g, by computing the mean or sum, which can often simplify modelling.
As an actuary, I'd say this is perhaps the best descriptive video/lecture I've ever seen on the CLT. I wish I would have seen this when taking my classes for exam P because the visualizations are so useful in understanding what can be a very dense topic when it's spewed from a chalk board/overhead screen hastily.
Describing the mean of the weights as the center of mass of the distribution was just incredible. And the intuitive matrix multiplication without even mentioning it. You are a great teacher!
I would absolutely love to see a series on probability/combinatorics/statistics on this channel. It's the subject I've struggled the most with in math by far. I think your ability to take the time to really think through and understand what the basic building blocks really mean will become a very valuable resource in my and many other people's math journeys.
I studied probability and statistics in university and learned about the central limit theorem, then totally forgot it. When I saw this video title I knew I had heard of it, but it took a while to remember it, for the first time in probably twenty-five years. Thank you for explaining it much better than our textbook did!
Seriously, imagine if all our stem subjects had teaching material like this.
Then imagine if we made it and exported it to the world.
Ngl, I could see China or India doing it, and then half the world learning from Chinese- or Indian- source lessons plans and curricula. Or like, Finland. Hopefully Canada.
As I was watching this video, I wanted to say that videos from your channel inspire me to learn. Not just mathematics, but anything worth doing. Although I am an engineer and enjoy doing what I do, I have never been a huge fan of pure mathematics. But the way you explain concepts just makes it so easy to understand. Even though I might have to rewatch some videos to fully comprehend the meaning, I really enjoy it and it never feels like a chore. I have watched your videos more than my university lectures. I wish there are more teachers like you in this world. Thank you so much 3Blue1Brown!
I really like your style of teaching. The way you help us discover things by slowly unrevealing it, instead of just telling the result, is awesome.
Like when you were making the formula for bell curve you just started with e^x and then how slowly slowly, step wise step, by encountering problems and then solving them, you finally reached the formula. That was an awesome mathematical journey.
And I enjoyed the ride! Woohoo!
One of the most interesting things I've learned in my math undergrad so far is that Brownian motion follows a normal distribution over time (at least, this was shown in the context of diffusion), which you elegantly explained in the first few minutes of the video. We had derived the diffusion equation from a formula modelling simple Brownian motion. I had never seen the connection between abstracted physical science and pure probability theory until then. Great topic!
I am constantly impressed by how Grant's videos extract the art that is inherent in certain mathematical concepts. What a great video!
A series on probability and statistics would be awesome! Everyone in my university hates prob & stat because our teachers are pretty bad, but I'm sure you could explain it really well.
This video is a great presentation of some of the most important ideas.
I could have really used this video before taking one of my hardest classes: college senior level probability and statistics, which I took in 1973.
All quarter, I kept asking myself: what were the prerequisites that I was supposed to take but that I must have missed. I have used this information almost every day of my life since then. The world would be so much better off if it was a required class just as freshman algebra class is. Alas, maybe we will be able to teach it better in the future.
You are a crazy good educator my friend, this video was a work of art, masterfully crafted, delightfully beautiful while still highly informative and surprisingly understandable in many levels, thank you very much for it! You're very talented and experienced in highliting the main concepts after building them up perfectly while hinting at a couple very interesting consequences or more complex aspects coming up later, balancing these with immaculate skill, hats off to you!
When i started watching this channel the things he was explaining to me where completely new to me and I was watching to learn those new things.
Now, after so many years, few exams away from a degree in software engineering, I'm still watching those videos, but not because I don't know the subject, just because I'm sure that he is going to get to conclusions in such a human and reasonable way giving lots of insights and new points of view that I surly never got at a university course...
Deam I love this channel❤️
CLT is a classic and beautiful. What's more mind boggling is that nowadays you can drop the identically distributed assumption and still get a general CLT (Lindeberg, Lyapunov). Probabilists keep finding a version of CLT in different settings that do not even converge to a Gaussian, but a different distribution like Tracy - Widom, Wigner's semicircles etc.
Can't wait to see 3b1b's take on the computation of the Gaussian integral, still one of the craziest places for pi to show up (maybe second to the Basel problem which he already covered). Even though the trick is very well-known, I am sure he'll have something new to say. Happy pi day!
Depends on the method I guess
WAIT, so normal distribution of all normal distributions is a normal distribution?
I just learned about the Central Limit Theorem in my AP Statistics class, but my teacher didn’t explain why it was true. Thank, you as always, for teaching at a deep level but still making it understandable 🙏
I feel like this video -- hell, this channel -- can be held up as an example of what good can come from the internet.
What I find great about easy to access videos like yours is that they'll make it easier for anyone to understand the intuition behind what they learn at school. Over time, I think the overall level of everyone will increase thanks to that, and we'll have more and more people that can make these fields progress
It might be a bit idealistic of a view but I sure hope it's true on the long run
Your assumption is probably correct. I slogged through statistics over 40 years ago and never got the intuitive feel, despite some good teachers. After a few years working in statistics, I lacked the confidence to continue, and switched tracks entirely (to translation). While the best minds will grasp this field quickly, the rest would benefit from seeing it from other angles, whereupon understanding might click.
There are very few results as beautiful as the central limit theorem. Thanks so much for the explainer vid!
As an AP Stats student in high school, all I have to say thank you this is amazing.
This couldn't have come at a better time. We just hit on CLT a couple weeks ago in my Engineering probability class. Your video's are always my go to for a deeper understanding of the material and I would say anyone not watching 3B1B is at a disadvantage in STEM. Unmatched visuals and eloquent explanations. Thank you Grant.
Probability and statistics are probably my weakest points in math (math that I've specifically learned about in school, anyways) so a full series would be great. Also this is a really good video as usual and I found it to be pretty easy to understand. Of course I would need practice and to re-watch some bits to clear some areas of misunderstanding I have but that's not an issue. Overall, this was very engaging!
I feel like this is one of those videos where I will be pausing more than watching
I could show this video to my dog and he would understand it. David Hilbert would surely be impressed with your teaching abilities. Thanks for bringing a smile to this old face:-)
That was the best description and comparison of the difference between variance and standard deviation that I have ever seen.
The graphical depiction of variance (as a square shape) versus standard deviation (the square root of variance), producing a line, was a revelation to me.
Hey Grant, awesome video! One point however, I believe at 7:15 you meant to say that the distribution is skewed slightly to the right-as using skew as a descriptor should follow the direction of the tailing data, not the direction of the majority of data. That is, the direction the mean is being “pulled.”
You don't know how glad I am that you made a video on this topic. I have been trying to understand the subject for a long time and this video helped me in an incredible way
Just a curiosity: almost everything in telecommunications depends on this theorem! it is extremely important!
Absolutely, and everyday I see new applications of it in the world.
this is by far one of the most MUST KNOW channels on youtube, actually on the whole internet.
Wow, the level of insight that can be gained by watching your videos is truly astonishing. Also, thank you for making me realise that "dice" can also be pronounced as "die" in English
A single one is called a "die" but two or more are called "dice", pronounced how you'd expect. Native English speakers frequently mess those terms up though, usually defaulting to calling everything "dice".
Yep. One die. Two dice. 🎲
Like one mouse. Two mice. 🐭
It cant be xpressed in words just how grateful I am to you for the work you do. Thank you grantt
Your videos are appreciated all over the world 🌍
Best greetings from Germany 🇩🇪
I'm very excited to hear the idea for a video delving into variance.
Explaining my trouble with variance: looking at the exponential function, we could absolutely choose to use 2^x or 10^x everywhere and just live with the correction terms ln(2) and ln(10) showing up (sorta like the π vs τ dealio). Out of convenience, e makes a 'better' base. But I can point out a number of 'deeper' reasons to use e as the base for exp than just convenience, and I can point to enough such reasons that using any other number seems 'wrong.'
Contrasting this with variance, I'm aware that taking the square of the differences (x-μ) is more convenient, but I can't tell you why it's the 'obvious' or 'correct' choice based on that 'deeper' reasoning. Maybe |x-μ| and |x-μ|^3 use abs() which isn't smooth, but then why not (x-μ)^4, or any other even power? On a desert island, building stats from the start, I don't know how to make that choice for (x-μ)^2 well motivated.
This is one thing I'm still confused about after watching the video as well. Hopefully it becomes clearer in the next video
this is probably not the right way to motivate it but perhaps it has something to do with using the “root mean square” (before Bessel’s correction) rather than the arithmetic mean as the “average” deviation?
idk im clearly talking out of my ass here and would love to see this properly explained in a future video
or, another thought; maybe think of variance geometrically, as the square of the n-dimensional “distance” between the point (mean,mean,mean,…,mean) and the point whose coordinates are your data points?
@@elrichardo1337 I don't know why I never saw it, it's pretty glaring now that you mention it, but I think you're onto something with the idea of a norm. Almost like I'm asking "if any p-norm works, why do we choose p=2, the Euclidean norm?" For geometry in a flat space, I know (more or less) how to answer that question. If this translates cleanly to variance in stats, I'ma be annoyed that I haven't seen it before.
@@nylonco7134 It's completely to do with a Euclidean norm. In fact, if you think about it, the standard deviation is exactly the Euclidean norm on the space of centered (mean=0) random variables (up to equality almost everywhere if you know these sorts of things, otherwise don't worry about it).
This is the best explanation for the use of the standard distribution I've ever seen
This is my favorite maths video ever. Another spectacular piece of work, Grant! Hope your 2023 is happy and fulfilling!
Where can we do these mathematical illustrations
Look up ‘manim’.
I remember when spreadsheets became a "thing" - Lotus123 - yeah.. I know.. I'm old... and playing around with the dice example in this video and how much it helped me understand Probability and Statistics. This video brings back memories of those days. Being able to visualize some of this stuff makes it so much more intuitive and makes me admire those geniuses from the past who figured it all out without the computing power we have today.
This channel is becoming more and more essential for my studies. Basically, if I'll manage to get my degree on time it will be thanks to you.
I think I ran into the CLT in the wild while trying to understand Stable Diffusion model training. I have two models with the sole difference being the weight loss function, and I wanted to quantify how this affects the models' ability to reproduce the likeness of the trained subject.
To do this, I generated hundreds of random images using each model, randomly batched these ensembles into smaller sets, and calculated a "closeness value" for each batch with respect to a given reference image. By using a fixed set of reference images I'm able to generate a large number of random data sets of closeness values for assessment. Here's where the CLT may have shown up.
I then plotted these closeness values in histograms to get a sense of how they are distributed for each batch. These small batch sized distributions were rather asymmetrical, and I didn't expect anything to happen when I combined data from multiple batches (for a given model). But, lo-and-behold, as I added more data to each histogram the shapes began to become much more symmetric, and bell shaped. As soon as I noticed this, this specific 3Blue1Brown video shot into my mind like a bolt of lightning as I began to finally feel the CLT in my bones.
One way or the other (need more data), even if I'm imagining it, moments like this are what keep me getting up at the crack of dawn. Thank you very much for the hard work, time, and effort you put into your videos. It has been almost a year since I first watched this one, but its effect has stayed with me this entire time. :D
I am an Electrical engineer and studied probability and statistics in Signals and Systems coursework. I wish we had such an intuitive explanation at the time!
Last year I looked at the electricity consumption of a large factory, with many processes happening at once, but with random variations. I was amazed when drawing a histogram of the frequencies of the difference values of the electrical demand, that the shape of the histogram was very close to a bell curve, except for the spike at zero that corresponds to electrical outages. The processes aren't even completely independent.
When I clicked on this video, I thought it would be about the Central Limit Theorem from calculus, so was surprised at its length. I didn't realize there was an identically named theorem in statistics! I've used the standard distribution a fair bit in the realm of data clustering and compression, but didn't understand all the nuance. This really expanded my understanding; great job!
This has been the most exciting math video I've ever watched! I'm in engineering and I've always hated statistics because it's so unintuitive. You just plig values in ang get values out and you memorize what they mean. For the first time, I actually understand why we do the things we do in stats. I especially like how this video implicitly explains why we need a minimum number of samples. Really great video
it's interesting that all of the ideas I came up with while learning statistics in my college are being shown here. My teacher was much less than adequate and only read from the textbook, but I think that might have actually helped me understand distributions as it forced me to learn on my own.
I would love to have been able to show this video to my friends who struggled in that class. but, looking at how math-focused this video was, I doubt it would have helped them much.
This is by far the best video I have seen on the CLT so far. This made me subscribe to the channel!
this helped understand clt so much. as a medical professional, we dont go into details like this, but this is really helpful thanks!
That formula for the standard distribution is truly beautiful and now that I know why all those factors are in there, keeping it "in place", it's so much easier to memorise!
Statistician graduate student here. Really well done. It's great seeing the fundamentals displayed so cleanly.
Only critique is to remind what makes a valid distribution earlier, when you first talking about it again, given how prominent that definition is here.
Reminds me how amazing this theorem is from grad school, the visuals are fantastic. Thanks Grant!
Thanks Grant, I've wanted to stee probability/statistics videos in your channel for years, and I was just doing a refresher on the topic and found your new videos!
I think you should combine them in a statistics playlist just like your "Essence of" series
Most clear explanation of CLT I have ever seen
Just a brilliant description of basic(?) probability. I thought I understood this already... I didn't. Thanks, so much, for taking the time to put this video together... I can't imagine how long it took... what a labor of love.
It blows my mind that all this is freely available for anyone to watch 🤯
3B1B never disappoints. I never figured out the derivation of Normal Distribution...
In no statistics course I have done have I learned how the formula of the standard normal distribution is derived. It seemed teachers either did not know or saw it as a "given". I have therefore always viewed the normal distributions as unnecessary complex and "unfathomable" (and as a consequence, hard). Now after this video it is clear as day. I love the explanation.
Thank you for this great video. It was greatly intuitive as well as engrossing too. Didn't even feel that 30 minutes had passed by at the end of the video.
thank you for this video. I always wondered why people talk about how important it is to find the CLT.
I'm thinking about a dice where the face with the value "1" has a probability of 1 and the other faces have a probability of 0. There is no bell curve anymore, just a single spike - no matter how often I roll the dice. Now it doesn't matter how I manipulate the other factors in the experiment. I forced a certain pattern to appear by manipulating just the probability distribution. I can force other patterns by manipulating the experiment even further. The bell curve happens to be an easys one to produce.
*Math does not care about whether you manipulate the experiment in "this way" or "that way".*
But humans do. "You like bell curves? - You can make it happen!" How about making waves by adding evenly spaced holes where the balls come out? If you take away your abilities of manipulating the experiments then it becomes harder and harder to enforce a bell curve. You only need a handful of changes to make it happen though. This video does a really good job at explaining how you can do that. You can use the insights in this video to bias yourself or debias yourself.
I'm sorry. Sometimes I'm thinking backwards. I mean, we know intuitively that deeper understanding can emerge from changing the perspective. But we tend to forget that we prefer certain perspectives.
I learned this in high school, university and grad school. But this is the first time I felt like I understood it. Amazing work GS!
I did pick up one of these once you showed it off!
And, as I was driving around doing chores, I finally realized that the binomial distribution is a discrete version of the Gaussian distribution. So many things that I didn’t learn (or was sleeping) during school 😂
The reason why CLT for sums and CLT for means behave differently in a sense was one of my favorite observations, the intuition is so elegant yet it's fundamental for quite a lot of science, the hints to ponder are all in the video but if anyone wants a more concrete direction - where does the N, number of rolls, show up in the mean of the sum of rolls? Where in its standard deviation? And what does it mean to take a mean rather than a sum?
Eagerly waiting for "Essence of Probability and Statistics" series...it will change the lives of many data professionals like me.
Also Grant consider adding paid youtube membership to your channel, with couple of clever badges and emoticons, you'll get more support along with patreon.
I don't know how do you come up with these examples but they totally hit my mind . I have never understand this theorm in this way😮
Oh man! I bought that same Galton board for my math classroom at MoMath when I visited last year! I wish I run into you there, that would have been awesome. Your videos were incredibly inspirational for the way I taught, so thank you for that! I have a new job now, but I think I'll be able to carry over a lot of the inspiration.
Amazing video! I wish I could watch this series of videos 20 years ago.😄
I am so long into that business...But you keep surprising with those playful approaches. Thank you!
Honestly man I really appreciate your videos. I'm going to get a job just so I can support your channel.
This channel always shows the best way in making people understand how things work by the unbelievable animation.
Thank you.
I suggest you to show behind the scene how you make the video.
That would be so useful for mankind.
Just like teachers in every circumstance can make good videos to make their children easily to understand.
I've been waiting for a probability series for a while now! Glad to hear it's being worked on :D.
Thanks for the amazing content, and as always, cheers for blowing our minds!
Always a good day when 3 Blue one Brown posts
this video is a piece of art. absolutely amazing quality
The sound of the ball bouncing is extremely satisfying...
This is one of my favorite videos so far, the clarity of your explanations is astounding!
This is... WOW... Math really is mysterious yet our mind can actually grapple it, tantalizing, Thanks Grant!
I have never commented on any video before, but I think this video worths a thank you and a congratulations. So well explained.
To me, I'd explain this to myself as "increasing the resolution of results to find the natural occurring standard normal distribution within it".
It's simply impossible to do that with 2 options summing up, or 6 options as you've pointed out with the dice. But 50 would absolutely work.
In other words, if I increase the resolution on the old blocky Mario from the late 80s on Nintendo, Mario suddenly becomes clearer like Mario from Nintendo games in the present.
However, this isn't free. In Mario's case, we need more processing power, or in a weird way... time. In the case for results where something only occurs once a day, we need to wait around 50 days for one simple "roll of the dice sum" example you've shown.
We'd need years of data in that case to start seeing any normal distribution. And at that point, are we even able to make the assumptions we need to which standard deviation on a bell curve allows us to, in time for it to matter?
However, I wish you would have started with the 3.5 decimal you show at the end of the video for the dice, in the very beginning. It's very obvious to understand/feel that you can't roll a literal 3.5 on a die, so you need to find a way to "increase the resolution" by multiplying by 10 or 100.
Superb explanation. This video helped me understanding the concepts easily. Thanks a lot
I wish there were these videos during my stats courses in uni. Rally great quality. 👏🏻
just two weeks before my econometrics exam, perfect timing!
I highly appreciate the math behind your videos. A video on Taylor's remainder theorem would complement his existing videos on Taylor series and enhance our intuition in calculus.
23:49 "The more things change, the more they stay the same" the math interpretation of this English idiom.
I am a student in Statistics, thank you for this awesome video!!!
clearly explained like a real teacher
At 26:56, when you show the moving distribuition, i cannot unsee the time evolution of a wavefunction representing a free electron (moving and spreading)! I'm just imagining the connection you made on that dice example with the wavefunction now.
A theorem that describes one of the most powerful tendencies in nature!
Without it, things around us (and possibly including us) will not be the same as we know it.
This is the best content that explains CLT and all the concepts behind that, really outstanding! I would like to ask a question about something that confused me, In some real world examples it have been explained that CLT is used to find out some proportions of a population with some characteristic. For example the percentage of people within certain height interval. If we think about a similar question in rolling dices example I intuitively think that distribution graph at @27:32 infers that percentage of rolls that are 3 or 4 is more than other numbers if we roll the dices 100 times. I understand it actually means that the percentage of the average of any hundred dice rolls being between 3.16 and 3.84 is more likely. But still in the heights in the population example, when we say that 40% of people are between 1.75 and 1.85 cm tall, it also can be interpreted that if we pick a person probability of his height being between 1.75 and 1.85 is 0.4 but some assumption like this would be wrong in the dice example. Why these two things doesn't work together or what is the point that I don't understand? That would be very informative If you cloud give also a small real world example.
A statement like "40% of people are between 1.75 and 1.85 cm tall" coming out of the CLT is only accurate if you started of with a normal distribution (in which case the mean has the same distribution as a single result). In a way there are 2 applications of the CLT hidden here:
1) You argue that a persons height or say a measurement error in Physics is a result of a lot of independent factors whose influence averages out to a normal distribution due to the CLT (obviously for this one the conditions do not strictly hold, kinda like with the Galton board)
2) Then you can make a statement like this for a bunch of measurements by applying the CLT on the actual measurement results.
In case of a dice roll 1. is clearly not the case so the statement becomes clearly untrue, in case of heights it is much more feasible that 1) is at least approximately the case.
In the industry we have a multitude of situations where this phenomena occours . Most known to everyone is rain distribution or for instance seeds , But reactors for chemicals production have a lot of distribution problems too. It is a interesting field
I've read the statements and proofs of the CLT and Law of Large Numbers many times, but only after watching this did I finally connect that the sum of N iid samples has standard deviation growing like sqrt(N). So, if you take the mean, then you end up with a sequence whose sd goes to zero, hence convergence in the LLN. Amazing explanation, thank you!
(Thanks to Entropie for correcting me!)
Actually the variance of a sum of iid random variables grows with n. (maybe you meant the standard deviation)
It is still true that the variance of the mean goes to 0 with rate 1/n nonetheless.
The sqrt(n) specifically comes in since it is the rescaling necessary for stabilizing the variance at a constant value > 0 such that the convergence to a normal distribution of same variance becomes possible.
@@entropie-3622 yes! I did mean sd. Thank you for pointing that out, I'll update.
Thank you.. This is so helpful. Not just this one..... You rock in every single videos..... Thank you again
Awesome video. Thanks for posting. Even though I aced my stats classes, I had a huge gnawing discomfort. I'd always ask who came up with these formulae - what do they even mean. I was an outlier (pun intended). This video has finally quenched my curious thirst, after 25 years. Thanks.
an other nice theorem in probability is that all probability distributions are distributed uniformly, and this is the root of generating random numbers from different distributions. Hope to watch clear interpretation from 3B1B.
Interesentingly, is because of CLT that some real-world phenomena behave as Gaussian. They are implicitly sums of i.i.d. random processes. Amazing video btw 😄
Professionally I encounter log-normal distributions far more frequently, distributed computing. I'd love a similarly elegant explanation of how log-normal distributions arise when the CLT assumptions are violated in just the right way.
For instance when X is not infinite but extremely skewed.
Or when Xi and Xj are not completely independent, but their influence is much greater when Xi is large. In other words big values skew future sampling towards big values, where small values appear independent.
Thank you for your super clear explanation of the standard distribution function. I’ve always wondered about where it come from!
Thank you for this! This deals with the biggest issue I had when introduced to statistics and probablity in my undergrad degree. I hated it, because it always felt like a bunch of random formulas and manipulations thrown at us. I had to wait until postgrad to get a feeling of what was going on. Best part is that you managed to get there without measure theory 😄