The differential equation is interesting, but to me, the defining feature is sum stability. Namely, the convolution of two normals is a normal. Stability is necessary for an attracting fixed point, and while not a proof of the CLT, it should definitely be a strong hypothesis by then.
The natural emergence of the normal distribution from practically any sort of underlying random process is one of the most important parts of really understanding how the world works. The "shape" of the individual low-level process doesn't matter much at all - it's the way the aggregation of large numbers of them behaves that leads to the normal distribution.
I do think there is actually an other video wich cancels out the one you are talking about. In fact at 15:01 I don't think there sould be any minus sign factprising by 1/2. Therefore when he integrates e^-u, he also forgets the minus there and so it perfectly compensates. I think doing his video he has mistaken while shooting it or whatever. Don't hesite to tell me if my explanation is wrong.
how would you know this differential equation satisfies a normal distribution without knowing what the formula is for a normal distribution already. so you are not actually deriving the normal distribution from a differential equation, you are using circular reasoning by knowing in advance what a normal distribution is, to arrive at a differential equation it satisfies to justify the formula for a normal distribution that you already know. deriving it means without knowing in advance what the normal distribution formula is to arrive at a formula for a normal distribution
No, the differential equation comes from the desired qualitative definition. We want a function whose frequency falls off proportionally to it's distance from the mean. Without knowing the function, it can be reasoned that it will look something like a bell curve, with positive probability density everywhere, but rapidly falling away from a center peak value. If you let y be the distribution, M the mean, and x be the data values, then the question translates to: "For what function y is it true that the rate of change of y at a given point is proportional to the distance we currently are from M?" Directly writing this as a differential equation, you get what is in the video. Solving it gets you the specific equation.
Silly. It is Not circular reasoning to show the solution to a particular differential equation results in the normal distribution. It’s provides much insight in fact. If someone “derives” Newtons laws from Lagrangian mechanics, it’s not circular reasoning to note it results in Newton’s laws which you know in advance. One has then shown equivalent formulations which are far from obvious.
The differential equation was set up to interpolate binomial and Poisson distributions as n tends to infinity. The main reason this was done was because for large n, the binomial and Poisson distributions have factorial terms which are computationally intensive, so the mathematicians of that time wanted to approximate these distributions by a continuous function since as n is large the rectangles get finer. So the first time they ran into the normal distribution was by interpolation of these discrete distributions
At the beginning when you do separation of variables and integrate why does -k(x-mu) become (-k(x-mu)^2)/2 ? Isnt mu just a constant? So it should just be -k(x^2 -mu*x)/2 ... ?
Nice! I loved the conversion to polar--it's much more succinct than methods of sovling the gaussian integral that I've seen before, though I think that says more about my mathematical experience than anything.
This is a quite comprehensive video overall, but I do think it lacks a mention of /why/ that definition of normally distributed data appropriately describes the behavior of the balls-and-pegs simulation.
y' being proportional to y (edit: yx)is an interesting property of a normal distribution, but i don't think it's the a priori defining trait. probably the way to do it is to show that the characteristic function for sample average converges as n increases, and then compute the corresponding density. of course its far more usual to use MGFs, but the fourier transform is directly invertible, so you don't need to "already know" the normal distribution.
I am not convinced by the step where you integrate at the separation of variables step and you say that (x-mu) should go to (1/2)(x-mu)^2. Applying the power rule would not give that result since mu is just a constant term, where did the mu squared come from out of thin air?
you can use a substitution t = x - μ and get this result, or just with plain integration get k(x²/2 - μx) + C. here C is an arbitrary constant, so we can express C = kμ²/2 + C1 for some other constant C1, thus we get k(x² - 2xμ + μ²)/2 + C1 = k(x - μ)²/2 + C1
Just because of the 2π in the formula, we can concur and confer that from all parts of the world that that somhow (ironically) the bell curve rests on the shoulders of a circle. Your approach is very informative and can benefit neurodiversity in the mathematical realm of academia with a simple and succinct explanation as this.
Galton board is based on Binomial theorem extraction of coefficients in you case C=(1+x)^13 expand. It has bell curve shape but Gaussian normal distribution is integral over real numbers lie. Dices, coin flips, Galton board pegs can be got also from Pascal triangle are integer numbers and has nothing to do with Gaussian norm. dist. formula.
I like this derivation starting from the differential equation. The differential equation provides an interesting insight into the nature of the curve that I had never appreciated. However I don’t think I’ve ever seen it before as a way to define the normal distribution. Any derivation I’ve seen is a practical one usually starting from the binomial distribution and applying the central limit theorem. Thus the differential equation would be a derived result. Can you point to a source which defines the normal distribution in this way? What would motivate it?
I’m currently taking diff eqs, which I’ve been told is not normally a prerequisite for statistics. But it looks like in this case, being able to start the derivation with dy/dx and then separate variables is a huge time saver.
@@ritardstrength5169Perhaps, but my point is it’s not a definition of normality. It makes no connection to empirical results, namely the application of the central limit theorem to binomial experiments. It merely reverse engineers the derivative of the normal curve.
you should explain how the formula of dy/dx comes about. It is straightforward but a layman has no clue. You can talk, e.g. about the amount of change in the size of two consecutive surfaces underneath the curve. So you get the difference between (X1-m).(Y1-0) and (X2-m).(Y2-0). Any kid can understand that.
The true question is why a normal distribution (based off the differential equation definition you gave) is representative of many real world distributions (even the marble experiment you showed)?
Given the tdefinition of a Gaussian distribution in the video (values further away from the mean have less height in the graph) gives a way to interpret the marble example: There is only a single path that a marble can take to land at the far right or far left, but many paths will land it in the middle. Another way to say it is the probability of a marble landing in a certain spot is given by the number of paths the marble can take to land in that spot out of the total number of paths.
Accepting Hubble's finding that space is expanding and hence, our universe has a spatio-temporal origin and boundary, is apparently the conventional world-view in empirically focused natural and sub-space social science, excluding any contingent statement of form, "this may happen", since it's temporal domain doesn't coincide with this, but you may be able to 'have your cake and eat it', through dedicated notation like < as the opposite of >, instead of concatenated non-number-numeral - < number-numeral say, because the law of non-contradiction : nothing is it's opposite, is irrelevant in an instrumentalism consistent predicted world.
You're misinformed. The statement "one does not derive the normal equation" is not accurate. The normal equation can indeed be derived, and it is a crucial aspect of linear regression analysis. The normal equation provides an analytical solution to the linear regression problem, specifically for finding the values of the parameters that minimize the cost function. Here's a brief overview of the derivation: Given a hypothesis function (h_{\theta}(x) ) and a cost function ( J(\theta) ), the goal is to minimize (J(\theta) ). In matrix notation, this problem can be represented as minimizing the function ((X\theta - y)^T(X\theta - y)), where ( X) is the design matrix of input features, (\theta) ) is the design matrix of input features, (\theta) is the parameter vector, and (y) is the vector of output values. To find the minimum, one takes the derivative of the cost function with respect to (\theta ) and sets it to zero. This results in the normal equation: (X^TX\theta = X^Ty ). If ( X^TX ) is invertible, we can solve for (\theta) by multiplying both sides by ( (XTX){-1} ), yielding (\theta = (XTX){-1}X^Ty), which is the solution that minimizes the cost function. This derivation is a standard procedure in machine learning for obtaining the least squares estimates of the regression coefficients.
You are not aware how much I appreciate this video
❤️❤️❤️🙏🙏🙏
Excellent delivery. Well articulated.
I can't express how much I appreciate this video
❤️❤️❤️🙏
Very well done. Rarely encounter something so fundamental so simply explained.
❤️❤️❤️
you literally open my eyes. I was searching many explanations on the internet of how to derive this and i finally get it.
❤️❤️❤️
Sounds very aggressive
In video 15:05 there should be 1/2. (-1/2) appear only after integration of exponential (-u).
Thanks
Excellent!
very thorough nice video!
The differential equation is interesting, but to me, the defining feature is sum stability. Namely, the convolution of two normals is a normal. Stability is necessary for an attracting fixed point, and while not a proof of the CLT, it should definitely be a strong hypothesis by then.
I really appreciate your effort
The natural emergence of the normal distribution from practically any sort of underlying random process is one of the most important parts of really understanding how the world works. The "shape" of the individual low-level process doesn't matter much at all - it's the way the aggregation of large numbers of them behaves that leads to the normal distribution.
Very well said❤️
Thank you for your precious time.
❤️❤️❤️🙏
God bless you
Very insightful derivation of the Normal distribution. Thanks.
❤️❤️❤️
may I ask why the integral of e^-u is not -e^-u? where did the negative sign go? Thank you for the great video!
I do think there is actually an other video wich cancels out the one you are talking about. In fact at 15:01 I don't think there sould be any minus sign factprising by 1/2. Therefore when he integrates e^-u, he also forgets the minus there and so it perfectly compensates. I think doing his video he has mistaken while shooting it or whatever. Don't hesite to tell me if my explanation is wrong.
Thanks
Wow, that was surprisingly way simpler than I imagined. Next do boltzman distribution please!!
I second this!
how would you know this differential equation satisfies a normal distribution without knowing what the formula is for a normal distribution already. so you are not actually deriving the normal distribution from a differential equation, you are using circular reasoning by knowing in advance what a normal distribution is, to arrive at a differential equation it satisfies to justify the formula for a normal distribution that you already know. deriving it means without knowing in advance what the normal distribution formula is to arrive at a formula for a normal distribution
No
No, the differential equation comes from the desired qualitative definition. We want a function whose frequency falls off proportionally to it's distance from the mean. Without knowing the function, it can be reasoned that it will look something like a bell curve, with positive probability density everywhere, but rapidly falling away from a center peak value.
If you let y be the distribution, M the mean, and x be the data values, then the question translates to: "For what function y is it true that the rate of change of y at a given point is proportional to the distance we currently are from M?"
Directly writing this as a differential equation, you get what is in the video. Solving it gets you the specific equation.
Silly. It is Not circular reasoning to show the solution to a particular differential equation results in the normal distribution. It’s provides much insight in fact. If someone “derives” Newtons laws from Lagrangian mechanics, it’s not circular reasoning to note it results in Newton’s laws which you know in advance. One has then shown equivalent formulations which are far from obvious.
The differential equation was set up to interpolate binomial and Poisson distributions as n tends to infinity. The main reason this was done was because for large n, the binomial and Poisson distributions have factorial terms which are computationally intensive, so the mathematicians of that time wanted to approximate these distributions by a continuous function since as n is large the rectangles get finer. So the first time they ran into the normal distribution was by interpolation of these discrete distributions
❤️❤️❤️
At the beginning when you do separation of variables and integrate why does -k(x-mu) become (-k(x-mu)^2)/2 ? Isnt mu just a constant? So it should just be -k(x^2 -mu*x)/2 ... ?
Ah so you just add a constant term and absorb it into k???
No actually don't understand why. Can someone explain?
You are an excellent teacher
Thank you! 😃❤️❤️❤️
Nice! I loved the conversion to polar--it's much more succinct than methods of sovling the gaussian integral that I've seen before, though I think that says more about my mathematical experience than anything.
❤️❤️❤️
16:06 Why isn’t is negativ e to the negativ u, instead of just e to the negativ u? When you differentiate e^-x you don’t get e^-x you will get -e^-x
Thanks
@ 14:38 Why is it -1/2, it should be +1/2 surely?
Thanks
Why have you used delta to represent the standard deviation rather than sigma?
Thanks for noticing the greek symbols. Symbol Encoding issue
This is a quite comprehensive video overall, but I do think it lacks a mention of /why/ that definition of normally distributed data appropriately describes the behavior of the balls-and-pegs simulation.
wow, this was very easy to understand, thank you!
❤️❤️❤️
y' being proportional to y (edit: yx)is an interesting property of a normal distribution, but i don't think it's the a priori defining trait.
probably the way to do it is to show that the characteristic function for sample average converges as n increases, and then compute the corresponding density. of course its far more usual to use MGFs, but the fourier transform is directly invertible, so you don't need to "already know" the normal distribution.
Thanks for the insight❤️❤️❤️
On the Definition, how does the differential equation apply the curve without any approval?
I am not convinced by the step where you integrate at the separation of variables step and you say that (x-mu) should go to (1/2)(x-mu)^2. Applying the power rule would not give that result since mu is just a constant term, where did the mu squared come from out of thin air?
you can use a substitution t = x - μ and get this result, or just with plain integration get k(x²/2 - μx) + C. here C is an arbitrary constant, so we can express C = kμ²/2 + C1 for some other constant C1, thus we get k(x² - 2xμ + μ²)/2 + C1 = k(x - μ)²/2 + C1
Just because of the 2π in the formula, we can concur and confer that from all parts of the world that that somhow (ironically) the bell curve rests on the shoulders of a circle. Your approach is very informative and can benefit neurodiversity in the mathematical realm of academia with a simple and succinct explanation as this.
Thanks❤️❤️❤️🙏🙏🙏
Brilliant
Galton board is based on Binomial theorem extraction of coefficients in you case C=(1+x)^13 expand. It has bell curve shape but Gaussian normal distribution is integral over real numbers lie. Dices, coin flips, Galton board pegs can be got also from Pascal triangle are integer numbers and has nothing to do with Gaussian norm. dist. formula.
Thanks
Beautiful explanation!
❤️❤️❤️
I like this derivation starting from the differential equation. The differential equation provides an interesting insight into the nature of the curve that I had never appreciated. However I don’t think I’ve ever seen it before as a way to define the normal distribution. Any derivation I’ve seen is a practical one usually starting from the binomial distribution and applying the central limit theorem. Thus the differential equation would be a derived result.
Can you point to a source which defines the normal distribution in this way? What would motivate it?
Here is one link
spoudai.unipi.gr/index.php/spoudai/article/download/853/932
I’m currently taking diff eqs, which I’ve been told is not normally a prerequisite for statistics. But it looks like in this case, being able to start the derivation with dy/dx and then separate variables is a huge time saver.
@@ritardstrength5169Perhaps, but my point is it’s not a definition of normality. It makes no connection to empirical results, namely the application of the central limit theorem to binomial experiments. It merely reverse engineers the derivative of the normal curve.
This was awesome!
Is this related to elliptic curves??
Thanks
@@BecauseMaths i mean is it related it seems to be related to
Thanks
How did you get the left hand side of the differental equation? How did you know it was of first degree ? Did you just guessed ?
Thanks for the suggestion, I should have done that.
Informative.
❤️❤️❤️
why you using delta? instead of sigma??
Should be sigma. Typing issue
you should explain how the formula of dy/dx comes about. It is straightforward but a layman has no clue. You can talk, e.g. about the amount of change in the size of two consecutive surfaces underneath the curve. So you get the difference between (X1-m).(Y1-0) and (X2-m).(Y2-0). Any kid can understand that.
Thanks for the suggestions❤️
The true question is why a normal distribution (based off the differential equation definition you gave) is representative of many real world distributions (even the marble experiment you showed)?
Given the tdefinition of a Gaussian distribution in the video (values further away from the mean have less height in the graph) gives a way to interpret the marble example:
There is only a single path that a marble can take to land at the far right or far left, but many paths will land it in the middle. Another way to say it is the probability of a marble landing in a certain spot is given by the number of paths the marble can take to land in that spot out of the total number of paths.
❤️❤️🎁
Need measure ultrasound force to see background biased
Accepting Hubble's finding that space is expanding and hence, our universe has a spatio-temporal origin and boundary, is apparently the conventional world-view in empirically focused natural and sub-space social science, excluding any contingent statement of form, "this may happen", since it's temporal domain doesn't coincide with this, but you may be able to 'have your cake and eat it', through dedicated notation like < as the opposite of >, instead of concatenated non-number-numeral - < number-numeral say, because the law of non-contradiction : nothing is it's opposite, is irrelevant in an instrumentalism consistent predicted world.
Thanks
mindblown
just a small thing to mention. [δ=delta= "δέλτα"] and [σ=sigma ="σίγμα"].
Thanks❤️
Subscribed!😊
❤️❤️❤️
the symbol you are using for sigma is actually delta 😂
Thanks.
One does not “derive” normal equation. The presenter is misinformed.
You're misinformed. The statement "one does not derive the normal equation" is not accurate. The normal equation can indeed be derived, and it is a crucial aspect of linear regression analysis. The normal equation provides an analytical solution to the linear regression problem, specifically for finding the values of the parameters that minimize the cost function.
Here's a brief overview of the derivation:
Given a hypothesis function (h_{\theta}(x) ) and a cost function ( J(\theta) ), the goal is to minimize (J(\theta) ). In matrix notation, this problem can be represented as minimizing the function ((X\theta - y)^T(X\theta - y)), where ( X) is the design matrix of input features, (\theta) ) is the design matrix of input features, (\theta) is the parameter vector, and (y) is the vector of output values.
To find the minimum, one takes the derivative of the cost function with respect to (\theta ) and sets it to zero. This results in the normal equation: (X^TX\theta = X^Ty ). If ( X^TX ) is invertible, we can solve for (\theta) by multiplying both sides by ( (XTX){-1} ), yielding (\theta = (XTX){-1}X^Ty), which is the solution that minimizes the cost function.
This derivation is a standard procedure in
machine learning for obtaining the least squares estimates of the regression coefficients.
❤️❤️❤️