I'm so impressed with how in depth and without choking you are capable to teach statistics. They are a pretty complex subject, and thanks to you we are getting to understand them, and who knows, maybe some of us even like them. THANKS!
Beautifully explained! This is a topic that gets glossed over a lot in statistics courses and I really appreciate the amount of time you devoted to it.
Just commented on your other video, but ended up watching this too by accident, when I was searching information on degrees of freedom. As a medical student doing research on biostatistic heavy subject, you truly are a lifesaver! Stuff like this really helps me to keep on going instead of needing to go through multiple statistic courses in uni on top of all my other studies and research project work!
if you know maths and desrciptive statistcis already, excep degrees of freedom and use of (n-1) instead of n, the critical explanation for you starts at 13.11 and ends at 15th minute but it's kind of explained away without a real explanation. let me look for the sections about regression etc.
Thank you for actually explaining the meaning behind DF clearly before jumping in to any abstract analysis of numbers. It is frustrating trying to find clear content so thank you for that.
Thanks for putting these together man. I'm not a student; I'm just brushing up on my stats. Your explanations are spot on. Had I had a teacher like you, I wouldn't be brushing up on my stats now (and I had some great mathematics teachers).
Big fan Justin, you really don't know how helpful and mind-blowing are these videos for like 'whole world'. I wish you health and happiness in these extraordinary times. Where are you from, as in country and city? I would be fortunate to meet you someday, you're a great guy!
I'm greateful for your lectures, and can say that this specific topic was always somehow incomplete for me, until now! I'm studying calculus, and statistic is a challenge for me. Thank you, and be always healthy!
this is the best video i have found online to tell me the DF, it is the independent pieces of information that exists in a sample to predict the main population. if were to predict, we must know the minimal values of independent pieces of sample to do a prediction over the population, generally, the more df the more accurate the prediction from the sample
This is a very helpful explanation on a topic that is all too easily glossed over, but I think it is essential to getting a firm grasp of what we are doing with statistics. Thank you for taking the time to post it.
Absolutely amazing explanation of degrees of freedom. You gave a very good, simple and easy example of the urchins through which what df is and how its calculated could be immediately understood. Great job.
That is a very explanatory, cool sound, and step-by-step lecture. I really like it because it is very easy even for new beginners of statistics. Good job! many thanks.
Thank you so very much for this thorough and well delivered explanation of a complex concept that many educators try to breeze over. The type of explanation you provided is rare and I can't believe how smoothly and clearly you delivered your content. I am super impressed and very inspired as I tutor statistics to my fellow students. Thank you again. Liked, Subscribed and hit the bell 😊🙏
From 17:00, for the next minute: that's where it clicked for me, and I (somewhat) understood what degrees of freedom means. Thank you! Great breakdown.
you changed n to n-1 without any mathematical proof. any other proofs? if we want to inflate the estimate, why not make it n-2? i am assuming there is some robust mathematical reasoning.
We use df primarily when estimating variances, because we know that dividing by n underestimates population variance. To my knowledge, ther is no formal mathematical proof to show that n-1 is necessarily "correct". We can never know it is 'correct", because in the real world we never know the true population values. That said, statisticians have demonstrated the concept using toy data sets for which the population values are defined. With these models, n-1 reliably made the variance estimates better. If n-1 had been better at the job, they would have chosen it. It's the mathematical equivalent if having thermometers that reliably read 10% colder than reality. We are simply compensating for a known bias in the tool.
The no. of Degrees of freedom of sum of squares = no. of independent variables in that sum of squares. Let SS= sum of (yi-ybar )^2 ,i=1,2..n here SS is sum of square of n elements (y1-ybar),(y2-ybar)....(yn-ybar). These elements are not all independent bcz sum(yi-ybar)=0 (which condition for dependence of variable). So we use ' n-1 ' degrees of freedom for SS insted of n.
@@niemand262 there are mathematical proofs as to what gives the best estimate on average (i.e. an unbiased estimate). In terms of standard deviations, its called "Bessel's correction" and there is a proof as to why we use n-1. As for using n-p, i.e. some other number of degrees of freedom, I THINK these are calculated by seeing how many of the data points are free to vary and still give us the same statistic. For example, if we calculate the mean of [x1, x2, x3], if we vary any of them, we can just move another one so that the mean stays the same. As we can move any of them, we have n=3 degrees of freedom. If we are estimating population variance from a sample without knowing the population mean, we are solving 2 equations (one for mean and one for variance) with n unknowns. As such, we can "replace" one of the n data points in the equation for standard deviation with some function of the sample mean whilst still technically expressing the standard deviation in the same way. As we can do this replacement of 1 of the data points with one of our statistics, we have n-1 degrees of freedom. This could be slightly wrong (I came to this video hoping for a full mathematical explanation) but I'm fairly sure its the gist of it.
@@henrysorsky thanks for pointing out "Bessel's correction". Your intuitive explanation for degrees of freedom makes sense, but why doesn't the same intuition also apply for the standard deviation of the population? After all, given a set of N observations, if you know the standard deviations of n-1 points around the mean, you know the value of the Nth standard deviation.
So, I have a question. What if the Population mean, bz chance happens to lie right on the the place where our sample mean is calculated. Then wouldnt we be unnecessarily inflating the variance? Might be a silly question but I am understanding the concept so well that just wanted to ask. :)
Could you please explain why we use degrees of freedom to adjust the difference between sample statistics and population parameters? What does that have to do with "independent pieces of information"?
@@pk-uk5lc Hey there, buddy! It ain't about being Indian or American, it's about the individual's ability to simplify things and make them more understandable. There are tons of examples of amazing Indian educators out there, not just on RUclips.
. Is that you ? . Nice to see a face to a name . Been watching a few of your videos . Thank you . Hope to use the skills into my retirement... touching 60 .
Hello, great job on all the explanations. But my question is: I understand that we need to "inflate" the computation of the variance from estimate of the population mean by x bar, why should this "inflation" be by dividing by n-1? Why not divide by (n/2)? I did not see the answer to: where did n-1 come from? I will listen to the rest of the video in case the answer is in the remaining part...
17:13 Only with the third observation that we have a "degree of freedom" such that the regression line can cut through the points, to get the errors and the parameters.
@14:25 OMG, all this time, I took DoF as an abstract peculiarity of the equation. So the Sample Mean is just an estimate of the Population Mean. Reducing n by 1, you decreasing denominator and therefore inflating the variance to get a better estimate of the Population Variance, thereby including sampling error. Even though, I still don't know why you will use 2,3,4 DoF, I feel must more relieved now that this major stumbling block is removed. Thank you for breaking it down!
17:45 The explanation of having k "X" variables was a bit confusing, I had to go through a second time to understand. I am not sure calling all the variables "X" variables is correct??? Wouldn't we just say "the number of variables?" One of the independent variables is perhaps on an X axis, the other independent variable on the Y. The dependent variable is on the Z. Good explanation everywhere else - the Chi-square examples were especially interesting.
At 10:00, isn't it the case the numerator must be zero as well as the denominator since x -x_bar is also zero. So its not "an explosion" but undefined.
14:20 why do we inflate the estimate by choosing to go from n to n-1. Why not n to n-2 for example? I'm baffled at all the responses saying how everything is clear to them, just because the denominator n-1 is smaller than n. Well n-2 and n-3 etc are also smaller than n. There's a mention of Bessel's correction in the comments below but nothing else anywhere in the video about what's really going on here.
23:38 What confuses me about this line of reasoning is that it requires you to know the total number of samples. You can't derive the number of samples for the missing category if you don't know the total. So isn't the total a piece of information in itself? To me it looks like you have four pieces of information: category 1, category 2, category 3 and the total. So you have 4 pieces of information. But one of those doesn't contribute anything new and is therefore obsolete which would leave you with 4 - 1 = 3 independent pieces of information and not 2. What am I missing?
In a regression, when you have three observations (n = 3), based on formula, degrees of freedom is 1. What does the number 1 mean? (based on your definition at the 6 minute mark on the video?)
The reason n-1 is used in calculating s^2 is because n-1 is unbiased estimator of sigma^2 or the variance. You did some wishy-washy hand waiving. If you don't want to work through the math, just say it can be shown that if n is used to calculate s^2, the bias is ((n-1)/n)*sigma^2 so by using n-1 you instead end up (n-1)/n-1) all times sigma^2. which is unbiased
For Descriptive Statistics: 1) Why does the variance use the squares of the deviations instead of their far more intuitive absolute values? The variance considers points farther from the mean as much more important for no particular reason I can see. 2) You say the n-1 term is included to make the sample standard deviation "bigger". But that explains nothing. Why not sqrt(n)-1? (you still get undefined for the case where n=1). Why is it n-1 specifically? In short, how can I look at any system and determine the degrees of freedom, even when they are other than n-1?
Shouldn't the DF of x bar in the descriptive statistics section be 4 instead of 5? When calculating the mean of 5 values, only 4 of those can vary. The last one can only take one possible value.
Justin, I had a doubt. Is it okay to say, in simple regression analysis, y hat has n-2 dof because beta1 and beta2 are estimates of the population coefficients that we don't know ??
Noice explanation, but I am confused. You mentioned that skewness has DF=3 and kurtosis has DF=2 given that n=5. But I think both skewness and kurtosis have DF=3 because, In your other video of moments, you were explaining the higher moments. For the second moment, we need to standardize the first moment, and for the third moment, we need to standardize the second moment. Till here all is good, but you mentioned for the fourth moment (i..e. Kurtosis), we don't need to standardize the third moment, it can be derived by the second moment itself. Am I going in the right direction? 🤣🤣🤣🤣. Hope so. Just reply to me once, what should be the DF of Kurtosis with 5 data?
@zedstatistics Dear Sir, when you explained about the n-1 in the denominator of S.D., it was more of an empirical observation. I would like to know if we have a mathematical deduction of this formula. Thank you
When the person that was in fact, seated on the front row, asked this question, professor said something like this: I don't know how to explain this to you, because you don't know enough statistics, I don't want to insult you, but you won't understand this. He was an ass in my opinion.
13:53 so your saying we arbitrailly use n-1 to inflate the variance because the population variance (and thus Std Dev) may, or likely to be, larger? Why not use n-2 then? I guess because you need two data points to get to a standard dev.
I still don't understand why it's (n-1) to inflate your variance? Why isn't it (n-N), where N=1,2.... How does 1 make a sample variance into a population variance?
Hi, thanks for nice explanation! I have a question about calculating degrees of freedom in chi-squared test. In population genetic study, the degrees of freedom is calculated as "the number of categories - the number of parameters" when we do chi-squared test for testing Hardy-Weinberg equilibrium. For example, if there are 6 genotypes(AA, BB, CC, AB, BC, AC), the number of categories(genotypes) is 6, and the number of parameters(alleles) is 3(A,B,C). So the degrees of freedom is 3. Do you know why is this? Additional description : Sum of allele frequencies is 1. And the expected genotypes are calculated from product of allele frequencies. If number of observed genotypes are 30,40,30 for AA, AB, BB respectively, allele frequencies are 0.5, 0.5 for A, B respectively. So the expected genotype will be 25(0.5*0.5*100), 50(2*0.5*0.5*100), 25(0.5*0.5*100) for AA, AB, BB.
What softwares are you using to make these excellent videos? They are amazingly crisp and clean! Under the current pandemic and wanting to create better videos, I would love to know what tools you use. If you could share, that would be great! Thank you for making these videos!
11:40 "using absolute values is clunky, statistically [which is why we use the variance instead]". I'm curious what the clunkiness you're alluding to is. I was taught that variance/stddev includes outliers a bit more than if we used absolute values, and that that was typically something we wanted in statistics, but that never made much sense to me. Great video, thank you! I'm a stats tutor in college and the prof for this class definitely handwaved DF.
i think it has to do with absolute values functions not being differentiable everywhere. finding optimums is not as easy as with squared functions, which are differentiable everywhere
great explanations at the beginning....I would suggest to define what things are at the beginning of the section, such as what the heck is a Chi squre test for. I had to stop the video and go out and google to see that it's useful when you have a model or hypothesis first to compare against observed data. So apparently you use it when you have a hypothesis. Also the table at the end seems backwards- I would visualize with the unbleached coral on the left if that's showing expected healthy distribution because it seems backwards to explain the other way..... maybe I'm wrong thinking about this... Now I have no idea why you are multiplying 60 x 50...what exactly do you mean by the marginal values?... oh well, People with not good math sense need all the help we can get. This is why I stopped paying attention in math when I was in middle school. The teacher would say something and go right on as if we all knew what they were talking about. Prior to the Chi Square section everything was making sense. Thank you. I probably need to go watch the actual Chi Square video if I want to know about that.
It couldn't be a better explanation! Unfortunately, when we ask ourselves: "what is, again, a "degree of freedom?", a half-hour response would not come to mind... 😢
Your videos are so useful, thank you so much! One thing I can't get my head around here though. So, we divide by n-1 (as opposed to n) to account for the variance needing to be larger as our sample mean is just an approximation of the population mean and the variance of the population mean is as small as it can be. But, we don't know the population mean so our sample mean could be the same as the population mean and thus we would be over estimating the variance by dividing by n-1 and not n. Is this true?
Overestimation can occur only when n, sample size is greater than population size, which by definition is impossible. You can't gather more units to measure than absolute best case scenario, assuming you have available a 100% of information. For example, you can't run a questionnaire through all 8 billion people. And even if you could, you can't hope they all will have answered 100% honestly without biases. If you'd divide your result by 8 billion and 1, your result by default will have a smaller value than the true value. Dividing by a larger value gives you a smaller value. Anything we measure is less than 8 billion people, hence, all of our results will be larger or smaller than the true value. If we wouldn't have measured 1 person out of a total population, that 1 person would be the final measurement before the sample mean becomes a population mean. That one last piece of information increases or decreases the sample mean into a population mean. We divide by n-1, or n-k-1, or (c-1)(r-1) to have our guess be as close to the actual population as possible. We know that any guess we make is not spot on a population guess. Look at what happens with a variance in a perfect population (x-mu)/n with random numbers i.e. mean at zero: (3-0) + (2-0) + (1-0) + (0 - 0) + (-1-0) + (2-0) + (-3-0) = 3 + 2 + 1 + 0 - 1 - 2 - 3 = 0 All of the values cancel out. We square them to get rid of the negative sign and have any kind of usable metric to assess reality. We probably could have use |absoulte bars| with the same result. When you pass 0, i.e. our sample mean passes the population mean. Statistics under/overestimates the true mean mathematically by default. Once your guess is higher than the mean, the now squared distance starts to grow again. 9 + 4 + 1 + 0 +1 + 4 + 9 It does not collapse on itself. Graphically it's a parabola and you've estimated the guess at its bottom point. I guess what you asking is, will we overestimate the variance when our guess size matches perfectly the true population size and no better guess can be made? Firstly you should never assume you can guess the true population value. We have to live in a real life where nothing is perfect. Statistics is not perfect. It simply strives for perfection. Improbable to guess any single continuous value precisely. What is the chance that a next person has a height of 173.48763498456437654405 cm? Impossible to pick a sample this precise. For a sample of n=1 you'd have a undefined sample variance, because in a case of n-1 variance is divided by zero. How far away is a sample from itself? No distance. It is the mean value of itself. Statistics doesn't deal with absolutes and cannot PROVE anything IS because we never know the population at any given moment. Something is always left unaccounted for. If n=0, answer would be negative. Negative value means no sample has been measured. Imagine an equation without the -1 part i.e. a sample of me n=1 writing an an amount of this comment x=1. The average of these two would be 1. The variance (1-1)/1 = 0. No spread from itself. It's nonexistent either way, either you say it's = 0, or infinity, because you divided the spread by 0. Your alternative hypothesis defined by the sample size IS the null hypothesis. You've proven the alternative hypothesis is the null hypothesis. Null hypothesis is not rejected within the significance level of 1, we can have an underestimation or overestimation of the true population mean Having written all this, i still don't think i have answered the question, why an overestimation happens as sample mean becomes the population mean.
Blessings. I’m a 79 yr old grad student. This stuff is rocking my self image
I'm so impressed with how in depth and without choking you are capable to teach statistics. They are a pretty complex subject, and thanks to you we are getting to understand them, and who knows, maybe some of us even like them. THANKS!
😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😊😮😊😊😊😊😮😊😅😊😊😮😊😮😊😊😊😅😊
P
Please f
Hi! Your teaching style is exceptional! You really know how to approach the weaknesses of the students! Keep on the good work!
Beautifully explained! This is a topic that gets glossed over a lot in statistics courses and I really appreciate the amount of time you devoted to it.
Just commented on your other video, but ended up watching this too by accident, when I was searching information on degrees of freedom. As a medical student doing research on biostatistic heavy subject, you truly are a lifesaver! Stuff like this really helps me to keep on going instead of needing to go through multiple statistic courses in uni on top of all my other studies and research project work!
if you know maths and desrciptive statistcis already, excep degrees of freedom and use of (n-1) instead of n, the critical explanation for you starts at 13.11 and ends at 15th minute but it's kind of explained away without a real explanation. let me look for the sections about regression etc.
Thank you for actually explaining the meaning behind DF clearly before jumping in to any abstract analysis of numbers. It is frustrating trying to find clear content so thank you for that.
Probably one of the BEST instructor I have run into in my career or lifetime - most people cannot teach statistics - this gentleman is awesome!
Thanks for putting these together man. I'm not a student; I'm just brushing up on my stats. Your explanations are spot on. Had I had a teacher like you, I wouldn't be brushing up on my stats now (and I had some great mathematics teachers).
so good. also his choice of words to explain concepts is really good
Liked the video as soon as i heard his introduction. Summed my feelings about the topic up Perfectly
Big fan Justin, you really don't know how helpful and mind-blowing are these videos for like 'whole world'. I wish you health and happiness in these extraordinary times. Where are you from, as in country and city? I would be fortunate to meet you someday, you're a great guy!
Your pace of delivering is perfect. Can't thank you enough!
It is even cool to watch stats with you as a teacher! Bravo.
1:26 - Relief, I thought that music was going all the way though the video. Awesome video. Best explanation I've found so far.
I'm greateful for your lectures, and can say that this specific topic was always somehow incomplete for me, until now! I'm studying calculus, and statistic is a challenge for me. Thank you, and be always healthy!
DUUUUUDE!!!!! THANKS FOR ALL THE SIMPLE EXPLANATIONS. Appreciate it a lot.
I have never seen anyone describe degrees of freedom so clearly, thanks!
this is the best video i have found online to tell me the DF, it is the independent pieces of information that exists in a sample to predict the main population. if were to predict, we must know the minimal values of independent pieces of sample to do a prediction over the population, generally, the more df the more accurate the prediction from the sample
This is a very helpful explanation on a topic that is all too easily glossed over, but I think it is essential to getting a firm grasp of what we are doing with statistics. Thank you for taking the time to post it.
Detailed coverage , kudos .Finally What i have been looking for. Appreciate
Absolutely amazing explanation of degrees of freedom. You gave a very good, simple and easy example of the urchins through which what df is and how its calculated could be immediately understood. Great job.
That is a very explanatory, cool sound, and step-by-step lecture. I really like it because it is very easy even for new beginners of statistics. Good job! many thanks.
Best explanation of Chi Square so far. Best use of my 27 minutes
Happy 2021...Thank you Justin for the immense effort you put into this video...Love from Kerala...🙂
Thank you so very much for this thorough and well delivered explanation of a complex concept that many educators try to breeze over. The type of explanation you provided is rare and I can't believe how smoothly and clearly you delivered your content. I am super impressed and very inspired as I tutor statistics to my fellow students. Thank you again. Liked, Subscribed and hit the bell 😊🙏
Probably the best explanation about df on RUclips, well done!
yeah? but whats the explanation really?
The three-dimensional explanation of degrees of freedom in regression was really a light bulb moment. Awesome stuff.
Wow...this is really awesome...you did in 30 mins what my lecturer couldnt do over the whole semester...LOL. THANK YOU!!
🤣🤣🤣
You are really an excellet teacher! I love the way you explain these concepts!
From 17:00, for the next minute: that's where it clicked for me, and I (somewhat) understood what degrees of freedom means. Thank you! Great breakdown.
Superb explanation. This was stuck in my head. You just cleared the concepts. Grateful to you.
you are a life saver for stats students...
Very helpful. Thank you for helping me master this subject.
you changed n to n-1 without any mathematical proof. any other proofs? if we want to inflate the estimate, why not make it n-2? i am assuming there is some robust mathematical reasoning.
We use df primarily when estimating variances, because we know that dividing by n underestimates population variance. To my knowledge, ther is no formal mathematical proof to show that n-1 is necessarily "correct". We can never know it is 'correct", because in the real world we never know the true population values.
That said, statisticians have demonstrated the concept using toy data sets for which the population values are defined. With these models, n-1 reliably made the variance estimates better. If n-1 had been better at the job, they would have chosen it.
It's the mathematical equivalent if having thermometers that reliably read 10% colder than reality. We are simply compensating for a known bias in the tool.
The no. of Degrees of freedom of sum of squares = no. of independent variables in that sum of squares.
Let SS= sum of (yi-ybar )^2
,i=1,2..n
here SS is sum of square of
n elements
(y1-ybar),(y2-ybar)....(yn-ybar).
These elements are not all independent bcz
sum(yi-ybar)=0 (which condition for dependence of variable).
So we use ' n-1 ' degrees of freedom for SS insted of n.
please correct me,if am wrong
@@niemand262 there are mathematical proofs as to what gives the best estimate on average (i.e. an unbiased estimate). In terms of standard deviations, its called "Bessel's correction" and there is a proof as to why we use n-1. As for using n-p, i.e. some other number of degrees of freedom, I THINK these are calculated by seeing how many of the data points are free to vary and still give us the same statistic. For example, if we calculate the mean of [x1, x2, x3], if we vary any of them, we can just move another one so that the mean stays the same. As we can move any of them, we have n=3 degrees of freedom. If we are estimating population variance from a sample without knowing the population mean, we are solving 2 equations (one for mean and one for variance) with n unknowns. As such, we can "replace" one of the n data points in the equation for standard deviation with some function of the sample mean whilst still technically expressing the standard deviation in the same way. As we can do this replacement of 1 of the data points with one of our statistics, we have n-1 degrees of freedom. This could be slightly wrong (I came to this video hoping for a full mathematical explanation) but I'm fairly sure its the gist of it.
@@henrysorsky thanks for pointing out "Bessel's correction". Your intuitive explanation for degrees of freedom makes sense, but why doesn't the same intuition also apply for the standard deviation of the population? After all, given a set of N observations, if you know the standard deviations of n-1 points around the mean, you know the value of the Nth standard deviation.
So, I have a question. What if the Population mean, bz chance happens to lie right on the the place where our sample mean is calculated. Then wouldnt we be unnecessarily inflating the variance? Might be a silly question but I am understanding the concept so well that just wanted to ask. :)
Loving your videos so far... Really helpful 🎉
Thaaaankk you! No one ever properly explained it to me!
This was very informative! I will be sharing this with my students.
Thank you so much sir. Please keep up the good work. I'm learning a lot.
Could you please explain why we use degrees of freedom to adjust the difference between sample statistics and population parameters? What does that have to do with "independent pieces of information"?
I wish you were my professor
That's something most of us indians want -better education
It's better if we adapt the Vedic methods😂😂
He is your professor by choice.
Support the idea
@@pk-uk5lc Hey there, buddy! It ain't about being Indian or American, it's about the individual's ability to simplify things and make them more understandable. There are tons of examples of amazing Indian educators out there, not just on RUclips.
this explanation is simply fantastic, thank you so much!
Super clear explanation!
.
Is that you ?
.
Nice to see a face to a name
.
Been watching a few of your videos
.
Thank you
.
Hope to use the skills into my retirement... touching 60
.
Hello, great job on all the explanations. But my question is: I understand that we need to "inflate" the computation of the variance from estimate of the population mean by x bar, why should this "inflation" be by dividing by n-1? Why not divide by (n/2)? I did not see the answer to: where did n-1 come from? I will listen to the rest of the video in case the answer is in the remaining part...
Same exact question! Thank you
Search for Bessel's Correction, there's a tough math deduction behind n-1.
17:13 Only with the third observation that we have a "degree of freedom" such that the regression line can cut through the points, to get the errors and the parameters.
the best stats explanation that I ever had!!!
This is like the 8th video am watching on this channel today !! Where had you been all this while !!!!?
Awesome explanation! thanks!
@14:25 OMG, all this time, I took DoF as an abstract peculiarity of the equation.
So the Sample Mean is just an estimate of the Population Mean. Reducing n by 1, you decreasing denominator and therefore inflating the variance to get a better estimate of the Population Variance, thereby including sampling error. Even though, I still don't know why you will use 2,3,4 DoF, I feel must more relieved now that this major stumbling block is removed. Thank you for breaking it down!
Brilliantly explained
9:10 you said standard deviation is undefined but mathematically both numerator and denominator are zero so why it is still undefined?
0/0 = undefined, not zero, believe it or not!
I'm very glad I subscribed to this channel
Loved the line about dividing by zero, "mathematically speaking, an explosion." Made me laugh of loud.
I noticed that too but it didn’t cause me to lol...just a chuckle.😏
Clearly explained, excellent.
17:45 The explanation of having k "X" variables was a bit confusing, I had to go through a second time to understand. I am not sure calling all the variables "X" variables is correct???
Wouldn't we just say "the number of variables?" One of the independent variables is perhaps on an X axis, the other independent variable on the Y. The dependent variable is on the Z.
Good explanation everywhere else - the Chi-square examples were especially interesting.
At 10:00, isn't it the case the numerator must be zero as well as the denominator since x -x_bar is also zero. So its not "an explosion" but undefined.
A sea urchin has some many spikes?!
Great explanation
14:20 why do we inflate the estimate by choosing to go from n to n-1. Why not n to n-2 for example?
I'm baffled at all the responses saying how everything is clear to them, just because the denominator n-1 is smaller than n. Well n-2 and n-3 etc are also smaller than n.
There's a mention of Bessel's correction in the comments below but nothing else anywhere in the video about what's really going on here.
Great concise presentation!
Much appreciated!👍
23:38
What confuses me about this line of reasoning is that it requires you to know the total number of samples. You can't derive the number of samples for the missing category if you don't know the total. So isn't the total a piece of information in itself? To me it looks like you have four pieces of information: category 1, category 2, category 3 and the total. So you have 4 pieces of information. But one of those doesn't contribute anything new and is therefore obsolete which would leave you with 4 - 1 = 3 independent pieces of information and not 2. What am I missing?
In a regression, when you have three observations (n = 3), based on formula, degrees of freedom is 1. What does the number 1 mean? (based on your definition at the 6 minute mark on the video?)
First time i understand why it’s n-k-1. Thanks!
Incredible explanation!
Fascinating Work you are doing ... Keep it up Plz
The reason n-1 is used in calculating s^2 is because n-1 is unbiased estimator of sigma^2 or the variance. You did some wishy-washy hand waiving. If you don't want to work through the math, just say it can be shown that if n is used to calculate s^2, the bias is ((n-1)/n)*sigma^2 so by using n-1 you instead end up (n-1)/n-1) all times sigma^2. which is unbiased
Great video, as always. I could not find anything specifically about F-distribution, is it in the pipeline? Thank youy
great video acutally for the first time I can say I understand DFs
For Descriptive Statistics:
1) Why does the variance use the squares of the deviations instead of their far more intuitive absolute values? The variance considers points farther from the mean as much more important for no particular reason I can see.
2) You say the n-1 term is included to make the sample standard deviation "bigger". But that explains nothing. Why not sqrt(n)-1? (you still get undefined for the case where n=1). Why is it n-1 specifically? In short, how can I look at any system and determine the degrees of freedom, even when they are other than n-1?
Shouldn't the DF of x bar in the descriptive statistics section be 4 instead of 5? When calculating the mean of 5 values, only 4 of those can vary. The last one can only take one possible value.
Justin, I had a doubt.
Is it okay to say, in simple regression analysis, y hat has n-2 dof because beta1 and beta2 are estimates of the population coefficients that we don't know ??
AT 25:20, what is the purpose of marginal values? The calculation will change if the row total and column total is different right?
Seriously, u are the greatest, I love u man 🤍🤍
Amazing content.
But, can you say, as to why in 7.03, you mean to say that there is only 5 df for mean 4 df for std? I am a newbie to stats
Kudos to you Bud! Great Explanation!
Quite clearly explained.
Noice explanation, but I am confused. You mentioned that skewness has DF=3 and kurtosis has DF=2 given that n=5. But I think both skewness and kurtosis have DF=3 because, In your other video of moments, you were explaining the higher moments. For the second moment, we need to standardize the first moment, and for the third moment, we need to standardize the second moment. Till here all is good, but you mentioned for the fourth moment (i..e. Kurtosis), we don't need to standardize the third moment, it can be derived by the second moment itself.
Am I going in the right direction? 🤣🤣🤣🤣. Hope so. Just reply to me once, what should be the DF of Kurtosis with 5 data?
Great content. I am starting a statistics channel. Any recommendations?
great explanation. Keep up your good work!
23:53 Ooh, that's a dangerous assumption in 2022 haha. Thanks for the great lecture!
This is very helpful 😃
@zedstatistics
Dear Sir, when you explained about the n-1 in the denominator of S.D., it was more of an empirical observation. I would like to know if we have a mathematical deduction of this formula.
Thank you
When the person that was in fact, seated on the front row, asked this question, professor said something like this: I don't know how to explain this to you, because you don't know enough statistics, I don't want to insult you, but you won't understand this.
He was an ass in my opinion.
13:53 so your saying we arbitrailly use n-1 to inflate the variance because the population variance (and thus Std Dev) may, or likely to be, larger? Why not use n-2 then? I guess because you need two data points to get to a standard dev.
I still don't understand why it's (n-1) to inflate your variance? Why isn't it (n-N), where N=1,2....
How does 1 make a sample variance into a population variance?
It's so good!!!! It's the way of how statistics should be taught!
Hi, thanks for nice explanation!
I have a question about calculating degrees of freedom in chi-squared test. In population genetic study, the degrees of freedom is calculated as "the number of categories - the number of parameters" when we do chi-squared test for testing Hardy-Weinberg equilibrium. For example, if there are 6 genotypes(AA, BB, CC, AB, BC, AC), the number of categories(genotypes) is 6, and the number of parameters(alleles) is 3(A,B,C). So the degrees of freedom is 3. Do you know why is this?
Additional description : Sum of allele frequencies is 1. And the expected genotypes are calculated from product of allele frequencies.
If number of observed genotypes are 30,40,30 for AA, AB, BB respectively, allele frequencies are 0.5, 0.5 for A, B respectively. So the expected genotype will be 25(0.5*0.5*100), 50(2*0.5*0.5*100), 25(0.5*0.5*100) for AA, AB, BB.
I would be interested in your perspective on how degrees of freedom should be considered for nuisance parameters.
What softwares are you using to make these excellent videos? They are amazingly crisp and clean! Under the current pandemic and wanting to create better videos, I would love to know what tools you use. If you could share, that would be great! Thank you for making these videos!
obvious that it's Prezi software
@@gazzzada Is it? I have used Prezi and wouldn't have guessed that's what was used to create this video. Thanks.
current version it gives even more options
11:40 "using absolute values is clunky, statistically [which is why we use the variance instead]". I'm curious what the clunkiness you're alluding to is.
I was taught that variance/stddev includes outliers a bit more than if we used absolute values, and that that was typically something we wanted in statistics, but that never made much sense to me.
Great video, thank you! I'm a stats tutor in college and the prof for this class definitely handwaved DF.
i think it has to do with absolute values functions not being differentiable everywhere. finding optimums is not as easy as with squared functions, which are differentiable everywhere
Nice & illustrative
May I ask what software do you use to produce such attractive and informative video ?
great explanations at the beginning....I would suggest to define what things are at the beginning of the section, such as what the heck is a Chi squre test for. I had to stop the video and go out and google to see that it's useful when you have a model or hypothesis first to compare against observed data. So apparently you use it when you have a hypothesis. Also the table at the end seems backwards- I would visualize with the unbleached coral on the left if that's showing expected healthy distribution because it seems backwards to explain the other way..... maybe I'm wrong thinking about this... Now I have no idea why you are multiplying 60 x 50...what exactly do you mean by the marginal values?... oh well, People with not good math sense need all the help we can get. This is why I stopped paying attention in math when I was in middle school. The teacher would say something and go right on as if we all knew what they were talking about. Prior to the Chi Square section everything was making sense. Thank you. I probably need to go watch the actual Chi Square video if I want to know about that.
Thank you! Intuitive explanations.
Why isn't total counted in one of the independent pieces of information? After all we need that too??
What great explanation! I thank you.
It couldn't be a better explanation!
Unfortunately, when we ask ourselves: "what is, again, a "degree of freedom?", a half-hour response would not come to mind... 😢
Your videos are so useful, thank you so much! One thing I can't get my head around here though. So, we divide by n-1 (as opposed to n) to account for the variance needing to be larger as our sample mean is just an approximation of the population mean and the variance of the population mean is as small as it can be. But, we don't know the population mean so our sample mean could be the same as the population mean and thus we would be over estimating the variance by dividing by n-1 and not n. Is this true?
Overestimation can occur only when n, sample size is greater than population size, which by definition is impossible. You can't gather more units to measure than absolute best case scenario, assuming you have available a 100% of information. For example, you can't run a questionnaire through all 8 billion people. And even if you could, you can't hope they all will have answered 100% honestly without biases. If you'd divide your result by 8 billion and 1, your result by default will have a smaller value than the true value. Dividing by a larger value gives you a smaller value. Anything we measure is less than 8 billion people, hence, all of our results will be larger or smaller than the true value. If we wouldn't have measured 1 person out of a total population, that 1 person would be the final measurement before the sample mean becomes a population mean. That one last piece of information increases or decreases the sample mean into a population mean.
We divide by n-1, or n-k-1, or (c-1)(r-1) to have our guess be as close to the actual population as possible. We know that any guess we make is not spot on a population guess. Look at what happens with a variance in a perfect population (x-mu)/n with random numbers i.e. mean at zero:
(3-0) + (2-0) + (1-0) + (0 - 0) + (-1-0) + (2-0) + (-3-0) = 3 + 2 + 1 + 0 - 1 - 2 - 3 = 0
All of the values cancel out. We square them to get rid of the negative sign and have any kind of usable metric to assess reality. We probably could have use |absoulte bars| with the same result.
When you pass 0, i.e. our sample mean passes the population mean.
Statistics under/overestimates the true mean mathematically by default.
Once your guess is higher than the mean, the now squared distance starts to grow again.
9 + 4 + 1 + 0 +1 + 4 + 9
It does not collapse on itself. Graphically it's a parabola and you've estimated the guess at its bottom point.
I guess what you asking is, will we overestimate the variance when our guess size matches perfectly the true population size and no better guess can be made?
Firstly you should never assume you can guess the true population value. We have to live in a real life where nothing is perfect. Statistics is not perfect. It simply strives for perfection. Improbable to guess any single continuous value precisely. What is the chance that a next person has a height of 173.48763498456437654405 cm? Impossible to pick a sample this precise.
For a sample of n=1 you'd have a undefined sample variance, because in a case of n-1 variance is divided by zero. How far away is a sample from itself? No distance. It is the mean value of itself. Statistics doesn't deal with absolutes and cannot PROVE anything IS because we never know the population at any given moment. Something is always left unaccounted for. If n=0, answer would be negative. Negative value means no sample has been measured.
Imagine an equation without the -1 part i.e. a sample of me n=1 writing an an amount of this comment x=1. The average of these two would be 1. The variance (1-1)/1 = 0. No spread from itself. It's nonexistent either way, either you say it's = 0, or infinity, because you divided the spread by 0.
Your alternative hypothesis defined by the sample size IS the null hypothesis.
You've proven the alternative hypothesis is the null hypothesis.
Null hypothesis is not rejected within the significance level of 1, we can have an underestimation or overestimation of the true population mean
Having written all this, i still don't think i have answered the question, why an overestimation happens as sample mean becomes the population mean.
Perfect! But i still want to know, Why the DF for Skewness is N-3, and the DF for Kurtosis is N-2 ?