Dont we have to take that sample mean x bar that correspond to 1.96. or else we will not get the correct population mean. Please someone make this clear
He’s not selfless. I would like to think he enjoys it and that this isn’t something he cares nothing about. Then he’d be truly selfless. I hope he’s selfish about it! That it is his value and that he gets selfish pleasure in doing what he does.
"Student" was a pseudonym of a fellow named Gosset, who worked at Guinness breweries in the early 1900s. He derived the t distribution (with some gaps in the derivation) in a 1908 article “The Probable Error of a Mean”. Guinness did not want workers publishing their findings (to keep a competitive edge), but allowed him to publish under the pseudonym Student. The name stuck.
Dont we have to take that sample mean x bar from the mean distribution who's z score correspond to 1.96. or else we will not get the correct population mean in confidence interval formula if sigma is known. Please someone make this clear
Man! The way you speak and explain, you should be commentator on the national geographic. Excellently done video and superbly explained. Thanks a lot , t distribution will not confuse me anymore
Thanks for your videos, my biostats professor can be fairly unclear and his exams are incredibly challenging. Your videos are very clear and concise, and are analogous to an oasis in a desert of confusion. Keep up the good work, helps a lot of students like myself.
Can't stress enough how thankful I am for these videos. There are many videos on statistics here on RUclips, but few really take the time to thoroughly explain the concepts and seemingly expect students to take certain things/steps for granted. Your videos on the other hand really provide clarity. THANK YOU!!!
Your videos are so incredibly clear. I am a statistics graduate student, and watching even very basic videos like this one is still helpful to solidify concepts because of how well you communicate and visualize concepts. Thank you!
This is one of the only videos I have seen that advocates against the "thumb rule" of blindly using the standard normal distribution instead of the t distribution when the sample size is greater than 30 and makes it crystal clear why it is imprecise to do so. I challenged my statistics professor on this point a few weeks ago and was simply told to use the z table when n > 30. Thanks to you, I now understand when it is appropriate to use the standard normal distribution and when to use the t distribution.
if anyone is curious, Ive been struggling with tscores for the last week and, out of the many videos I have watched, this is the one that has helped me the most. 10-10, would recommend to amyone
thank you so much. I was struggling to understand why a t-distribution was required and my lecturer's explanations were too technical. Within two minutes of this video, I understood. Thank you again, this is really helpful!
Thank you so much for this video and all the time you spent making it! I was super confused but now finally understand t-distributions. You are an excellent teacher
Thank you very much for your videos!!! You can not imagined how many times these videos saved me!!! Very like your approach, always simple and clear! Many thanks!
I'm referring to the Z random variable as given on that slide, and how it has the standard normal distribution (under the conditions given on that slide).
I am referring your videos to prep for Data Scientist interview. I am getting more and more confident as I watch your videos on daily basis. 😊 Thanks for helping mate. 👊🏻🎉
I have being confused on this so long since there are plenty of different explanation from different resources. But you make a really good conclusion which help me figure out when is the proper time to use Z or T ststics. Thank you so much.
I'm glad to be of help. It's not surprising that there is so much confusion, as many confused people make videos on it and post them. There's lots of truly terrible stuff out there on this topic.
@@jbstatistics i have a doubt like if we want to estimate the population mean we need to know the sample size ,sample mean and sample standard deviation and we calculate Z .But how can we include or how will be population sd will be known to us and we are using it to calculate to Z value as we are going to estimate population mean ,How is population sd is calculated before estimating population mean? Population Sd will get only after calculating population mean right.
@@vinaysai9788 Yes, pretty much. As I bring up in the video, the population standard deviation is almost always unknown, and so we need to use the sample standard deviation, and that leads to the t distribution and t statistic. It's conceptually possible that we might have some really, really good estimate of sigma from a large body of past experience, in, say, a manufacturing scenario where the variance is roughly constant for any given mean, but the mean changes. We might consider sigma known but mu unknown in a spot like that. But yes, that's always a bit of a stretch, and why in practice we end up using t rather than z in inference for means.
@@jbstatistics why is it almost? There is no chance to estimate population sd before estimating population mean ,so we have to always use t distribution right?
@@vinaysai9788 What part of the example I gave in my response is problematic? Why is that situation not "conceptually possible"? I say it's extremely rare. I say that's a bit of a stretch. I say the population standard deviation is almost always unknown. There's a random variable X. I know its distribution but you don't. Its standard deviation is 3. What is its mean? Sure, if you're sitting down to calculate the standard deviation of a random variable then you need to know its mean first. But it's conceptually possible to have information about the variance of a random variable without having information about its mean. The "almost" in "almost always" is intentional and needed.
thank you so much for your work. I have been struggling with statistic and although I am still struggling your videos did help to clarify some concepts.
Great videos. I think that the idea of using the normal distribution to approximate the student t distribution for large sample sizes comes from the days before computer software, when statistitians had to rely on mathematical tables. Such tables had to have different entries for each degree of freedom, and would be computationally expensive to produce if they included entries for degrees of freedom beyond a certain threshold. Hence the rule of thumb for sample sizes greater than 30.
Yes, that's definitely a very big contributing factor. But there's no legitimate reason for us to hold on to that forever, and I think using that rule is problematic for a number of reasons.
Thanks for the great video! But question: 0:29, doesn't Z distribution only divide sigma (pop stddev), confused why you divide by sigma-over-sqrt(n). If you could explain. thanks!
@@zanyarrouf5740 still confused. for sample you should use t-distrubtion, isn't it? also z-distribution always divide by sigma. Would you please enlighten a bit more detail? Thanks!
Yep. Pretty standard in statistic courses to use the n>= 30 rule because of the central limit theorem as well. The heuristic put forward is that the sample distribution of the sample mean is close enough to a normal distribution centered at the population mean with its corresponding standard error. But I saw how some of those histograms look for the sampling distribution for around 30 and what the rule doesn’t tell you is that, if your underlying population was pretty close to normal already then of course the n>30 sampling distribution would be close to a normal distribution too! But if you had something heavily skewed, even with n>100 the sampling distribution is nowhere near that bell shaped curve we all know and love. So I actually agree with you here, I’d rather use the student-t distribution, when I can assume normality, regardless of the sample size. It’s just more accurate!
Thanks for the videos, really helped me to understand. But I have one question: at 1:05 you introduce t, explaining how we substitute sigma with s since we don't know the population params. But you didn't mention about mu - this is also an unknown quantity (like sigma) and I don't understand what is happening with it or why we don't substitute it with xbar.
Excellent question (and note that I don't always say that!). First, it wouldn't be at all helpful to substitute mu with X bar in this situation, as that would simply turn the numerator to 0. We use the formula in a couple of ways: 1) In hypothesis testing scenarios, the t test statistic substitutes the *hypothesized* value of mu (in place of mu). Then, if the null hypothesis is true, this test statistic has the t distribution with the appropriate degrees of freedom. 2) We can rework the formula to derive a confidence interval for mu. So even though we can't possibly know the *value* of this t quantity in any given scenario, we do know its distribution (under certain assumptions), and that is very helpful in constructing appropriate confidence intervals and hypothesis tests.
Knowledge is valueable, what!'s more valueable is the actions that are taken to expel the popular wrong-doings in the realm of knowledge. It's decisions of courage and decency. Use t test no matter how big your sample is!
I loveeee your videos but, can you please when you say you are making something in another video make a reference which people usually do above on the right so we can get that another video easily I really love your videos and thank you soooo much
So that means if we possess the standard deviation of a population we can get away with a smaller sample size (we just have to iterate the process for large number of times , courtesy Law Of Large Number) , but if it's not known then , bigger the sample size the better it is ?
unrelated to T distributions, but why do we center confidence intervals at the population mean. Why are they symmetric around mu? Is is just to make things easier?
Wow, this was amazing, thank you! but I have a question: I've seen the z-stats formula as divided by the sd only (not by sd / squared root of n)...why is that?
You know how the rule of thumb seems to be that n = 30 samples is an acceptable condition for using the t-test? Is that because 29 degrees of freedom makes the t distribution close enough to standard normal? Edit: Whoops, I should've just kept watching. He answers my question around the 8 minute mark. Thank you based jbstatistics!
The only thing I don't understand is how the probability distribution of (Xbar-μ)/(s1/√n), a variable whose value would depend on a single sample's statistic s1, can be a t distribution which is a fixed curve for a given dof (n-1). There is nothing fixing s1, and it can be any s1 from any single sample of size n. So wouldn't choosing a different s1 yield different distributions for a given sample size?
The standard deviation S is a random variable, as is X bar, as is (X bar - mu)/(S/sqrt(n)). All of those quantities are random variables, and they all have probability distributions. That the quantity (X bar - mu)/(S/sqrt(n)) has a t distribution with n-1 degrees of freedom is harder to show, but not too bad (it's covered in a typical intro math stats course). Sure, if we condition on a given value of S, then (X bar - mu)/(S/sqrt(n)) has a different distribution (a normal distribution if we're sampling from a normal distribution), but when S is viewed as the random variable it is, then we end up with a t distribution.
I am learning statistics, and your videos have been immensely helpful. Could you please refer me to the video where you talked about the relationship of the degree of freedom between t and S^2? It was mentioned at: 2.12
I have infinite respect for the incredibly selfless mathematicians like you who go out of your day to help people out. thank you so much!
You are very welcome Marko, and thank you very much for the kind words.
Dont we have to take that sample mean x bar that correspond to 1.96. or else we will not get the correct population mean. Please someone make this clear
He’s not selfless. I would like to think he enjoys it and that this isn’t something he cares nothing about. Then he’d be truly selfless. I hope he’s selfish about it! That it is his value and that he gets selfish pleasure in doing what he does.
Very true, I was hoping the explanatiom was more clear though. Less technical.
"Student" was a pseudonym of a fellow named Gosset, who worked at Guinness breweries in the early 1900s. He derived the t distribution (with some gaps in the derivation) in a 1908 article “The Probable Error of a Mean”. Guinness did not want workers publishing their findings (to keep a competitive edge), but allowed him to publish under the pseudonym Student. The name stuck.
Dont we have to take that sample mean x bar from the mean distribution who's z score correspond to 1.96. or else we will not get the correct population mean in confidence interval formula if sigma is known. Please someone make this clear
What if we take random sample who's mean's z score doesn't corresponds to 1.96? Will we still get correct population mean?
No mucking about. Concise and on the money. Excellent.
Thanks!
Your profile pic is from Engineers Australia 😂
Man! The way you speak and explain, you should be commentator on the national geographic. Excellently done video and superbly explained. Thanks a lot , t distribution will not confuse me anymore
Thanks for your videos, my biostats professor can be fairly unclear and his exams are incredibly challenging. Your videos are very clear and concise, and are analogous to an oasis in a desert of confusion.
Keep up the good work, helps a lot of students like myself.
+InfinityBeard Thanks! I'm very happy that I can be such an oasis :)
Been moved to online classes due to carona, this is the video my teacher gave for class
Can't stress enough how thankful I am for these videos. There are many videos on statistics here on RUclips, but few really take the time to thoroughly explain the concepts and seemingly expect students to take certain things/steps for granted. Your videos on the other hand really provide clarity. THANK YOU!!!
Just made my final exam 40x easier - thank the lord that you were born
Your videos are so incredibly clear. I am a statistics graduate student, and watching even very basic videos like this one is still helpful to solidify concepts because of how well you communicate and visualize concepts. Thank you!
Thank you so much for the very kind words. I'm very glad to be of help!
This is one of the only videos I have seen that advocates against the "thumb rule" of blindly using the standard normal distribution instead of the t distribution when the sample size is greater than 30 and makes it crystal clear why it is imprecise to do so. I challenged my statistics professor on this point a few weeks ago and was simply told to use the z table when n > 30. Thanks to you, I now understand when it is appropriate to use the standard normal distribution and when to use the t distribution.
I'm glad to be of help! I am strongly against using the hard-and-fast n>30 rule.
I finally understood T-distribution after 3 videos. This video explained it the best!
then there must be some other distribution for that ig
You're very welcome Bonnie! I'm glad to hear they helped you out. Cheers.
"If you take statistics from me, forget you ever heard such a notion [if n>30 just use Z]" thanks for teaching us why. Yes Prof!
if anyone is curious, Ive been struggling with tscores for the last week and, out of the many videos I have watched, this is the one that has helped me the most. 10-10, would recommend to amyone
thank you so much. I was struggling to understand why a t-distribution was required and my lecturer's explanations were too technical. Within two minutes of this video, I understood. Thank you again, this is really helpful!
was writing my Statistics exam today. Thanks to these videos, I did very well
I'm glad to hear it! All the best.
You helped me, after 7 years of publishing. Thank you very much. These videos will serve in the years to come
The way you present lessons with 2 fonts at most && black bg is immaculate.
Thank you so much for this video and all the time you spent making it! I was super confused but now finally understand t-distributions. You are an excellent teacher
You are very welcome! Thanks for the compliment!
You are the First Person to knock some sense into me when it comes to Statistics.
Thank you.
You are very welcome! I'm glad I could be of help.
Couldn't resist but to thank you for this great lesson -- Very high quality!
You are very welcome, and thanks for the compliment!
Thank you very much for your videos!!! You can not imagined how many times these videos saved me!!! Very like your approach, always simple and clear! Many thanks!
+Svetlana Gromova You are very welcome Svetlana!
I would like to thank you a lot for your pedagogical skills. Now i begin to understand the t distribution
A heroic explanation of high quality. Thanks for help with the FE exam!
you are great you saved my life with your videos, I hope I can find all of the subjects that my prof teaches in your channel
I'm glad to be of help!
An excellent help! I think the problem with prob/stat is there are so many different ways to teach it. You provide very clear structure.
justin bieber statistics is the best
.....
😆
haha! but srsly, Jeremy Balka, we'll remember your name.
Great video! I love your in-depth teaching methods and clarity in explanation.
so far the best mathematics instruction video ever seen! Appreciate!
Thanks, and you are very welcome. I'll try to beat it on the next video!
Quick Q Sir, when you said "we've previously learned that ..." at 0:26, which video you referring to? Many Thanks!
I'm referring to the Z random variable as given on that slide, and how it has the standard normal distribution (under the conditions given on that slide).
Do you mind to give me the link of your video? Sorry to bother again Sir. Many Thanks!
Good at every point, your discrete explanation gives good understanding. thank you for making this out.
This is probably the best prof I've ever had and I haven't even met him! (distant education course).
Your explanations are so clear and to the point, man! Thank you.
I am referring your videos to prep for Data Scientist interview. I am getting more and more confident as I watch your videos on daily basis. 😊 Thanks for helping mate. 👊🏻🎉
I'm glad to be of help. Best of luck on your interview!
That is an outstanding discussion of the t-distribution, how it differs from the Z-distribution and why the t should be used instead of the Z.
you explained this better than khan academy! thank you so much :)
Wow hands down best video for T distribution out there ... Thanks
You're very welcome, and thanks for the compliment!
I have being confused on this so long since there are plenty of different explanation from different resources. But you make a really good conclusion which help me figure out when is the proper time to use Z or T ststics. Thank you so much.
I'm glad to be of help. It's not surprising that there is so much confusion, as many confused people make videos on it and post them. There's lots of truly terrible stuff out there on this topic.
@@jbstatistics i have a doubt like if we want to estimate the population mean we need to know the sample size ,sample mean and sample standard deviation and we calculate Z .But how can we include or how will be population sd will be known to us and we are using it to calculate to Z value as we are going to estimate population mean ,How is population sd is calculated before estimating population mean? Population Sd will get only after calculating population mean right.
@@vinaysai9788 Yes, pretty much. As I bring up in the video, the population standard deviation is almost always unknown, and so we need to use the sample standard deviation, and that leads to the t distribution and t statistic.
It's conceptually possible that we might have some really, really good estimate of sigma from a large body of past experience, in, say, a manufacturing scenario where the variance is roughly constant for any given mean, but the mean changes. We might consider sigma known but mu unknown in a spot like that. But yes, that's always a bit of a stretch, and why in practice we end up using t rather than z in inference for means.
@@jbstatistics why is it almost? There is no chance to estimate population sd before estimating population mean ,so we have to always use t distribution right?
@@vinaysai9788 What part of the example I gave in my response is problematic? Why is that situation not "conceptually possible"? I say it's extremely rare. I say that's a bit of a stretch. I say the population standard deviation is almost always unknown.
There's a random variable X. I know its distribution but you don't. Its standard deviation is 3. What is its mean?
Sure, if you're sitting down to calculate the standard deviation of a random variable then you need to know its mean first. But it's conceptually possible to have information about the variance of a random variable without having information about its mean.
The "almost" in "almost always" is intentional and needed.
Such a simple and lucid explanation. Thank you so much!
thank you so much for your work. I have been struggling with statistic and although I am still struggling your videos did help to clarify some concepts.
I'm glad I could be of help!
You are welcome. I'm glad my video helped!
This was so helpful! Thank you so much!
You're welcome!
Great videos. I think that the idea of using the normal distribution to approximate the student t distribution for large sample sizes comes from the days before computer software, when statistitians had to rely on mathematical tables. Such tables had to have different entries for each degree of freedom, and would be computationally expensive to produce if they included entries for degrees of freedom beyond a certain threshold. Hence the rule of thumb for sample sizes greater than 30.
Yes, that's definitely a very big contributing factor. But there's no legitimate reason for us to hold on to that forever, and I think using that rule is problematic for a number of reasons.
Thats some extreme clarity. Thank you soo much sir!
What a video. You are very very good! No more confusion for me.
Thanks. The last statement cleared so much confusion
Thanks for the great video! But question: 0:29, doesn't Z distribution only divide sigma (pop stddev), confused why you divide by sigma-over-sqrt(n). If you could explain. thanks!
Tttt Y this is for a sample chosen from a population. A little different from what you have seen before.
@@zanyarrouf5740 still confused. for sample you should use t-distrubtion, isn't it? also z-distribution always divide by sigma. Would you please enlighten a bit more detail? Thanks!
watching this videos I have several times "oh syiiiiiiiit so that's why!" moment cuz this answers a lot of questions in my mind, thank you!
Thanks Christie! I'm glad to be of help!
Very very very very nice Sir .. absolutely clear very nice
Excellent way of explaining... Great experience....
Thanks!
this was really very helpful to understand the principle behind it! thank you very much
Hi Vinayak. I do not yet have a video that discusses degrees of freedom in detail. One of these days.
Thank you very much.Now i understood the central limit theorm.It is the basis.
Awsome walkthrough, i'm finally learning something lol
Simple & Brief- the way i like. Thank u vry much !
your explanation is very clear... thank you
Yep. Pretty standard in statistic courses to use the n>= 30 rule because of the central limit theorem as well. The heuristic put forward is that the sample distribution of the sample mean is close enough to a normal distribution centered at the population mean with its corresponding standard error. But I saw how some of those histograms look for the sampling distribution for around 30 and what the rule doesn’t tell you is that, if your underlying population was pretty close to normal already then of course the n>30 sampling distribution would be close to a normal distribution too! But if you had something heavily skewed, even with n>100 the sampling distribution is nowhere near that bell shaped curve we all know and love. So I actually agree with you here, I’d rather use the student-t distribution, when I can assume normality, regardless of the sample size. It’s just more accurate!
endless thanks for saving me from final exam
You are very welcome. Best of luck on your exam!
Was fantastic explanation. Thank you!
It was so helpful for me. Great video
Thank you! The explanation was very good.
Nice way of presenting topics.
Nice IRL examples.
(Bonus: Nice voice.)
Thanks for the kind words!
extremely well explained, thank you so much
Blessed to have a concept clearer like you...(Don't go for the grammar😋😅)
Your videos are so helpful, thank you!
You are very welcome!
Well done. Excellent presentation - thanks!
I love you khanacademy, but this was soo much better.
Thanks!
Thanks for the videos, really helped me to understand. But I have one question: at 1:05 you introduce t, explaining how we substitute sigma with s since we don't know the population params. But you didn't mention about mu - this is also an unknown quantity (like sigma) and I don't understand what is happening with it or why we don't substitute it with xbar.
Excellent question (and note that I don't always say that!). First, it wouldn't be at all helpful to substitute mu with X bar in this situation, as that would simply turn the numerator to 0. We use the formula in a couple of ways: 1) In hypothesis testing scenarios, the t test statistic substitutes the *hypothesized* value of mu (in place of mu). Then, if the null hypothesis is true, this test statistic has the t distribution with the appropriate degrees of freedom. 2) We can rework the formula to derive a confidence interval for mu.
So even though we can't possibly know the *value* of this t quantity in any given scenario, we do know its distribution (under certain assumptions), and that is very helpful in constructing appropriate confidence intervals and hypothesis tests.
Your videos are impressive , kindly check at t-0.025 it would be for 97.5 % Confidence rather 95 % on 8:19
Loved your explanation
Knowledge is valueable, what!'s more valueable is the actions that are taken to expel the popular wrong-doings in the realm of knowledge. It's decisions of courage and decency.
Use t test no matter how big your sample is!
You are my lifesaver! Thank you so much ;)
thank you, very simple and informative
no Khan academy was harmed in this video 7:50 hahaha
amazingly helpful video. thank you so much.
You are very welcome!
Brilliant, please write a book.
Pure gold! Thank you so much!!
You are very welcome!
Awsome! It seems so simple now! thank you :D
I love you! You and your videos are amazing! =)
Thanks again! I'm glad you like them.
Very clear and concise!
thank you !your videos help me a lot!
I loveeee your videos but, can you please when you say you are making something in another video make a reference which people usually do above on the right so we can get that another video easily
I really love your videos and thank you soooo much
So that means if we possess the standard deviation of a population we can get away with a smaller sample size (we just have to iterate the process for large number of times , courtesy Law Of Large Number) , but if it's not known then , bigger the sample size the better it is ?
unrelated to T distributions, but why do we center confidence intervals at the population mean. Why are they symmetric around mu? Is is just to make things easier?
You're welcome!
Thank you! It makes so much more sense!
If somebody had told me that I need this video in the future 10 years ago, I'd run to the end of the world and never come back #boyithurt
thank you! loved the intro
You are very welcome!
Wow, this was amazing, thank you! but I have a question: I've seen the z-stats formula as divided by the sd only (not by sd / squared root of n)...why is that?
I am very much grateful to you
謝謝你!講得非常清楚!
What would be the *destandardized* original sampling distribution will look like? that is, X = T*(s/sqrt(n)) + mu?
Great explanation!
You know how the rule of thumb seems to be that n = 30 samples is an acceptable condition for using the t-test? Is that because 29 degrees of freedom makes the t distribution close enough to standard normal?
Edit: Whoops, I should've just kept watching. He answers my question around the 8 minute mark. Thank you based jbstatistics!
Awesome video. Thank You!
Made it so much easier
thanks
The only thing I don't understand is how the probability distribution of (Xbar-μ)/(s1/√n), a variable whose value would depend on a single sample's statistic s1, can be a t distribution which is a fixed curve for a given dof (n-1). There is nothing fixing s1, and it can be any s1 from any single sample of size n. So wouldn't choosing a different s1 yield different distributions for a given sample size?
The standard deviation S is a random variable, as is X bar, as is (X bar - mu)/(S/sqrt(n)). All of those quantities are random variables, and they all have probability distributions. That the quantity (X bar - mu)/(S/sqrt(n)) has a t distribution with n-1 degrees of freedom is harder to show, but not too bad (it's covered in a typical intro math stats course). Sure, if we condition on a given value of S, then (X bar - mu)/(S/sqrt(n)) has a different distribution (a normal distribution if we're sampling from a normal distribution), but when S is viewed as the random variable it is, then we end up with a t distribution.
I am learning statistics, and your videos have been immensely helpful.
Could you please refer me to the video where you talked about the relationship of the degree of freedom between t and S^2? It was mentioned at: 2.12
I too had doubt that I can use normal dist for sample size >30. Thanks.
Could you please tell me how to do we find S in here? is it going to me value we get from degree of freedom? i am little confused about it