I'm also a PhD student (math finance which is mostly statistics) but my background is applied math. Your videos have been very helpful for filling in my knowledge gaps that an undergrad in stats might learn! Thank you!
The past half year has been my sprint into grad school prep with an emphasis in ecological modeling- lots of time spent in R and side quests to more fully grasp statistical intuition. You have continued to fuel the fire with these easy to understand, well thought out videos. I hope this gets a 1/10!
My god man at this point you should do this full time. You have a gift. The content and presentation is arguably immaculate. I am trying to learn statistics, but so often it is assumed that the learner is intuitively familiar with some statistical/mathematical principles and they are not accurately explained. For example when I try to learn what a standard error is many talk about how to calculate the standard error of the mean and not about the concept of it. Likewise the mathematical notation is all over the place. Can you tell me why regression models represented as equations are often writte like so : yi = b1*xi + b0 + ε and not like so Y = beta0 +beta1 * X + Ε . I believe using the small letters and having a index indicates that we use realizations for the random variable Y or X for observation i. But that does not accurately illustrate the general model in my opinion. Iam so confused why it is so widespread.
Someday I hope you do a video on fixed and random effects and mixed models. I think you may be the only man alive that can help me understand those clearly 😂
Awesome video. While in university, I struggled a lot to understand the meaning of the p-value aside from its "straightforward" definition as a conditional probability. The way you phrase your results at the end makes things so much clearer.
My suggestion would be to communicate (video time: 14:44) the population parameter range in CI rather than the test statistics range. Great videos and keep up the good work.
@@very-normal So given H0, 95% confident that the 10% falls in between 0.5% and 45.8% ; But that can happen only 2.6% of times, hence we reject H0 ? I thought we donot reject if the 10% is in CI
This was very good and clearly communicated. I would have liked only some more detail on calculating the confidence intervals and maybe a step by step solution on a concrete example
You rightly said that seeing a p-value alone should always make you wonder what the null hypothesis was, so wouldn't it be better if we presented the result in the order: 1) Null hypothesis 2) Test results 3) CI and p-value 4) Conclusion ?
Yeah that can work as well, the main takeaway was that there’s a set of things that should be mentioned to best communicate about a test. The order may change depending on where this communication takes place
Great video as usual ! I cant really understand what is the importance of the CI. I understand what you said in the video. But for instance, in my regression class my professor always wanted me to draw the CI on the graph. And i still do not understand its importance.
The CI can be helpful because it’s easier to see when the null hypothesis should be rejected. In linear regression, the null hypothesis of interest is usually that the parameter associated with the covariate is 0. When your teacher asks you to draw the CI, you’re getting a visual check that 0 is not included as a possible value (ie there is a discernible association and the “line” isn’t flat). If you’re talking about logistic regression, the usual null is that the parameter is 1, but same logic applies
Great video, as always! Can you use a two-sided t-test instead of the proportion test? Test-Statistic under null hypothesis would have a mean of 0.5, if heads == 1 and tails == 0
You technically can, but with only 10 heads, the results wouldn’t be very reliable. The same thing can be said for the proportion test, but I just used it for teaching purposes
I am wondering if it actually makes sense to report the confidence interval given its non intuitive definition. With Bayesian credible intervals it’s easier I think.
Yeahhhhh that’s what I think too, but it’s kinda like a “how I think it should work” vs “how the world works” type of thing. People will need to use the confidence interval til the end of time, so might as well make sure they know what it is
I dont think a Chi² test is the appropriate test for the coinflip, since its a very simple experiment with only two possible results (heads or tails), meaning a binomial test is much more appropriate. Since it makes more assumptions it should also be able to reject the null hypothesis with less data. Its also worth noting that the sample size is quite literally on the border of what you can (at least by common agreement) properly use the Chi² test for (# of expected occurrences assuming H0 for all buckets >= 5 (or >= 5 for at least 80% of the buckets, and no bucket with less than 1))
you’re not wrong, that’s what I was getting at when I was saying there’s a better way to approach the test rather than the proportion test / chi-squared test
What? You will judge me for using Excel! First I use MATLAB when needed, but I teach students to use Excel because it doesn't need any programming, and it is likely on every office computer, so if they need to do a statistical test in the future they can. BTW, great videos, keep them coming. In class (research methods for aviators) I tell students to watch your vidoes if they need more info.
@@aerospacedoctor Agreed. Doing stats in Excel is actually fine! Using Excel as a database management system, which is usually what I see, is a travesty.
I was a bit confused about p=0.026 as long as an exact probability of the observed and more extreme outcomes could be calculated mentally: all heads, all tails, 10 cases of 1 tail and 10 cases of 1 head. Total of 22 out of 2^10. Something definitely below 0.022 as long as 1024>1000. Calculator helps and gives 0.021.
Yeah, technically you can calculate an exact p-value for this because you know the number of heads is binomial. I used the Z-test approximation even though it’s not really appropriate here because I wanted to focus on the getting the interpretation down
@@very-normal Once you’ve named the test being used, I’ve got it. In general, a very good video 👍 IMHO, only one important point is missing: the significance level alpha should be chosen prior to testing. It is done based on “costs” for type I and type II errors. You’ve got a very nice example for that with a coin. If those two people were just buddies, then the cost of the type I error, when the coin was indeed fair, is just a bit of temporary tension in relationship, so 5% is reasonable. But if the punishment for the unfair coin is an execution, then the guilt should be proved without any reasonable doubt and alpha 10^-6 does not seam unrealistic.
Shouldn't you have a correction factor? You studied the coin flips after the sample was already taken because it appeared unlikely from the beginning. If I were cheating and didn't want to get caught I'd also call into question your ability to decide on a reasonable p-val post-hoc.
For my example, there’s no need for a correction, but I think I understand what you’re getting at. That I only had a hypothesis after seeing the data, so I’m doing post-hoc reasoning. In this case, it’s not unreasonable to think that coins would be equally likely to land tails or heads. From a Bayesian perspective, nothing tells me I should believe otherwise. You could say I “had” the hypothesis before I saw the data, even though I wasn’t really thinking about it. However, corrections would be needed if I wanted to conduct multiple hypotheses. The test I performed was for a null with a fair coin, but nothing would stop me from doing another test where the coin hit tails 60% of the time. In that case, I would need to adjust for multiple hypothesis tests because the probability of getting “at least” one type one error will grow with more tests. Hopefully that helps a bit
I think @ethandavis7310 is hinting at post-selection inference, not multiple testing correction. The choice to decide to test a hypothesis after looking at the data is data-peeking, which invalidates the coverage guarantees (or equivalently the Type 1 error guarantee) provided by the frequentist test. The argument that a hypothesis already exists out there, but not thought about, is not valid. This issue comes up a lot in say p-hacking. One can look at the data and keep looking for hypotheses to test, and just by chance one of these hypotheses may be significant even if none of them are true (for instance, testing 100 hypotheses at 5% significance would yield approx 5 significant even if none are true). One can then say that hey this hypothesis already existed, I just wasn't thinking about it, but that would not be a valid argument. Choosing a hypothesis after peeking at the data requires correction factors or alternate procedures for inference. For example, an easy fix would be in the slapping example would be, once u become suspicious that the coin is unfair, you play the game again and then use the new data to conduct the test. Since the hypotheses you chose was, although random, a function of your old data which is independent from the new data, the inference obtained from the new data would thus be valid.
Great video. I just wish Bayesian methods were more widespread🥲 (I know that's out of scope of this video, but I do wish the world will change in a way that they'll be scope in the future) Also, I wonder whether all these hypothesis test methods are at least partially applicable in a Bayesian context, or are they fundamentally frequentist?
This video slaps.
I'm also a PhD student (math finance which is mostly statistics) but my background is applied math. Your videos have been very helpful for filling in my knowledge gaps that an undergrad in stats might learn! Thank you!
The past half year has been my sprint into grad school prep with an emphasis in ecological modeling- lots of time spent in R and side quests to more fully grasp statistical intuition. You have continued to fuel the fire with these easy to understand, well thought out videos. I hope this gets a 1/10!
I'm currently writing my master thesis in CS and this video is an excellent refresher of my statistics class!
Great video! Thanks!
Thanks! Good luck with your thesis!
@@very-normal Thank you! Best regards from Germany.
My god man at this point you should do this full time. You have a gift. The content and presentation is arguably immaculate. I am trying to learn statistics, but so often it is assumed that the learner is intuitively familiar with some statistical/mathematical principles and they are not accurately explained. For example when I try to learn what a standard error is many talk about how to calculate the standard error of the mean and not about the concept of it. Likewise the mathematical notation is all over the place. Can you tell me why regression models represented as equations are often writte like so : yi = b1*xi + b0 + ε and not like so Y = beta0 +beta1 * X + Ε . I believe using the small letters and having a index indicates that we use realizations for the random variable Y or X for observation i. But that does not accurately illustrate the general model in my opinion. Iam so confused why it is so widespread.
This was amazing, I’ve never seen the p-value being explained that clearly.
Couldn't have come at a better time, I've just started studying this
update it was an amazing video!
Thanks a lot for this video.
I’ll share this to my students by posting in our Canvas LMS. :)
Thank you!
great video, interesting perspective to see the problem
Someday I hope you do a video on fixed and random effects and mixed models. I think you may be the only man alive that can help me understand those clearly 😂
I talk about it a little bit in my video on “the biggest prize in statistics” actually. You can jump to the section on Nan Laird to get what you want
Fabulous video. Thank you very much for the clear explanation. Will now check out your other videos as well.
This is a golden masterpiece! No doubt to subscribe. Thank you!
terrific video!want to know more about multiple hypothesis.
Awesome video. While in university, I struggled a lot to understand the meaning of the p-value aside from its "straightforward" definition as a conditional probability. The way you phrase your results at the end makes things so much clearer.
My suggestion would be to communicate (video time: 14:44) the population parameter range in CI rather than the test statistics range. Great videos and keep up the good work.
the CI is what’s communicated there tho?
@@very-normal So given H0, 95% confident that the 10% falls in between 0.5% and 45.8% ; But that can happen only 2.6% of times, hence we reject H0 ? I thought we donot reject if the 10% is in CI
Great video! Well done.
This was very good and clearly communicated. I would have liked only some more detail on calculating the confidence intervals and maybe a step by step solution on a concrete example
Thank You
You’re Welcome
Omg you’re the goat, I have a review quiz on hypothesis testing this week
Good luck!
Loved your videos, man! Do you recommend those two books at @4:33?
Thanks! I don’t think I’d recommend those for learning. They’re more like a reference for people for the most common hypothesis tests
Would you consider making a video on a suggested topic? Regression analysis, given how inportant it is to many fields including decision-making.
yeah! Next next video is on linear regression actually
@@very-normal That's great. Thank you!
You rightly said that seeing a p-value alone should always make you wonder what the null hypothesis was, so wouldn't it be better if we presented the result in the order:
1) Null hypothesis
2) Test results
3) CI and p-value
4) Conclusion
?
Yeah that can work as well, the main takeaway was that there’s a set of things that should be mentioned to best communicate about a test. The order may change depending on where this communication takes place
Great video as usual !
I cant really understand what is the importance of the CI. I understand what you said in the video. But for instance, in my regression class my professor always wanted me to draw the CI on the graph. And i still do not understand its importance.
The CI can be helpful because it’s easier to see when the null hypothesis should be rejected.
In linear regression, the null hypothesis of interest is usually that the parameter associated with the covariate is 0. When your teacher asks you to draw the CI, you’re getting a visual check that 0 is not included as a possible value (ie there is a discernible association and the “line” isn’t flat). If you’re talking about logistic regression, the usual null is that the parameter is 1, but same logic applies
Great video, as always! Can you use a two-sided t-test instead of the proportion test? Test-Statistic under null hypothesis would have a mean of 0.5, if heads == 1 and tails == 0
You technically can, but with only 10 heads, the results wouldn’t be very reliable. The same thing can be said for the proportion test, but I just used it for teaching purposes
I am wondering if it actually makes sense to report the confidence interval given its non intuitive definition. With Bayesian credible intervals it’s easier I think.
Yeahhhhh that’s what I think too, but it’s kinda like a “how I think it should work” vs “how the world works” type of thing. People will need to use the confidence interval til the end of time, so might as well make sure they know what it is
Booooo. Integrals can be challenging to students, should we not use them?
if I had a nickel for everytime someone got confused by a confidence interval, I’d have a nickel 95% of the time
@very-normal I don't doubt it. I'm willing to be they were poorly taught by professors who were poorly taught.
I dont think a Chi² test is the appropriate test for the coinflip, since its a very simple experiment with only two possible results (heads or tails), meaning a binomial test is much more appropriate. Since it makes more assumptions it should also be able to reject the null hypothesis with less data.
Its also worth noting that the sample size is quite literally on the border of what you can (at least by common agreement) properly use the Chi² test for (# of expected occurrences assuming H0 for all buckets >= 5 (or >= 5 for at least 80% of the buckets, and no bucket with less than 1))
you’re not wrong, that’s what I was getting at when I was saying there’s a better way to approach the test rather than the proportion test / chi-squared test
I thought this video was going to be about the Hubble constant lol :P
What? You will judge me for using Excel! First I use MATLAB when needed, but I teach students to use Excel because it doesn't need any programming, and it is likely on every office computer, so if they need to do a statistical test in the future they can.
BTW, great videos, keep them coming. In class (research methods for aviators) I tell students to watch your vidoes if they need more info.
You can use python in excel now too.
@@quillaja another reason just using excel is fine 😁
@@aerospacedoctor Agreed. Doing stats in Excel is actually fine! Using Excel as a database management system, which is usually what I see, is a travesty.
I was a bit confused about p=0.026 as long as an exact probability of the observed and more extreme outcomes could be calculated mentally: all heads, all tails, 10 cases of 1 tail and 10 cases of 1 head. Total of 22 out of 2^10. Something definitely below 0.022 as long as 1024>1000. Calculator helps and gives 0.021.
Yeah, technically you can calculate an exact p-value for this because you know the number of heads is binomial. I used the Z-test approximation even though it’s not really appropriate here because I wanted to focus on the getting the interpretation down
@@very-normal Once you’ve named the test being used, I’ve got it.
In general, a very good video 👍
IMHO, only one important point is missing: the significance level alpha should be chosen prior to testing. It is done based on “costs” for type I and type II errors. You’ve got a very nice example for that with a coin. If those two people were just buddies, then the cost of the type I error, when the coin was indeed fair, is just a bit of temporary tension in relationship, so 5% is reasonable. But if the punishment for the unfair coin is an execution, then the guilt should be proved without any reasonable doubt and alpha 10^-6 does not seam unrealistic.
Yeah you’re right, that’s a great point. I ended up being too focused on trying to avoid jargon that I forgot to make that point explicit
@@very-normal Make an additional short about it.
Shouldn't you have a correction factor? You studied the coin flips after the sample was already taken because it appeared unlikely from the beginning. If I were cheating and didn't want to get caught I'd also call into question your ability to decide on a reasonable p-val post-hoc.
For my example, there’s no need for a correction, but I think I understand what you’re getting at. That I only had a hypothesis after seeing the data, so I’m doing post-hoc reasoning.
In this case, it’s not unreasonable to think that coins would be equally likely to land tails or heads. From a Bayesian perspective, nothing tells me I should believe otherwise. You could say I “had” the hypothesis before I saw the data, even though I wasn’t really thinking about it.
However, corrections would be needed if I wanted to conduct multiple hypotheses. The test I performed was for a null with a fair coin, but nothing would stop me from doing another test where the coin hit tails 60% of the time. In that case, I would need to adjust for multiple hypothesis tests because the probability of getting “at least” one type one error will grow with more tests.
Hopefully that helps a bit
I think @ethandavis7310 is hinting at post-selection inference, not multiple testing correction. The choice to decide to test a hypothesis after looking at the data is data-peeking, which invalidates the coverage guarantees (or equivalently the Type 1 error guarantee) provided by the frequentist test.
The argument that a hypothesis already exists out there, but not thought about, is not valid. This issue comes up a lot in say p-hacking. One can look at the data and keep looking for hypotheses to test, and just by chance one of these hypotheses may be significant even if none of them are true (for instance, testing 100 hypotheses at 5% significance would yield approx 5 significant even if none are true). One can then say that hey this hypothesis already existed, I just wasn't thinking about it, but that would not be a valid argument. Choosing a hypothesis after peeking at the data requires correction factors or alternate procedures for inference. For example, an easy fix would be in the slapping example would be, once u become suspicious that the coin is unfair, you play the game again and then use the new data to conduct the test. Since the hypotheses you chose was, although random, a function of your old data which is independent from the new data, the inference obtained from the new data would thus be valid.
I’ve hit an absolute wall with stats. How in the world do y’all understand this?
a lot of time and examples
A person who cheats is a 'cheat' NOT a cheater!
I learned something today
:( sowwie
it’s okay
Great video. I just wish Bayesian methods were more widespread🥲 (I know that's out of scope of this video, but I do wish the world will change in a way that they'll be scope in the future)
Also, I wonder whether all these hypothesis test methods are at least partially applicable in a Bayesian context, or are they fundamentally frequentist?
Next video should sate your curiosity 👌
I rolled ⚂⚂ and concluded the dice were loaded, P(⚂⚂) = 1/36 ≃ .0278
But how do you know if something is more extreme than ⚂⚂
@@very-normal I'm just teasing. Great video, thanks for making it.
lol yeah I know, I appreciate you giving time to the channel. I spent too much time trying to figure out how you type the dice character