I liked this video. I just would have felt very confused by what "expected" means in this context had I not already known it. I think that a good place to quickly explain it would have been when you were explaining why a 2x2 table has one degree of freedom. On the other hand it might get someone still confused to research it more themselves. On the other other hand, I am not sure where they should go to find something like that out, as most resources are not easy for a beginner to approach for math.
good video, i was just confused with the expected table numbers, I thought to caculate this by hand any of the tables you displayed were good. I ended up learning to multiply the margins and applying the Yate's correction and that was enough replicate the result you got from R
This is a great video, but I'd like to make two comments for everybody: - "Chi Squared test" is an awful name, because there are many, many different statistical tests that have Chi-squared(n) as its null-distribution. As a group, let's all try to phase out the use of this terminology. - The test presented in this video is increasingly replaced by the G-test. The test statistic in this video is an asymptotic approximation of the G-test statistic. The asymptotic distribution of the G-test is Chi-squared (which comes back to the first point).
But how does that actually compare to using logistic regression to estimate the log odds of success given the group label (which we can estimate by MLE, MAP, or even fully Bayesian)
Chi-squared test/proportion test looks directly at the response rate, which theoretically you can transform in to odds or or log-odds. The reverse can be said of logistic regression, but we also get the added benefit of adjusting for additional variables beyond a particular covariate of interest
I was confused because calculating the statistic gave me a different value than in the video. Reading about this, I think it's because of the Yates Correction. Is it necessary in this case as the frequencies are >5? The result changes by quite a bit. Thanks for your videos.
Yes, the different value is because of the correction. The correction helps when the counts are small, but doesn’t change much if the size is large. When you say the result changes, which result are you referring to?
@@very-normal I was referring to the p-value, ( 0.153 vs 0.116 ). I was thinking in a scenario when using the correction changes the analysis for a given alpha, and what to do then.
Ah okay I get what you mean Let’s say that the significance level is 0.05. And hypothetically, doing it without correction produces a p-value of 0.04, and with it produces a p-value of 0.06. In an ideal world, an author with this result will declare everything they will do ahead of time before the analysis is done. This also includes whether or not they do the correction. If you have stated in the protocol or paper that you will do it with the correction, you should report the p-value that came with the correction. Choosing the more favorable p-value while ignoring this plan constitutes p-hacking and will get a paper redacted if this is found out. It’s important to note that p-value thresholds are arbitrary, we just choose them to be low according to our needs. If a p-value is on the borderline, then a good reviewing statistician will note that it almost didn’t fall below the threshold and might not allow the paper to publish, or force the author to note the borderline non significance of the result. The 5% threshold is often viewed as a “magic” number to get below, but for a statistician it’s pretty much describes the same situation: the probability that the data (via test statistic) would look like that, assuming the null hypothesis is true, would be low. Of course, that’s in an ideal world. Unfortunately, other interests can get in the way and allow falsely optimistic results to be published. Statistics is a nuanced field in a world that wants black and white results
I think you mean the two-sample proportion test, the t-test is technically for continuous outcomes. The chi-squared test (in this video) is actually equivalent to the two sample proportion test, assuming everything I did in the video. The conclusions would be the same, no matter which you use. If you run the proportion test in R, you’ll actually see it uses the chi-squared test to calculate a p-value. You would want to use something else if your sample size is small or isn’t mutually exclusive. A usual substitution is Fisher’s test for small sample sizes. For paired data, there’s also McNemar’s test.
it kinda follows the same logic. The null hypothesis is that your data comes from some specific distribution. Your data would actually be a contingency table with one row because a goodness of fit test looks at whether or not your data conceivably comes from a given distribution. Based on this specific distribution, you can calculate expected counts. From there you calculate the statistic in the same way.
The degree of freedom for the same test statistics for bigger contingency tables with I rows and J columns should be (I-1)×(J-1), for those wondering
this guy knows
you have no idea how long I've waited for this
This was such a clear and nice video! Keep it going!
Wake up babe. Very Normal uploaded a new video!
I liked this video. I just would have felt very confused by what "expected" means in this context had I not already known it. I think that a good place to quickly explain it would have been when you were explaining why a 2x2 table has one degree of freedom. On the other hand it might get someone still confused to research it more themselves. On the other other hand, I am not sure where they should go to find something like that out, as most resources are not easy for a beginner to approach for math.
One of my favorite channels thanks a lot.
Lost from 8.28; until then, fantastic especially with Claude for remediation.
That's explained in a much better way than most colleges in my country.
good video, i was just confused with the expected table numbers, I thought to caculate this by hand any of the tables you displayed were good. I ended up learning to multiply the margins and applying the Yate's correction and that was enough replicate the result you got from R
Just one question
At the end when the p value is less than 5% we fail to reject the null hypothesis.
Means our drug is not effective.
Right?
😔 yeah you’re right, my company is going to need to fictionally downsize
The p-value is actually 15%, namely greater than 5%.
And yes, we don't reject the null.
Awesome content! You should definitely do a video about survival analysis
Thanks! I do have a small bit of survival in another video about the “biggest award in statistics” but it’s definitely worth it’s own video
Great vid, thanks a ton!🏆
It would ‘ve been great if you had shown how the expected frequencies under the independence assumption are calculated.
😩😩😩 10/10 training without even having to apply to the job
Thanks!
This is a great video, but I'd like to make two comments for everybody:
- "Chi Squared test" is an awful name, because there are many, many different statistical tests that have Chi-squared(n) as its null-distribution. As a group, let's all try to phase out the use of this terminology.
- The test presented in this video is increasingly replaced by the G-test. The test statistic in this video is an asymptotic approximation of the G-test statistic. The asymptotic distribution of the G-test is Chi-squared (which comes back to the first point).
But how does that actually compare to using logistic regression to estimate the log odds of success given the group label (which we can estimate by MLE, MAP, or even fully Bayesian)
Chi-squared test/proportion test looks directly at the response rate, which theoretically you can transform in to odds or or log-odds. The reverse can be said of logistic regression, but we also get the added benefit of adjusting for additional variables beyond a particular covariate of interest
I was confused because calculating the statistic gave me a different value than in the video. Reading about this, I think it's because of the Yates Correction. Is it necessary in this case as the frequencies are >5? The result changes by quite a bit. Thanks for your videos.
Yes, the different value is because of the correction. The correction helps when the counts are small, but doesn’t change much if the size is large. When you say the result changes, which result are you referring to?
@@very-normal I was referring to the p-value, ( 0.153 vs 0.116 ). I was thinking in a scenario when using the correction changes the analysis for a given alpha, and what to do then.
Ah okay I get what you mean
Let’s say that the significance level is 0.05. And hypothetically, doing it without correction produces a p-value of 0.04, and with it produces a p-value of 0.06.
In an ideal world, an author with this result will declare everything they will do ahead of time before the analysis is done. This also includes whether or not they do the correction. If you have stated in the protocol or paper that you will do it with the correction, you should report the p-value that came with the correction. Choosing the more favorable p-value while ignoring this plan constitutes p-hacking and will get a paper redacted if this is found out.
It’s important to note that p-value thresholds are arbitrary, we just choose them to be low according to our needs. If a p-value is on the borderline, then a good reviewing statistician will note that it almost didn’t fall below the threshold and might not allow the paper to publish, or force the author to note the borderline non significance of the result. The 5% threshold is often viewed as a “magic” number to get below, but for a statistician it’s pretty much describes the same situation: the probability that the data (via test statistic) would look like that, assuming the null hypothesis is true, would be low.
Of course, that’s in an ideal world. Unfortunately, other interests can get in the way and allow falsely optimistic results to be published. Statistics is a nuanced field in a world that wants black and white results
Damn dude, what is the frequency of you hitting the gym? Your arms are BIG
weekdays
and ty lol
How do you choose between using 2 sampled t-test and chi squared test?
Are there any examples where one would be suitable and one wouldn't?
I think you mean the two-sample proportion test, the t-test is technically for continuous outcomes.
The chi-squared test (in this video) is actually equivalent to the two sample proportion test, assuming everything I did in the video. The conclusions would be the same, no matter which you use.
If you run the proportion test in R, you’ll actually see it uses the chi-squared test to calculate a p-value.
You would want to use something else if your sample size is small or isn’t mutually exclusive. A usual substitution is Fisher’s test for small sample sizes. For paired data, there’s also McNemar’s test.
@@very-normal Sorry yes, I did mean the 2 sample proportion test. Thank you for the reply.
best youtuber
Shit just got real.
What about chi Square goodness of fit
it kinda follows the same logic. The null hypothesis is that your data comes from some specific distribution. Your data would actually be a contingency table with one row because a goodness of fit test looks at whether or not your data conceivably comes from a given distribution. Based on this specific distribution, you can calculate expected counts. From there you calculate the statistic in the same way.
I came here for the math. Disappoint.
sorry bud