Explaining the Chi-squared test

Поделиться
HTML-код
  • Опубликовано: 13 янв 2025

Комментарии • 38

  • @pfizerpflanze
    @pfizerpflanze 4 месяца назад +30

    The degree of freedom for the same test statistics for bigger contingency tables with I rows and J columns should be (I-1)×(J-1), for those wondering

  • @komethtauch5151
    @komethtauch5151 4 месяца назад +5

    you have no idea how long I've waited for this

  • @psl_schaefer
    @psl_schaefer 2 месяца назад +1

    This was such a clear and nice video! Keep it going!

  • @fg786
    @fg786 4 месяца назад +18

    Wake up babe. Very Normal uploaded a new video!

  • @HesderOleh
    @HesderOleh 4 месяца назад +5

    I liked this video. I just would have felt very confused by what "expected" means in this context had I not already known it. I think that a good place to quickly explain it would have been when you were explaining why a 2x2 table has one degree of freedom. On the other hand it might get someone still confused to research it more themselves. On the other other hand, I am not sure where they should go to find something like that out, as most resources are not easy for a beginner to approach for math.

  • @IUT-e8x
    @IUT-e8x 4 месяца назад

    One of my favorite channels thanks a lot.

  • @mark110292
    @mark110292 4 месяца назад +1

    Lost from 8.28; until then, fantastic especially with Claude for remediation.

  • @bhomiktakhar8226
    @bhomiktakhar8226 28 дней назад

    That's explained in a much better way than most colleges in my country.

  • @spaceotter6218
    @spaceotter6218 3 месяца назад

    good video, i was just confused with the expected table numbers, I thought to caculate this by hand any of the tables you displayed were good. I ended up learning to multiply the margins and applying the Yate's correction and that was enough replicate the result you got from R

  • @bilal_ali
    @bilal_ali 4 месяца назад +5

    Just one question
    At the end when the p value is less than 5% we fail to reject the null hypothesis.
    Means our drug is not effective.
    Right?

    • @very-normal
      @very-normal  4 месяца назад +7

      😔 yeah you’re right, my company is going to need to fictionally downsize

    • @pfizerpflanze
      @pfizerpflanze 4 месяца назад +4

      The p-value is actually 15%, namely greater than 5%.
      And yes, we don't reject the null.

  • @sotirisbekiaris3580
    @sotirisbekiaris3580 4 месяца назад +1

    Awesome content! You should definitely do a video about survival analysis

    • @very-normal
      @very-normal  4 месяца назад +1

      Thanks! I do have a small bit of survival in another video about the “biggest award in statistics” but it’s definitely worth it’s own video

  • @tr0wb3d3r5
    @tr0wb3d3r5 4 месяца назад

    Great vid, thanks a ton!🏆

  • @axscs1178
    @axscs1178 4 месяца назад

    It would ‘ve been great if you had shown how the expected frequencies under the independence assumption are calculated.

  • @Matthew-eb3di
    @Matthew-eb3di 4 месяца назад +1

    😩😩😩 10/10 training without even having to apply to the job

  • @braineaterzombie3981
    @braineaterzombie3981 4 месяца назад

    Thanks!

  • @fibonacci112358steve
    @fibonacci112358steve 4 месяца назад +3

    This is a great video, but I'd like to make two comments for everybody:
    - "Chi Squared test" is an awful name, because there are many, many different statistical tests that have Chi-squared(n) as its null-distribution. As a group, let's all try to phase out the use of this terminology.
    - The test presented in this video is increasingly replaced by the G-test. The test statistic in this video is an asymptotic approximation of the G-test statistic. The asymptotic distribution of the G-test is Chi-squared (which comes back to the first point).

  • @psl_schaefer
    @psl_schaefer 2 месяца назад

    But how does that actually compare to using logistic regression to estimate the log odds of success given the group label (which we can estimate by MLE, MAP, or even fully Bayesian)

    • @very-normal
      @very-normal  2 месяца назад

      Chi-squared test/proportion test looks directly at the response rate, which theoretically you can transform in to odds or or log-odds. The reverse can be said of logistic regression, but we also get the added benefit of adjusting for additional variables beyond a particular covariate of interest

  • @AM-kp3mv
    @AM-kp3mv 2 месяца назад

    I was confused because calculating the statistic gave me a different value than in the video. Reading about this, I think it's because of the Yates Correction. Is it necessary in this case as the frequencies are >5? The result changes by quite a bit. Thanks for your videos.

    • @very-normal
      @very-normal  2 месяца назад

      Yes, the different value is because of the correction. The correction helps when the counts are small, but doesn’t change much if the size is large. When you say the result changes, which result are you referring to?

    • @AM-kp3mv
      @AM-kp3mv 2 месяца назад

      @@very-normal I was referring to the p-value, ( 0.153 vs 0.116 ). I was thinking in a scenario when using the correction changes the analysis for a given alpha, and what to do then.

    • @very-normal
      @very-normal  2 месяца назад +1

      Ah okay I get what you mean
      Let’s say that the significance level is 0.05. And hypothetically, doing it without correction produces a p-value of 0.04, and with it produces a p-value of 0.06.
      In an ideal world, an author with this result will declare everything they will do ahead of time before the analysis is done. This also includes whether or not they do the correction. If you have stated in the protocol or paper that you will do it with the correction, you should report the p-value that came with the correction. Choosing the more favorable p-value while ignoring this plan constitutes p-hacking and will get a paper redacted if this is found out.
      It’s important to note that p-value thresholds are arbitrary, we just choose them to be low according to our needs. If a p-value is on the borderline, then a good reviewing statistician will note that it almost didn’t fall below the threshold and might not allow the paper to publish, or force the author to note the borderline non significance of the result. The 5% threshold is often viewed as a “magic” number to get below, but for a statistician it’s pretty much describes the same situation: the probability that the data (via test statistic) would look like that, assuming the null hypothesis is true, would be low.
      Of course, that’s in an ideal world. Unfortunately, other interests can get in the way and allow falsely optimistic results to be published. Statistics is a nuanced field in a world that wants black and white results

  • @yahlimelnik4483
    @yahlimelnik4483 3 месяца назад

    Damn dude, what is the frequency of you hitting the gym? Your arms are BIG

  • @sajanator3
    @sajanator3 4 месяца назад

    How do you choose between using 2 sampled t-test and chi squared test?
    Are there any examples where one would be suitable and one wouldn't?

    • @very-normal
      @very-normal  4 месяца назад +1

      I think you mean the two-sample proportion test, the t-test is technically for continuous outcomes.
      The chi-squared test (in this video) is actually equivalent to the two sample proportion test, assuming everything I did in the video. The conclusions would be the same, no matter which you use.
      If you run the proportion test in R, you’ll actually see it uses the chi-squared test to calculate a p-value.
      You would want to use something else if your sample size is small or isn’t mutually exclusive. A usual substitution is Fisher’s test for small sample sizes. For paired data, there’s also McNemar’s test.

    • @sajanator3
      @sajanator3 4 месяца назад

      @@very-normal Sorry yes, I did mean the 2 sample proportion test. Thank you for the reply.

  • @Abhishek-bz5is
    @Abhishek-bz5is 4 месяца назад

    best youtuber

  • @pipertripp
    @pipertripp 4 месяца назад

    Shit just got real.

  • @duckymomo7935
    @duckymomo7935 4 месяца назад

    What about chi Square goodness of fit

    • @very-normal
      @very-normal  4 месяца назад

      it kinda follows the same logic. The null hypothesis is that your data comes from some specific distribution. Your data would actually be a contingency table with one row because a goodness of fit test looks at whether or not your data conceivably comes from a given distribution. Based on this specific distribution, you can calculate expected counts. From there you calculate the statistic in the same way.

  • @AER9095
    @AER9095 4 месяца назад +2

    I came here for the math. Disappoint.