The Multiple Comparisons Problem

Поделиться
HTML-код
  • Опубликовано: 29 дек 2024

Комментарии • 37

  • @etiennefrancois5884
    @etiennefrancois5884 4 года назад +4

    Videos like this show why stats should be taught by people like you, with good communication skills, rather than people who can only describe things mathematically

  • @create_space812
    @create_space812 6 лет назад +9

    Thank you for the clear analogy of monopoly dice roll! it's so intuitive

  • @KristinHlebowitsh
    @KristinHlebowitsh 7 лет назад

    Studying for my medicine boards and this was a great, quick, helpful explanation of the multiple testing problem which I had never heard of. Thanks.

    • @SprightlyPedagogue
      @SprightlyPedagogue  7 лет назад

      Thanks! As I'm trying to get this channel going, feel free to make topic requests. I'd be curious to know what else is on your boards. Appreciate the support!

  • @MrChaluliss
    @MrChaluliss 2 года назад

    Other than the microphone sounding a bit cheap this is an amazing video! Thanks so much for the concise thoughts and insights.

  • @tb6605
    @tb6605 5 месяцев назад

    "Thanks for the explanation. I am a little bit confused about FWER.
    For example, my research question is: 'Does the medication have an effect on diabetes patients?' I have one independent variable: Group (Diabetes patients and Control) and four dependent variables: Blood Sugar Levels, Hemoglobin A1c Levels, Insulin Sensitivity, and BMI.
    1. If I perform 4 t-tests with these dependent variables, will the Type 1 error rate increase?
    2. If I would like to compare clinical features between Diabetes vs. Control (Gender, Age, Educational Level, etc.), will the FWER further increase?"

  • @mayanklal9892
    @mayanklal9892 2 года назад

    Thanks for this simple explainer. Hope you will be back making these simple explainer videos.

    • @SprightlyPedagogue
      @SprightlyPedagogue  2 года назад +1

      Thanks for the encouragement! It's sure tough to get to them. Not really any monetary incentive unfortunately. I'd love to do something more in the future, but don't have nearly as much time for the hobby as I used to.
      I really appreciate the appreciation though! I hope I can find some time soon/eventually.

    • @mayanklal9892
      @mayanklal9892 2 года назад

      @@SprightlyPedagogue Thanks for the reply. I understand your point. Won't mind your explanations in form of blogposts as well. Your lucid explanation in one of the replies to illustrate the difference between Fischerian and Bayesian Statistics was also a joy to read.
      Glad to come across your channel. Wishing you the best in your future endeavors.

  • @vman049
    @vman049 6 лет назад +2

    Thanks for the video. Could you please elaborate on why Bayesian methods do not suffer from the multiple comparisons problem?

    • @SprightlyPedagogue
      @SprightlyPedagogue  6 лет назад +5

      Yeah, I'll do my best! Warning, this gets into the weeds a bit.
      In traditional (Fisherian) statistics, we use p-values to determine whether or not an effect is significant. Each of those decisions is a dichotomous, all-or-nothing, discrete choice. The question is whether those choices reflect the real world. Multiple comparisons becomes an issue because random things happen. Rarely, yes. But they do happen. So if we run a bunch of tests, eventually one of them will be labeled as significant... because... random things happen randomly. That's Type I error btw.
      So, the multiple comparisons problem is a result of our making these significance decisions. It exists because we use these threshold-style decisions as our basis for determining whether a pattern in our data is important/different/rare.
      Bayesian stats don't operate like that. In Bayes, you aren't trying to determine whether there is a significant effect. You're just estimating how big the effect size is (oversimplifying a bit). It is arguably (though I say convincingly) a better approach for many reasons, but this is one of them. There are no significance decisions to inflate.
      Take an independent samples t-test for example. Fisherian stats (as a gross overgeneralization) are trying to determine whether the differences in means (relative to the standard error) are big enough to care about. Thus, Fisherian statisticians will try to answer the question: "Is this difference big enough to matter?"
      Bayesian stats are trying to use those same bits of information (sample size, standard deviation, observed mean difference, as well as our prior beliefs about the expected differences) to answer the following question: "What is our best guess as to *how* different these two groups are? The answer to the Fisherian question might be "yes" or "no". The answer to the Bayesian question might be, for example, "5" or "8." As you get more information and run more tests, you're more likely to get more "yes" answers... randomly. You aren't necessarily more likely to get more "5" answers if the data keeps being consistent with "8."
      It is the yes/no decision that leads to the multiple comparisons problem. If we're talking about whether there were *any* differences then we have inadvertently created an *OR* probability situation.
      (Quick primer on OR probability if you don't know it; skip this if you do) Imagine that there four coins in front of you, A penny, nickle, dime and quarter. You pick one up at random. Question 1: What is the probability that you picked up a quarter? That's the question we act like we're asking when we make significance decisions using p-values. If we only did it once, what is the probability that some event (like a big enough difference between two groups) happened? Question 2: What is the probability that you picked up a quarter assuming you get to draw four times? (with replacement). That's a much higher value. If you can pick a coin up, put it back, draw another, put it back... and so on.... you're much more likely to draw a quarter at some point. If you're allowed to check for a significant difference between Var 1 and Var 2 OR between Var 1 and Var 3 OR between Var 2 and Var 3, it's much more likely that you will find a difference somewhere at some point. Again, Bayesian's bypass all of this by refusing to play that game.
      I'm being a little unfair to Fisherians. Using significance values isn't universal in Fisherian stats. Many good Fisherian statisticians know about this problem and take steps to address it (including refusing to use or report p-values). However, using these p-values in this way is common. Fisherian statisticians *often* do this. They don't *only* do this. They are the *only* ones who do this. Bayesian statisticians don't use these threshold decisions in their work. So there isn't an "OR" probability to inflate.
      If I didn't answer your exact question or if some of that seems mildly inaccurate, confusing or incomplete, feel free to ask more and I'll try to clarify. There could be a whole video series on this and similar topics.

    • @DG-yf5pt
      @DG-yf5pt 5 лет назад

      History of the United States of America account details for your reference I have attached my resume thanks for the night

  • @lukehebert6207
    @lukehebert6207 4 года назад

    This was an excellent video. I'm a huge fan of publishing or submitting methods before conducting experiments. This might also force researchers to use more appropriate sampling techniques/experimental methods beforehand instead of conducting post-hoc (sometimes jokingly called "post mortem") analyses after the experiment is finished.
    I've been researching in a genetics lab for the last few years, which is where I first realized the multiple comparisons problem. We need to compare allele frequencies for thousands of variants or genes when evaluating two populations (one with genetic problem, one without) in order to identify which variants are responsible for a disease. It's an exciting area for the development and application of statistics!

    • @awfominaya
      @awfominaya 2 года назад

      Genetics research has a couple benefits over other domains in part because you usually have enough data to do split half analyses. Helps to cross validate your findings!

  • @federicaluppino1211
    @federicaluppino1211 2 года назад

    I am really confused. Why would you consider rolling dices as well as flipping a coin as non independent events? This is what they teach you in probability classes with the Bernoulli distribution. I don't get why they would be dependent events. I would really appreciate if you elaborated on that.

    • @SprightlyPedagogue
      @SprightlyPedagogue  2 года назад +1

      When you're thinking about independent or non-independent events, you're really talking about the possibility of a specific outcome occurring in a one-off event. If I draw marbles from a bag, what result will I get? The answer depends on what marbles are in the bag.
      The multiple comparisons problem comes into play only when you keep looking for that single specific outcome but you keep drawing many times.
      It's the repeated drawing, rolling, and flipping that causes the problem.
      So each dice roll is independent from each other dice roll. But, YOU keep looking for a four to be rolled. You are not dependent or independent. You're just trying to game the system and get a result you prefer.

  • @tawongaishejulianhawu7380
    @tawongaishejulianhawu7380 2 года назад +1

    Amazing intutive explanation. Thank you !

  • @polvotierno
    @polvotierno 2 года назад

    Have you ever seen an ANOVA test show significance but then none of the pairwise tests show significance?

    • @awfominaya
      @awfominaya 2 года назад

      Yes. It's annoying. But possible. Usually you're dealing with low effect size. Or low power.

  • @haydenjohnson5294
    @haydenjohnson5294 2 года назад

    Would this not lead to logical inconsistencies? Such as if a high fat diet causes higher hepatic triglycerides compared to a low fat diet, why should anything change just because we also measured body weight and blood glucose? Surely measuring additional variables did not change the biological relevance of hepatic triglyceride levels? To me, I am having trouble seeing the justification for using multiple testing correction.

    • @SprightlyPedagogue
      @SprightlyPedagogue  2 года назад

      Great question! The data doesn't change and the actual reality of the true relationships between the variables doesn't change. But, we don't have an exact measurement of these variables. From classical test theory each datum is made up of some proportion of 'true score' and some 'error'. If you look at enough error, you'll eventually see patterns in it. The more you torture your data, the more likely you are to accidentally see patterns in error and misinterpret it as meaning something that it doesn't.

    • @SprightlyPedagogue
      @SprightlyPedagogue  2 года назад

      Recommend checking out the book The Signal and the Noise.

  • @khemlalnirmalkar4515
    @khemlalnirmalkar4515 4 года назад

    When I did pairwise multiple comparisons (A,B,C groups, e.g. pairwise wilcox) and BH correction, p-adjust value for A & B was different, when i used only A & B group than when i used A, B & C group, why?

    • @SprightlyPedagogue
      @SprightlyPedagogue  4 года назад

      I'll admit to not being able to solve that one based on the info provided. It's been a while since I've dug into BH adjustments. Email if you're totally stuck and I can dig in with you, potentially.

  • @thomas52011
    @thomas52011 3 года назад

    Which statement is correct?
    • The problem with multiple testing only arises if at least 1 test is significant
    • The problem with multiple testing only arises if all tests are significant
    • The problem in multiple testing arises when at least two results are significant
    • The problem in multiple testing arises when at least three results are significant

    • @polvotierno
      @polvotierno 2 года назад

      At least 2 tests are significant

  • @SeandBC
    @SeandBC 5 лет назад +1

    This is a terrific explanation. Thank you!

  • @nickcalo7445
    @nickcalo7445 4 года назад +1

    This was fantastic. Thank you!

  • @kunalkohli2257
    @kunalkohli2257 7 лет назад +1

    thanks..this made multiple comparison so easy to grasp.

  • @burrito_san
    @burrito_san 4 года назад

    Great video, really helped me understand the problem better. Thank you!

  • @Pierreadam224
    @Pierreadam224 3 года назад +1

    For me the exemple with the dice is wrong. You don't reroll the same dice every time when you make multiple comparaison. You roll several dice in parallel and see the result. So i don't understand the problem when you make multiple comparaison to differents groups... If you throw 6 dices on the same time, your probability to have "1" is 1/6 no???

    • @Pierreadam224
      @Pierreadam224 3 года назад +1

      edit: your example sucks

    • @SprightlyPedagogue
      @SprightlyPedagogue  3 года назад

      Those dice rolls are independent of one another, so they are functionally the same statistical statement.

    • @Hoerlimax
      @Hoerlimax 2 года назад

      Rerolling the same die six times and recording the results is no different from rolling six dice and recording the results. In both cases each roll has a probability to result in a 1 of 1/6. The problem of multiple comparisons arises when (implicitly or explicitly) a conclusion about a more general hypothesis is drawn from a number of tests. A "more general hypothesis" could mean a hypothesis like "there are differences between groups somewhere" in a case with more than 2 groups.
      With every additional group the number of (possible) group comparisons increases. Since each comparison has a chance of resulting in a false rejection of the null hypothesis multiple group comparisons result in an increase of chance to falsely reject the null hypothesis of that "more general hypothesis" (that there are differences somewhere).
      The type I error rate of such a "more general hypothesis" is sometimes called the family-wise error rate. The more comparisons you look at the higher the chance that you end up with at least one significant comparison just by chance (as a result of random sampling) even though there are no differences in the population.
      That said, I don't like the analogy of the kid that is rerolling the dice for this either. The kid rolls the dice again if (and only if) it does not like the first result. It does that until it likes the result and then uses that. An equivalent to this exists in data analysis: An analysts performs additional tests until they find one that is significant and then report that one (it is like making up new hypotheses on the go). This behavior is problematic as it leads to spurious results that are essentially worthless. However, this does not describe the problem of multiple comparisons. This behavior is known as fishing (for effects) in the literature.
      The analogy with the kid that is rerolling the dice suggests that adjusting for multiple comparisons solves the problem of fishing, which it does not. The analogy also suggests that the problem with multiple comparisons only exists when one makes up new hypotheses on the go. This is not the case either.
      For me the analogy of the kid rerolling the dice is a great analogy for fishing. But aside from the fact that both cases involve multiple comparisons, the multiple comparisons problem that one tries to resolve via a p-value adjustment is independent from fishing.
      Do you agree?

  • @aileenazari3979
    @aileenazari3979 7 лет назад

    excellent job! tx!