p-hacking: What it is and how to avoid it!

StatQuest with Josh Starmer

Просмотров 133 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 29 июн 2024
p-hacking is the misuse and abuse of p-values and results in being fooled by false positives. Some forms of p-hacking are obvious, but other are much more subtle. In this video, we talk about two forms of p-hacking and how to avoid them.
NOTE: This StatQuest assumes that you are already familiar with p-values, if not check out:
p-values: What they are and how to interpret them: • p-values: What they ar...
How to calculate p-values: • How to calculate p-values
Also, if you'd like to learn more about FDR, check out the 'Quest: • False Discovery Rates,...
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Support StatQuest by buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
RUclips Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
0:28 Simple example of p-hacking
2:21 p-hacking defined
2:38 Multiple testing problem explained
7:29 Using FDR to compensate for multiple testing
8:52 A subtle form of p-hacking
11:26 Power analysis to prevent p-hacking
12:12 Summary of concepts
#statquest #pvalue

Комментарии • 271

@statquest 4 года назад ⁺³⁴
NOTE: This StatQuest was brought to you, in part, by a generous donation from TRIPLE BAM!!! members: M. Scola, N. Thomson, X. Liu, J. Lombana, A. Doss, A. Takeh, J. Butt. Thank you!!!!
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@lcoandrade 4 года назад
Why the standard is 0.05? What can you say about making changes in the significance value to prove a point? Why 5%?
@user-xm4ci6bp4b 2 года назад
@@lcoandrade He has no answer to your question because p values are really one big hack that don't prove anything at all.
Search for: "Cohen - Things I have learned so far."
@VaneyRio 3 года назад ⁺⁵⁷
I studied Industrial engineering, however, engineering tends to have a way too "practical" approach in Mexico, so my statistics teachers were more focused on "formula -> result -> verdict" rather than a true explanation of why we do things. This series has helped me a lot in understanding the whole movement behind some of the everyday uses i give to my statistics knowledge. That's truly worthy of a triple BAM! Thank you for sharing your knowledge.
@statquest 3 года назад ⁺⁵
Muchas gracias!!! :)
@evianpullman7647 2 года назад ⁺³
@@statquest just Found this site, videos today; How do you say 'BAM!" in spanish ??! :)
@statquest 2 года назад ⁺³
@@evianpullman7647 BAM!, Doble BAM!!, y Triple BAM!!! :)
@evianpullman7647 2 года назад ⁺²
@@statquest :) you the da-man !! later i want to be a Triple Bam member, (when i get my Moneys$$$ stuff in order-repair).
@statquest 2 года назад ⁺¹
@@evianpullman7647 Thank you!
@dantevale0 4 года назад ⁺¹¹⁹
Him: "Imagine there's a virus"
Me: Yea I'm there
@statquest 4 года назад ⁺¹¹
Yep...
@gabrielcasagrande9463 4 года назад
Damn virus :(
@VadikUglov 4 года назад ⁺²⁹
that's the most enthusiasm i've seen anyone show when explaining hypothesis testing))) Thanks for making this thing clear!
@statquest 4 года назад ⁺¹
Thank you! :)
@MadhushreeSinha 4 года назад ⁺¹³
Just can't leave your videos without giving a Like!!!! Thank You for making our life easy with your pedagogy!!
@statquest 4 года назад
Thank you very much! :)
@alecvan7143 4 года назад ⁺²⁷
"Instead of feeling great shame" made me laugh out loud hahah
@alecvan7143 4 года назад ⁺¹
Great lead-out!
@statquest 4 года назад ⁺²
Thank you very much! :)
@zipfslaw3771 2 года назад ⁺¹
You are amazing. I have taught statistics, but learn something from your videos every time!
@anapeleteirovigil3199 Месяц назад ⁺¹
So tempted to send this video to my former PhD director... 🙄
Thank you for your immensely valuable content!!! Learning more on your channel than in class 😅
@statquest Месяц назад ⁺¹
bam! :)
@pkmath12345 4 года назад ⁺³
Good contents indeed. Sometimes students find it easy to use critical value method but feel kind of difficulty with p value. Your content explains clearly!
@statquest 4 года назад
Awesome! Thank you very much. :)
@danielbaena4691 4 года назад ⁺²
Waiting anxiously the Power Analysis video!!! As always Josh, thank you
@statquest 4 года назад
ruclips.net/video/VX_M3tIyiYk/видео.html
@joaobacelo3154 7 месяцев назад ⁺³
Congratulations and thank you for the videos! I appreciate the clarity, simplicity, and humor in your content. While you probably don't need more compliments, I wanted to express how much I enjoy it and how I find it to be a brilliant way to convey key concepts in statistics. BAM!
@statquest 7 месяцев назад ⁺¹
Thank you very much!
@Ennocb 3 года назад ⁺³
Really appreciate the videos. Helps me a great deal understanding what's behind the formulas and what their numbers mean.
@statquest 3 года назад ⁺¹
Thanks!
@manishakadri5052 3 года назад ⁺³
Such a lucid explanation!!!.. You make stats so much more fun!!. Thank you
@statquest 3 года назад
Thank you! :)
4 года назад ⁺³²
I am a great fan of your work. Here is an idea/need to enhance learning: It would be helpful that after a video or a set of videos you give us problems (or homework) to learn better. In further videos, you could give us the answers. Thank you
@HussainAlyousif 4 года назад ⁺⁵
i second that
@statquest 4 года назад ⁺¹⁹
That's a great idea, and thank you for supporting StatQuest!!! :)
@nakul___ 4 года назад
Awesome video as always! Would love to hear your thoughts on moving the standard alpha to 0.005 or some alternatives to p-value reporting (surprisals/s-values) in a future video too!
@statquest 4 года назад ⁺¹
When we talk about Bayesian stuff, we'll talk about alternatives to the p-value. As for changing the "standard" threshold for significance. That's always been a cost/benefit/risk balance and ends up being field specific. For example, in a lot of medical science, the threshold can be as high as 0.1 because it makes sense in terms of cost/benefit/risk.
@statisticaldemystic6817 4 года назад ⁺³
This is a great non-technical explanation.
@statquest 4 года назад
Thanks! :)
@LiquidBrain 4 года назад ⁺¹⁰
7:09 for "Ohhh Nooo"
@statquest 4 года назад ⁺²
:)
@zaharsadatnajafi8169 2 года назад ⁺¹
Thanks for your clear explanation !
@statquest 2 года назад
Thank you! :)
@elvislee7808 Год назад ⁺¹
Thank you. You are a awesome teacher!!
@statquest Год назад
Wow, thank you!
@amiralikhatib3650 3 года назад ⁺²
Thanks a lot for your insightful tutorial
That was really useful :)
@statquest 3 года назад
Glad it was helpful!
@Thyagohills 4 года назад ⁺²
The P-hacking culture is so endemic that sometimes I seriously get so demotivated that I wonder leaving the field entirely. It's depressing. Thanks, Josh. Sorry about the vent.
@statquest 4 года назад ⁺¹
Noted!
@remia5 4 года назад ⁺¹
Great work Josh! Could you also do a video on how to compute the effect size? Would effect size be a better replacement for p-value?
@statquest 4 года назад
Here's a video that shows one way to compute effect size: ruclips.net/video/VX_M3tIyiYk/видео.html
@ethanmendelson6978 4 года назад ⁺¹
This series is fantastic. For the rest of the pandemic, stats instructors should just kick their feet up and redirect their e-campus courses to this channel.
@statquest 4 года назад
Sounds good to me!
@malishakapugamage7052 3 года назад ⁺¹
Quality Contain. you earn a subscriber
@statquest 3 года назад
Thanks!
@chunyuji6162 4 года назад ⁺⁶
Thank you for the video! I have a question though. Since we are testing different drugs, why do we need to consider the p-values of other drugs for False Discovery Rate? I thought these were independent events.
@Thyagohills 4 года назад ⁺²
I believe this is a intrinsic problem the way "expediction experiments" works. Even if the drugs are independent, given the multiple testing there is the problem with inflation of type I errors. The same would occur by measuring 100 uncorrelated variables from a homogenous two sample (null). You'd expect false discoveries even if the variables are themselves independent and equall between the groups. Also, the are procedures for FDR based on independent and dependent scenarios, eg., the BH and BY methods.
@unicornsandrainbowsandchic2336 3 года назад ⁺²
I feel like this video needs way more views...
@statquest 3 года назад
Thank you! :)
@yihuang4875 4 года назад ⁺⁶
LOVE YOUR VIDEOS!!! After discovering one of your videos, I started to review my statistics from the very basic. Here is a little question. I can't figure out how to calculate the p-value from the statistical test that compares the means of two sets of samples, based on the things I learned from the last one 'How to Calculate p-values'. In 'How to Calculate p-values' I understand calculating the p-value of one event can see whether the occurrence of one event is that special (
@statquest 4 года назад
It sounds like you are asking how to calculate a p-value for a t-test. There are two ways to do it. The "normal way" (and I don't have a video for that), and the "more flexible linear model" way. I have a video for that. Here's the playlist: ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@yihuang4875 4 года назад ⁺¹
@@statquest Wow thank you for the reply!
@joanahernandez9084 3 года назад ⁺¹
Hi Statquest great video! I watched this video and your power analysis video and I have one quick question. If you already collected preliminary data, can you perform a power analysis on that data or would that be considered p-hacking as well? Thanks
@statquest 3 года назад ⁺¹
You should use your preliminary data for doing a power analysis.
@bijayapoudel3014 3 года назад ⁺¹
Great explanation!!:)
@statquest 3 года назад
Thanks! 😃
@bijayapoudel3014 3 года назад ⁺¹
@@statquest DOUBLE BAM!!:)
@kenny102000 Год назад
Thank you so much for explaining probability concepts with such good pedagogy (and positive energy). These lessons are usually soporiphic.
Question: why do we need to re-test new samples of recovery WITHOUT any drug every time? Can't we just compare the results of each drug to the results of "no drug" ?
As in, in practice, we would only have 1 placebo group against which we compare the results of a given drug, wouldn't we?
@statquest Год назад
Sure, if you did all the tests at the same time, you could use the same placebo group for all the tests, however, that doesn't change the possibility of false positives.
@alsonyang230 Год назад
Big fan of StatQuest, really appreciate the work and humor you put into this.
Just a question on the approach, why do we need to have different control groups for each of the drug for comparison? Can we have one control group that doesn't take any drugs, and have it to compared with all the treatment groups that take different types of drugs. Is there a pros and cons of doing this in comparison to what's done in the video?
@statquest Год назад
You could do it that way, but you'll still run into the same problem.
@alsonyang230 Год назад
@@statquest Thanks for the prompt response. Yeh I agree that's by any mean not the solution to the p-hack problem or even attempt to mitigate it.
I was wondering what is the pros and cons of having different control group for each drug vs having one control group for all. Under what scenario should i pick one approach over another?
@statquest Год назад
@@alsonyang230 It all just depends. You want to control for all things other than the drug or "treatment" or whatever you are testing. So, if you can do everything all at the same time, you could just collect one control sample. But if there are changes (like time of year, or location), then you need to get extra controls.
@alsonyang230 Год назад ⁺¹
@@statquest Ah yeah, that makes a lot of sense. Thanks for the explanation!
@viduradias4646 2 года назад ⁺¹
Thank you!
@statquest 2 года назад
You're welcome!
@Y--H 4 года назад ⁺¹
I almost understand the first example about drug A to Z, except for one difficulty. If we assume that the distributions for each drug are independent, from the perspective of drug Z, there’s only one test. Then this will not have the multiple testing problem.
@statquest 4 года назад ⁺⁵
If something has a 5% chance of happening, then 5% of the time we do it, we'll get a false positive. So now imagine we did 100 tests for all 27 drugs. That means 5 of the tests for drug A are false positives, 5 of the tests for drug B are false positives, etc. So we have 100 rows of results, with mixtures of true negatives and false positives. When we test each drug 1 time, we get one of those rows, and it's very likely that it will include at least one false positive.
@afanasnosdaglaz 4 года назад
Thank you A LOT for this video!!! Please, answer my question: which correction method for p-values should I use, if I make several (say, 30) comparisons, but each 2 groups of observations come from different distributions? To be specific, I compare two variants of 5 different enhancer sequences, each in 6 cell lines, using luciferase reporter technique.
@statquest 4 года назад ⁺²
I should have clarified that it doesn't matter if you mix and match distributions. When we do multiple tests, regardless of the distribution, we run the risk of getting false positives. So I would recommend adjusting your p-values with FDR (false discovery rate).
@deepakmehta1813 3 года назад
Amazing video, thanks Josh. One question: How did you get the p-value for 2 means ? Is there any video for that ?
@statquest 3 года назад ⁺¹
You can do it with something called a 't-test'. There are two ways to do t-tests, the traditional way that is very limited, or you can use linear regression and it opens up all kinds of cool possibilities. If you want to learn how to do it with linear regression, see: ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@deepakmehta1813 3 года назад ⁺¹
@@statquest Thank you so much.
@luqingren4997 3 года назад ⁺²
Congratulate myself being a member :)
@statquest 3 года назад
BAM! Thank you very much for your support! :)
@ricardoafonso7563 2 года назад ⁺¹
.
a nice lecture..
.
@statquest 2 года назад
Thanks!
@LittleLightCZ 2 года назад ⁺¹
"I don't do p-value hacking, I raise my alpha level. I'm José Mourinho."
@statquest 2 года назад
:)
@Vextrove 3 года назад ⁺¹
intro is a banger
@statquest 3 года назад
:)
@epsilonzeromusic 3 года назад
Can you please explain Why increasing the sample size after measuring the p-value would increase the likelihood of false error? I would think it would be the opposite. If you're adding 2 observations to each set, it's more likely that each of these observations are closer to the distribution mean than far away from them. This would imply that the sample means of both sets are more likely to come closer (p-value would increase) than for them to move apart (p-value would decrease)
@statquest 3 года назад ⁺¹
Unfortunately that is not correct. Even if the new points are closer to the mean, there is a 25% chance that the new point for the observations with higher values will be on the high side of the true mean AND the new point for the observations with the lower values will be on the low side of the true mean.
@jiaxuanchen8652 2 года назад ⁺²
You are my god! I love you
@statquest 2 года назад
Thanks!
@dakeni8256 4 года назад
Amazing! I have a question about Benjamin-Hochberg method. Is this method only applicable to parameter tests such as t test and chi-square test, or is it also applicable to non-parametric tests such as Wilcoxon Signed Rank Test. Thanks a lot.
@statquest 4 года назад
It works with all tests.
@dakeni8256 4 года назад ⁺¹
@@statquest Thanks a lot
@konstantinlevin8651 10 месяцев назад
Hey Josh, thanks for the videos. I follow the 66daysofdata playlist and am curious if we'll learn the statistical tests you mentioned in the video
@statquest 10 месяцев назад ⁺¹
What time point, minutes and seconds, are you asking about?
@konstantinlevin8651 10 месяцев назад
@@statquest such as 05:02
@statquest 10 месяцев назад ⁺¹
@@konstantinlevin8651 Yes, I teach how to compare means in this playlist: ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@Exhora 4 года назад
Let's say I am developing a new method of analysis and it gives me a p-value. Is it ok to keep changing my method to get a better p-value? Would this be p-hacking as well? Thank you for your videos! It helps me a lot!
@statquest 4 года назад ⁺¹
I don't think so. For example, if you ran the wrong test on your data, then realized your mistake and ran the correct one, I don't think you should be penalized for that. However, in your example, you have to make sure that when you modify your test, you're not modifying it just to get a small p-value. Instead you are modifying it in ways that are statistically justifiable in a broad sense.
@michaeljbuckley 2 года назад ⁺¹
Great video
@statquest 2 года назад
Thanks!
@michaeljbuckley 2 года назад
@@statquest just discovered your channel. Do you cover different distributions.
What I'm finding is explainions of them in isolation but nothing comparing them, when their used, how to test for them.
@statquest 2 года назад ⁺¹
@@michaeljbuckley Unfortunately I don't have those videos either.
@michaeljbuckley 2 года назад ⁺¹
@@statquest ah well. Still looking forward to go through your channel more.
@crywulf44 4 года назад ⁺¹
If each drug trial was a different separate study though, wouldn't it mean the p-value false positive would still stand? Because the authors are acting independently? So you get different results depending on if one person trialled or the drugs or if one person trialled each drug.
@statquest 4 года назад
The way the p-value threshold is specified to be 0.05, we expect 5% false positives - and, overall, this is a good trade off of cost/benefit/risk/reward. So, if different people are doing tests, sure they will will get false positives from time to time. But the goal is to limit them within our own study, so we adjust the p-values with FDR.
@Dupamine 4 года назад
What if i get a p value of 0.06 using a sample of 29 and do a power analysis AFTER that and power analysis says that i need a sample size of 30 so i add one more observation to my data. Would this be okay to do?
@statquest 4 года назад ⁺¹
You should start from scratch.
@shashankgpt94 3 года назад ⁺¹
"Bam?" Bam with a question mark has a separate fan base
@statquest 3 года назад
Ha! You made me laugh. :)
@shashankgpt94 3 года назад ⁺¹
@@statquest And you make me learn statistics in a fun way 🥺
@chendong2197 4 года назад ⁺¹
Could you please explain how you calculated the p-value in these comparison examples?
@statquest 4 года назад ⁺¹
Because these were just examples, I think I just made up values that seemed reasonable.
@chendong2197 4 года назад
@@statquest I am not trying to justify the correctness of the number. Just wondering how to calculate the p-value in cases like the example. Do not recall you mentioned in previous videos. Thanks.
@statquest 4 года назад
@@chendong2197 With this type of data we would use a t-test. I explain these in my playlist on linear models(which sound much scarier than they really are): ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@yijingwang7308 Год назад
Great video! I have a question about drug testing. Is it better to perform an experiment that tests all of the drugs at once instead of testing each candidate one by one? Besides, if I want to test 6 candidates versus control, each of them has three tech replicates, and I performed the same experiment three times. Should I use the mean of the three tech replicates from the three times of experiments to calculate the p-value? Or are there better solutions for the experiential design? Look forward to your reply!
@statquest Год назад ⁺¹
If you do all the tests at the same time, that's called an ANOVA, and you can do that - it will tell you if something is different or not. However, it won't tell you which test was different. In order to determine which test was different, you'll still have to do all the pairwise tests, so it might not be better. So it depends on what you want to do.
@yijingwang7308 Год назад
@@statquest Many thanks for your reply! What about the technical replicates? If I have 6 technical replicates for each biological replicate, should I use the mean of technical replicate to stand for each biological replicate?
@statquest Год назад ⁺¹
@@yijingwang7308 If you have technical replicates, then you probably want to use a "general linear model." Don't worry, I've got you covered here: ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@mohammadhassanjafari481 2 года назад ⁺¹
It was great
🏅🏅
@statquest 2 года назад
Thank you! :)
@amalantony8934 2 года назад
in the experiment 0:29 - 2:18, what if the drug z was actually a better drug? p hacking would happen if we test the results of drug z itself multiple times until we get a false positive. In example mentioned above we just tested 1 drug relative to other drugs, which gave a p value less than 0.05, so why would that be a false positive?
@statquest 2 года назад
If Drug Z was actually a better drug, then the times when we failed to detect a difference would be false negatives and the time when we detected a difference would have been a true positive. So that means it is possible to make both types of mistakes - if Drug Z is not different, then we can have false positives and if Drug Z is different, then we can have false negatives. The good news, is that we can prevent false negatives by doing a power analysis. For details, see: ruclips.net/video/Rsc5znwR5FA/видео.html and ruclips.net/video/VX_M3tIyiYk/видео.html
@camillep3925 Год назад
Hi, thank you for your great work !
However, there is something that I have a hard time processing. I understand the last example of p-hacking, but we are expected to do the same experiments 3 independent times. How do we analyze and represent these results without p-hacking ?
@statquest Год назад ⁺¹
Is is really true that you are expected to do the same experiment 3 times? Or just that you are supposed to have at least 3 biological replicates within a single experiment? If you really are doing the exact same experiment 3 separate times, then you could avoid p-hacking by just requiring all 3 to result in p-values < whatever threshold for significance you are using.
@camillep3925 Год назад ⁺¹
@@statquest Thank you !
@opencardyugioh 11 месяцев назад
Can you nuance what is meant by one more measurement and adding more data. I assume they are the same in this context regarding " a subtle form of p-hacking' segment
@statquest 11 месяцев назад
What time point, minutes and seconds, are you asking about?
@jiaming5269 4 года назад ⁺¹
tq v clear
@statquest 4 года назад
:)
@SebastianHay 2 года назад
If you get p-value of less than 0.05 and you set sample size beforehand, are there ways of further interrogating the data to ensure you haven't just got that unlikely but possible '1 in 20' result without accidentally reverse p-hacking (not sure of the right term but taking more samples, or sets of samples, so now the p-value is greater than 0.05)?
@statquest 2 года назад ⁺¹
You can use a lower p-value threshold and you can adjust for multiple testing.
@tricky778 4 года назад
If I'm doing an experiment then a lot of other people must have already done experiments and didn't get a useful result - so eventually the spare budget falls to me making my test dependent just the same as if I'd done all of them. Does that mean we must increase our sample size based on the area under the curve of the economic capacity for investigating the problem?
@statquest 4 года назад
No. It is inevitable that some of our results will be false positives. So we just need to focus on our own experiments and do what we can. That said, we can reduce the number of false positives further by using complementary experiments that essentially show the same thing. Like a criminal trial that lots uses pieces of evidence used to convince us that someone is guilty or innocent, our final conclusions should be based on multiple pieces of evidence.
@pakersmuch3705 3 года назад ⁺¹
love it
@statquest 3 года назад ⁺¹
:)
@pesco7790 3 года назад
Hey Josh, but what if my real difference is smaller than what I considered when calculating the experiment Power? Can't I just keep the experiment running to reach a higher sample size, then reach a good experiment Power for the smaller difference and test it? Would that be p-hacking?
@statquest 3 года назад
That would be p-hacking. Instead, use your data as "preliminary data" to estimate the population parameters and use that for a power analysis: ruclips.net/video/VX_M3tIyiYk/видео.html
@bzaruk 2 года назад
How do you calculate a p-value based on two means (from two samples) - isn't a p-value calculated between a number and a distribution? what is the exact process of using the two means parameters to calculate a p-value on a distribution?
@statquest 2 года назад
We can calculate p-values for two means using something called a t-test. For details, see: ruclips.net/video/nk2CQITm_eo/видео.html and then ruclips.net/video/NF5_btOaCig/видео.html
@alexlee3511 Год назад
i am a little bit confusted that, as you mentioned previously, p
@statquest Год назад ⁺¹
When we compute a p-value, we select a "null distribution", which is usually "no difference". If the null hypothesis is correct, and there is no difference, then a p-value threshold of significance of 0.05 means that there is a 5% chance we'll get a false positive.
@alexlee3511 Год назад
@@statquest so can i understand in this way: say the p-value is 0.05, when our observed data lays on the 5% of null distribution, there is also 5% of chance we incorrectly reject the null hypothesis if we decide the rejection, but we are still 95% confident to reject the null hypothesis.
@statquest Год назад ⁺¹
@@alexlee3511 The p-value = 0.05 means that there's a 5% chance that the null distribution (no difference) can generate the observed data or something more extreme. To be honest, I'm not super comfortable interpreting 1 - 0.05 (alpha) outside of the context of a confidence interval.
@scottwais1288 2 года назад
If example 1 is p-hacking because the drug z result relies on only one test, how do you view all the social experiments that rely on one test (because it would be too costly to reproduce them)?
@statquest 2 года назад
The drug z result relied on us testing every single drug - repeating the process until we got a significant result.
@woodrowhowe5536 3 года назад
Please link the video on the false discovery rate on the video index?
@statquest 3 года назад
Thanks for the suggestion. I've added it.
@katielui131 3 месяца назад
Hi I still don’t understand at 1:47 section, why is it p-hacking for rejecting the null hypothesis for drug Z? They aren’t like the latter example where they’re taking the samples from the same population? I presumed we call it a different population for each drug the samples are put on
@statquest 3 месяца назад
Say like we have 100 drugs and none of them are effective - they are all variations on a sugar pill. Then we test all of them like in this example. Well, due to random sampling of people taking the pill, there's a good chance that in one of those tests, a bunch of health people take one pill and a bunch of sick people take the other. This will make it look like there is a difference between the two pills even though it's not the pill, it's just the people that take the pill. Thus, this is p-hacking because it looks like the pills are different.
@AlexanderYap 4 года назад
I had expected that adding more data would have made it less likely to get such false positives. Why does the p value decrease as we add more data in the 2nd example?
@statquest 4 года назад
I did some simulations with a normal distribution and when the p-value was between 0.05 and 0.1, adding more observations resulted in a 30% probability of a false positive.
@MrReadale 4 года назад
@@statquest Excellent video. Thank you! So basically you say that observations in the power analysis can not be included in the final analysis? I have just worked through the book "Medical statistics at a glance". It says that it is OK , and calls it an "intetnal pilot study"?
@nicolasstegmann5855 24 дня назад
I did not get one part:
I understand why if I add 1 more measurement for each group I would be p-hacking.
But does that mean that if I get more measurements than the one previously estipulated sample size by the power analysis I would be p-hacking? so if my power analysis says i need 100 samples, but by the time i did the test i already had 500, what then?
What happens if instead of adding 1 more measurement for each group, im adding 100 more? Is this p-hacking as well?
@statquest 23 дня назад
It's better to either use the first set of data for a power analysis and start over.
@ilikegeorgiabutiveonlybeen6705 3 месяца назад
what if i repeat all of the experiments with other samples a number of times, e.g. i do 20 experiments for each drug total, and then i try to look if anything changed is that real bad (cause it isnt guaranteed that i dont just get 20 false positives out of nowhere) or might that be useful in some obscure scenario
@statquest 3 месяца назад
You'll get more power if you combine all of the data together into a single experiment, so I'm not sure why you would do it another way.
@ilikegeorgiabutiveonlybeen6705 3 месяца назад ⁺¹
@@statquest oh okay thanks i didnt think of that
@rajatchopra1411 Год назад
8:45 "Don't cherry pick your data and only do tests that look good"
politicians: I'm gonna ignore what you just said!
@statquest Год назад
:)
@taotaotan5671 4 года назад ⁺³
I just finished my RNA-seq homework...
@statquest 4 года назад ⁺¹
BAM! :)
@DhruvSharmal0W 3 года назад
This should be the stats Bible for genz
@statquest 3 года назад
bam!
@revolutionarydefeatism 4 года назад
We are testing some drugs to see if they change the recovery period distribution, right? Why do you repeat that they are from the same distribution? How we can be sure that the drug didn’t change the recovery period beforehand?
@revolutionarydefeatism 4 года назад
StatQuest with Josh Starmer thank you for your reply. So if I’m not wrong we generated those stochastic numbers with the same distribution as a tool to teach this stuff. But in reality we can’t be sure if they are form the same distribution or not. So we will test them and power analysis will help us to find the right amount of data we need to have a reliable statistical result. Little bam or not a bam at all? :-D
@statquest 4 года назад ⁺¹
I'm sorry - my previous response was intended for someone else - I have no idea how that got mixed up. Anyway, here's the deal... The drugs can be from any distribution. However, the worst case scenario is when they both come from the same distribution. This means that any small p-value is a false positive (whereas, if they come from other distributions, then any small p-value is a true positive). So we assume the worst to determine how bad things can really get. Does that make sense?
@revolutionarydefeatism 4 года назад ⁺¹
@@statquest it is crystal clear now. Thank you very much. And I also started to listen to your music at the beginning of the videos. :-D
@IstEliCool 3 года назад ⁺¹
Oh how I love you
@statquest 3 года назад
:)
@MrSreior 4 месяца назад ⁺¹
I love you
@statquest 4 месяца назад
:)
@hafidhrendyanto2690 2 года назад
how do you get the p values of two different sample?
@statquest 2 года назад
In this video I used t-tests.
@alexandersmith6140 2 года назад
"...and compare these two means and get a p-value = 0.86."
Wait, how do you compare two means to get a p-value? On the Statistics Fundamentals playlist I'm working through, it's only been explained so far how to determine the p-value of a certain event happening (like a Brazilian woman being a certain height). It hasn't yet explained how I can understand the statistical significance of two sample means being a given distance apart from each other if they're hypothesized to belong to the same overall population.
@statquest 2 года назад ⁺¹
Sorry about that. In this case, I was just assuming that knowing the concept of a p-value would be enough to understand what was going on. However, if you'd like to know more about how to compare means, see: ruclips.net/video/nk2CQITm_eo/видео.html and then ruclips.net/video/NF5_btOaCig/видео.html
@alexandersmith6140 2 года назад ⁺¹
@@statquest Thanks Josh! Wow, what a response time.
@chrisvaccaro229 4 года назад ⁺⁴
In the hood they say 'snitches get stitches'
In the lab we say 'p-hackers are backward'
@statquest 4 года назад
:)
@CarlosFerreira-zg1rp 4 года назад ⁺⁴
imagine there was a virus... well, I guess that it is quite easy to imagine that ...
@statquest 4 года назад
Yep....
@samiulsaeef2076 3 года назад
One thing that confuses me, if the threshold 0.05 means there will be 5% false positives (bogus tests in this example), then how to link p-value (say we got 0.02) to false-positive? Is there anything like 2% false positive involved with p-value? I think that's not the case.
I watched all of your p-value videos and it's been made clear. But the definition of "threshold" confuses me. I hope p-value has nothing to do with false-positives. Correct me if I am wrong.
@statquest 3 года назад
The p-value tell us the probability that the null hypothesis ( i.e.. that there is no difference in the drugs ruclips.net/video/0oc49DyA3hU/видео.html ) will give us a differences as extreme or more extreme than what we observed. When we choose a p-value threshold to make a decision, like all p-values < 0.05 will cause us to reject the null hypothesis that there is no difference in the drugs, then there is a 5% chance that the null hypothesis could generate a result as extreme or more extreme as what we observed, and thus, there is a 5% that we will incorrectly reject the null hypothesis and conclude that there is a difference between the two drugs when there is no difference.
@jolojololo3221 Год назад
HI, I wonder if remove outliers several times does it p-hacking too?
@statquest Год назад
It depends. It could be. It could also be just removing junk data.
@jolojololo3221 Год назад
@@statquest do you have any source for me to be sure not being a p hacker with outliers? Please xs
@statquest Год назад ⁺¹
@@jolojololo3221 Here's something that might help: www.reddit.com/r/AskAcademia/comments/bcop6p/removing_outliers_phacking_or_legitimate_practice/
@poiuytrew09876 4 года назад ⁺¹³
I think I'm in love with you
@statquest 4 года назад ⁺⁷
BAM! :)
@shaquibrahman77 3 года назад
We already do😍😍
@Umar_P 3 года назад ⁺¹
Double BAM😎
@phyzix_phyzix Год назад
P-hacking is how you get a high volume of papers published and cited. It's incentivized in all academic fields. Just try asking a researcher if you could take a look at their data and they will ghost you.
@statquest Год назад
noted
@SchergeSatans 3 месяца назад
I don't see how the first example with the different drugs is problematic, given that the drugs actually are different and that we draw a new sample every time.
@statquest 3 месяца назад
What if the only difference is the size, and they are all sugar pills?
@chrisvaccaro229 4 года назад ⁺²
Sweet! I was waiting for someone to post a good p-hacking tutorial! Now all my findings will be statistically significant!
oh, I'm just kidding!
@statquest 4 года назад ⁺¹
;)
@shivverma1459 2 года назад
I am really confused what do u mean by exact same distribution I mean only the people we are testing are the same the drugs are different therefore if we get different result then why do we assume it as a false positive.
@statquest 2 года назад
What time point, minutes and seconds, are you asking about?
@shivverma1459 2 года назад
@@statquest 6:22
@shivverma1459 2 года назад
I am really confused what different and same distributions mean here yes the people we are testing on are same but the drug is different in each scenario right so it can have different effects
@statquest 2 года назад
@@shivverma1459 This example starts at 2:38, when I say that we are measuring how long it took people to recover and these people did not take any drugs. So there are no drugs to compare - we just collected two groups of 3 people each and compared their recovery times. When we see a significant difference, this is because of a false positive.
@shivverma1459 2 года назад ⁺¹
@@statquest ohh now I get it btw love your videos ❤️ love from India.
@davidcmpeterson 2 года назад ⁺¹
This video starts out tough: "imagine a virus"
Damn, can't you come up with something less hard for me to imagine, such as Santa Claus visiting people's houses all on the same night?
@statquest 2 года назад ⁺¹
:)
@dmytro1667 4 года назад
6:21 is it really False Positive instead of False Negative? The 5% alpha indicates the chance of the first degree error, meaning the chance of rejecting the true hypothesis, which is actually the probability of False Negative
@statquest 4 года назад ⁺¹
Alpha is the probability that we will incorrectly reject the null hypothesis. Thus, alpha is the probability that we will get a false positive. Thus, at 6:21, we have a false positive.
@dmytro1667 4 года назад
@@statquest But why would it be the false positive if the null hypothesis is actually true whereas a criterion says the opposite? it's going to be in the left bottom cell of the confusion matrix, which is false negative
@statquest 4 года назад ⁺²
I think I understand your confusion. When we do hypothesis testing, the goal is to identify a statistically significant difference between two sets of data. This is considered a "positive" result. A failure to reject the null hypothesis is considered a "negative" result. So, when we get a "positive" result when it is not true, it is called a false positive.
@dmytro1667 4 года назад
@@statquest I agree with you, however we get a negative result when it actually should be positive at 6:21
@statquest 4 года назад ⁺²
@@dmytro1667 When we reject the null hypothesis, we call that a "positive result", regardless of whether or not the null hypothesis is actually true. This is because when do real tests, we don't actually know if it is true, so we simply define it that way. So, rejecting the null hypothesis, regardless of whether it is true or not, is defined as "positive". At 6:21 we reject the null hypothesis, so it is defined as a "positive result". Because, in this case, we should not reject the null hypothesis, this is a false positive.
@gaming_ayrus 3 года назад ⁺¹
To the 3 guys who disliked.... no bam
@statquest 3 года назад
:)
@vikramreddy3699 4 года назад
@ 4: 12 how the p-value is calculated?
@statquest 4 года назад
I think I used a t-test.
@vikramreddy3699 4 года назад
@@statquest Thank you for the response Josh. Do you have any statquest on that explaining how to calculate? Thank you
@user-jj3we9jv9i 7 месяцев назад ⁺¹
😇
@statquest 7 месяцев назад
:)
@woowooNeedsFaith Год назад
Question: Why the drug-free experiments are repeated? Why not reuse one set of drug-free results, or even better, aggregate all the drug-free results into one single set?
@statquest Год назад
Even if you re-used the drug-free results, the results would be the same - there's a chance that, due to random noise, you'll get results that look very extreme.
@redcat7467 3 года назад
Well basically the first 8 second says it's all, the rest is complimentary.
@statquest 3 года назад
Bam! :)
@edwardgrigoryan3982 Год назад ⁺¹
Bam? No. No bam.
@statquest Год назад
:)
@likelotusflowers 2 года назад
Is it really slowed down 1.5 x?))
@statquest 2 года назад
Some people like 2x It really depends on how fluent you are in english.
@christianscodecorner3176 3 года назад ⁺²
*shameless self promotion* 😂 josh could purposely make a whole video on nonsense and I’d shamelessly go watch it
@statquest 3 года назад
Bam! Thank you for your support! :)
@benjamin_markus 2 года назад
I don't get how the first example with the drugs is p-hacking. After all you're testing different drugs, not doing the same test again.
@statquest 2 года назад
If I took the same, exact drug, and gave it 27 labels, Drug A through Drug Z, then then tested each one, would it look like p-hacking?
@benjamin_markus 2 года назад
@@statquest In that case of course it would, but I find applying this logic to the case of truly different drugs confusing as that is surely not p-hacking if the tested drugs are not the same.
@shuklasrajan Год назад ⁺¹
BAM 😂
@statquest Год назад
:)
@poojakunte6865 3 года назад ⁺¹
Do you take PhD students??
@statquest 3 года назад
I wish! :)
@poojakunte6865 3 года назад ⁺¹
@@statquest :(
@sumitbagga2504 4 года назад ⁺¹
Hahaha .. Never seen a teacher like you.. Hey Wait a minute...... I didnt see you.. Baam!!!!!
@statquest 4 года назад ⁺²
Nice one! :)
@sumitbagga2504 4 года назад ⁺²
@@statquest I am learning from your videos to become data scientist :)
@JerryWho49 4 года назад ⁺¹
Another great statquest, thanks. Here‘s a xkcd version of your drug testing: xkcd.com/882/
@statquest 4 года назад ⁺¹
A classic xkcd! :)
@christianbolt5761 3 года назад
Cherry picking your study is like cherry picking your data.
@statquest 3 года назад
Yep.

Следующие

Автовоспроизведение

False Discovery Rates, FDR, clearly explained