False Discovery Rates, FDR, clearly explained

StatQuest with Josh Starmer

Просмотров 208 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 9 янв 2017
One of the best ways to prevent p-hacking is to adjust p-values for multiple testing. This StatQuest explains how the Benjamini-Hochberg method corrects for multiple-testing and FDR.
For a complete index of all the StatQuest videos, check out:
statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Buying The StatQuest Illustrated Guide to Machine Learning!!!
PDF - statquest.gumroad.com/l/wvtmc
Paperback - www.amazon.com/dp/B09ZCKR4H6
Kindle eBook - www.amazon.com/dp/B09ZG79HXC
Patreon: / statquest
...or...
RUclips Membership: / @statquest
...a cool StatQuest t-shirt or sweatshirt:
shop.spreadshirt.com/statques...
...buying one or two of my songs (or go large and get a whole album!)
joshuastarmer.bandcamp.com/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#statistics #pvalue #fdr

Комментарии • 424

@statquest 2 года назад ⁺⁵
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@ronnieli0114 3 года назад ⁺²⁰⁵
My PhD dissertation relies heavily on bioinformatics and biostatistics, although my background is neuroscience. Naturally, I had a lot of learning to do, and your videos have helped me immensely. Every time I want to learn about a stats concept, I always type in my Google search, "[name of concept] statquest." Seriously, this is almost too good to be true, and I just wanted to thank you for providing this absolute gold mine.
@statquest 3 года назад ⁺¹³
Wow! Thank you very much and good luck with your dissertation.
@simonpirlot2720 4 года назад ⁺⁹³
You make without a doubt the best videos about statistics on RUclips: funny, clear, intuitive, visual. Thank you so much.
@statquest 4 года назад ⁺³
Thank you! :)
@chetanatamadaddi8370 3 года назад ⁺¹
Totally second this...
@dysnomia6413 4 года назад ⁺¹⁹
God bless you, I made screenshots of this video to explain this concept to my lab. This isn't the first time you've helped me with RNA-seq procedures. I have bumbled through a differential expression analysis. Trying to understand the statistical methods and knowing which option amongst several is the most logical is a mental hurdle. I am the only student in my lab currently undertaking bioinformatics and I am essentially trying to teach myself. There is a huge vacuum of knowledge in this realm amongst biologists and it's daunting. We all can generate data until we're blue in the face, but it doesn't do anyone any good until someone knows how to analyze it.
@statquest 4 года назад ⁺³
Awesome! Good luck learning Bioinformatics.
@didismit1766 5 лет назад ⁺³¹
BAM BAM BAM, thanks a lot man...Your 20 minutes most likely saved hours of trying to understand from wikipedia...
@statquest 5 лет назад ⁺¹
Sweet!!! Glad I could help you out. :)
@meg7617 3 года назад ⁺¹³
Can't thank you enough!! Your methods are truly amazing. Being able to deliver them to us so cleverly is a true indication of how much effort you must have put into understanding these concepts .
@statquest 3 года назад ⁺³
Wow, thank you!
@zebasultana930 6 лет назад ⁺¹
Awesome explanation !! Thanks for taking the time to make these videos and also answering questions from viewers so well. Going through them already answered some queries that I had :)
@user-gx3eg5sz9n 3 года назад ⁺³
Im from China and I watched your channel in Bilibili but I cant had enough so I catch you all the way up ended here, a paradise of data science! thank you Josh, wish you the best!
@statquest 3 года назад
Wow, thank you!!!!
@wenbaoyu 3 года назад ⁺⁶
Wow wow wow how intuitive and visual. Can’t thank you enough for saving me from spending hours struggling to understand this concept🙏
@statquest 3 года назад ⁺¹
You're very welcome!
@Demonithese 7 лет назад ⁺⁹
Fantastic video, thank you for taking the time to put this together.
@fgfanta 2 дня назад ⁺¹
From the way my university teachers (didn't) explain to me Benjamini-Hochberg, and after watching this video, I can claim I now understand Benjamini-Hochberg better than them, at a 99.7% confidence level!
@statquest 2 дня назад
BAM! :)
@niklasfelix7126 4 года назад ⁺⁶
Thanks for the awesome explanation! Really informative and easy to follow. And the DOUBLE BAM in the end actually made me laugh out loud :D
@statquest 4 года назад
Awesome! :)
@daeheepyo3053 5 лет назад ⁺⁶
OMG!! This is the most beautiful explanation I've ever experienced...... Thank you so much professor.
@statquest 5 лет назад
Awesome!!! Thanks so much.
@docotore 3 года назад ⁺³
Simple, informative, and to the point. Absolutely perfect.
@statquest 3 года назад ⁺²
Glad you liked it!
@fmetaller 4 года назад ⁺³
I love you ❤️. I was so afraid of FDR adjustment because I thought the math behind was empirical and worked like magic but you made it surprisingly intuitive.
@statquest 4 года назад
Thank you! :)
@loretaozolina8414 2 года назад ⁺²
Thank you! This was really helpful and made me smile during my intense evening revision :)
@statquest 2 года назад ⁺¹
Glad it helped!
@kakusniper 6 лет назад
I'am currently learning to do RNAseq data analysis, these videos are extremely helpful.
@AnkitDhankhar-uv6qd Месяц назад ⁺¹
First and foremost, I extend my heartfelt gratitude for providing such a series that elucidates concepts in an easily comprehensible manner. Bam !☺
@statquest Месяц назад
Thank you!
@hossam86 3 года назад ⁺²
This is amazing. Very well explained and easy to understand!
@statquest 3 года назад ⁺¹
Glad it was helpful!
@RobertWF42 6 лет назад ⁺¹
Cool, thanks for posting this, very intuitive! An equivalent method for eyeballing the # of true null hypotheses is to plot ranked 1 - p-value on the x-axis and the hypothesis test rank on the y-axis, then fit a line to the scatter plot, starting at the origin. Where the line hits the y-axis is your estimate of the # of true null hypotheses.Would like to see an intuitive explanation for the Benjamini-Yekutieli procedure, used in studies where the tests are not completely independent!
@yanggao8840 4 года назад ⁺³
I have always hated math and you just make it clear and interesting! Can't thank you enough
@statquest 4 года назад
Hooray!!! I'm glad the video is helpful. :)
@chunhuigu4086 4 года назад ⁺⁵
Great tutorial for FDR. The adjusted p-value is a p-value for the remaining result after cutting off some results you know that are not significant just by the distribution. It will be better if you can tell something about Q-value and how Q-value reflects the quality of a experiment.
@FengyuanHu 6 лет назад
This is simply great!!! Thanks for sharing Joshua.
@shanoodi 7 лет назад
This is the best video that explains FDR. Thank you,
@adelinemorez8072 10 месяцев назад ⁺²
I love you StatQuest. Thank you for never letting me down. You were always present to answer my deepest and most shameful doubts. You never abandoned me during the darkest hours of my PhD.
@statquest 10 месяцев назад ⁺¹
I'm so happy to hear my videos helped you. BAM! :)
@ryanruthart581 6 лет назад
Great video, your example was clear and very will illustrated.
@broken_arrow1813 5 лет назад
The clearest explanation of BH correction so far. Quadruple BAM!
@ericshaker9377 3 года назад ⁺¹
Wow was seriously struggling with my research since I dont know the first thing about statistics and I love this so so so much. So instructional I had to like
@statquest 3 года назад ⁺¹
BAM! :)
@tinAbraham_Indy 2 года назад ⁺¹
Thank you very much indeed for the perfect explanations and examples of the FDR concept. I really get my answer.
@statquest 2 года назад
Thanks!
@frrraggg Год назад ⁺²
As always, by far the best explanation on the web!
@statquest Год назад
Thanks!
@ramazanaitkaliyev8248 Месяц назад ⁺¹
Great explanation, thanks ! clear explanation, amazing balance between theory and examples
@statquest Месяц назад
Thank you!
@li-wenlilywang8856 6 лет назад
Thank you so much for this great movie!! Great explanation.
@curiousarabidopsis 7 месяцев назад ⁺¹
As always, it is a great explanation. Thank you Josh 👏
@statquest 7 месяцев назад
Thank you!
@rodrigohaasbueno8290 5 лет назад ⁺¹
I have to keep saying that I love this channel so much
@statquest 5 лет назад
Hooray!!! Thank you so much!!! :)
@afraamohammad1001 4 года назад ⁺¹
Thanks for your effort and simplified explanation!!! live saver ))
@statquest 4 года назад ⁺¹
Glad it helped!
@isaiasprestes 6 лет назад ⁺⁸³
1 thumb down is a case of FDR :)
@statquest 4 года назад ⁺⁵
So true! :)
@reflections86 Год назад ⁺²
Josh is a genius. Really appreciate your work statquest.
@statquest Год назад
Thank you! :)
@diegocosta2383 3 года назад ⁺¹
Nice video, simple and fast.
@statquest 3 года назад
Thanks!
@archanaydv995 5 лет назад ⁺¹
Just wow!! Thank you for this.
@annawchin 3 года назад ⁺¹
This was SUPER helpful, thank you!
@statquest 3 года назад
Thank you! :)
@PedroRibeiro-zs5go 6 лет назад
Dude thanks so much, this video is AWESOME!!!
@noahsplayground2564 3 года назад
Hey Josh, love you videos on stats, specifically centered around hypothesis testing. Can you do more videos on the different techniques of hypothesis testing, like (group) sequential testing and multi-armed bandit?
@statquest 3 года назад
I'll keep that in mind.
@weihe3639 6 лет назад
Very nice explaination!
@agnellopicorelli4751 3 года назад ⁺¹
I just love your videos. Thank you so much!
@statquest 3 года назад
Thank you! :)
@user-dk4ss4gp3l Год назад ⁺¹
This is my first time fully understanding FDR ...
@statquest Год назад
bam!
@kezhang1460 3 года назад ⁺²
BAM！！！finally i understand it, which confused me half a year!!
@statquest 3 года назад
BAM! :)
@ygbr2997 Год назад ⁺¹
the second half is hard to understand, but I know I will come back later and watch it again, and again, and again until I finally understand it
@statquest Год назад
Let me know if you have any specific questions.
@ilveroskleri 4 года назад ⁺¹
Thanks, that was preciuos (and spared me hours of frustration)
@statquest 4 года назад
Thanks! :)
@barbaramarqueztirado7567 2 года назад ⁺¹
Thank you very much por the explanation, very very clear!!
@statquest 2 года назад
Muchas gracias!
@karinamatos4253 3 года назад ⁺¹
Great explanations!
@statquest 3 года назад ⁺¹
Thanks!
@abdullahalfarwan1458 3 месяца назад ⁺¹
شكرا جاش. ماقصرت. مقطع مختصر ومفيد
@statquest 3 месяца назад
Thank you!
@sunjulie 3 года назад ⁺²
It's so good, I want to give it more than one thumb up!
@statquest 3 года назад ⁺¹
Double BAM! :)
@maryamsediqi3625 3 года назад ⁺¹
Thank you sir, was very useful 🙏
@statquest 3 года назад ⁺¹
Glad it helped
@RavindraThakkar369 2 года назад ⁺¹
Nicely explained.
@statquest 2 года назад
Thank you!
@timokvamme 3 года назад ⁺¹
Nice explaination!
@statquest 3 года назад
Thanks!
@junymen223 7 лет назад
Thanks a lot. Mr. Joshua
@BadalFamily 4 года назад ⁺¹
Hi Josh! Great stuffs here. Could you please make a video on "Significance Analysis of Microarrays". Mainly how it differs from T-stat/Anova. Really appreciate you for all the videos.
@statquest 4 года назад ⁺¹
I'll keep it in mind, but I can't promise I'll get to it soon.
@yoniashar3179 5 лет назад ⁺¹
This is a great video. And, could help me understand how the intuitive understanding (the histograms of p values coming from two distributions) connects to the mathematical procedure of the B-H procedure? thank you!
@telukirIY 6 лет назад
Good explanation
@unavaliableavaliable Год назад ⁺¹
This video is so beautiful.. Thank you so much
@statquest Год назад
I'm glad you like it!
@poiskkirpitcha2003 4 года назад ⁺¹
Thank you, bro!
@karolnowosad886 3 года назад ⁺¹
I love the explanation!
@statquest 3 года назад
Thank you! :)
@arem2218 3 года назад ⁺¹
Thank you, nicely expalined
@statquest 3 года назад ⁺¹
You are welcome!
@dingdingdingwen 2 года назад ⁺¹
Great channel and fantastic content! I am wondering if you could make an episode about IDR, Irreproducible discovery rate. It is difficult to find a good explanation or usage guide on it.
@statquest 2 года назад
I'll keep that in mind.
@congchen170 7 лет назад
Very nice video and I learned a lot from it. The only thing is when you give examples and told us when you do 10,000 times P value calculation, the distribution of P values will be like this or like that. But I don't know that's true or not. So, I am wondering can you explain a little bit more or is there any further reading I can do about P value and adjusted P value?
@vaibhavijoshi6443 4 года назад ⁺¹
This is amazing. thank youu.
@statquest 4 года назад
Thank you! :)
@biancaphone 5 лет назад ⁺¹
Would love a video about the target decoy approach
@statquest 5 лет назад
OK. I've added it to the to-do list. :)
@ucheogbede Год назад ⁺¹
This is very great!!!
@statquest Год назад
Thank you!
@RobertWF42 6 лет назад
One part I don't quite understand is how the intuitive eyeball method translates into the B-H p-value adjustments you explain starting at ~15:00. To me, plotting a line along the H0 = True p-values sounds like you would be fitting a linear regression & identifying the outliers < .05.
@torquehan9404 Год назад ⁺²
I don't understand one thing. If samples are taken from the same population, p-value bins would NOT be evenly distributed, rather it is also skewed toward p=1 because it is normally distributed and most of the time samples close to average values are likely to be picked.
@statquest Год назад ⁺¹
By definition, p-values are uniformly distributed. By definition, a p-value = 0.5 means that 5% of the random tests will give results equal to or more extreme. a p-value = 0.1 means 10% etc etc. etc.
@torquehan9404 Год назад ⁺¹
Thanks a lot!
@zijianchen4775 5 лет назад
It is a crystal clear about FDR and BH method, rather than my professor said
@worldofinformation815 3 года назад ⁺¹
Thank you Sir🌹
@statquest 3 года назад
Thank you!
@hedaolianxu2748 4 года назад ⁺¹
AWESOME！ Thank you！
@statquest 4 года назад
:)
@rongruo2624 4 года назад ⁺³
I'd like to know why when samples come from the same distribution, the p values are uniformly distributed? Thank you!
@krisdang 7 лет назад
This is awesome. Imma save it for later reference hah
@thomasalderson368 6 лет назад ⁺¹
thanks josh!
@statquest 6 лет назад
You are welcome!!! I'm glad you like the video! :)
@StephenRoseDuo 6 лет назад
Awesome, this may be too niche but could you do a video on local FDR please?
@mihirgada5585 Год назад ⁺¹
Thanks for these videos! They are great!!
Can you help me understand the intuition behind why the p-values are uniformly distributed in the samples from the same distribution?
@statquest Год назад
Think about how p-values are defined. If there is no difference, the probability of getting a p-value between 0 and 0.05 is... 0.05. And the probability of getting a p-value between 0.05 and 0.1 is also 0.5 etc.
@mihaellid 4 года назад ⁺²
BAMMMM! Thank you!
@statquest 4 года назад
Hooray! I'm glad you like the video. :)
@SuperARIF1990 5 лет назад
Your explanations are very helpful. Can you please make a long video where you discuss all other approaches like SPLOSH, BUM, Pound and Cheng methods, also a comparative explanation between them? I'm eagerly waiting for it. Furthermore, you can explain them with R.
@linako2896 4 года назад ⁺¹
Amazing!!
@statquest 4 года назад
Thanks!!
@yuyangluo7292 3 года назад ⁺³
i love how he made that joke about wild type with monotone lol
@statquest 3 года назад ⁺¹
:)
@thiagomaiacarneiro2829 Год назад ⁺²
Great video! Congratulations. I've seen the paper of Benjamini and Hochberg 1995, but (guided by my very limited knowledge of math) I was not able to find the formula in the way you explained. Please, could you give some clarifications on this issue, as some kind of transformation of the mathematical procedure? Thank you very much. Best wishes.
@statquest Год назад ⁺¹
I'll keep that in mind.
@donnizhang5960 5 месяцев назад
I have the same questions. Did you figure out the logic behind the mathematical procedure? Thank you!
@Ken-vp6xc 5 лет назад ⁺⁹
Hey thanks for the video. Just a question, don't you have higher chance of having samples that come from the middle of the distribution than the tails resulting having more large p-values than small ones? I don't get why p-values are uniformly distributed? Thanks :)
@statquest 5 лет назад ⁺¹⁰
You know, I found this puzzling as well. However, imagine we are taking two different samples from a single normal distribution. If we did a t-test on those samples, 5% of the time the p-value would be less than 0.05. Now imagine we created 100 random sets of samples and did 100 t-tests. 5 of those p-values will be less than 0.05. 10 will be less than 0.1, 15 will be less than 0.15.... 50 will be less than 0.5.... 90 will be less than 0.90, etc. This isn't a mathematical proof, but it makes sense - the whole idea of having any p-value threshold, x, is that we are only expecting, x percent of the tests with random noise to be below that threshold. Thus, we have a uniform distribution of p-values.
@RobertWF42 5 лет назад
Also keep in mind that when computing p-values for the difference between two sample means, p-values of .05 or less cover a wider range of x values than say p-values between .50 and .55.
@Tbxy1 3 года назад ⁺³
@@statquest Wow, I had the same question as Ken. Thanks for giving this super intuitive explanation!
@lizheltamon Год назад
@@Tbxy1 me too! been struggling to understand that part and thank god Ken asked 😅
@tysonliu2833 5 месяцев назад ⁺¹
I think you previously talked about how to calculate p value for one sample set that tells us how likely the sample set belongs to the distribution, but in here we are calculating the p-value of two sample sets, and try to tell whether they belong to the same distribution, how is it calculated? Or is it simply just comparing one sample set to the distribution and another and if they both likely belong to the same distribution we say we fail to reject the null hypothesis?
@statquest 5 месяцев назад
In this video I believe I'm using t-tests. To learn about those, first learn about linear regression (don't worry, it's not a big deal): ruclips.net/video/nk2CQITm_eo/видео.html and then learn how to use linear regression to compare two samples to each other with a t-test: ruclips.net/video/NF5_btOaCig/видео.html
@Priestessfly 3 года назад ⁺¹
great video
@statquest 3 года назад
Thank you!
@chadmoon3139 10 месяцев назад ⁺¹
Awesome!!
@statquest 10 месяцев назад
Thanks!
@karimnaufal9792 4 года назад ⁺¹
Holy freaking nuts!! Thank you haha...
@statquest 4 года назад ⁺¹
Yes! :)
@chimiwangmo1512 3 месяца назад
Thank you for the intuitive video. I am awfully new to statistics so I have three questions: Suppose it is a classification problem 1. Are "samples" referred to as "classes" (types of genes) or is it samples of genes? 2. Will the null hypothesis be: there is no dependency between the gene and the samples? 3. Why 10,000 times? (I am bit confused what is relationship between 10,000 genes and 10,000 test as I understand for each test, the distribution plot is based on values of genes)?
@statquest 3 месяца назад
1) I'm not sure I understand the question because we are trying to classify the expression as being "the same" or "different" between two groups of mice or humans.
2) The null hypothesis is that there is that all of the measurements come from the same population.
3) When we do this sort of experiment, we test between 10,000 and 20,000 genes to see if they are expressed the same or different between two groups of mice or humans or whatever. So, for each gene in the genome, we do a test to see if it is the same or different. This allows us to identify genes that play a role in cancer or some other disease.
@zeyads.el-gendy4227 3 года назад ⁺¹
I truly love you...
@statquest 3 года назад
Thank you! :)
@s.campusano 5 лет назад ⁺¹
Maravilloso.
@statquest 5 лет назад
:)
@revolutionarydefeatism 3 года назад
Oh My Goodness! You explain very clearly! Why should I waste my time in the classes? But for the graphics part, I prefer something like 3B1B. Besides, I searched but I couldn't find any video about A/B tests? Do you have any? Thank you Josh!
@statquest 3 года назад
I'm sorry you don't like my graphics.
@revolutionarydefeatism 3 года назад ⁺¹
@@statquest It's the best channel on RUclips about Statistics ever! ❤️❤️❤️❤️❤️
@statquest 3 года назад
@@revolutionarydefeatism Thanks!
@TheJosephjeffy 2 года назад
I am glad to see this video as i am doing some FDR tests in my project. I have a question: what if the false positive samples remained after adjustment? Is it still acceptable if FDR is < 0.05?
@statquest 2 года назад
You can not eliminate false positives, but you can use FDR to control how many there are. So typically people call all tests with FDR < 0.05 "significant".
@TaylanMorcol Год назад ⁺¹
Hi Dr. Josh, I'm curious to get your thoughts on a simulation I'm running. It's very similar to the simulation in this video where you calculate 10,000 p-values by sampling from the same distribution.
When I run my simulation using a Welch t-test and n=3, only ~3.5% of p-values are less than 0.05. The percentage converges on 5% when I increase the sample size or use the Student's t-test.
It seems as though forgoing the equal variances assumption sacrifices some power, especially at low sample sizes. But I'm still trying to grasp why that is and what the implications are for using the Welch t-test with low sample size in real-life situations. For example, if the null hypothesis is that both samples come from the same population, then why not just assume equal variances and use Student's t-test all the time? (I know that last question is probably conflating some concepts that should be separate, but I'm having a hard time keeping track of it all, and I'm really interested to hear how you would respond to that question).
You seem to have a great way of explaining things like this intuitively. I'm curious to hear your thoughts.
Thanks so much! I've benefited greatly from your videos.
@statquest Год назад ⁺¹
It makes sense to me that welch's t-test has less power with low sample sizes because it makes less assumptions - and thus, has to squeeze more out of the data by estimating more parameters.
@HannaFBerg-bj9vp 6 лет назад
omg, amazing!
@user-gb8ko3jp4m 2 года назад ⁺¹
Awesome!
@statquest 2 года назад
Thanks!
@JadAssaf 6 лет назад
Thank you so much.
@statquest 6 лет назад
Hooray! I'm glad you like the video! :)
@JadAssaf 6 лет назад
I've been reading publications for an hour and you solved my problem in 10 minutes.
@statquest 6 лет назад
Awesome!!! This is definitely one of those things that's easier to "see" then to read about. Glad I could help. :)
@mohammadj.shamim9342 2 года назад
Thank you very much. I think now is the time to step these videos up to applications in R or another statistical software.
@statquest 2 года назад
I have a few videos in R and Python here: statquest.org/video-index/
@sergiooterinosogo4286 3 года назад ⁺¹
Thank you for your very helpful video. I have one question here: what I have understood from the calculation of the FDR is that it will make only the smaller p-values still be significant after the correction, am I right? (you suggested it in 12:09) Nevertheless, I got distracted at 17:20 because there are small-er values in the red area that, based on this, would not be "false positives" if I got your explanation. Could you clarify this? Thank you :)
@statquest 3 года назад ⁺¹
The numbers in the blue boxes are p-values that were created from two separate distributions. Some of those p-values are below the standard threshold of 0.05 and some are not. The ones that are not are "false negatives". The numbers in the red boxes are p-values that were created from a single distribution. Some of those p-values are below the standard threshold of 0.0.5 and some are not. The ones below the threshold are false positives. However, in this specific example, after we apply the BH procedure (at 18:02 ), all of the false positives end up with p-values > 0.05 and are no longer considered statistically significant so the false positives are eliminated.
@TheRonakagrawal 8 месяцев назад ⁺¹
@statquest: Josh, Thank you. I have a follow-up though. Sure, we could adjust the p-values to reduce the False positives, but could this adjustment cause an increase in False negatives? Is there a way to quantify that? Apologies if I am missing something obvious.
@statquest 8 месяцев назад
There are different methods to control the number of false positives, some do a better job than others at keeping the number of false negatives small. FDR is one of the best methods for limiting both types of errors. In contrast, the Bonferroni correction is one of the worst.

Следующие

Автовоспроизведение