P-Value Problems: Crash Course Statistics #22
HTML-код
- Опубликовано: 29 сен 2024
- Last week we introduced p-values as a way to set a predetermined cutoff when testing if something seems unusual enough to reject our null hypothesis - that they are the same. But today we’re going to discuss some problems with the logic of p-values, how they are commonly misinterpreted, how p-values don’t give us exactly what we want to know, and how that cutoff is arbitrary - and arguably not stringent enough in some scenarios.
Crash Course is on Patreon! You can support us directly by signing up at / crashcourse
Thanks to the following Patrons for their generous monthly contributions that help keep Crash Course free for everyone forever:
Mark Brouwer, Erika & Alexa Saur Glenn Elliott, Justin Zingsheim, Jessica Wode, Eric Prestemon, Kathrin Benoit, Tom Trval, Nathan Taylor, Divonne Holmes à Court, Brian Thomas Gossett, Khaled El Shalakany, Indika Siriwardena, SR Foxley, Sam Ferguson, Yasenia Cruz, Eric Koslow, Caleb Weeks, Tim Curwick, D.A. Noe, Shawn Arnold, Ruth Perez, Malcolm Callis, Ken Penttinen, Advait Shinde, William McGraw, Andrei Krishkevich, Rachel Bright, Mayumi Maeda, Kathy & Tim Philip, Jirat, Eric Kitchen, Ian Dundore, Chris Peters
--
Want to find Crash Course elsewhere on the internet?
Facebook - / youtubecrashcourse
Twitter - / thecrashcourse
Tumblr - / thecrashcourse
Support Crash Course on Patreon: / crashcourse
CC Kids: / crashcoursekids
I would reccomend that the courses have outlines written or showned on video, it would be better for us to catch up with the main points.
IMHO, this episode is basically the entire reason why this series needed to exist. P-Hacking continues to be one of the most detrimentally misunderstood concepts of my lifetime. It started getting talked about a few years back, but it hasn't stopped lazy science journalists from picking up the odd worthless deceptive science press-release and failing to scrutinize the validity.
p-values can be really hard to comprehend. There can be a lot of double negatives and false dichotomies. I think thats why it is important to use consistent language.
Sorry, but talking about numbers showing fast clips of fish, deers and chess is IN NO WAY better than the good old whiteboard. It really doesn't help visualization and abstraction.
I'm a PhD student in economics and my honest opinion is that p-values is that forcing a policy asking for ever lower p-value cutoff just encourages researchers to get ever larger data samples, which is not bad. However, we run into the fact that with large enough sample, the distributions thin out and we will get ever more likely to reject null hypotheses. The problem is that statistical significance can be meaningless when we fail to have economic significance: i.e. the effect is large enough for it to matter. So, p-values are important but by no means should one ever take too seriously the result of any given statistical test in isolation.
When looking at differences in error statistics, say RMSE, I've taken to distinguish statistically significant from practically significant. Sure, the 0.05 K degree of improvement in that weather model might be statistically significant, but does it really impact the forecast meaningfully?
And to go further, we always have to have spurious relationships on the back of our minds. Even when results are statistically significant, do we want to believe that we didn’t just get a bad draw?
A great video. I'm a risk manager, so I work a lot with probability and hypothesis testing, and a lot of people in my field misinterpret p-values as probability of rejecting a true null hypothesis given data, which it absolutely isn't. It's a damn confusing thing to understand, but every student of statistics should learn that all that a p-value tells you is how extreme your sample is given that the null is true. It's important because this type of misunderstanding then creeps into academic papers.
this explanation was so helpful! it really solidified the idea of p-values in my head. i've been struggling with the concept a LOT.
To be honest, I get lost a tad with this subject. Will review it again and again until I get it ;)
Did you get it ?
That's truly the only and best thing to do.
First time I watched the series, 6month ago, I dropped at 10th episode. Now that I have let it sit enough to renew my mtivation, I started over and am going all the way. Nothing wrong with going your own pace.
I'm currently taking an Experimental Methods course in Psycholinguistics for my Master's degree, and we're currently doing a TON of significance testing in R, so these videos are amazingly timely for me!
Finally I understood what are p-values. Thank you!
Now, I have to figure out how to internalize that knowledge and don't commit errors in the future.
Thank you for the series. I find there are too many examples per video. Some, like the black swan example, are necessary to prove the point, but I wonder if other ones (cats' weights, bees, et al) could be rolled into one?
Can't wait for Bayesian statistics now!
Do one on permutation tests and bootstrapping. I'm loving it.
I really hate stats 😭
99% good. Hypothetically.
love the videos, thank you a lot, but what is a chuck n cheese? i had to google that because i'm not american, maybe "universal" examples would be better?
You can look it up for more info, but the purpose of this video its a family entertainment establishment where many kids can be found.
Math is Hard.
My null hypothesis: The data is normally distributed with a mean u, and a standard deviation of sigma.
As I collect data, the data has mean x_bar, and standard deviation of s. I know that the normal distribution is a conjugate prior to itself, so I update my distribution to more consistent with the given data.
Great video just she talk too fast .
im not convinced
i think the problem of p-value could be solve by meta-study.
What
What building takes ten minutes to walk around!?
Both those psychology experiments actually show the opposite of what she said they showed. I know that they are just examples. But they are terrible examples. We cannot train our cognitive intellect.
Had to listen to 3:21-3:41 four times.
This is such helpful information. Just wish it had some think-time. The sentences are cognitively heavy. They need a few seconds to put into working memory and process. Did this very useful content have to be so densely packed and so very breathlessly delivered??
2
about problem #2 (that p-values assume null hypothesis to be true):
Can we not say that this isn't really a problem since the way we use p-value is not just the conditional probability? We use the "reductio ad absurdum" argument in relation to p-value.
So to say that the fact that p-values assume null hypothesis to be true means it cant help us judge whether the null is true or not, we would also have to say that "reductio ad absurdum" is a faulty argument in general. Which, i suppose, could be argued but would be a hard task.
Most of the error here referred in defining/limiting p-value are surely because of sampling error...
I'm pretty sure if better sampling techniques are adopted they will have lot more to explain for the sake of what they had explained it now...
is this series going to get to stuff like multidimensional reduction? or is this more of only an "intro" kind of thing?
There is a lack of resources to learn statistics in youtube, I hope this series fixes that
I didn't like this style of statistics in class. I liked priors so much more. They made much more sense.
Can't you look at the power to determine that you had a big enough sample size to pick up an effect, and use that to determine whether you have a type 2 error?
Such a great video, why so little thumbs up. I would even pay for this...
Is this it for discussion of p-values, or is "p-hacking" going to be brought up at some point?
6:25 best slide
The swans in the pool of my university are black ^ ^
The central idea of this video is p( null / data) vs p( data / null ) .
Can someone explain what's the difference between "fail to reject" and "accept" the null hypothesis. I didn't understand that part 😢.
Let's say your null hypothesis is "there are only white swans in the world". As soon as you find a black swan, you can reject the null hypothesis. But what happens if you find only white swans in your attempt to investigate your question? You have looked and looked and you only found white swans. Does that mean that there are no black swans in the world (i.e. can you accept your null hypothesis)? No (This is a classical example of an incorrect inductive argument). It might be that you just were not looking the right place at he right time and that is why you cannot accept your null hypothesis if you only find white swans. You failed to reject your hypothesis because you did not find any black swans.
This is why we have to always provide evidence to prove that something is wrong and can't provide evidence that something is right. We can say that all our evidence hints at our hypothesis being right (but can't say that it is right), and can work under such assumptions, until this assumption gets disproven.
Kaminaji Thank you for the explanation. I do understand the case of the swans, but I don’t understand the analogy: how it is translated to hypothesis testing.
@@GottsStr Do understand the concept of "reductio ad absurdum"?
nice se
i really liked that the fischer quote was included
Can someone explain to me where that super small P value came from in the example on 7:37 please?
You mean 0.00036? It comes from calculating the p-value. Crash Course tries not bog down their lessons with formulas, but for statistics its hard to do. If you REALLY WANT to know I can explain it to you comment-by-comment, but if you don't want that then the best I can tell you is to remember the idea of histograms and density curves. For histograms you plot data into bins of frequency. For example, if you want to figure out the average number of miles a person takes going to school or work, you create bins of 0-4 miles, 5-9 miles, 10-14 miles, and so on. Certain histograms show distributions, and the case of the normal distribution all the data is centered around the middle so if the you created a histogram of average # of miles and it was normally distributed, if you drew an outline of the histogram it would resemble a bell curve. If you made the bins smaller and smaller, your bell curve would become more apparent histogram would eventually become a density graph. This is the important part: a density graph is a graph based on probability, remember that 100% of the data you collected is under the curve. A normal curve is symmetric so 50% of your data would be one side and 50% would be on another. So again the main point is the normal curve gives you probabilities, and you use those probabilities to calculate p-values
puppycat in the background ( * __ * )
So that's what it's called. I've been wondering since I saw it!
Nice 'n analog chess clock.
Que gran material.
p fishing is nonsense.
Firs
My neurons are confused.
It's important to remeber that the statistical tests can be overly insensitive at very large sample sizes since the power increase. The alpha then needs to be adjusted to a lower level, possibly 1%. It's also important to study the practical significance in these cases. For example, when testing if data is normally distributed and it's rejected, to test the practical significance a graphical approach can be done to study if its approximately normally distributed.