Exact Binomial Test Explained! + Real-World Example: Counting Trash in the Baltic Sea 📊🌊🔬(4K)

yuzaR Data Science

Просмотров 1,4 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 21 авг 2024
IF YOU WOULD LIKE TO SUPPORT ME, JOIN THE CHANNEL: / @yuzar-data-science
The Exact Binomial Test is a simple yet powerful technique that every data scientist should have in their toolbox. In this video, we’ll explore why we need the Exact Binomial Test and examine a real-world application where I used it to publish a scientific paper on encounters of marine litter particles on the Baltic seafloor.
Enjoy! 🥳
Welcome to my VLOG! My name is Yury Zablotski & I love to use R for Data Science = "yuzaR Data Science" ;)
This channel is dedicated to data analytics, data science, statistics, machine learning and computational science! Join me as I dive into the world of data analysis, programming & coding. Whether you're interested in business analytics, data mining, data visualization, or pursuing an online degree in data analytics, I've got you covered. If you are curious about Google Data Studio, data centers & certified data analyst & data scientist programs, you'll find the necessary knowledge right here. You'll greatly increase your odds to get online master's in data science & data analytics degrees. Boost your knowledge & skills in data science and analytics with my engaging content. Subscribe to stay up-to-date with the latest & most useful data science programming tools. Let's embark on this data-driven journey together!

Комментарии • 36

@neurosciencehubbykissikont5378 2 месяца назад ⁺²
Your videos are intuitive. Can you start a playlist on machine learning in R?
@yuzaR-Data-Science 2 месяца назад ⁺¹
thanks man! yes, that's the plan, but first I would cover stat models, like logistic regression etc. after that I would go full ML and AI ;)
@WilForDataScience 2 месяца назад ⁺¹
Excellent content quality! You sir always find a way to keep us interestingly attached to the video. You are like our Statistics and Data Science dealer. Thanks for your labor. We'll keep growing up.
@yuzaR-Data-Science 2 месяца назад
Wow, thank you! Your comments, my friend, are the most supportive and motivating! So, after reading them I just wanna jump straight into creating a new video on one of the 1000 ideas I have :) For instance, I am finally starting the logistic regression series. One of my favorite topics ;) Genuinely thankful for your continuous support! Cheers!
@hikeaway1596 2 месяца назад ⁺¹
good quality content as always thanks! keep up a great work!
@yuzaR-Data-Science 2 месяца назад
thanks for your continuous support! :)
@aminebahmed7526 2 месяца назад
Nice video as usual, keep up the good work
@yuzaR-Data-Science 2 месяца назад ⁺¹
Thanks, will do! Appreciate your continuous support!!! Commenting, watching and liking is really the best support! So, thanks again!
@jalalkassout4226 2 месяца назад
I think your the best explaining statistics in such smooth way. I'm wondering if your blog is out of service?
@yuzaR-Data-Science 2 месяца назад
Thanks man! Greatly appreciate your positive feedback. My blog was shut down, since they want me to pay. I refuse to pay, because I actually do something useful for the world for free. So, hope for your understanding. The good news is, youtube is still free and will stay free, so, when you just stop the video at any time and type the code it's free. However, when you want to see the whole code from any of the video, you could join the channel (ruclips.net/channel/UCcGXGFClRdnrwXsPHi7P4ZAjoin), because for the members, I do provide the whole code.
@SUNILYADAV-tv5ze 2 месяца назад
Nice video and excellent explanation 👍
@yuzaR-Data-Science 2 месяца назад
Thanks 🙏 Sunil! Glad you enjoyed it!
@statlab_stat.solution 2 месяца назад
❤
@yuzaR-Data-Science 2 месяца назад
🙏
@OnLyhereAlone 2 месяца назад
Thanks for this but is this not the classic case when p-value adjustments for multiple testing need to be applied? Why wasnt it applied? Also, in a situation where we have only 2 possible outcomes, when should the binomial test be used versus Chi-square? Would this be based on preference or one is objectively more appropriate versus the other?
@yuzaR-Data-Science 2 месяца назад ⁺¹
Man, you ask good questions! ;) first, yes, you are absolutely right, p-values adjustment would be the right thing to do, but to the time I have done the paper, I used it irregularly. For the video I also try to keep the focus, so it's concise.
@yuzaR-Data-Science 2 месяца назад
And since I try to keep videos short, some infos does not end up there, but was considered, while I was writing the script. For instance, check these parts below, I hope you'd find them useful:
Intuition
Ironically, there is nothing exact about Exact Binomial test. It is called “exact” because it simply calculates the p-value directly from the probability, and not from any kind of statistics, like the Chi-Square.
However, Chi-Square’s Goodness of Fit test is only approximation for a p-value, that is why the exact binomial test is recommended.
Proportion test
If you have lots of data (N > 30) or more than two outcomes, use a proportion test which is highly similar to the exact binomial test. In fact, the Exact binomial test is exactly the same as the proportion test with Yates continuity correction, which is used by the proportion test by default. Below, I explicitly wrote down such correction:
prop.test(x = 7, n = 10, p = 0.5, correct = T)
One sample Chi-Square test
However, as you can see above, the proportion test calculates the chi-squared statistics, so it is actually calling a chi-squared test. And interestingly a proportion test without Yates continuity correction gives identical results to a Goodness-of-Fit One-sample Chi-Square test:
prop.test(x = 7, n = 10, p = 0.5, correct = F)
chisq.test(c(7,3)) # total is 10 and p = 0.5 for both numbers by default
(Simplest) Logistic regression
If you are not overwhelmed yet, you can also go further and conduct the simplest logistic regression possible (don’t need to understand it now! I’ll cover it in different videos). Below you’ll find an log-odds output of the logistic regression (0.847) which can be expressed in probability using the plogis function. You’ll see that the probability is exactly 0.7, as in the tests above and the p-value is similar. By the way, the p-value represents the probability of observing a result as extreme or more extreme than the one you got, assuming the null hypothesis is true.
m
@paulocastro5502 2 месяца назад
why not use the Chi-square test goodness-of-fit also?
@yuzaR-Data-Science 2 месяца назад ⁺¹
you actually can, here is what I think about them:
Close relatives: connect the dots
Ironically, there is nothing exact about Exact Binomial test. It is called “exact” because it simply calculates the p-value directly from the probability, and not from any kind of statistics, like the Chi-Square.
However, Chi-Square’s Goodness of Fit test is only approximation for a p-value, that is why the exact binomial test is recommended.
Proportion test
If you have lots of data (N > 30) or more than two outcomes, use a proportion test which is highly similar to the exact binomial test. In fact, the Exact binomial test is exactly the same as the proportion test with Yates continuity correction, which is used by the proportion test by default. Below, I explicitly wrote down such correction:
prop.test(x = 7, n = 10, p = 0.5, correct = T)
1-sample proportions test with continuity correction
data: 7 out of 10, null probability 0.5
X-squared = 0.9, df = 1, p-value = 0.3428
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.3536707 0.9190522
sample estimates:
p
0.7
One sample Chi-Square test
However, as you can see above, the proportion test calculates the chi-squared statistics, so it is actually calling a chi-squared test. And interestingly a proportion test without Yates continuity correction gives identical results to a Goodness-of-Fit One-sample Chi-Square test:
prop.test(x = 7, n = 10, p = 0.5, correct = F)
1-sample proportions test without continuity correction
data: 7 out of 10, null probability 0.5
X-squared = 1.6, df = 1, p-value = 0.2059
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.3967781 0.8922087
sample estimates:
p
0.7
chisq.test(c(7,3)) # total is 10 and p = 0.5 for both numbers by default
Chi-squared test for given probabilities
data: c(7, 3)
X-squared = 1.6, df = 1, p-value = 0.2059
(Simplest) Logistic regression
If you are not overwhelmed yet, you can also go further and conduct the simplest logistic regression possible (don’t need to understand it now! I’ll cover it in different videos). Below you’ll find an log-odds output of the logistic regression (0.847) which can be expressed in probability using the plogis function. You’ll see that the probability is exactly 0.7, as in the tests above and the p-value is similar. By the way, the p-value represents the probability of observing a result as extreme or more extreme than the one you got, assuming the null hypothesis is true.
m
@paulocastro5502 2 месяца назад
@@yuzaR-Data-Science Thank you for your attention and complete answer. This was very insightful. Also thanks for your video about regressions vs statistical tests, which I wondered about that for months or more (since I don't have a strong background on statistics, I'm a Biologyst who (tries to ^^) enjoy statistics).
So, the proportion test calculates the pvalue from the Chi-square value as in the Chi-square test unlike binomial test? At least I can see a clear advantage given that proportion test gives the confidence intervals and
I would expect that the Exact binomial test to be similar to the binomial glm but in fact it's similar to Chi-square pvalue. Is this due to the Yates continuity correction?, I found some sources saying is can be a bit conservative.
For some reason, I tried to do the Chi-square test and it gives me the same result with continuity correction (which is the default it seems, and doesn't specify which) and not corrected, equal to the proportion test without correction. When i used pvalue by Monte Carlo simulation it gives something closer to the Yate's continuity correction
Chi-squared test for given probabilities with simulated p-value (based on 2000 replicates)
data: c(7, 3)
X-squared = 1.6, df = NA, p-value = 0.3543
What is best? Using continuity corrections or alpha adjustments for multiple outcomes, or both?
@yuzaR-Data-Science 2 месяца назад ⁺¹
Monte Carlo simulation is the best I think and adjustment for multiple comparisons is always a must.
@user-wr4yl7tx3w 2 месяца назад
Thanks but why is it called ‘exact’?
@yuzaR-Data-Science 2 месяца назад ⁺²
amazing question! Thanks :) I was thinking to put it into a video actually, but deleted it as "boring" and "less useful" part of the script :) Here is what I was going to say:
Ironically, there is nothing exact about Exact Binomial test. It is called “exact” because it simply calculates the p-value directly from the probability, and not from any kind of statistics, like the Chi-Square. However, Chi-Square’s Goodness of Fit test is only approximation for a p-value, that is why the exact binomial test is recommended.
Hope it answers the question :)
@SUNILYADAV-tv5ze 2 месяца назад ⁺²
Sir please make a video lecture on Simulation study.
@yuzaR-Data-Science 2 месяца назад ⁺²
Thanks 🙏 Sunil, I’ll do. But it’ll take some time, because I first want to cover frequentists stats. Then come to simulation
@SUNILYADAV-tv5ze 2 месяца назад ⁺¹
@@yuzaR-Data-Science Sure Sir I will wait the video. Thank you so much Sir 🙏
@yuzaR-Data-Science 2 месяца назад
You are very welcome! :)
@RichmondDarko-qo2me 2 месяца назад
Very insightful. Thank you vey much. Please can I get your email. I will like to ask you some stuff I find confusing since you are the expert :)
@yuzaR-Data-Science 2 месяца назад
Hi Richmond, thanks a lot for your nice feedback! I do not share my email, but that's no problem, because you can ask anything here in the comments section of videos and I would do my best to answer as quick and as good as I can. The channel members get quicker and more insightful responses though, and the higher their level, the more time I can invest into answering questions, thus if you wish, join my channel: ruclips.net/channel/UCcGXGFClRdnrwXsPHi7P4ZAjoin
@RichmondDarko-qo2me 2 месяца назад
@@yuzaR-Data-Science Thank you very much for such informative videos. I spent several years in class and didn't understand all these concepts, but watching this video has made things easier for my comprehension.
I have a few questions I would like to ask:
When performing a statistical test, we use a parametric test if the data or variable in question is normally distributed, and a non-parametric alternative if the data or variable is not normally distributed.
My question is: when does the central limit theorem come into play here?
Also, a colleague of mine told me to always use parametric tests even if the data is not normally distributed. His explanation was that parametric tests are more powerful than non-parametric tests.
So, should I straightforwardly use the non-parametric alternative when I observe that my data is not normally distributed, or should I take the CLT into consideration and use the parametric test?
@yuzaR-Data-Science 2 месяца назад
I am not sure the CLT helps too much, but using parametric test for a highly skewed data is absolute nonsense. The power difference is minimum and is overrated. I also have colleagues who use non-parametric tests by default. Another extreme nonsensicality and laziness. Just for the sake of learning effect, please, take skewed data and calculate mean and median to see how much difference you'll get. And if your colleague would really care about power, he/she would use multivariable models, not univariable tests. And this is what I would recommend to you - the test are fine, in the beginning - but try to learn models and their assumptions when you want to go to the next level. Cheers and thanks again for a nice feedback!
@RichmondDarko-qo2me 2 месяца назад
@@yuzaR-Data-Science I'm really grateful for finding time in your busy schedule to reply me. So please when should I use the CLT or I shouldn't use it at all. Thank you
@yuzaR-Data-Science 2 месяца назад
but what exactly do you mean by CLT? bayesian methods?

Следующие

Автовоспроизведение

Logistic Regression Basics Explained: Probabilities, Odds, Odds-Ratios and Log-Odds-Ratios (4K)