The sign of a great presentation is to make explicit what for an expert is obvious! This is important because the expert usually forgets that what is obvious to them is far from obvious to the novice. Thanks for making the concept of parametric distribution to everyone!... Parametric distributions are taught in every basic statistics class, they are so obvious to professors that sometimes they forget to stop a bit and just explain why they are important and what we are doing when we choose a parametric distribution... In another note, one common mistake is to pick the wrong parametric distribution to model a population... This is one of the main complains about the use of the normal distribution (the classical example is the use of normal distribution in finances which produced misleading estimates of risk during the sub mortgage financial crisis)... But to complement your example on the binomial distribution, it could also happen that each of the trials are not independent
Great video! A lot of shade thrown at the Poisson distribution 😅 count data arise all the time in health research and beyond, and the Poisson is (for better or worse) the go-to standard model
ERRATA (aka my brain on editing) 7:28: Poisson PMF contains the letter "k", but these should all be "x". I let my disdain for the Poisson slip. EDIT: I have no real beef with the Poisson, don’t worry Poisson stans lol
Oh I can't beilieve what I'm hearing! Young man, your disdain is *unfounded*. This distribution is *very* useful. Especially since it isnt limited to modelling events that occur in intervals of time, but it can also model events in intervals of space (any kind of space, not just physical space). I hereby officiall call for an end to the hate for the Poisson distribution, effective immediately!
First time seeing the DPQR acronym, very clearly summarized! Perhaps pnorm and qnorm are inverses of each as p and q look to be opposite of each other so easier to remember?
@@very-normal In computational biology there is all kinds of count data, but I am most familiar with (single-cell) RNA-sequencing where you basically count mRNA molecules in a sample (or in single cells). So if you want to model those data in a statistically robust way you have to use the Poisson / Gamma-Poisson or the Zero-inflated versions (ZIP, ZINB). Otherwise we commonly log normalize data to "account" for the heteroskedasticity...
The chance of a continuous distribution yielding any specific value versus the infinite other possibilities is so unlikely as to be considered 0. Thus, we instead calculate the probability of getting something within a range of values. Meaning that the values that dnorm returns are Not the real probabilities but rather just a y-value from the density function. It's pnorm that calculates the probability of a range, but the first endpoint would need another input, so for convenience we assume the first endpoint is the distribution's minimum. That endpoint makes it identical to the cumulative function. If Rstudio lets you change the first endpoint, then pnorm wouldn't be the cumulative function
This is amazing. I want to teach stats one day and I'm definitely gonna steal some ideas from this video. Hope you don't mind! With a proper shout out of course :)
@@very-normal Great! btw in case you're looking for video ideas, I'd love to hear some thoughts on parametric vs non-parametric hypothesis tests (esp coming from someone in biostats since you guys tend to have such small sample sizes in experimental trials etc). I'm often surprised to see how often I see t-tests and the like when CLT seems absurd for that sample size and the distribution is almost certainly going to be not normal!
@@very-normalI liked the video about bootstrap although I don't understand it enough in practicality, I didn't know about it nor I knew about the other resources mentioned
So, is it correct to assume that the utilization of the parametric family facilitates the estimation process because we only need to estimate the parameters that shape the function instead of trying to estimate the probability distribution itself because in that case, we would need to estimate a lot of values ?
It is still relevant, good sir! For real though, I still find R to be more accurate than most of the commonly used Python libraries for many numerical approximations of common functions. There have been times in my work where the difference in errors have been as big as 10e6 between Python and R, which in many applications can be catastrophic.
This is my favorite non-entry-level statistics channel. Well done!
Agree
Non-entry-level 😂. So true. Great content but not many views yet
The sign of a great presentation is to make explicit what for an expert is obvious! This is important because the expert usually forgets that what is obvious to them is far from obvious to the novice. Thanks for making the concept of parametric distribution to everyone!... Parametric distributions are taught in every basic statistics class, they are so obvious to professors that sometimes they forget to stop a bit and just explain why they are important and what we are doing when we choose a parametric distribution...
In another note, one common mistake is to pick the wrong parametric distribution to model a population... This is one of the main complains about the use of the normal distribution (the classical example is the use of normal distribution in finances which produced misleading estimates of risk during the sub mortgage financial crisis)... But to complement your example on the binomial distribution, it could also happen that each of the trials are not independent
Great video! A lot of shade thrown at the Poisson distribution 😅 count data arise all the time in health research and beyond, and the Poisson is (for better or worse) the go-to standard model
I rarely comment on videos, but your channel definitely warrants one. Great work!
ERRATA (aka my brain on editing)
7:28: Poisson PMF contains the letter "k", but these should all be "x". I let my disdain for the Poisson slip. EDIT: I have no real beef with the Poisson, don’t worry Poisson stans lol
Oh I can't beilieve what I'm hearing! Young man, your disdain is *unfounded*.
This distribution is *very* useful. Especially since it isnt limited to modelling events that occur in intervals of time, but it can also model events in intervals of space (any kind of space, not just physical space).
I hereby officiall call for an end to the hate for the Poisson distribution, effective immediately!
Thank you for great presentation. I learnt a lot from your presentation sir.
First time seeing the DPQR acronym, very clearly summarized! Perhaps pnorm and qnorm are inverses of each as p and q look to be opposite of each other so easier to remember?
In computational biology the Poisson distribution and the Gamma-Poisson (negative binomial) distribution are used quite often :)
Oh that’s cool! What kinds of stuff we usually approached with these models?
@@very-normal In computational biology there is all kinds of count data, but I am most familiar with (single-cell) RNA-sequencing where you basically count mRNA molecules in a sample (or in single cells). So if you want to model those data in a statistically robust way you have to use the Poisson / Gamma-Poisson or the Zero-inflated versions (ZIP, ZINB). Otherwise we commonly log normalize data to "account" for the heteroskedasticity...
The chance of a continuous distribution yielding any specific value versus the infinite other possibilities is so unlikely as to be considered 0. Thus, we instead calculate the probability of getting something within a range of values. Meaning that the values that dnorm returns are Not the real probabilities but rather just a y-value from the density function. It's pnorm that calculates the probability of a range, but the first endpoint would need another input, so for convenience we assume the first endpoint is the distribution's minimum. That endpoint makes it identical to the cumulative function. If Rstudio lets you change the first endpoint, then pnorm wouldn't be the cumulative function
This is such a great explanation, I wonder how it went unnoticed! Thank you!
Love your humor!
This is amazing. I want to teach stats one day and I'm definitely gonna steal some ideas from this video. Hope you don't mind! With a proper shout out of course :)
Great video
Great video! Just pointing out a small typo, @7:28 you've got k instead of x for the Poisson pmf where the function states it's f(x) not f(k)
Thank you for catching that! I've added a pinned errata post
@@very-normal Great! btw in case you're looking for video ideas, I'd love to hear some thoughts on parametric vs non-parametric hypothesis tests (esp coming from someone in biostats since you guys tend to have such small sample sizes in experimental trials etc). I'm often surprised to see how often I see t-tests and the like when CLT seems absurd for that sample size and the distribution is almost certainly going to be not normal!
That’s a great idea! I think that slots nicely with other material I have planned, thank you!
@@very-normal looking forward to it!
Thank you very much!
Very nice video. Can you please make videos of distributions at 9:14 (not gaussian)in future.
I am mainly waiting for the mode advanced stuff to be covered, like those other distributions mentioned
I’ll be real with you, it’s going to be a while for this format lol, but I’ll try to cover more advanced stuff in other videos
@@very-normalI liked the video about bootstrap although I don't understand it enough in practicality, I didn't know about it nor I knew about the other resources mentioned
So, is it correct to assume that the utilization of the parametric family facilitates the estimation process because we only need to estimate the parameters that shape the function instead of trying to estimate the probability distribution itself because in that case, we would need to estimate a lot of values ?
Yes! I think you’ve phrased it well
very good
Good video, but R feels less relevant every year
It is still relevant, good sir!
For real though, I still find R to be more accurate than most of the commonly used Python libraries for many numerical approximations of common functions. There have been times in my work where the difference in errors have been as big as 10e6 between Python and R, which in many applications can be catastrophic.