Explaining Parametric Families

Поделиться
HTML-код
  • Опубликовано: 29 июн 2024
  • An explanation of why parametric families are so commonly used in statistics
    Stay updated with the channel and some stuff I make!
    👉 verynormal.substack.com
    👉 very-normal.sellfy.store

Комментарии • 32

  • @TriglycerideBeware
    @TriglycerideBeware 7 месяцев назад +20

    This is my favorite non-entry-level statistics channel. Well done!

  • @varbias
    @varbias 7 месяцев назад +7

    Great video! A lot of shade thrown at the Poisson distribution 😅 count data arise all the time in health research and beyond, and the Poisson is (for better or worse) the go-to standard model

  • @very-normal
    @very-normal  7 месяцев назад +5

    ERRATA (aka my brain on editing)
    7:28: Poisson PMF contains the letter "k", but these should all be "x". I let my disdain for the Poisson slip. EDIT: I have no real beef with the Poisson, don’t worry Poisson stans lol

    • @AnthonyBerlin
      @AnthonyBerlin 7 месяцев назад +1

      Oh I can't beilieve what I'm hearing! Young man, your disdain is *unfounded*.
      This distribution is *very* useful. Especially since it isnt limited to modelling events that occur in intervals of time, but it can also model events in intervals of space (any kind of space, not just physical space).
      I hereby officiall call for an end to the hate for the Poisson distribution, effective immediately!

  • @academyofuselessideas
    @academyofuselessideas 7 месяцев назад +4

    The sign of a great presentation is to make explicit what for an expert is obvious! This is important because the expert usually forgets that what is obvious to them is far from obvious to the novice. Thanks for making the concept of parametric distribution to everyone!... Parametric distributions are taught in every basic statistics class, they are so obvious to professors that sometimes they forget to stop a bit and just explain why they are important and what we are doing when we choose a parametric distribution...
    In another note, one common mistake is to pick the wrong parametric distribution to model a population... This is one of the main complains about the use of the normal distribution (the classical example is the use of normal distribution in finances which produced misleading estimates of risk during the sub mortgage financial crisis)... But to complement your example on the binomial distribution, it could also happen that each of the trials are not independent

  • @frankmazza5359
    @frankmazza5359 7 месяцев назад +2

    I rarely comment on videos, but your channel definitely warrants one. Great work!

  • @barutaji
    @barutaji 7 месяцев назад +3

    I think it is P and not C because the main concept is the probability, and not the cumulative. It just so happen that working with cumulative distributions is way easier than with the pdf themselves, specially when mixing discrete and continuous distributions.

  • @kprabhakar975
    @kprabhakar975 7 месяцев назад +1

    Thank you for great presentation. I learnt a lot from your presentation sir.

  • @plaza3825
    @plaza3825 2 месяца назад

    The chance of a continuous distribution yielding any specific value versus the infinite other possibilities is so unlikely as to be considered 0. Thus, we instead calculate the probability of getting something within a range of values. Meaning that the values that dnorm returns are Not the real probabilities but rather just a y-value from the density function. It's pnorm that calculates the probability of a range, but the first endpoint would need another input, so for convenience we assume the first endpoint is the distribution's minimum. That endpoint makes it identical to the cumulative function. If Rstudio lets you change the first endpoint, then pnorm wouldn't be the cumulative function

  • @Michallote
    @Michallote 7 месяцев назад +1

    Love your humor!

  • @pauls.6686
    @pauls.6686 Месяц назад

    Thank you very much!

  • @daltakid
    @daltakid 7 месяцев назад +1

    First time seeing the DPQR acronym, very clearly summarized! Perhaps pnorm and qnorm are inverses of each as p and q look to be opposite of each other so easier to remember?

  • @psl_schaefer
    @psl_schaefer 4 месяца назад

    In computational biology the Poisson distribution and the Gamma-Poisson (negative binomial) distribution are used quite often :)

    • @very-normal
      @very-normal  4 месяца назад

      Oh that’s cool! What kinds of stuff we usually approached with these models?

    • @psl_schaefer
      @psl_schaefer 4 месяца назад

      @@very-normal In computational biology there is all kinds of count data, but I am most familiar with (single-cell) RNA-sequencing where you basically count mRNA molecules in a sample (or in single cells). So if you want to model those data in a statistically robust way you have to use the Poisson / Gamma-Poisson or the Zero-inflated versions (ZIP, ZINB). Otherwise we commonly log normalize data to "account" for the heteroskedasticity...

  • @xavierlarochelle2742
    @xavierlarochelle2742 7 месяцев назад

    This is amazing. I want to teach stats one day and I'm definitely gonna steal some ideas from this video. Hope you don't mind! With a proper shout out of course :)

  • @kezza7773
    @kezza7773 7 месяцев назад

    Great video

  • @user-kg5ii5lq1v
    @user-kg5ii5lq1v 7 месяцев назад

    very good

  • @karangarg4631
    @karangarg4631 7 месяцев назад +1

    Great video! Just pointing out a small typo, @7:28 you've got k instead of x for the Poisson pmf where the function states it's f(x) not f(k)

    • @very-normal
      @very-normal  7 месяцев назад +1

      Thank you for catching that! I've added a pinned errata post

    • @karangarg4631
      @karangarg4631 7 месяцев назад

      @@very-normal Great! btw in case you're looking for video ideas, I'd love to hear some thoughts on parametric vs non-parametric hypothesis tests (esp coming from someone in biostats since you guys tend to have such small sample sizes in experimental trials etc). I'm often surprised to see how often I see t-tests and the like when CLT seems absurd for that sample size and the distribution is almost certainly going to be not normal!

    • @very-normal
      @very-normal  7 месяцев назад +2

      That’s a great idea! I think that slots nicely with other material I have planned, thank you!

    • @karangarg4631
      @karangarg4631 7 месяцев назад

      @@very-normal looking forward to it!

  • @braineaterzombie3981
    @braineaterzombie3981 7 месяцев назад

    Very nice video. Can you please make videos of distributions at 9:14 (not gaussian)in future.

  • @yorailevi6747
    @yorailevi6747 7 месяцев назад

    I am mainly waiting for the mode advanced stuff to be covered, like those other distributions mentioned

    • @very-normal
      @very-normal  7 месяцев назад +1

      I’ll be real with you, it’s going to be a while for this format lol, but I’ll try to cover more advanced stuff in other videos

    • @yorailevi6747
      @yorailevi6747 7 месяцев назад

      @@very-normalI liked the video about bootstrap although I don't understand it enough in practicality, I didn't know about it nor I knew about the other resources mentioned

  • @OhInMyHouse
    @OhInMyHouse 7 месяцев назад

    So, is it correct to assume that the utilization of the parametric family facilitates the estimation process because we only need to estimate the parameters that shape the function instead of trying to estimate the probability distribution itself because in that case, we would need to estimate a lot of values ?

    • @very-normal
      @very-normal  7 месяцев назад +1

      Yes! I think you’ve phrased it well

  • @jameyhall5255
    @jameyhall5255 7 месяцев назад

    Good video, but R feels less relevant every year

    • @AnthonyBerlin
      @AnthonyBerlin 7 месяцев назад +2

      It is still relevant, good sir!
      For real though, I still find R to be more accurate than most of the commonly used Python libraries for many numerical approximations of common functions. There have been times in my work where the difference in errors have been as big as 10e6 between Python and R, which in many applications can be catastrophic.