Understanding the glm family argument (in R)

Поделиться
HTML-код
  • Опубликовано: 22 окт 2024

Комментарии • 37

  • @Insipidityy
    @Insipidityy Год назад +6

    THIS video really made GLMs "click" for me. I spent hours trying to figure out what do link functions and families mean, and I have found no better explanation. Thank you so much!

  • @joshuavernontanner1326
    @joshuavernontanner1326 3 года назад +15

    I'm floored by how clear this was. Incredible teaching ability, thank you!

  • @Roy-xr2wq
    @Roy-xr2wq 6 месяцев назад +1

    Best Explanation, the visuals bring the whole idea into life. Thanks

  • @ibntuahirabdulhaqq5866
    @ibntuahirabdulhaqq5866 Год назад +1

    this is not the first I am watching this video and any time I do I wish u made them a series of videos. Thanks for giving the best explanation and taking the time to respond to queries in the comment section. Much love

    • @kasperwelbers
      @kasperwelbers  Год назад +1

      Thanks! That's really nice to hear. I do intend to get back to creating some videos soon. Still figuring out how to do this more consistently besides work, but I really love how these platforms bring together people that are intrinsically motivated to teach and learn, and I greatly appreciate hearing that you enjoy my current contributions.

  • @os2171
    @os2171 2 года назад +1

    Awesome work! thanks! Cheers from a PhD Neurobiologist candidate from Colombia.

  • @laure189
    @laure189 3 года назад +2

    Absolutely fantastic video! Thank you SO much! And I truly appreciated the little recap on error distribution as well, extremely helpful!

  • @davidgao9046
    @davidgao9046 7 месяцев назад

    very clear layout and superb explanation for the intuition. Thanks!

  • @gergerger53
    @gergerger53 Месяц назад

    Very well put together. I think there should be some recognition of the fact some of the symbols are mixed up in the presentation. The systematic component should always be mu and mu goes into the link function to give eta and eta is the value that goes into the random component distribution. Otherwise the slides don't make sense. To take a random example, the probit regression slide, mu is not defined anywhere. But changing systematic component to mu and then changing binomial parameter to eta then fixes everything.

    • @kasperwelbers
      @kasperwelbers  Месяц назад

      Hi Murphyalex. Thanks for your comment! The notation used here is based on the book in the description. I was also initially confused about using eta as the systematic component, and then defining mu from inside the link function rather than the output of the link function, but thats how the link function is defined, and when you read their runthrough of the generalization it makes sense (just looked it up again; page 42, highly recommended). Note that mu is still defined, but as the inverse of the link function over eta. So for example, for poisson the mean function for the poisson distribution is defined as mu = exp(eta), which is identical to eta = log(mu).
      Or am I missing something else that you're referring to?

  • @yahiarafik9965
    @yahiarafik9965 Год назад

    Very clear, thank you so much for this explanation, you just helped a lot of people in my major :))

  • @m9017t
    @m9017t Год назад

    Very well explained, thank you!

  • @anandimerchant5180
    @anandimerchant5180 Год назад

    Thank you! Well explained...

  • @JinaneJouni
    @JinaneJouni 2 года назад

    Really well explained! Thank you very much

  • @md.masumbillah8222
    @md.masumbillah8222 2 года назад

    many thanks! for uploading

  • @dulquerpauly321
    @dulquerpauly321 3 года назад +1

    You absolute legend thank you mate.... Subscribed already :-)

  • @PP-im6lu
    @PP-im6lu 2 года назад

    Wow, what a great explanation!

  • @darmaw22
    @darmaw22 3 года назад

    I'd really appreciate it if you could let me know how you managed to have a list of arguments shown instead of lines. Many thanks!

    • @kasperwelbers
      @kasperwelbers  3 года назад

      Do you mean the drop-down list, as seen at 0.25? On Ubuntu you get that by pressing tab when your cursor is between the parentheses of a function.

    • @darmaw22
      @darmaw22 3 года назад

      @@kasperwelbers Many thanks! That is exactly what I meant.

  • @gurns681
    @gurns681 2 года назад

    Great video. Thank you

  • @yiyuanzhang6335
    @yiyuanzhang6335 3 года назад

    thank you very much! however, it is still unclear for me when should i use which family and which link function. should i check how error terms distributed and then decide which family and link function to use?

    • @kasperwelbers
      @kasperwelbers  3 года назад +3

      You're welcome :). Regarding the choice for which distribution family, a good place to start is to look at and think about your response/dependent variable (though also see my answer to Jessica Hough about the conditional distribution of the response).
      You're right that you'll eventually want to look at the errors to get an idea of whether your model makes sense. But note that you can only check the errors after choosing a model. So you don't check the errors of an ordinary regression to determine the 'right' family/link (which I thought you were implying). There's a chapter (12) on this model checking loop in the book I mention (link in description).
      So where to start this loop? You should first think what type of distribution makes sense for your response. What are all the possible values in your response, and what type of distribution could produce this type of data? For instance, if your response is binary, then a binomial distribution makes a lot of sense. If your response is a count (0, 1, 2, etc.) with a low average (e.g., nr of comments to youtube videos), then a Poisson distribution might be good.
      As such, it really pays of to learn a bit about common distributions. As a starting point, I would recommend learning about the Binomial and Poisson distribution. At least in my field, these are heavily used (logistic regression and poisson regression).
      Finally, for the link function I would recommend initially sticking to the canonical link functions. These are also the default functions that R uses for each family. Aside from the logistic versus probit link for binomial regression I've rarely encountered studies using other link functions.

  • @syedahmedali8118
    @syedahmedali8118 3 года назад

    great lecture

  • @jessicahough569
    @jessicahough569 3 года назад

    Hey! I am a master student looking to make a GLM incl. random error. I find that both my dependent and independant variables (all continuous) do not follow any of the distributions. I also don't know if I should then try and transform them before putting it into this model? Also, what part of my data needs to follow these distributions? Independant or dependant variables? What if after I transform them, some follow poisson (eventhough they are not discrete integers) and others follow a normal distribution after I add log? Shall I admit defeat?

    • @jessicahough569
      @jessicahough569 3 года назад

      Also- really helpful video! Thank you!

    • @kasperwelbers
      @kasperwelbers  3 года назад +4

      Hi Jessica,
      That's a pretty big question, and I'm afraid there's no simple answer. The general guideline is that you choose the distribution family based on the distribution of the response (i.e. dependent) variable, but more accurately it should be the family for the 'conditional distribution of the response'. Notice (around 11:30) that the distribution in the random component takes the expected value from the systematic component.
      For example, say your dependent variable is the weight of fruit, but your data contains both apples and melons. Lets assume that for both apples and melons the weight is normally distributed, but apples and melons do have different means. So if we throw them together, our distribution might suddenly be bimodal (we'll have two 'humps'). But to model this we could still use a regular normal distribution, as long as we include the type of fruit as an independent variable. 'Conditional' on whether the fruit is an apple or melon, the distribution is normal.
      So if your dependent variable has a very weird distribution, something like this might be at play. Try also looking at graphs of your dependent and independent variable(s).
      Also, try to first think what distribution would make sense for your dependent variable. Is it a count variable? A proportion/percentage? Is it time between events? The number of events within a given time? This often largely determines what sort of distribution would make sense. Try putting to words what type of measurement you have, and googling for modeling strategies. There are some tweaks to GLMs that could work. For instance, if you have non-integer 'rates' rather than counts, you might use poisson with an 'offset'.
      So don't admit defeat to early! :)

    • @jessicahough569
      @jessicahough569 3 года назад

      @@kasperwelbers Thank you so much! I actually ended up with lmer, but have hit a new hic-up as my dependant variables are not so dependant afterall (they effect eachother). So now I am looking at glmtbbr, but have no idea where to start! *Sigh*

  • @sotinupuerto
    @sotinupuerto 3 года назад

    Great video!!! So helpful :)

  • @MrSazid1
    @MrSazid1 Год назад

    Dude. Omg. Ty

  • @mimzu89
    @mimzu89 3 года назад

    thank you!

  • @shahfahadalishah1152
    @shahfahadalishah1152 3 года назад

    very interesting

  • @paphiopedilum1202
    @paphiopedilum1202 5 месяцев назад

    thank you french accent man

  • @brazilfootball
    @brazilfootball 3 года назад +1

    What the hell is a link function?

    • @kasperwelbers
      @kasperwelbers  3 года назад +3

      I get that sentiment :'), so I kind of hoped this video would have been a good answer to that question. The key to understanding the link function is to first get a good grasp of the systematic and random component of the GLM. The link function is really just a simple transformation (like log) to link these components. If you're comfortable with the regression formula, the punchline is around 8:30 to 11:30

  • @djangoworldwide7925
    @djangoworldwide7925 Год назад

    Nice video but at the end to state "identity" without explaining about the fact it's the I matrix, is a bit lacking

    • @kasperwelbers
      @kasperwelbers  Год назад

      Thanks! About the identity function, I think its uncommon to use matrix notation for link functions because many are non linear. So I prefer to also just think of the identity link more generally as an identity function. But maybe I'm missing something?