GLM in R

Kasper Welbers

Просмотров 58 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 24 ноя 2024

Комментарии • 41

@randomdude4411 6 месяцев назад ⁺¹
This is a brilliant tutorial on GLM in R with a very good breakdown of all the information in step by step fashion that is understandable for a beginner
@ammarparmr 3 года назад ⁺²
Very well explained !!! However, using the coefficients in the summary in my opinion is by far mush easier to understand than the way with tab model
@kasperwelbers 2 года назад ⁺²
Hi Ammar, sorry I missed this comment, but I would like to break a lance for odds ratios ;). Benefit of the log odds ratios is, I think, only that the sign corresponds to the effect direction. But the values are very hard to interpret. With odds ratios you can say things like "for a unit increase in x, the odds of y increase by a factor 2 (aka twice the odds)". Is there a benefit of using the log odds ratios that I'm overlooking?
@djyi2174 2 года назад
Thank you so much for the tutorial.
@philip_che 3 года назад
Thank you for these videos!
@michellelaurendina 8 месяцев назад
THANK. YOU.
@audreyq.nkamngangk.7062 Год назад
Thank you for the tutorial. Is it possible to create a glm model with a variable to explain which has 3 modalities
@kasperwelbers Год назад ⁺¹
If I understand you correctly, I think it's indeed possible to model a dependent variable with a tri-modal distribution with glm. Actually, you might not even need glm for that. Whether a distribution is multimodal is a separate matter of the distribution family. A tri-modal distribution might be a mixture of three normal distributions, three binomial distributions, etc. Take the following simulation as an example. Here we create a y variable that is affected by a continuous variable x, and a factor with three groups. Since there is a strong effect of the group on y, this results in y being tri-modal.
## simulate 3-modal data
n = 1000
x = rnorm(n)
group = sample(1:3, n, replace=T)
group_means = c(5,10,15)
y = group_means[group] + x*0.4 + rnorm(n)
hist(y, breaks=50)
m1 = lm(y ~ x)
m2 = lm(y ~ as.factor(group) + x)
summary(m1) ## bad estimate of x (should be around 0.4)
plot(m1, 2) ## error is non-normal
summary(m2) ## good estimate after controlling for group
plot(m2, 2) ## error is normal after including group
@kariiamba7324 3 года назад
Thankyou for this helpful video
@rubyanneolbinado95 7 месяцев назад
Hi, why is R studio producing different results even though I am using the same call and data.
@kasperwelbers 7 месяцев назад
Hi! Do you mean vastly different results, or very small differences? I do think some of the multilevel stuff could in potential differ due to random processes in converging the model, but if so it should be really minor.
@hm.91 3 года назад
Thank you!
@Gravelbiken Год назад
Hi Kasper, what/how much does the intercept tells us in this case?
@kasperwelbers Год назад ⁺¹
Good Question! It's similar to ordinary regression, in that it just means: the expected value of y if x (or all x-es in a multiple regression) is zero. This is mainly interpretable if there is a clear interpretation of what x=0 means. For instance, say your model is: having_fun = intercept + b*beers_drank. In that case, the intercept is the expected fun you have if you haven't had any beers.
Now saw we have a binomial model. Our dependent variable is binary, namely whether or not a person had a hangover the day after a party. This time, the effect is more like (but not exactly, i'm ignoring the link function): hangover = intercept * b^beers_drank. Notice that ^ in b^beers_drank. Thats the multiplicative part: we expect that the odds of having a hangover increase by a 'factor of b' for every unit increase in beers. But whats most relevant for us now is that an exponent of zero is always 1! So b^0 (zero beers) is 1. So here as well, it means that when x is zero, the intercept is just our expected value.
If we've transformed our coefficints to odds ratios, then if we haven't had any beers, the intercept would represent the odds that someone had a hangover. So if the intercept is 2, it would mean that the odds that someone who didn't have any beers has a hangover is 2-to-1, so a probability of 0.66 (odds of 2-to-1 means 2 people out of 3). That sounds weird, but they probably had whisky instead.
I don't know how much that helped. The key takeaway is that like with ordinary regression, it's mainly interpretable if you have a clear idea of what x=0 means.
@954giggles 2 года назад
Do you need to install any packages to run the glm code?
@kasperwelbers 2 года назад ⁺²
The glm function is in the stats package, which comes shipped with the basic R installation. So you dont necessarily need other packages. But in the tutorial I do use some packages for convenience, such as the sjplot package for making a regression table. If you run this without sjplot the results are the same, but you'll need to do some calculations yourself. For instance, logistic regression gives log odds ratio coefficients, so you'd need to take the exponent (exp function) to get the odds ratios. Tldr; you dont need to install packages, but it does make life easier
@朝に弱い人 3 года назад
Hi Kasper, thank you for wonderful video. I have a question, which is about R2 and R2 adjusted of GLM models on R. How we can get R2 and R2 adjusted on R console? On my console, I can not find these values when I run a code “summary()”. Any specific code to get them on console?
@kasperwelbers 3 года назад ⁺¹
Hi, great question! The thing is, there actually isn't a R2 or R2 adjusted for GLM. Instead, to evaluate model fit, it is more common to compare models (in the second link in the description, see logistic regression -> interpreting model fit and pseudo R2). There ARE, however, also some 'pseudo R2' measures, such as the R2 Tjur seen in the video. These measures try to imitate the property of R2 as a measure of explained variance. You'll never get these scores in the basic glm output though, because there are many possible pseudo R2 measures. But there are packages that implement them. For instance, the 'performance' package has an r2() function which calculates a (pseudo) r2 for different types of models.
I'd also recommend reading about the model comparison approach though (if you don't know about it already), because journals often like to see this rather than or in addition to some pseudo R2.
@朝に弱い人 3 года назад
@@kasperwelbers Thank you so much for quick reply! It was really helpful and easy to understand:)
One mor question! I will be conducting GLM in my master’s thesis. Which one would you recommend?
1. Report AIC value (and I would write like “this model had the smallest AIC value)
2. Try calculating pseudo R2 measures and report them
@kasperwelbers 3 года назад
@@朝に弱い人 I'd actually recommend reporting Deviance AND some pseudo R2. The pseudo R2 is nice to help along interpretation, but deviance is more appropriate, and also provides a nice test to see if adding variables to a model provides a significant increase in fit. Say you have models of increasing complexity (i.e. adding variables): m0, m1 and m2. For glm's, you can then use: anova(m0, m1, m2, test = "Chisq"). In the ouput, the deviance column for the m1 row tells you how much deviance decreased compared to m0, and the pr(chi) column tells you whether this increase was significant (and same for m2 compared to m1). Alternatively, you could use sjPlot's tab_model and just add the AIC and/or deviance directly to the table: tab_model(m0, m1, m2, show.aic = T, show.dev = T).
@朝に弱い人 3 года назад
@@kasperwelbers Thank you so much, Kasper! I will try calculating deviance and pseudo R2 using the code you suggested :) Can I ask another question via email or something? I’m sorry to be a pain, but I think you can answer another big question I have🙇‍♂️
@kasperwelbers 3 года назад
@@朝に弱い人 No problem! I do however prefer to keep questions based on these videos confined to youtube (and not too big). Especially at the moment with the whole corona teaching situation I'm swamped with emails, and I do need to prioritize my direct students. For bigger questions, I also do think it's best to find someone at your uni (ideally supervisor or someone in same department). Not only because they supposedly can invest more time, but also because in more specific problems there tend to be differences across disciplines / traditions in how to do statistics.
@draprincesa01 2 года назад
how can i vizualized if some variables are factors like yes or no
@kasperwelbers 2 года назад
I think sjPlot handles those pretty nicely! There's some great explanations on the website, under the regression plots tab: strengejacke.github.io/sjPlot/
@JT-ph3hk Год назад
use the function str(yourbasename). If the variable is not yet a factor you can transform it using the following yourbasename$nameof the factor
@KatrineBasil-c5n 2 месяца назад
Tanner Rest
@DavidKoleckar 10 месяцев назад
nice audio bro. you record in bathroom?
@kasperwelbers 10 месяцев назад
Ahaha, not sure whether that's a question or a burn 😅. This is just a Blue Yeti mic in the home office I set up during the COVID lock downs. The room itself has pretty nice acoustic treatment, but I was still figuring out in a rush how to make recordings for lectures/workshops and it was hard to get clear audio without keystrokes hitting through.
@AndersonDouglas-v5c Месяц назад
Weissnat Shores
@MatthewWilson-d8e 2 месяца назад
Connelly Mountain
@Mojiborkhan-i1s 2 месяца назад
Thomas Paul Wilson Eric Hernandez Melissa
@WalterEunice-e1s Месяц назад
Doyle Plaza
@gotnolove923 Месяц назад
Tabmodel doesnt work😮
@Whycantijustdeletethis Месяц назад
Surely we can make it work. What error do you get?
@kasperwelbers Месяц назад
@@gotnolove923 ah haha, that was me on another account that I was trying to delete.
@DiamondScheiber-j9w 2 месяца назад
Kailey Islands
@HarlanEngdahl-e3l Месяц назад
Hilll Streets
@StracheyAnnabelle-w8c 2 месяца назад
Garcia Paul Wilson William Young Karen

Следующие

Автовоспроизведение

Understanding the glm family argument (in R)