Great explanation Brian! I have a small question, though. If the response variable has to be normal (in a normal linear regression), why do you think most statistics articles insist that only the residuals have to be normal and not the variable? What tests do you think should be done before a GLM, besides residual plots?
Saying the response is normal and the residuals are normal means the same thing basically. The response is normal (around the mean for that X value), which just means the response’s distance from the mean (residual) is normal with mean 0. If we want to evaluate normality of residuals, it’s then easier to look at a graph of residual since they all have the same mean so we can easily visualize if they seem normally distributed.
Great work, quick question! Why is it ok to use a normal distribution for response variables like weight if weight can't be negative, or zero? I see it a lot, but don't understand why it's so common.
There's pretty much nothing that *really* follows a normal distribution - it's all approximations. Take height for example - and suppose the height follows an approximately normal distribution with mean = 64 inches and sd = 4 inches. Even though a normal distribution has some probability of being less than 0 (which is impossible), because that is 16 standard deviations away from the mean, the probability is basically 0 anyways (less than 1 in a billion billion billion billion billion billion). So yes, you're totally right that it's impossible, but assuming it's normal makes things easy and the probability calculations are often pretty accurate!
In your final slide, you say that the link function maps from the original scale to "the parameter of the relevant probability distribution". You also say the parameter is personalised.... Is your final slide saying that in general, the link function maps to the parameter of the data's distribution? e.g. "p" in Bernoulli, "sigma" in Rayleigh? Apologies if i haven't understood this correctly.
Yes, the link function is just transforming a real number with no restrictions (negative infinity to infinity) to something with the correct possibilities for the parameter of interest. In logistic regression, if we were predicting the probability of having diabetes based on weight, you and me would each get a personalized parameter p based on our weight. The heavier person might have p = 0.7, reflecting the fact that their weight makes it more likely that they may have diabetes. The lighter person might have p=0.3. But they will both be between 0 and 1 no matter eat because the link function transformed the scale to ensure that it’s between 0 and 1, which regular linear regression did not do.
To me, it just feels like the expit function (not the link) should really be what we call the "link" function, as it is actually what "links" us to the personal parameter. What am I missing...? For interpreting coefficients, it seems like the link function is really important, but other than that ... it seems like the inverse link is really what we care about.
The link is invertible, so it goes both ways. It’s a little arbitrary and maybe they made an awkward choice of which one to call the link and which one is the inverse link. Think of a railway that connects two cities. You wouldn’t say the eastbound path is more or less the link between the cities than the westbound path. But in math, we can’t just say “the railway” and we have to write down one of the two functions and give it a name. I don’t think you’re missing anything.
But you missed the best part, how we can engineer any combination we want to fit our data. We can model different types of trends, heteroscedasticity and of course, sample from either pdf or pmf. They are incredibly flexible. By the way, ultimately what's the scope of this channel? Can we eventually expect videos on things like measure theoretic probability, stochastic processes and the like?
There might be one video on measure theory sometime, but no, I plan to stick more on the statistics and data science end. Any more probability videos would probably be similar to the Markov/Chebyshev's inequality videos.
Thank you, I go to a top 20 school globally and yet my professor still can't explain as good as you. I wish you success in life my friend.
Man this is pure gold. No BS, just esence. Subscribed!
This is a great explanation. I love the visuals showing how they are all related. Thank you.
Great explanation Brian! I have a small question, though. If the response variable has to be normal (in a normal linear regression), why do you think most statistics articles insist that only the residuals have to be normal and not the variable? What tests do you think should be done before a GLM, besides residual plots?
Saying the response is normal and the residuals are normal means the same thing basically. The response is normal (around the mean for that X value), which just means the response’s distance from the mean (residual) is normal with mean 0. If we want to evaluate normality of residuals, it’s then easier to look at a graph of residual since they all have the same mean so we can easily visualize if they seem normally distributed.
@@statswithbrian Thank you.
Very helpful! Appreciated
Can u create a whole playlist for the GLM's. Please do consider doing this
Great work, quick question! Why is it ok to use a normal distribution for response variables like weight if weight can't be negative, or zero? I see it a lot, but don't understand why it's so common.
There's pretty much nothing that *really* follows a normal distribution - it's all approximations. Take height for example - and suppose the height follows an approximately normal distribution with mean = 64 inches and sd = 4 inches. Even though a normal distribution has some probability of being less than 0 (which is impossible), because that is 16 standard deviations away from the mean, the probability is basically 0 anyways (less than 1 in a billion billion billion billion billion billion). So yes, you're totally right that it's impossible, but assuming it's normal makes things easy and the probability calculations are often pretty accurate!
@@statswithbrian Works for me, thank you!
In your final slide, you say that the link function maps from the original scale to "the parameter of the relevant probability distribution". You also say the parameter is personalised....
Is your final slide saying that in general, the link function maps to the parameter of the data's distribution? e.g. "p" in Bernoulli, "sigma" in Rayleigh?
Apologies if i haven't understood this correctly.
Yes, the link function is just transforming a real number with no restrictions (negative infinity to infinity) to something with the correct possibilities for the parameter of interest.
In logistic regression, if we were predicting the probability of having diabetes based on weight, you and me would each get a personalized parameter p based on our weight. The heavier person might have p = 0.7, reflecting the fact that their weight makes it more likely that they may have diabetes. The lighter person might have p=0.3. But they will both be between 0 and 1 no matter eat because the link function transformed the scale to ensure that it’s between 0 and 1, which regular linear regression did not do.
To me, it just feels like the expit function (not the link) should really be what we call the "link" function, as it is actually what "links" us to the personal parameter. What am I missing...? For interpreting coefficients, it seems like the link function is really important, but other than that ... it seems like the inverse link is really what we care about.
The link is invertible, so it goes both ways. It’s a little arbitrary and maybe they made an awkward choice of which one to call the link and which one is the inverse link. Think of a railway that connects two cities. You wouldn’t say the eastbound path is more or less the link between the cities than the westbound path. But in math, we can’t just say “the railway” and we have to write down one of the two functions and give it a name. I don’t think you’re missing anything.
Thanks!
But you missed the best part, how we can engineer any combination we want to fit our data. We can model different types of trends, heteroscedasticity and of course, sample from either pdf or pmf. They are incredibly flexible.
By the way, ultimately what's the scope of this channel? Can we eventually expect videos on things like measure theoretic probability, stochastic processes and the like?
There might be one video on measure theory sometime, but no, I plan to stick more on the statistics and data science end. Any more probability videos would probably be similar to the Markov/Chebyshev's inequality videos.
Finally it came!!!
Thanks for the inspiration!