I read too much from most of the articles and I didnt get the physical interpretation. You explained it in a outstanding way. A big thanks to you. Teaching like you is required in todays world while others just give the mathematical as like books. Again a very big thanks to you.
This was a very good explanation. I've read several explanations of prediction intervals and I have to say, I still don't understand them. I understand how confidence intervals work, but for some reason the added complexity for the prediction interval goes beyond my level of comprehension. Some day I hope to understand how it works...
Best explanation I have ever got for any statistical topic..... Simply superb! Could you please elaborate on why the width of CI and PI is narrower at mean gets wider as we move away from mean
Thanks for the compliment Maluram. As the value of X we are interested in gets farther from the mean, the only thing that changes in the variance formulas is the (X* - X bar)^2 term. The variance is smallest when X* = X bar (since this implies (X*-X bar)^2 =0), and gets larger as X* gets farther from X bar (since the value of (X*-X bar)^2 depends only on the distance of X* from the mean of X). The (X* - X bar)^2 appears in the variance for both intervals for the mean and the prediction intervals, so the effect is similar in both cases. All the best.
jbstatistics Can you explain to me why this is the case (that variance changes when we get farther from the mean)? I thought that the point of homoscedasticity was that the variance of Y is the same regardless of the x value.
Just by only learning some basic machine learning knowledge, I got confused on how SE of the slope is calculated, then I watched all your videos, I never knew you can do inferences on the predicted value of a single point in linear regression, statistics indeed is very hard.
Hi JB! I have been following all you explanations, and it has been incredibly clear and straightforward every time. Do you think you would be able to do a video on Multiple Linear Regressions? Thanks a lot!!
Hi Gregoire! Thanks for the kind words! Yes, multiple regression is definitely up high on the priority list, and I'll be getting back to video production soon.
Prof. Balka: May I ask you a question please? I am trying to understand the meaning and derivation of the prediction interval. On my book (DeGroot), it is said that Y and Y_hat are independent normal random variables, so we can establish a statistic of them. But then what is the meaning of establishing an interval using Y_hat to predict where Y falls, while they are independent? If it is true that they are independent of each other, wouldn't it be the same to simply predict Y on its own?
I think this is a really good question, and it requires a bit of a long explanation. Some concepts like this in regression can be a bit tricky. First note that, when we are about to draw a sample, the random variables Y_1, …, Y_n are not independent of the random variables Y_1 hat, …, Y_n hat. (The random vectors Y and Y hat are not independent.) This is because, in a handwaving argument, the Y values will be part of the calculation for the parameter estimates, which are then used to calculate the Y hat values. We can work out cov(Y_i, Y_j hat) without too much difficulty, but I’m not showing it here. What your book is referring to is the notion that any *new* Y will be independent of the Y hats in the sample (we might think of the new Y as being out-of-sample). When we discuss the big picture of prediction (and prediction intervals), we’re not talking about predicting Y values for which we already know Y. We’re talking about predicting new Y values. Now, you might ask, doesn’t Y hat give us information about the theoretical mean of the new Y, and, if not, why the heck are we doing this? (Which is something along the lines of what you’re asking.) The truth is it does, but let’s break down the model a bit. Let’s assume simple linear regression, though the ideas hold for MLR as well. The model is: Y = beta_0 + beta_1X + epsilon. Here, we view beta_0, beta_1, and X as fixed constants (not random variables). The values of X will be known, but the values of beta_0 and beta_1 are unknown. beta_0 and betae_1 are fixed numbers, we just don’t know what they are. Epsilon is a random variable (we often assume epsilon ~ N(0,sigma^2). Since beta_0, beta_1, and X are fixed, with those assumptions on the epsilons, Y ~ N(beta_0 + beta_1X, sigma^2). So each Y has a mean of beta_0 + beta_1X, which we don’t know the value of, and a variance of sigma^2 (which we also don’t know). The Y hats are based on the sample estimates of beta_0 and beta_1 (and the value of X in the scenario under discussion). While the sample estimates of beta_0 and beta_1 give us information about the true mean of the new Y, they are not *correlated* with the true mean of the new Y (since the true mean of the new Y is a constant). The true distribution of Y is the distribution of Y, regardless of what happened in our sample. Whatever beta_0 hat and beta_1 hat end up being in a given sample, the values beta_0 and beta_1 remain unchanged. If we call the new value of Y we’re trying to predict Y_new, then cov(beta_0 hat + \beta_1 hat X_new, Y_new) = 0, structurally, since, in its nature, that new observation is independent of the original sample. In more mathematical terms: cov(beta_0 hat + \beta_1 hat X_new, Y_new) = cov(beta_0 hat + \beta_1 hat X_new, beta_0 + \beta_1 X_new + epsilon_new) = cov(beta_0 hat + \beta_1 hat X_new, epsilon_new) (because beta_0 + \beta_1 X_new is fixed) = 0 since the error term associated with Y_new is independent of everything in the original sample (by the typical assumptions, which is often reasonable due to the structural nature of the sampling/experiment). So a bit tricky conceptually, in some ways, I agree.
Could you please link me the proof from an article or whatever of how we managed to get to the formula for the variance on 5:40? I don't mind if it's complicated I wanna see the proof
When I use s in the formula, I'm referring to the sample estimate of the variance about the regression line (sigma^2). The sample variance s^2 is the sum of squared residuals, divided by the appropriate degrees of freedom (n-2 in simple linear regression). It's not something we usually calculate on our own, as it can be found in output (s^2 = MS residual, in the ANOVA table, for example). And these intervals are usually calculated with software as well.
10 years later and this is still relevant. THANK YOU.
I read too much from most of the articles and I didnt get the physical interpretation. You explained it in a outstanding way. A big thanks to you. Teaching like you is required in todays world while others just give the mathematical as like books. Again a very big thanks to you.
JB, you did a best job on this topic and many other topics.
Thanks!
It’s quite interesting that books of statistics and books of econometrics give slightly different assumptions for the linear regression model
Never found a video on this topic better than this! Thanks you a lot!
You are very welcome. Thanks for the compliment!
me too here, literally brilliant and very clear!
You are a saviour !!!!
That can be found in regression texts that go into the mathematical details, e.g. Draper & Smith.
Thanks Horacio! I'm glad to be of help.
Thank you so much for the so very helpful explaination. The visual representation helps a ton. Thank you.
This was a very good explanation. I've read several explanations of prediction intervals and I have to say, I still don't understand them. I understand how confidence intervals work, but for some reason the added complexity for the prediction interval goes beyond my level of comprehension. Some day I hope to understand how it works...
I hope was able to help a little. All the best.
Best explanation I have ever got for any statistical topic..... Simply superb! Could you please elaborate on why the width of CI and PI is narrower at mean gets wider as we move away from mean
Thanks for the compliment Maluram. As the value of X we are interested in gets farther from the mean, the only thing that changes in the variance formulas is the (X* - X bar)^2 term. The variance is smallest when X* = X bar (since this implies (X*-X bar)^2 =0), and gets larger as X* gets farther from X bar (since the value of (X*-X bar)^2 depends only on the distance of X* from the mean of X). The (X* - X bar)^2 appears in the variance for both intervals for the mean and the prediction intervals, so the effect is similar in both cases. All the best.
jbstatistics Can you explain to me why this is the case (that variance changes when we get farther from the mean)? I thought that the point of homoscedasticity was that the variance of Y is the same regardless of the x value.
Thank You so much, tis explanation was outstanding. Ho my God amazing!
My brain hurts 😵💫 great video, thank you!
Just by only learning some basic machine learning knowledge, I got confused on how SE of the slope is calculated, then I watched all your videos, I never knew you can do inferences on the predicted value of a single point in linear regression, statistics indeed is very hard.
You're just awesome.
Thanks!
brilliantly explained thanks
You are very welcome! Thanks for the compliment!
Hi JB!
I have been following all you explanations, and it has been incredibly clear and straightforward every time.
Do you think you would be able to do a video on Multiple Linear Regressions? Thanks a lot!!
Hi Gregoire! Thanks for the kind words! Yes, multiple regression is definitely up high on the priority list, and I'll be getting back to video production soon.
Amazing explanation! Thanks :)
You are very welcome!
Truly amazing bro!
can we just have this channel replace all the statistics courses worldwide
You rock...jazak Allah 😇
Thanks!
Thanks for the video, very helpful
I’d like to shake your hand and thank you for your effort
I'm glad I could be of help!
I wish you would derive the formulas in a separate video.
Can you tell me where I can find info on the derivations of the two standard errors? I don't see a video on that in your channel.
I wish my lecturers were this good at teaching lol
Prof. Balka: May I ask you a question please? I am trying to understand the meaning and derivation of the prediction interval. On my book (DeGroot), it is said that Y and Y_hat are independent normal random variables, so we can establish a statistic of them. But then what is the meaning of establishing an interval using Y_hat to predict where Y falls, while they are independent? If it is true that they are independent of each other, wouldn't it be the same to simply predict Y on its own?
I think this is a really good question, and it requires a bit of a long explanation. Some concepts like this in regression can be a bit tricky.
First note that, when we are about to draw a sample, the random variables Y_1, …, Y_n are not independent of the random variables Y_1 hat, …, Y_n hat. (The random vectors Y and Y hat are not independent.) This is because, in a handwaving argument, the Y values will be part of the calculation for the parameter estimates, which are then used to calculate the Y hat values. We can work out cov(Y_i, Y_j hat) without too much difficulty, but I’m not showing it here.
What your book is referring to is the notion that any *new* Y will be independent of the Y hats in the sample (we might think of the new Y as being out-of-sample). When we discuss the big picture of prediction (and prediction intervals), we’re not talking about predicting Y values for which we already know Y. We’re talking about predicting new Y values.
Now, you might ask, doesn’t Y hat give us information about the theoretical mean of the new Y, and, if not, why the heck are we doing this? (Which is something along the lines of what you’re asking.) The truth is it does, but let’s break down the model a bit. Let’s assume simple linear regression, though the ideas hold for MLR as well. The model is:
Y = beta_0 + beta_1X + epsilon.
Here, we view beta_0, beta_1, and X as fixed constants (not random variables). The values of X will be known, but the values of beta_0 and beta_1 are unknown. beta_0 and betae_1 are fixed numbers, we just don’t know what they are. Epsilon is a random variable (we often assume epsilon ~ N(0,sigma^2). Since beta_0, beta_1, and X are fixed, with those assumptions on the epsilons, Y ~ N(beta_0 + beta_1X, sigma^2).
So each Y has a mean of beta_0 + beta_1X, which we don’t know the value of, and a variance of sigma^2 (which we also don’t know). The Y hats are based on the sample estimates of beta_0 and beta_1 (and the value of X in the scenario under discussion). While the sample estimates of beta_0 and beta_1 give us information about the true mean of the new Y, they are not *correlated* with the true mean of the new Y (since the true mean of the new Y is a constant). The true distribution of Y is the distribution of Y, regardless of what happened in our sample. Whatever beta_0 hat and beta_1 hat end up being in a given sample, the values beta_0 and beta_1 remain unchanged.
If we call the new value of Y we’re trying to predict Y_new, then cov(beta_0 hat + \beta_1 hat X_new, Y_new) = 0, structurally, since, in its nature, that new observation is independent of the original sample.
In more mathematical terms:
cov(beta_0 hat + \beta_1 hat X_new, Y_new)
= cov(beta_0 hat + \beta_1 hat X_new, beta_0 + \beta_1 X_new + epsilon_new)
= cov(beta_0 hat + \beta_1 hat X_new, epsilon_new) (because beta_0 + \beta_1 X_new is fixed)
= 0 since the error term associated with Y_new is independent of everything in the original sample (by the typical assumptions, which is often reasonable due to the structural nature of the sampling/experiment).
So a bit tricky conceptually, in some ways, I agree.
really good explanation, Thank u .. :)
THANK YOU!!!
I LOVE YOU!!!!!
Can you suggest to us some theoretical books about this topic, please?
very good explenation!
nice way of explanation
I need the derivation for Yprediction and Ymean
very helpful
tks!!!
Could you please link me the proof from an article or whatever of how we managed to get to the formula for the variance on 5:40? I don't mind if it's complicated I wanna see the proof
What is SE for prediction interval
When we are finding SE for a prediction interval, do we use Sx or Sy?? or how do I find (s) ?
When I use s in the formula, I'm referring to the sample estimate of the variance about the regression line (sigma^2). The sample variance s^2 is the sum of squared residuals, divided by the appropriate degrees of freedom (n-2 in simple linear regression). It's not something we usually calculate on our own, as it can be found in output (s^2 = MS residual, in the ANOVA table, for example). And these intervals are usually calculated with software as well.
Are you Dekar?
Thanks!