Least Square Estimators - Unbiased Proof
HTML-код
- Опубликовано: 4 янв 2021
- The Simple Linear Regression Least Squared Estimators, b0 and b1, are unbiased. In this video I show the proof.
What does it mean for X to not be random? I discuss this more in this video: • Why is X non-random in...
Near minute 4, I mention that Sum(xi - xbar) = 0, here is a video showing this: • Proof that the Sum of ...
Near minute 8, I mention that Sum (xi - xbar)xi = Sum (xi - xbar) * (xi - xbar) = Sxx, here is a video showing this: • Proof that the Sum (xi...
Splendid work....
amazing video! thank you!
great video :) , keep it going
That was a life saver. Thank you! ❤
Glad it helped!
Thank you!!!
THE BEST explanation on this topic on RUclips! Many Thanks! One question though: 1:02 why can we make such assumption #1? Or what's the meaning of that assumption? Xi is nonrandom. Appreciate any advice!
Thanks for this comment! Here is a video that might be helpful: ruclips.net/video/5Ezg9UXIyZs/видео.htmlsi=IkU9c9jmBlIorGw6
First, let's think about Yi, which is a random number. To be a random number, we can think of that as a number that when sampled is randomly not the same number. For example, suppose I am interested in knowing the IQ of people in America. Not everyone's IQ is the same. IQ varries randomly from person to person. Actually, we could measure the IQ of the same person twice and likely get two different numbers. Generally, things that are "measured" are random numbers because exact measurements are very difficult/impossible.
A non random number is constant. Age might be a good example of a nonrandom X value. If we know someone's birthday, Age can be calculated exactly.
@@Stats4Everyone Thank you so much for your time and detailed reply! Yup very helpful! So the independent sample data is nonrandom per samples that collected. But the dependent data (the one that measured, the one that we study the relation to the sample's data) are random, cuz even tho we draw the same sample, but the dependent variable we study still subject to vary within a certain distribution. Really appreciate it! Wish all the best to you and your channel!
thanks for the video!
yooooo that's really good thank you!
Amazing!
Thanks a lot
finally understood it thanks.
why sum of (x-xbar)ybar would be zero in 4.29 while it is not zero with sum of (xi-xbar)yi??
Good question. Thank you for posting this comment. ybar is a constant (it is the same number for all i), therefore, we can factor it as follows:
sum (xi-xbar)*ybar = ybar * sum(xi-xbar)
sum(xi - xbar) = 0 . For proof, please see this video: ruclips.net/video/oyjkFmNMMKA/видео.html
Therefore,
sum (xi-xbar)*ybar = ybar * sum(xi-xbar) = ybar * 0
In contrast, yi can not be factored, because it is not a constant, and changes with i. Therefore, sum (xi-xbar)*yi is not zero, because for each i, xi-xbar is being multiplied by yi.
Thank you so much for the great explanation! One question: Is there difference of E[yi] and y_bar ; similar like E[epsilon_i] and epsilon_bar. One is expectation, the other is the average? For example if E[epsilon_i] = 0, then epsilon_bar , the average over sample, also should be zero, right?
Thanks for this comment! I have broken down a response below:
1. Is there difference of E[yi] and y_bar
Yes there is a difference.
ybar is an estimator for the expected value of y, E(y). ybar is a sample average, whereas the expected value of y, E(y) is the population average. Also, ybar is not conditional on the value of x, rather it is an average over all the different values of x. In a hypothetical example, if we were using tree height, x, to model tree age, y, then ybar is the sample average tree age of all the sampled trees, regardless of their height. E(y) is the population average tree age of all trees in the population, regardless of their height.
E(yi) is the expected value of yi. In our hypothetical example, E(yi) is the expected tree age for a particular tree height. In other words, E(yi) is the average tree age for a particular tree height (“expected” and “average” are synonymous). yi is a single observed tree age at a particular tree height. We could observe and sample several values trees at the exact same height, and then average their ages to estimate E(yi).
2. similar like E[epsilon_i] and epsilon_bar
Not really…
We assume that E[epsilon_i] is zero as a model assumption. This means that the model is explaining the variability in y, and anything else making y vary is just random noise. In the hypothetical example, this means that tree height is the only predictor for tree age and everything else is either controlled or not important in explaining why all trees are not the exact same age (note that this assumption is definitely a stretch and unrealistic in this example…and yet people might run this regression without much thought about whether the model assumptions are reasonable…which is not good).
Epsilon_bar. It is important to define this idea. Usually, people use lower case e to denote the sample model error term, since epsilon_i is not observed (unlike y_i, which is observed). The sample average of the model error term, e, is zero by definition. It is a fun exercise to prove the following:
If, ei = yihat - yi
where yihat = bo + b1xi
then, sum ei = 0
therefore e-bar = 0
To do this proof, plug in the estimators bo and b1.
I hope this helps! Thanks again for the thoughtful comment!
Great! Recall my memory
Hi! I would like to ask why X bar is nonrandom? Isnt the (sum of Xi)/n ? Since Xi can be any value, shouldn't X bar be a random variable?
papa think of it like this: when she says non random she means constant. Unlike xi xbar is constant. xi can take different values of x but x bar is their mean or average. E(xi) mean average where probability involved
Thank you for this post! I made a video to discuss this more: ruclips.net/video/5Ezg9UXIyZs/видео.html. Please let me know if you have follow-up questions about this!
Goddess
How is (Xi-Xbar)Xi = (Xi-Xbar)(Xi-Xbar)
Thank you for this comment! I made this video to discuss this question in more detail: ruclips.net/video/b0FCezzWqIg/видео.html
You lost me at the first step. Namely, E(S_xy / S_xx) is taking the expectation of both the numerator and the denominator
S_xx is not random so the expectation of S_xx equals to S_xx's calculated result. E(S_xx) = S_xx
I think I understand, Ma. The values x_i are given beforehand and then the y_i are observed, rather than simultaneously given. For example, you want to regress X=hrs of Study vs Y=Grade in class. So the x_i are given, chosen beforehand : 1hr, 2hrs, 3hrs, etc. And then the values of y_i --the grade after i hrs of study, e.g., y_1 is the grade of those who -we know beforehand_ studied 1 hr. They would both be random if, e.g., we tried to regress height X_i vs weight Y_i and selected people at random and measured their weight and respective height . Then both X,Y are random, as neither is set beforehand. Another example for X being non-random is if we studied the amount of Co2 released by groups of x people. So we would choose, beforehand to study the emissions of x_i =1,2,3, j people, rather than selecting at random the number of people in the group. Did I make sense ?
Thank you for posting this question! Sxx is a not random because X is not random. For more discussion about X not being random, please see this video: ruclips.net/video/5Ezg9UXIyZs/видео.html. Since Sxx is not random, I am able to pull it out of the expected value. Recall this property of expected value: if "a" is a constant, and "x" is random, then E(a*X) = a*E(X). Using this property of expected value, E(Sxy / Sxx) = E(1/Sxx * Sxy) = 1/Sxx * E(Sxy) = E(Sxy) / Sxx
Please let me know if you have any follow-up questions about this!
@@fernandojackson7207 Great explanation! Thank you for this post!!