""regression" does not relate to error minimization. The term "regression" appeared first in the article describing statistics of people's height through generations. If a father was tall, his son would be likely taller than average, but ... less so because it is a "regression to the mean". See The Art of Statistics: Learning from Data by David Spiegelhalter for more details.
Trivia: while dealing with real data, one might not want R2 to get close to 1, as that might indicate overfitting, which is really not good, especially for prediction models, which is nicely illustrated by the case of 16-degree polynomial
Nothing to do with the area of triangle. Trying to to find best line which stands at a minimum distance from observed value. So that means, you are trying to minimize the y value in the picture.
R square is the percentage of explained variance/total variance. It falls between 0 and 1 accordingly. It records the amount of variance (error) explained by the model.
By definition (R2=1-RSS/TSS), the R2 will be negative when the model is worse than a "mean model" (y_hat = y_bar). In general, a model can be arbitrarily bad (RSS >> TSS), so R2 can certainly be negative.
Thank you @@fredfeng1518. I have looking into this more to better understand. Rhetorically, why are we being taught the range is 0-1? Is it just more practical? Admittedly, I am new to the field and only have a grasp of the basic concepts, but I can find many resources that I would find credible that state R^2 it is definitively 0-1. "It's a proportion." "It's a squared term.", etc. Is this contentious? Are negative r^2 more theoretical and so rare they aren't worth discussing? Anyways, thank you for elucidating the point and setting me straight. I will try to understand this better.
@@nbgarrett88 No problem. This is indeed more on the theoretical side. In practice, any useful model would have a positive R2, because if it performs even worse than the mean model (in which case RSS > TSS, and thus a negative R2), we could simply pick the mean model instead, which is always at our disposal.
@44:44 Best ever explanation of coefficient of determination R and variability R^2
*My takeaways:*
1. An example: spring model 3:43
2. Coefficient of determination 38:03
These jokes are so cool that I would hang out with them for sure
""regression" does not relate to error minimization. The term "regression" appeared first in the article describing statistics of people's height through generations. If a father was tall, his son would be likely taller than average, but ... less so because it is a "regression to the mean". See The Art of Statistics: Learning from Data by David Spiegelhalter for more details.
Fun, on point, and in-depth lecture. Thanks you MIT.
Trivia: while dealing with real data, one might not want R2 to get close to 1, as that might indicate overfitting, which is really not good, especially for prediction models, which is nicely illustrated by the case of 16-degree polynomial
anything over a 5-degree polynomial is extremely rare in mathematics rather do a non parametric / non linear fit
Out of curiosity, at 19:01, what would trying to minimize the area of the triangle result in? as opposed to minimizing the distance y?
Since it contains an X difference, I think that the result would not have something significant.
Nothing to do with the area of triangle. Trying to to find best line which stands at a minimum distance from observed value. So that means, you are trying to minimize the y value in the picture.
would have been nice to see the slides
Here: ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/lecture-slides-and-files/MIT6_0002F16_lec9.pdf
@18:37 is he refering to line P?
ROFL because of the spring joke!
Great lecture!
cracked up with those jokes
Not sure if R^2 is always positive.
R square is the percentage of explained variance/total variance. It falls between 0 and 1 accordingly. It records the amount of variance (error) explained by the model.
By definition (R2=1-RSS/TSS), the R2 will be negative when the model is worse than a "mean model" (y_hat = y_bar). In general, a model can be arbitrarily bad (RSS >> TSS), so R2 can certainly be negative.
Thank you @@fredfeng1518. I have looking into this more to better understand. Rhetorically, why are we being taught the range is 0-1? Is it just more practical? Admittedly, I am new to the field and only have a grasp of the basic concepts, but I can find many resources that I would find credible that state R^2 it is definitively 0-1. "It's a proportion." "It's a squared term.", etc. Is this contentious? Are negative r^2 more theoretical and so rare they aren't worth discussing?
Anyways, thank you for elucidating the point and setting me straight. I will try to understand this better.
@@nbgarrett88 No problem. This is indeed more on the theoretical side. In practice, any useful model would have a positive R2, because if it performs even worse than the mean model (in which case RSS > TSS, and thus a negative R2), we could simply pick the mean model instead, which is always at our disposal.
Love the jokes!
32:28
38:28