Thank you Sal! This was the best explanation of the R squared error I've seen ... makes a LOT more sense than just machine learning courses saying "its a percentage of how much it explains the variability of the data". This shows why that is, brilliant. Such a simple concept when broken down and explained visually step-by-step. You and 3B1B (also a former Khan Academy educator) are amongst the best instructors on YT.
Sal makes no presumptions about students coming to watch his lectures, and that's what has been helping me out. I have never met an educator better than him in my entire life, PERIOD
1:03 What defines the error? 1:20 What defines total square error? 1:37 How y = (mx+b) relates to the error. 2:55 "What % of the variation in y describes the variation in x? 6:30 What defines the squared error of the mean of y? (end) 6:59 What is not described by the variation in x? 12:38 What is the coefficient of determination, aka R²? (end) 10:23 What does R² say about the regression line?
thank god for him because my own math teacher has absolutely no idea what he's teaching. he's not a statistician, it's his first year teaching stats, so i don't blame him.. but i'd be lost without sal & his vids. thanks again!
buddy,i had this problem too. but think it like this.when we talk about the "proportion of unexplained variation," we mean the fraction of the total variability that is not captured or explained by the regression model which means the error or SSE(more fancy term XD). hope this helps u.
Note potential confusion: "SE" in this video is the (sum of the) Squared Errors. However, usually "s.e." is standard error. Since, technically, we NEVER observe errors (because they are a function of the true regression line--we can estimate residuals, though), what is called "SE" here is actually the Sum of the Squared Residuals (SSR in some texts). I am torn whether to recommend this video to my stats students because of this likely confusion.
Mr khan , youve been with me since i was at secondary school . And now im trying to understand how to do my papers and statistics, suprised to still see there is so much i can learn from you . Thank you for all of ur lectures.
When you say " described by the variation in X," you should just write "described by the line," rather than X. It's really confusing for the rest of the video for those who didn't catch that.
I wish everyone else would like and comment after enjoying a video, these likes and comments motivate us when we are in panic to quickly click into the video and cram before exam
Can anyone tell me, How SEline (total square error) gives the total variation not described by regression line? Also, How does the regression line describe the variation in x?
He is referring to the diagonal line. Often called the regression line. Represented by mx+b. The variation of the line is equal to the sum of all the (y-(mx + b))^2 equations he wrote up there in the video.
If all of the points were exactly on the regression line, then the regression line would explain all of the variation. In other words, SE(line) would be zero. Anywhere that the points do NOT lie on the line is variation that is undescribed by the regression. For example, a regression line might show that for each hour a student studies, he gets 10% higher on a test. 5 hours earns a 50, 6 hours earns a 60, 7 hours earns a 70. If a student studied for 8 hours and only got a 75, that variation between 75 and the expected 80 would not be described by the regression line and would add to the SE(line). Taken from this comment: www.khanacademy.org/video/r-squared-or-coefficient-of-determination?qa_expand_key=kaencrypted_c228fca3982c98caf8466db5f21d3293_97ad4ce3e39b15af2b59bffb1589cc545a0a13b897657bf6c9694237018a45585fa77017167815c3fb5868e6cd97f5a15499b0ead13d4f8bdae6f9823d7f95f6f9c067fbf53f053b72e5221cd3341c5f842cbd96c5140b0868e8e68a46855aa5af32a4cc1fe19ffd136df5b5a492e8a83fd3a28e20965323f00c242a29d98de4c9bf954e71ca493b1e661f7d719a521c5a74b8710cac5e6b8b9f268a9170fad433006065db869f0da574bb11fdd142868bff9ffd8014d6433eebdcfc59aec0e9
How do we know that the amount of variation in y not caused by variation in x is SE_line -SE_ymean? What does the difference in errors between the two lines have to do with anything?
Why "how much of total variation is not described by the regression line" = SE(line)? Isn't SE(line) describes the total variation for the regression line? I'm confused by this sentence? Could anyone help to enlighten me? Thanks :)
Thank you for video, it was really helpful. I have a question. What if we have data, which we approximated well with a line L, but L is almost horizontal and close to y=mean(y). The ratio of SE(line) to SE(y) is supposed to be close to 1, thus r^2 is close to 0, although L is a good fit.
+Jonathan Georgiou Makes it harder and harder for me to watch Khan Academy math videos. They are good at explaining, but the lecturer's repetitive phrases drive me nuts!
maybe there's people watching these videos that aren't as fluent in english as you are mate so by repeating phrases those peolpe might have another chance to catch the meaning of the phrase.. my 2c
“If you declare with your mouth “Jesus is Lord,” and believe in your heart that God raised Him from the dead, you will be saved” (Romans 10:9). Now is the time to accept Jesus as your personal Lord and Savior. Obey His commands and repent of your sins because Jesus is coming back soon. Tomorrow isn’t promised.
Here's the question: IS THE ANSWER E? Which of the following tells us how strong the relationship is between two variables? a) the slope of a line b) the intercept of a line c) the coefficient of determination d) the coefficient of correlation e) both C and D are correct
I think I've fallen in love with Khan. We've been through everything together; Statistics, Chemistry, you name it.
LMFAO
9 years ago what are you now
Khan "Let me do this in an other color" Academy
XD
This is EPIC !
Oscar Bergman another*
Lol all joking aside, the colors help visualize it a lot better since there's usually a ton of stuff up on the "board" by the end of the video.
and after all of the money that I spent on education... I just learned more without any pants on at home.
hy i am from india,i also maths teacher
@@10thpass93 to?? kya??
You watched video after your wedding night
Thank you Sal! This was the best explanation of the R squared error I've seen ... makes a LOT more sense than just machine learning courses saying "its a percentage of how much it explains the variability of the data". This shows why that is, brilliant. Such a simple concept when broken down and explained visually step-by-step. You and 3B1B (also a former Khan Academy educator) are amongst the best instructors on YT.
Had you watched Applied AI course video on R squared error and came here to understand it in depth?
Sal makes no presumptions about students coming to watch his lectures, and that's what has been helping me out. I have never met an educator better than him in my entire life, PERIOD
1:03 What defines the error?
1:20 What defines total square error?
1:37 How y = (mx+b) relates to the error.
2:55 "What % of the variation in y describes the variation in x?
6:30 What defines the squared error of the mean of y? (end)
6:59 What is not described by the variation in x?
12:38 What is the coefficient of determination, aka R²? (end)
10:23 What does R² say about the regression line?
Appreciate it
This video "literally" saved my butt. Thank you very much.
Was your ass on fire?
thank god for him because my own math teacher has absolutely no idea what he's teaching. he's not a statistician, it's his first year teaching stats, so i don't blame him.. but i'd be lost without sal & his vids. thanks again!
Don't think I've ever heard the term *statistician* before. Thanks :D
9:58 r squared is the Coefficient of determination
Best explanation ever, imagine how many students around the world you have enriched!
Sal the saviour of pilgrims seeking conceptual clarity, as always.
Khan you are prodigy!!! only prodigy can teach as clear as crystal
Why sesubline/sesuby is "What % of variation is not described by the Regression line " instead of "described by"?
buddy,i had this problem too. but think it like this.when we talk about the "proportion of unexplained variation," we mean the fraction of the total variability that is not captured or explained by the regression model which means the error or SSE(more fancy term XD). hope this helps u.
Note potential confusion: "SE" in this video is the (sum of the) Squared Errors. However, usually "s.e." is standard error. Since, technically, we NEVER observe errors (because they are a function of the true regression line--we can estimate residuals, though), what is called "SE" here is actually the Sum of the Squared Residuals (SSR in some texts). I am torn whether to recommend this video to my stats students because of this likely confusion.
It is the sum of the squares of the residuals, not errors.
Mr khan , youve been with me since i was at secondary school . And now im trying to understand how to do my papers and statistics, suprised to still see there is so much i can learn from you . Thank you for all of ur lectures.
Can u explain what is the meaning of line " % not explained by regression line" as that ratio regression/mean error
Khan you are my statistics hero!
Sal Sir = LEGEND🙌
When you say " described by the variation in X," you should just write "described by the line," rather than X. It's really confusing for the rest of the video for those who didn't catch that.
An Luu this comment helped a lot thank you
i am maths teacher at india. i want to help students in maths subject
I wish everyone else would like and comment after enjoying a video, these likes and comments motivate us when we are in panic to quickly click into the video and cram before exam
Great video. I really appreciate it.
I am glad I watched this, I am glad...glad..I am glad I watched this.
Wonderful Mr. Khan. You have made this problematic concept seem so simple and 'intuitive'. Keep up this GOOD WORK.
Thank you Khan Academy for clearing my doubts in so many topics.
Can anyone tell me, How SEline (total square error) gives the total variation not described by regression line?
Also, How does the regression line describe the variation in x?
I loved the explanation. Came here trying to understand the r2_score function in Python.
Your voice is so sweet.
Man this saved my life. Thanks so much
There is no easier way to explain regression as simple as that. thanks
The error that you are referring to [ 0:58] is really residuals, right? just clearing things up.
Brilliantly simple and straightforward. So good.
What is "variation described by the line" mean? Can someone help explain please..
Inno Vation needing this also
He is referring to the diagonal line. Often called the regression line. Represented by mx+b. The variation of the line is equal to the sum of all the (y-(mx + b))^2 equations he wrote up there in the video.
If all of the points were exactly on the regression line, then the regression line would explain all of the variation. In other words, SE(line) would be zero. Anywhere that the points do NOT lie on the line is variation that is undescribed by the regression.
For example, a regression line might show that for each hour a student studies, he gets 10% higher on a test. 5 hours earns a 50, 6 hours earns a 60, 7 hours earns a 70. If a student studied for 8 hours and only got a 75, that variation between 75 and the expected 80 would not be described by the regression line and would add to the SE(line).
Taken from this comment: www.khanacademy.org/video/r-squared-or-coefficient-of-determination?qa_expand_key=kaencrypted_c228fca3982c98caf8466db5f21d3293_97ad4ce3e39b15af2b59bffb1589cc545a0a13b897657bf6c9694237018a45585fa77017167815c3fb5868e6cd97f5a15499b0ead13d4f8bdae6f9823d7f95f6f9c067fbf53f053b72e5221cd3341c5f842cbd96c5140b0868e8e68a46855aa5af32a4cc1fe19ffd136df5b5a492e8a83fd3a28e20965323f00c242a29d98de4c9bf954e71ca493b1e661f7d719a521c5a74b8710cac5e6b8b9f268a9170fad433006065db869f0da574bb11fdd142868bff9ffd8014d6433eebdcfc59aec0e9
How far the point are from the line
I think it's n-1 ... not n ... there are n-1 degrees of freedom .. (5:57)
he is not calculating variance here just to compare with population..just a avearage Squared error so n makes sense
@@mindloop4070 yeah good point
thanks this was clear explanation
Always enjoy the explanations of Khan Academy. Precise, succinct and. Intuitively explained.
I love you. you saved my life, keep up the good work :D
khan acadamy rocks..
Brilliant teaching!
I love you, man.
just one word, awesome!!
THANKYOU
what do "the variation of y" and "the variation of y described by the variation of x" mean again.. ?
superb explanation, understood well
Gotta get those Ms and Bs
Wonderful explanation. Thank you so much!
thanks a lot. this really helped
Good channel.
Excellent explanation! But why is the Coefficient of Determination equal to r _squared vs some other quantity? Why not r- cubed or ( r_squared -1)?
Did u find the answer?
KHAN IS KING
i dont hv basic in statistic but need to understand this for managerial economic. what do you mean by mean-of-y? is that the average for all the Y?
yes, mean = average
Great, thx!
you're an HU student right ?
What do you mean by "not described by the variation in x"
Thanks!
Thankyou so much for this video! Wonderfully explained!
Minor correction: it should be 'fraction' instead of 'percentage' at several parts of the video.
Oh my god I finally got this! Thank you so much. x
Great video. Tks
Just awesome
How do we know that the amount of variation in y not caused by variation in x is SE_line -SE_ymean? What does the difference in errors between the two lines have to do with anything?
You are the best!
I don't understand the need to interchange "describe by variation in X" and "describe by the line", are they the same thing?
great video,many thanks
Great video! Thanks sal!
Great tutorial..
Sal is the GOAT
Why "how much of total variation is not described by the regression line" = SE(line)? Isn't SE(line) describes the total variation for the regression line? I'm confused by this sentence? Could anyone help to enlighten me? Thanks :)
I read somewhere that Rˆ2 may be negative, I don't understand how since it is essentially a %
that's an adjusted or predicted R2, because you can't have negative with regular R2.
Thank you for video, it was really helpful. I have a question.
What if we have data, which we approximated well with a line L, but L is almost horizontal and close to y=mean(y). The ratio of SE(line) to SE(y) is supposed to be close to 1, thus r^2 is close to 0, although L is a good fit.
How do you estimate the significance of the betas?
thank you,it is extremely easy to understand what you are saying, absolutely different from my lecture notes. why can't you be my professor? :)
what does variation in x mean exactly ?
this guy must have repetitive strain injury by now repetitive strain injury by now. by now.
+Jonathan Georgiou Makes it harder and harder for me to watch Khan Academy math videos. They are good at explaining, but the lecturer's repetitive phrases drive me nuts!
maybe there's people watching these videos that aren't as fluent in english as you are mate so by repeating phrases those peolpe might have another chance to catch the meaning of the phrase.. my 2c
It's not his problem, sometimes it is that you don't understand the math.
Repetition is key to memory.
maybe i should have studied for my midterm this week
what are possible interpretations and justifications for low r square values in management sceince?
What program is used to take these notes?
“If you declare with your mouth “Jesus is Lord,” and believe in your heart that God raised Him from the dead, you will be saved” (Romans 10:9). Now is the time to accept Jesus as your personal Lord and Savior. Obey His commands and repent of your sins because Jesus is coming back soon. Tomorrow isn’t promised.
Amen to this 🙏
Just checking you can't get postive R^2 values higher than 1 right?
khaaaaaaaaaannnnnn
I'm with v2prc. This is the first statement in the playlist that has me confused. Help would be much appreciated.
How did he estimate the regression line?? Did he just guess? Or he used some technique?
Why does he refer to the "variation of X" and the "the line" as the same thing ?
If you play this a 1.5x speed he sounds like Jeff Goldblum
What is it meant by "describe"? "percentage of variation described by percentage of error"?" DESCRIBED"?! what's that mean?
that R is pearson's coefficient right? don't we denote that with 'r' instead of 'R'?
I still don’t understand why in the world you would use the mean of y instead of just the residual from the regression line?
GREAT EXPLANATION !! the concept is very clean in head now
@hapolian He is probably writing with a tool called graphic tablet.
thank u so much sir....u r awsome
dude it's been 10 years since you wrote this. are you still alive :D
Me learning after 13 years
love you
how to write on the computer screen like this?
What's the relationship btw r and r squared?
I don't know where 30% obtain and 1 in the formula for sure am not understand
lecture Urdu ma dayn smjh n ari
So, they both tell us how strong the relationship is between two variables?!
Here's the question: IS THE ANSWER E?
Which of the following tells us how strong the relationship is between two variables?
a) the slope of a line
b) the intercept of a line
c) the coefficient of determination
d) the coefficient of correlation
e) both C and D are correct
why are you so good in explaining like the how the fk...
Hy
That's not error that's residual
Who is here because of machine learning?
Thank you.. but I badly need the notes just made :p
take screen shots in full screen
730K views and 2.3K likes? 0.o wut?
he's married. so yeah.. i guess he does.