Thank you Sal! This was the best explanation of the R squared error I've seen ... makes a LOT more sense than just machine learning courses saying "its a percentage of how much it explains the variability of the data". This shows why that is, brilliant. Such a simple concept when broken down and explained visually step-by-step. You and 3B1B (also a former Khan Academy educator) are amongst the best instructors on YT.
Sal makes no presumptions about students coming to watch his lectures, and that's what has been helping me out. I have never met an educator better than him in my entire life, PERIOD
1:03 What defines the error? 1:20 What defines total square error? 1:37 How y = (mx+b) relates to the error. 2:55 "What % of the variation in y describes the variation in x? 6:30 What defines the squared error of the mean of y? (end) 6:59 What is not described by the variation in x? 12:38 What is the coefficient of determination, aka R²? (end) 10:23 What does R² say about the regression line?
thank god for him because my own math teacher has absolutely no idea what he's teaching. he's not a statistician, it's his first year teaching stats, so i don't blame him.. but i'd be lost without sal & his vids. thanks again!
Mr khan , youve been with me since i was at secondary school . And now im trying to understand how to do my papers and statistics, suprised to still see there is so much i can learn from you . Thank you for all of ur lectures.
Note potential confusion: "SE" in this video is the (sum of the) Squared Errors. However, usually "s.e." is standard error. Since, technically, we NEVER observe errors (because they are a function of the true regression line--we can estimate residuals, though), what is called "SE" here is actually the Sum of the Squared Residuals (SSR in some texts). I am torn whether to recommend this video to my stats students because of this likely confusion.
I wish everyone else would like and comment after enjoying a video, these likes and comments motivate us when we are in panic to quickly click into the video and cram before exam
When you say " described by the variation in X," you should just write "described by the line," rather than X. It's really confusing for the rest of the video for those who didn't catch that.
buddy,i had this problem too. but think it like this.when we talk about the "proportion of unexplained variation," we mean the fraction of the total variability that is not captured or explained by the regression model which means the error or SSE(more fancy term XD). hope this helps u.
Thank you for video, it was really helpful. I have a question. What if we have data, which we approximated well with a line L, but L is almost horizontal and close to y=mean(y). The ratio of SE(line) to SE(y) is supposed to be close to 1, thus r^2 is close to 0, although L is a good fit.
How do we know that the amount of variation in y not caused by variation in x is SE_line -SE_ymean? What does the difference in errors between the two lines have to do with anything?
He is referring to the diagonal line. Often called the regression line. Represented by mx+b. The variation of the line is equal to the sum of all the (y-(mx + b))^2 equations he wrote up there in the video.
If all of the points were exactly on the regression line, then the regression line would explain all of the variation. In other words, SE(line) would be zero. Anywhere that the points do NOT lie on the line is variation that is undescribed by the regression. For example, a regression line might show that for each hour a student studies, he gets 10% higher on a test. 5 hours earns a 50, 6 hours earns a 60, 7 hours earns a 70. If a student studied for 8 hours and only got a 75, that variation between 75 and the expected 80 would not be described by the regression line and would add to the SE(line). Taken from this comment: www.khanacademy.org/video/r-squared-or-coefficient-of-determination?qa_expand_key=kaencrypted_c228fca3982c98caf8466db5f21d3293_97ad4ce3e39b15af2b59bffb1589cc545a0a13b897657bf6c9694237018a45585fa77017167815c3fb5868e6cd97f5a15499b0ead13d4f8bdae6f9823d7f95f6f9c067fbf53f053b72e5221cd3341c5f842cbd96c5140b0868e8e68a46855aa5af32a4cc1fe19ffd136df5b5a492e8a83fd3a28e20965323f00c242a29d98de4c9bf954e71ca493b1e661f7d719a521c5a74b8710cac5e6b8b9f268a9170fad433006065db869f0da574bb11fdd142868bff9ffd8014d6433eebdcfc59aec0e9
Why "how much of total variation is not described by the regression line" = SE(line)? Isn't SE(line) describes the total variation for the regression line? I'm confused by this sentence? Could anyone help to enlighten me? Thanks :)
Can anyone tell me, How SEline (total square error) gives the total variation not described by regression line? Also, How does the regression line describe the variation in x?
+Jonathan Georgiou Makes it harder and harder for me to watch Khan Academy math videos. They are good at explaining, but the lecturer's repetitive phrases drive me nuts!
maybe there's people watching these videos that aren't as fluent in english as you are mate so by repeating phrases those peolpe might have another chance to catch the meaning of the phrase.. my 2c
Here's the question: IS THE ANSWER E? Which of the following tells us how strong the relationship is between two variables? a) the slope of a line b) the intercept of a line c) the coefficient of determination d) the coefficient of correlation e) both C and D are correct
“If you declare with your mouth “Jesus is Lord,” and believe in your heart that God raised Him from the dead, you will be saved” (Romans 10:9). Now is the time to accept Jesus as your personal Lord and Savior. Obey His commands and repent of your sins because Jesus is coming back soon. Tomorrow isn’t promised.
I think I've fallen in love with Khan. We've been through everything together; Statistics, Chemistry, you name it.
LMFAO
9 years ago what are you now
and after all of the money that I spent on education... I just learned more without any pants on at home.
hy i am from india,i also maths teacher
@@10thpass93 to?? kya??
You watched video after your wedding night
Khan "Let me do this in an other color" Academy
XD
This is EPIC !
Oscar Bergman another*
Lol all joking aside, the colors help visualize it a lot better since there's usually a ton of stuff up on the "board" by the end of the video.
Thank you Sal! This was the best explanation of the R squared error I've seen ... makes a LOT more sense than just machine learning courses saying "its a percentage of how much it explains the variability of the data". This shows why that is, brilliant. Such a simple concept when broken down and explained visually step-by-step. You and 3B1B (also a former Khan Academy educator) are amongst the best instructors on YT.
Had you watched Applied AI course video on R squared error and came here to understand it in depth?
This video "literally" saved my butt. Thank you very much.
Was your ass on fire?
Sal makes no presumptions about students coming to watch his lectures, and that's what has been helping me out. I have never met an educator better than him in my entire life, PERIOD
1:03 What defines the error?
1:20 What defines total square error?
1:37 How y = (mx+b) relates to the error.
2:55 "What % of the variation in y describes the variation in x?
6:30 What defines the squared error of the mean of y? (end)
6:59 What is not described by the variation in x?
12:38 What is the coefficient of determination, aka R²? (end)
10:23 What does R² say about the regression line?
Appreciate it
thank god for him because my own math teacher has absolutely no idea what he's teaching. he's not a statistician, it's his first year teaching stats, so i don't blame him.. but i'd be lost without sal & his vids. thanks again!
Don't think I've ever heard the term *statistician* before. Thanks :D
Mr khan , youve been with me since i was at secondary school . And now im trying to understand how to do my papers and statistics, suprised to still see there is so much i can learn from you . Thank you for all of ur lectures.
Can u explain what is the meaning of line " % not explained by regression line" as that ratio regression/mean error
Khan you are prodigy!!! only prodigy can teach as clear as crystal
Best explanation ever, imagine how many students around the world you have enriched!
Thank you Khan Academy for clearing my doubts in so many topics.
Always enjoy the explanations of Khan Academy. Precise, succinct and. Intuitively explained.
I am glad I watched this, I am glad...glad..I am glad I watched this.
Note potential confusion: "SE" in this video is the (sum of the) Squared Errors. However, usually "s.e." is standard error. Since, technically, we NEVER observe errors (because they are a function of the true regression line--we can estimate residuals, though), what is called "SE" here is actually the Sum of the Squared Residuals (SSR in some texts). I am torn whether to recommend this video to my stats students because of this likely confusion.
It is the sum of the squares of the residuals, not errors.
Wonderful Mr. Khan. You have made this problematic concept seem so simple and 'intuitive'. Keep up this GOOD WORK.
9:58 r squared is the Coefficient of determination
Brilliantly simple and straightforward. So good.
Khan you are my statistics hero!
Wonderful explanation. Thank you so much!
Sal the saviour of pilgrims seeking conceptual clarity, as always.
thanks this was clear explanation
Great video. I really appreciate it.
Man this saved my life. Thanks so much
Thankyou so much for this video! Wonderfully explained!
I loved the explanation. Came here trying to understand the r2_score function in Python.
I wish everyone else would like and comment after enjoying a video, these likes and comments motivate us when we are in panic to quickly click into the video and cram before exam
When you say " described by the variation in X," you should just write "described by the line," rather than X. It's really confusing for the rest of the video for those who didn't catch that.
An Luu this comment helped a lot thank you
i am maths teacher at india. i want to help students in maths subject
superb explanation, understood well
THANKYOU
Brilliant teaching!
Excellent explanation! But why is the Coefficient of Determination equal to r _squared vs some other quantity? Why not r- cubed or ( r_squared -1)?
Did u find the answer?
Your voice is so sweet.
just one word, awesome!!
Why sesubline/sesuby is "What % of variation is not described by the Regression line " instead of "described by"?
buddy,i had this problem too. but think it like this.when we talk about the "proportion of unexplained variation," we mean the fraction of the total variability that is not captured or explained by the regression model which means the error or SSE(more fancy term XD). hope this helps u.
Great video. Tks
There is no easier way to explain regression as simple as that. thanks
Thanks,I really appreciate it.
Just awesome
Very nicely explained!
Thanks!
Good channel.
great video,many thanks
I love you, man.
Thank you for video, it was really helpful. I have a question.
What if we have data, which we approximated well with a line L, but L is almost horizontal and close to y=mean(y). The ratio of SE(line) to SE(y) is supposed to be close to 1, thus r^2 is close to 0, although L is a good fit.
Sal Sir = LEGEND🙌
GREAT EXPLANATION !! the concept is very clean in head now
How do we know that the amount of variation in y not caused by variation in x is SE_line -SE_ymean? What does the difference in errors between the two lines have to do with anything?
What is "variation described by the line" mean? Can someone help explain please..
Inno Vation needing this also
He is referring to the diagonal line. Often called the regression line. Represented by mx+b. The variation of the line is equal to the sum of all the (y-(mx + b))^2 equations he wrote up there in the video.
If all of the points were exactly on the regression line, then the regression line would explain all of the variation. In other words, SE(line) would be zero. Anywhere that the points do NOT lie on the line is variation that is undescribed by the regression.
For example, a regression line might show that for each hour a student studies, he gets 10% higher on a test. 5 hours earns a 50, 6 hours earns a 60, 7 hours earns a 70. If a student studied for 8 hours and only got a 75, that variation between 75 and the expected 80 would not be described by the regression line and would add to the SE(line).
Taken from this comment: www.khanacademy.org/video/r-squared-or-coefficient-of-determination?qa_expand_key=kaencrypted_c228fca3982c98caf8466db5f21d3293_97ad4ce3e39b15af2b59bffb1589cc545a0a13b897657bf6c9694237018a45585fa77017167815c3fb5868e6cd97f5a15499b0ead13d4f8bdae6f9823d7f95f6f9c067fbf53f053b72e5221cd3341c5f842cbd96c5140b0868e8e68a46855aa5af32a4cc1fe19ffd136df5b5a492e8a83fd3a28e20965323f00c242a29d98de4c9bf954e71ca493b1e661f7d719a521c5a74b8710cac5e6b8b9f268a9170fad433006065db869f0da574bb11fdd142868bff9ffd8014d6433eebdcfc59aec0e9
How far the point are from the line
You are the best!
Great video! Thanks sal!
Great tutorial..
What do you mean by "not described by the variation in x"
khan acadamy rocks..
Oh my god I finally got this! Thank you so much. x
i dont hv basic in statistic but need to understand this for managerial economic. what do you mean by mean-of-y? is that the average for all the Y?
yes, mean = average
Great, thx!
you're an HU student right ?
I don't understand the need to interchange "describe by variation in X" and "describe by the line", are they the same thing?
The error that you are referring to [ 0:58] is really residuals, right? just clearing things up.
Why "how much of total variation is not described by the regression line" = SE(line)? Isn't SE(line) describes the total variation for the regression line? I'm confused by this sentence? Could anyone help to enlighten me? Thanks :)
I read somewhere that Rˆ2 may be negative, I don't understand how since it is essentially a %
that's an adjusted or predicted R2, because you can't have negative with regular R2.
Minor correction: it should be 'fraction' instead of 'percentage' at several parts of the video.
KHAN IS KING
Gotta get those Ms and Bs
Just checking you can't get postive R^2 values higher than 1 right?
I love you. you saved my life, keep up the good work :D
How did he estimate the regression line?? Did he just guess? Or he used some technique?
Can anyone tell me, How SEline (total square error) gives the total variation not described by regression line?
Also, How does the regression line describe the variation in x?
what do "the variation of y" and "the variation of y described by the variation of x" mean again.. ?
what does variation in x mean exactly ?
what are possible interpretations and justifications for low r square values in management sceince?
How do you estimate the significance of the betas?
maybe i should have studied for my midterm this week
thank you,it is extremely easy to understand what you are saying, absolutely different from my lecture notes. why can't you be my professor? :)
thank u so much sir....u r awsome
dude it's been 10 years since you wrote this. are you still alive :D
What is it meant by "describe"? "percentage of variation described by percentage of error"?" DESCRIBED"?! what's that mean?
this guy must have repetitive strain injury by now repetitive strain injury by now. by now.
+Jonathan Georgiou Makes it harder and harder for me to watch Khan Academy math videos. They are good at explaining, but the lecturer's repetitive phrases drive me nuts!
maybe there's people watching these videos that aren't as fluent in english as you are mate so by repeating phrases those peolpe might have another chance to catch the meaning of the phrase.. my 2c
It's not his problem, sometimes it is that you don't understand the math.
Repetition is key to memory.
Why does he refer to the "variation of X" and the "the line" as the same thing ?
that R is pearson's coefficient right? don't we denote that with 'r' instead of 'R'?
I still don’t understand why in the world you would use the mean of y instead of just the residual from the regression line?
What program is used to take these notes?
I'm with v2prc. This is the first statement in the playlist that has me confused. Help would be much appreciated.
I think it's n-1 ... not n ... there are n-1 degrees of freedom .. (5:57)
he is not calculating variance here just to compare with population..just a avearage Squared error so n makes sense
@@mindloop4070 yeah good point
What's the relationship btw r and r squared?
@hapolian He is probably writing with a tool called graphic tablet.
love you
Sal is the GOAT
khaaaaaaaaaannnnnn
how to write on the computer screen like this?
So, they both tell us how strong the relationship is between two variables?!
Here's the question: IS THE ANSWER E?
Which of the following tells us how strong the relationship is between two variables?
a) the slope of a line
b) the intercept of a line
c) the coefficient of determination
d) the coefficient of correlation
e) both C and D are correct
Thank you.. but I badly need the notes just made :p
take screen shots in full screen
If you play this a 1.5x speed he sounds like Jeff Goldblum
why are you so good in explaining like the how the fk...
I don't know where 30% obtain and 1 in the formula for sure am not understand
Me learning after 13 years
That's not error that's residual
Hy
lecture Urdu ma dayn smjh n ari
730K views and 2.3K likes? 0.o wut?
“If you declare with your mouth “Jesus is Lord,” and believe in your heart that God raised Him from the dead, you will be saved” (Romans 10:9). Now is the time to accept Jesus as your personal Lord and Savior. Obey His commands and repent of your sins because Jesus is coming back soon. Tomorrow isn’t promised.
Amen to this 🙏
Who is here because of machine learning?