Hi, Josh! I just wanted to say thank you for these videos! The way you explain concepts has been honestly life changing for me (in terms of my academic career). Concepts that I've struggled with for years are finally becoming clear. I just wanted to take a moment to express my appreciation, and let you know how impactful these videos are!
This is such a breath of fresh air as opposed to the unecessarily difficult 'explanations' we have to work with in statistical analysis courses. Your videos are awesome.
Excellent vid & totally helped me again with my regression homework! One of the toughest challenges I have is writing and speaking Regression! One of your last slides around 10:29 helped me learn how to connect a positive / negative variable relationship with R2...love you guys, seriously!
When I saw "is the mean wweight the best way to predit mouse weight", I thought, "it is stupid". And then when I see the formula of R-square, I found that "I was stupid". Awesome videos and it really helps.
All stats courses any level of education must be taught like that. Otherwise for majority of the people stats is ambiguous and difficult to understand. But feel like lecturers are saying this is time consuming, we have a lot of topics to cover and etc. Luckily we have nice RUclips channel and online documents to supplement the courses. Thanks for the great video!
BAM! Avery, I'm glad this is helpful. This is actually the first StatQuest I ever made, back in the day. I had to re-upload it yesterday due to some oddness on behalf of RUclips, but it's still a classic and the video that got the whole thing started.
I am currently stressing about a final for Health Stats, and this gave me an amazing laugh. I love the description of R squared (totally not lame)! Thank you!
This is excellent. Why can't professors explain as well and clearly as you? I had a linear regression class yesterday and I had never even heard about variation before, only standard deviation. I didn't know the reason it was squared either. Thanks a lot
Holy mother of god THANK YOU for this video, I was looking online at a bunch of websites (some paywalled) and none of them explained them as well as this video. Thank you for providing examples and explaining the how rather than the what. 😁😁
Sorry you had trouble and I hope it never, ever happens again. It was very, very frustrating from my end since I've tried to hard to make my videos free for the world.
@ 03:30 How did you choose which line (which angle, starting point) to fit to the data? Shouldn't there be a method to find a line so that the line's R squared equals plain old R's squared?
There is an analytical method, meaning an equation we can plug our data in to get a result, that will give us the line the minimizes the sum of the squared residuals. The line that minimizes the sum of the squared residuals is defined as the best fitting line. Alternatively, we can use an iterative method like Gradient Descent to find the best fitting line. For details on Gradient Descent, see: ruclips.net/video/sDv4f4s2SB8/видео.html
So..r squared is the difference in variation around the mean and line or how less is variation around the line than the mean. How does it translate into the variation in mouse weight explained by the relationship between mouse weight and size? I'm losing all my brain cells trying to connect the dots.
@@_hhbk2128 We start out by calculating the variation around the mean of the y-axis values (weight). Then we use the values on the x-axis (size) to fit a line to the data. This line, which takes the mouse size into account, has an 81% reduction in variance compared to the variance around the mean of the mouse weights alone. Thus variation in mouse size can explain 81% of the variation in mouse weight because some mice are small and weigh less, some mice are large and weigh more.
Is variance different from variation? At 2:15 we find the sum of the squared differences but we don't divide it by the number of observations - 1. Is there a reason for this?
In this case we don't need to divide by n-1 because the denominators will cancel out, leaving us with just the numerators. So we save our selves a step and omit it.
Hi, I see a lot of your Analytics videos are repeated. Are these refreshed with new info or simply repeated? Do I need to watch both or just the newest one?
They are the same. For some strange reason, about a year ago some of my videos got stuck behind a paywall. So I re-uploaded all of the videos behind the paywall so that they would, once again, be available to everyone for free. It now seems that whatever freak event happened back then has become undone, so now I have 2 copies of a handful of videos.
thanks for the nice explanation. I wonder what is the difference between R2 formulation the one you explained and this one --> , R2 = 1 - SSE / SST, where SSE is sum of squared errors, and SST is sum of data variance.
Hi Sir I am madly addicted to your WAY OF EXPLAINING I personally owe you a lot I love math, the way you quest it recently I was researching on DEA as you surely know data envelopment analysis I now, know what does it mean and how to calculate it. can even pyomo code it. use it blindly ... but WHAT IS THE MAIN IDEA BEHIND DEA? Clearly Explained... searched the web there is no remarkable article or video etc I was thinking if you could make such genius video
How to apply it in multivariable linear regression? Calculate R^2 for each feature vs the dependant variable? Could it then be used as a feature selection method? Is that what is called Pearson correlation?
For multivariable linear regression, are still comparing the model (the fit line) to the mean of the values on the y-axis. For more details, see: ruclips.net/video/nk2CQITm_eo/видео.html
I love Statquest videos however, this video had me confused. I tried to study R-Squared from other sources and they told me a different formula which was, R squared = 1-(SSR/SST). Are there different kinds of R squared used in different situations?
It's the same formula, just written differently. However, you can do the algebra and show that they are equal to each other. See: en.wikipedia.org/wiki/Coefficient_of_determination
I have a question: in some cases I get an out of sample R squared which is negative, for example with multiple linear regression or even simple one-variable linear regression. Does that tell me the model is less capable of predicting the response compared to a simple mean? While in sample, there is there no difference between the R squared of a simple linear regression and the square of Person's correlation between two variables?
I'm not sure I understand what you mean by "out of sample" and "in sample", but if you are calculating R^2 using data the model was not originally fit to, then it is possible to get negative values.
@@statquest ah I see! I meant that sometimes I would fit a model on a training set, and among the metrics to evaluate its performance on a dev/test set I would use the R squared, occasionally obtaining negative values. But I see now that it's a pretty different scope compared to the one proposed in your video, since I'm not trying to measure how related two variables are, but rather trying to evaluate a model! Thank you for your reply btw!!
Yes. Something weird happened to the original and now it is behind a paywall. I contacted RUclips and they said there was nothing I could do about it, so I had to re-upload. Sorry for the trouble.
@@statquest In other thing.... what would you think of Statquest en Español! (pum!, the most spanish onomatopeia for bam!) I could help in the translation
@@rubenestebangarciagomez7040 I think it would be great and it's a dream of mine that I want to come true. I've even been trying to learn spanish on my own (but I'm a slow learner). For StatQuest, I've been using AI to create overdubs for my new videos and I think it is OK. If it's good enough, the cool thing is that it can be used for a ton of different languages.
It depends on the model. If the model only contains a single variable, X, then R-squared tells us the % of variance explained by the model, or X. Both are true. However, we can also calculate R-squared for models with many variables. For details, see: ruclips.net/video/nk2CQITm_eo/видео.html and ruclips.net/video/zITIFTsivN8/видео.html
i have a doubt this R square is used to test the accuracy of our model, and it is also used to select the parameters for our model, it will be very helpful if you can come up with a video explaining how to create a full fledged model with proper steps
Var(mean) and var(line) are numbers that are calculated by the sum of squares residuals. For example, for the var(mean), what you do is you find the difference between the mean and every point, square those, and then sune them up. In the video, this comes out to 32. Similarly, for the var(line) you find the difference between the points and the line, squaring, and summing
Yep. That's what this video was originally intended to explain - how R^2 relates to linear regression. That's why we compare the fitted straight line to a horizontal line at the mean.
The square of correlation coefficient (i.e., predicted and true values) is equal to "R squared" only in linear regression, and not in any other regression like decision tree regressor, support vector regressor, THIS is not mentioned in the video?
I hate to be a smart ass but I think you are wrong, R^2 COULD BE NEGATIVE, a simple example is if you have a very bad regressor that way too away from all training points, then the variance could be very very large, so variance of the mean minus variance of the model could be negative, the video here is very misleading.
You are correct. However, when I made this video I was thinking of R-squared only in the context of linear regression, and in that context, R^2 can't be negative. In that context, the worst your model can do is the mean of the y-axis variable.
The coefficient of determination (R²) could never be negative; if one squares a -ve number a positive is formed. Hence the reason R² is between 0 and 1.
@@statquest apologies, it was my attempt at humour. I'm sure it's part of your earlier series that you've re-uploaded recently. The video is fantastic in content.
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
👍
Hi, Josh! I just wanted to say thank you for these videos! The way you explain concepts has been honestly life changing for me (in terms of my academic career). Concepts that I've struggled with for years are finally becoming clear. I just wanted to take a moment to express my appreciation, and let you know how impactful these videos are!
@@DrOats22 Thank you very much! :)
This is such a breath of fresh air as opposed to the unecessarily difficult 'explanations' we have to work with in statistical analysis courses. Your videos are awesome.
Wow, thank you!
You're videos are the single greatest resource for my education on machine learning and AI. If I lost access to your videos, I would be devastated.
Glad you like them!
Yes!! Thanks for this. You are saving grad students around the world!
Happy to help!
And former grad students who haven't touched linear regression in 25 years! :) What a great concise refresher. BAM!
Excellent vid & totally helped me again with my regression homework! One of the toughest challenges I have is writing and speaking Regression! One of your last slides around 10:29 helped me learn how to connect a positive / negative variable relationship with R2...love you guys, seriously!
Glad it was helpful!
one of the most well explained about R, thanks for sharing! no time wasted in this video!
Thank you!
This is just what I was expecting from an explanation of what R-squared is. Thank you very much for making it clear and simple
Glad it was helpful!
It's INSANE how clear this is, thank you!
Thank you! :)
clicked for the title, stayed for the content. thanks for this
bam!
Beautifully explained! Loved the “Correlations close to 0 are lame “😂
:)
When I saw "is the mean wweight the best way to predit mouse weight", I thought, "it is stupid". And then when I see the formula of R-square, I found that "I was stupid". Awesome videos and it really helps.
bam!
Your videos are the most helpful and easiest to follow!
Glad you like them!
All stats courses any level of education must be taught like that. Otherwise for majority of the people stats is ambiguous and difficult to understand. But feel like lecturers are saying this is time consuming, we have a lot of topics to cover and etc. Luckily we have nice RUclips channel and online documents to supplement the courses. Thanks for the great video!
Thank you very much! I appreciate it.
people have no idea how much of a gold this video is
Thank you! :)
Josh, I'm literally teaching my students this today! Going to refer them to this video.
BAM! Avery, I'm glad this is helpful. This is actually the first StatQuest I ever made, back in the day. I had to re-upload it yesterday due to some oddness on behalf of RUclips, but it's still a classic and the video that got the whole thing started.
You keep this up and I’ll have to forward my tuition to your address.
BAM! :)
Time spent sniffing a rock 🤣🤣🤣
:)
I am currently stressing about a final for Health Stats, and this gave me an amazing laugh. I love the description of R squared (totally not lame)! Thank you!
Good luck!
This is excellent. Why can't professors explain as well and clearly as you? I had a linear regression class yesterday and I had never even heard about variation before, only standard deviation. I didn't know the reason it was squared either. Thanks a lot
Thanks!
Incredible explainations. I'm so glad I found this chanel/book!
Thank you!
Excellent explanation. Consider this comment as 1million likes.❤❤
Thank you very much! :)
mind blown. amazingly well explained thank you!
Thank you!
Thank you so much!!! You explain these concepts so easily!! Saving lives one video at a time 😁💕
Thank you!!! :)
Just awesome plain explanation 🎉
Thank you!
That's so intuitive! You really save my Midterm
Thanks!
thank you so much, subscribing right now!
Thank you!
Holy mother of god THANK YOU for this video, I was looking online at a bunch of websites (some paywalled) and none of them explained them as well as this video. Thank you for providing examples and explaining the how rather than the what.
😁😁
Glad I could help!
Thank you UNC-Chapel Hill for saving my life on my AP Stats test. I hope my EA is accepted.
BAM! Congratulations and good luck!
StatQuest is the best thing to come out of UNC since MJ
TRIPLE BAM! :)
Great clear explanation! Thanks!
Glad it was helpful!
Very clearly explained. Thank you
Thank you!
This is a good video. Funny, yet informative.
Glad you enjoyed it!
Great video. Thank you
Thanks!
very clear and concise
Thanks!
Thank for repost this precious r-squared explanation. Yesterday i cant play this modul because of payment bla bla bla bla. Super thanks !
Sorry you had trouble and I hope it never, ever happens again. It was very, very frustrating from my end since I've tried to hard to make my videos free for the world.
Thank you. Very useful.
Glad it was helpful!
This was wonderful. Thank you so much!
Glad you enjoyed it!
Thank you so much for explaining everything in easier way !
Thanks!
Very clear and helpful, thank you
Thanks!
Amazing Explanation.
Thank you so much and thank you UNC Chapel Hill for enabling you to make these
bam! :)
Thank you for this video! I have a much better understanding now
Glad it was helpful!
Such a beautiful explanation. Thank You! :-)
You're very welcome!
@ 03:30 How did you choose which line (which angle, starting point) to fit to the data?
Shouldn't there be a method to find a line so that the line's R squared equals plain old R's squared?
There is an analytical method, meaning an equation we can plug our data in to get a result, that will give us the line the minimizes the sum of the squared residuals. The line that minimizes the sum of the squared residuals is defined as the best fitting line. Alternatively, we can use an iterative method like Gradient Descent to find the best fitting line. For details on Gradient Descent, see: ruclips.net/video/sDv4f4s2SB8/видео.html
🤣🤣 The Intro . I'm enjoying stats thanks to you
:)
I can't believe this videos are fresh new. I'm sorry for everyone who had to give Statistics without watching these first
BAM! :)
just beautiful!!
Thank you!
So..r squared is the difference in variation around the mean and line or how less is variation around the line than the mean. How does it translate into the variation in mouse weight explained by the relationship between mouse weight and size? I'm losing all my brain cells trying to connect the dots.
R-square tells us what percentage of the variation along the y-axis can be explained by variation along the x-axis.
@@statquestThanks for the explaination. I’ll revisit the video with a fresh perspective and hopefully everything will click into place.
@@_hhbk2128 We start out by calculating the variation around the mean of the y-axis values (weight). Then we use the values on the x-axis (size) to fit a line to the data. This line, which takes the mouse size into account, has an 81% reduction in variance compared to the variance around the mean of the mouse weights alone. Thus variation in mouse size can explain 81% of the variation in mouse weight because some mice are small and weigh less, some mice are large and weigh more.
this makes sm sense tysm
bam! :)
thanki you so much.
Thanks!
Banger intro, man
Thanks!
yay more new videos ☺️
:)
Awesome!!!
Thanks!!
Starmer = Hero
Thank you! :)
Bring back stat quest
I hope to have some new stuff out soon.
You are the boss
Thanks!
Stat Quest ✊
bam! :)
Is variance different from variation? At 2:15 we find the sum of the squared differences but we don't divide it by the number of observations - 1. Is there a reason for this?
In this case we don't need to divide by n-1 because the denominators will cancel out, leaving us with just the numerators. So we save our selves a step and omit it.
@@statquest Thank you! It's so obvious now that you pointed it out lol
thanks bro
Any time!
I loved the video! I would like to give this video ten likes!
Thank you!
Sometimes a single video is better than a whole pdf
:)
10:00 explains 25% of original varaition means , 25% less variation compared to that of mean line. right?
coeffficient of correlation is square root of coefficient of determination ? 🙂
Yep, 25% less variation around the regression line than around the mean.
Hi thanks for your videos! Any chance is there a statquest for adjusted R-squared?
I mention it in my video on linear regression: ruclips.net/video/nk2CQITm_eo/видео.html
you are very good
Thanks! 😃
This variation around the mean/regression line that you speak of, is that the mean squared error?
It's related: stats.stackexchange.com/questions/140536/whats-the-difference-between-the-variance-and-the-mean-squared-error
not all heroes wear capes
:)
I'm repeating my question from the original video here:
4:21 I do not understand how this - var(blue line) - is calculated manually.
Thank you.
You may actually want to watch the whole linear regression playlist: ruclips.net/p/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@@statquest You replied so quickly. I will look at this, thank you!
Ty
:)
Hi Josh, can you also explain the F test?
Sure, see: ruclips.net/video/nk2CQITm_eo/видео.html and ruclips.net/video/NF5_btOaCig/видео.html
Hi, I see a lot of your Analytics videos are repeated. Are these refreshed with new info or simply repeated?
Do I need to watch both or just the newest one?
They are the same. For some strange reason, about a year ago some of my videos got stuck behind a paywall. So I re-uploaded all of the videos behind the paywall so that they would, once again, be available to everyone for free. It now seems that whatever freak event happened back then has become undone, so now I have 2 copies of a handful of videos.
DOUBLE BAM!!!
YES!
thanks for the nice explanation. I wonder what is the difference between R2 formulation the one you explained and this one --> , R2 = 1 - SSE / SST, where SSE is sum of squared errors, and SST is sum of data variance.
There is no difference. One formula can be derived directly from the other.
Hi Sir
I am madly addicted to your WAY OF EXPLAINING
I personally owe you a lot
I love math, the way you quest it
recently I was researching on DEA as you surely know data envelopment analysis
I now, know what does it mean and how to calculate it. can even pyomo code it. use it blindly ...
but
WHAT IS THE MAIN IDEA BEHIND DEA?
Clearly Explained...
searched the web
there is no remarkable article or video etc
I was thinking if you could make such genius video
I'm glad you like my videos and I'll keep that topic in mind.
How to apply it in multivariable linear regression? Calculate R^2 for each feature vs the dependant variable? Could it then be used as a feature selection method? Is that what is called Pearson correlation?
For multivariable linear regression, are still comparing the model (the fit line) to the mean of the values on the y-axis. For more details, see: ruclips.net/video/nk2CQITm_eo/видео.html
I love Statquest videos however, this video had me confused. I tried to study R-Squared from other sources and they told me a different formula which was,
R squared = 1-(SSR/SST). Are there different kinds of R squared used in different situations?
It's the same formula, just written differently. However, you can do the algebra and show that they are equal to each other. See: en.wikipedia.org/wiki/Coefficient_of_determination
@@statquest Thanks. Thats helpful. I will try that.
I have a question: in some cases I get an out of sample R squared which is negative, for example with multiple linear regression or even simple one-variable linear regression. Does that tell me the model is less capable of predicting the response compared to a simple mean? While in sample, there is there no difference between the R squared of a simple linear regression and the square of Person's correlation between two variables?
I'm not sure I understand what you mean by "out of sample" and "in sample", but if you are calculating R^2 using data the model was not originally fit to, then it is possible to get negative values.
@@statquest ah I see!
I meant that sometimes I would fit a model on a training set, and among the metrics to evaluate its performance on a dev/test set I would use the R squared, occasionally obtaining negative values. But I see now that it's a pretty different scope compared to the one proposed in your video, since I'm not trying to measure how related two variables are, but rather trying to evaluate a model! Thank you for your reply btw!!
Nice video, but Is var(x) supposed to be the variation or the variance?
Variation and variance are often used interchangeably and, in this case, it's OK.
is this a repost Josh?
Yes. Something weird happened to the original and now it is behind a paywall. I contacted RUclips and they said there was nothing I could do about it, so I had to re-upload. Sorry for the trouble.
@@statquest In other thing.... what would you think of Statquest en Español! (pum!, the most spanish onomatopeia for bam!) I could help in the translation
@@rubenestebangarciagomez7040 I think it would be great and it's a dream of mine that I want to come true. I've even been trying to learn spanish on my own (but I'm a slow learner). For StatQuest, I've been using AI to create overdubs for my new videos and I think it is OK. If it's good enough, the cool thing is that it can be used for a ton of different languages.
@@statquest I'll try to contact you later. Even will try to sing and play ukulele intros...
Thanks ! Ques: is R squared the % of y variance explained by X or explained by the model( regression equation) ?
It depends on the model. If the model only contains a single variable, X, then R-squared tells us the % of variance explained by the model, or X. Both are true. However, we can also calculate R-squared for models with many variables. For details, see: ruclips.net/video/nk2CQITm_eo/видео.html and ruclips.net/video/zITIFTsivN8/видео.html
i have a doubt this R square is used to test the accuracy of our model, and it is also used to select the parameters for our model, it will be very helpful if you can come up with a video explaining how to create a full fledged model with proper steps
See: ruclips.net/video/u1cc1r_Y7M0/видео.html and ruclips.net/video/hokALdIst8k/видео.html and ruclips.net/video/Hrr2anyK_5s/видео.html
@@statquest wow thanks didn't saw old videos great ❤❤❤❤
How did he get the var(mean) of 32 and the var(line) 32? are they just points?
Var(mean) and var(line) are numbers that are calculated by the sum of squares residuals. For example, for the var(mean), what you do is you find the difference between the mean and every point, square those, and then sune them up. In the video, this comes out to 32. Similarly, for the var(line) you find the difference between the points and the line, squaring, and summing
You can also see: ruclips.net/video/SzZ6GpcfoQY/видео.html
Nice
Thanks!
If I only know the angle between the two lines, Will I be able to find the R2 value? (Like Tan theta?)
No.
Cool !!
Thanks!
Please explain adjusted r square also
I describe adjusted R-squared in my video on linear regression, here: ruclips.net/video/nk2CQITm_eo/видео.html
why does this video only have the resolution of 360p?
It's super old, but people still watch it a lot.
r^2 = R^2 holds only for simple linear regression as I know, please correct me if i am wrong.
Yep. That's what this video was originally intended to explain - how R^2 relates to linear regression. That's why we compare the fitted straight line to a horizontal line at the mean.
@@statquest Thanks
mate can u update the resolution please.
Unfortunately updating old videos is a lot harder than you would expect. :(
Can you make a video explaining ETA squared?
I'll keep that in mind.
The square of correlation coefficient (i.e., predicted and true values) is equal to "R squared" only in linear regression, and not in any other regression like decision tree regressor, support vector regressor, THIS is not mentioned in the video?
That is correct. When I made this video, way back in early 2015, I only had linear regression in mind.
I hate to be a smart ass but I think you are wrong, R^2 COULD BE NEGATIVE, a simple example is if you have a very bad regressor that way too away from all training points, then the variance could be very very large, so variance of the mean minus variance of the model could be negative, the video here is very misleading.
You are correct. However, when I made this video I was thinking of R-squared only in the context of linear regression, and in that context, R^2 can't be negative. In that context, the worst your model can do is the mean of the y-axis variable.
He might be meaning the correlation coefficient, r
@@user-ff5sx6pg3d Slightly so but insignificant in practice. Not very misleading as you try to put it
The coefficient of determination (R²) could never be negative; if one squares a -ve number a positive is formed. Hence the reason R² is between 0 and 1.
So there's a 6% correlation between sniffing rocks and a mouse's weight? Lol
:)
I don't khow how to say thank you to be enough
:)
Why is 4 months ago potato quality? Thank you so much for this.
What time point in the video, minutes and seconds, are you asking about?
@@statquest apologies, it was my attempt at humour. I'm sure it's part of your earlier series that you've re-uploaded recently. The video is fantastic in content.
Time spent sniffing a rock 😂😂😂
bam! :)
BAM!
:)
Why on earth is this 360p
It's pretty old.
💚
:)
Noice 👍 Doice 👍 Ice 👍, ....wait, is this a re-upload?
Yes. Without telling me, RUclips put the original behind a paywall, so I re-uploaded it so it would still be free.
@@statquest oofty doof oof oof, Noice 👍 Thanks 👍
This is a re-upload from 8-years ago.
Yep. For some reason the original ended up behind a paywall, so I had to re-upload it.