Hi! Thank you for a great statistic program and wonderful tutorials. One question and one statement: - Why are not two other important assumptions addressed, namely the problem with outliers and the requirement if independence of residuals? - Maybe it should be better illuminated that normality refers to that it is the residuals that should be normally distributed about the predicted dependents variables sore. It can be misunderstood that it is the raw data that should be normally distributed.
Hi, thanks for the video. Regarding the second assumption (residuals must be normally distributed). Does the histogram represent the normal distribution of the residual, right? I didn't understand if the points in the qqplot are the residuals or the sample data
@@datatab Thanks for the reply. It's very common to see people checking the normality condition on the sample data and not on the residuals. I suppose it's a mistake. Instead, other people say: "Ok, it's not a "real assumption", but it is preferable that features are normally distributed, not only the residuals". Is there any truth behind this statement?
@@retenim28 Hmm normally you main assumption is that the residuals are normally distributed! I can't answer that off the top of my head, but maybe the residuals are always normally distributed if all variables are normally distributed, but I don't know that for sure!
so wtf do i do if my data isnt linear? just show a graph saying its not linear therefore i havent bothered to run and stats and all these data and research is a waste of time ?
Nope. Then you identify what distribution your dataset is following and try to predict. There are number of other distributions apart from linear. It might be quadratic or logarithmically related.
It should be linear in the --Parameters--, not your data itself. For example, a quadratic y = A + Bx² is still linear. You just need to put x² in your x axis to show that. The parameters in this case are A and B. That's a common misconception not explained in the video. y = A + Bx + Cx² is also linear (it will be a multilinear regression in this case). Even y = A + B cos(x) is linear, but y = A + B cos(Cx) is not linear anymore. C is not linear in relation to y. In general, for a function to be linear for a regression, it should follow y = Af(x) + Bg(x) + Ch(x) +...
Hmm, I have not read about it yet, but could make sense! Maybe the result is then no longer normally distributed. With it the assumption of independent observations would then be included in the assumption of normally distributed error. Regards Hannah
What do you mean by influential points? Personally, I haven't heard of influential points as a requirement, but I haven't looked that up in more detail either! Regards, Hannah and Mathias
@@datatab By influential point I mean an outlier that greatly affects the slope of the regression line. I was just wondering what the rule of thumb regarding them is when it comes to regression. But fair play if you're not sure if they are part of regression assumptions.
@@Gesuselsaviour Well, if the outliers are too large, then the error epsilon will probably no longer be normally distributed and thus the requirements are not met, but as is so often the case, there is unfortunately no limit value that says from there it still goes and from there no longer!
Why is it the case that you should square the determinant in order to check for linearity? If there would be a logistic correlation, it would still be significant if you squared the determinant, right? Or not?
If you like, please find our e-Book here: datatab.net/statistics-book 😎
Fantastic explanation. Presentation of the concept is excellent
Well explained. Thanks for including the diagnostics, which is by far the most important part and something not often covered in most of the videos.
Glad it was helpful! Regards Hannah
Presentation of the concept is excellent 👍.
Much appreciated 🎉
Thanks a lot 😊
Hello Ma'am , your teaching technique really Awesome.
Please make a video lecture on
""" What if these Linear Regression Assumption get violated ? """
you can ask chatgpt, can’t you?
Wow. Fantastic explanation.
Hi! Thank you for a great statistic program and wonderful tutorials.
One question and one statement:
- Why are not two other important assumptions addressed, namely the problem with outliers and the requirement if independence of residuals?
- Maybe it should be better illuminated that normality refers to that it is the residuals that should be normally distributed about the predicted dependents variables sore. It can be misunderstood that it is the raw data that should be normally distributed.
Hello Per, thank you for your feedback! Yes that's right! Maybe we can make a video again to put that better there! Regards Hannah
@@datatab The assumption of independence of errors is actually very important. Your video is misleading people by not covering it.
Thank you ma'am for such a simple explanation it really helped me
Glad to hear that! Many thanks! Regards Hannah
Good video
Very nice explanation 😄🙌
Thanks!
Hi, thanks for the video. Regarding the second assumption (residuals must be normally distributed). Does the histogram represent the normal distribution of the residual, right? I didn't understand if the points in the qqplot are the residuals or the sample data
Hello thank you very much! Yes you are right! The captions are not correct, it is the residual in both cases! Regards, Hannah
@@datatab Thanks for the reply. It's very common to see people checking the normality condition on the sample data and not on the residuals. I suppose it's a mistake. Instead, other people say: "Ok, it's not a "real assumption", but it is preferable that features are normally distributed, not only the residuals". Is there any truth behind this statement?
@@retenim28 Hmm normally you main assumption is that the residuals are normally distributed! I can't answer that off the top of my head, but maybe the residuals are always normally distributed if all variables are normally distributed, but I don't know that for sure!
Nicely explained!!
Many Thanks : )
so wtf do i do if my data isnt linear? just show a graph saying its not linear therefore i havent bothered to run and stats and all these data and research is a waste of time ?
Nope. Then you identify what distribution your dataset is following and try to predict. There are number of other distributions apart from linear. It might be quadratic or logarithmically related.
It should be linear in the --Parameters--, not your data itself. For example, a quadratic y = A + Bx² is still linear. You just need to put x² in your x axis to show that. The parameters in this case are A and B. That's a common misconception not explained in the video. y = A + Bx + Cx² is also linear (it will be a multilinear regression in this case). Even y = A + B cos(x) is linear, but y = A + B cos(Cx) is not linear anymore. C is not linear in relation to y. In general, for a function to be linear for a regression, it should follow y = Af(x) + Bg(x) + Ch(x) +...
Very helpful. Thank you so much.
Thanks for your feedback!!! Cheers Hannah & Mathias
Thanks so much. What about the assumption: independence of the observations?
Hmm, I have not read about it yet, but could make sense! Maybe the result is then no longer normally distributed. With it the assumption of independent observations would then be included in the assumption of normally distributed error. Regards Hannah
Well explained!
Hey! Your videos are awesome! It would be great if you make more videos on Machine Learning concepts.
Many thanks! Yes we will try! Regards Hannah
Thanks so much.
Do you have some features open (free) for students (i.e. regression)?
Thank you ma'am, helped a lot
Thanks!!!
Thank you ma'am!
Thanks for the video, found it very helpful. Do we also have to ensure that there are no influential points in the data?
What do you mean by influential points? Personally, I haven't heard of influential points as a requirement, but I haven't looked that up in more detail either! Regards, Hannah and Mathias
@@datatab By influential point I mean an outlier that greatly affects the slope of the regression line. I was just wondering what the rule of thumb regarding them is when it comes to regression. But fair play if you're not sure if they are part of regression assumptions.
@@Gesuselsaviour Well, if the outliers are too large, then the error epsilon will probably no longer be normally distributed and thus the requirements are not met, but as is so often the case, there is unfortunately no limit value that says from there it still goes and from there no longer!
Thank you very much!
You are welcome!
Fantastic. Thanks a lot
Thanks for your Feedback!
soooo good! tysm
Loved your accent
🙂
Marvelous
Why is it the case that you should square the determinant in order to check for linearity? If there would be a logistic correlation, it would still be significant if you squared the determinant, right? Or not?
Thanks u so much
Many thanks for your Feedback!
Regards Hannah and mathias
Some segments in the video are stamped not adjacent to each other
Hello, what do you mean by that?
Kandungan anda sangat menyentuh
I'm in love
: )
Wow! Thank you :-)
Thanks for your Feedback! Regards, Hannah