Hello Ali. Excellent question, and yes, it is cheating a little. We definitely feel better about it if we have a real-world motivation for our choice of transformation. For example, if the dependent variable is thought to have a multiplicative effect on the response (a 1 unit increase in X corresponds to a c% increase in Y, say), then a logarithmic transformation would be appropriate. But in practice, we sometimes simply play around until we find a reasonable transformation…
Thank you very much for this video! It was really fun to watch, especially seeing how the model improved as various log/sqrt transformations were being done!
We are attempting to find a (preferably simple), reasonable model that fits the data well. If we are lucky, the assumptions of the simple linear regression model are met without a transformation. But often, the assumptions are not met on the original scale of measurement. (e.g. There may be curvature, increasing variance, non-normality.) We may be able to find an appropriate transformation that results in the assumptions of the simple linear regression model being met to a satisfactory degree.
Dont know why tf im paying 50k a year to study at a school with dogshit prof when there're people on the internet who can explain everything 10 times better
Hi, mr. JB. First of all, thank you for this series of lectures. It's a true pleasure to hear you explain concepts of statistics. I have some questions. How come you can use a transformation to fit your data better? It feels a bit like cheating to show correlation between X and Y. How does this translate in the real world?
omg thank you!!!! you may be the person who is going to save my stats grade because my professor can't even explain properly😖😖😖😖😖😖😖😖, all i got was change the axis variable=actually SWAPPING the axes 🤣🤣🤣🤣🤣
…It might be a little unsatisfying, but there are definitely far worse ways of cheating. At some point I'll add more video content on transformations. This video is a very basic introduction to the concept. Cheers.
I noticed you said we can transform either the dependent or independent variable. In case of multiple regression would it make sense or is it better to transform the independent variables than the dependent ?
Transform whatever is easier, and then rearrange the equation to put y on one side when you're done. If you square root y, then square the whole equation and you will have y.
Hi, again! Let's see if I get this straight. You can transform the data as you wish along as you invert the data back once you have inferred something about your model. This leaves the question: Why do you transform?, unanswered. Is it to easier interpret the data? Is it to more easily interpret the data? And/Or is it to get more certainty (e.g. a better Confidence interval/Prediction Interval) when you want to say something about your data? Maybe it's all of the above?
Can we say this transformation is akin to the kernel trick commonly used in SVM to make data linearly separable, as condition otherwise needed for linear regression ?
+jbstatistics if there is a model i.e. Y= a+ bX and i know R sq as well as R and i want to transform the model to Y^1/2= A + BX then how can i find A, B??
I hope my nosieness doesn't cause to much of a brain fuzz ;) Many thanks in advance. By the way, can you recommend a good book in econometrics. As I understand it, that is the topic that plunges into linear regression the most? //Ali
Hello Ali. Excellent question, and yes, it is cheating a little. We definitely feel better about it if we have a real-world motivation for our choice of transformation. For example, if the dependent variable is thought to have a multiplicative effect on the response (a 1 unit increase in X corresponds to a c% increase in Y, say), then a logarithmic transformation would be appropriate. But in practice, we sometimes simply play around until we find a reasonable transformation…
Thank you very much for this video! It was really fun to watch, especially seeing how the model improved as various log/sqrt transformations were being done!
Thank you, this explanation is great.
Very nice and helpful explanation. Many thanks for posting!
You are very welcome!
Very nice and helpful presentation
We are attempting to find a (preferably simple), reasonable model that fits the data well. If we are lucky, the assumptions of the simple linear regression model are met without a transformation. But often, the assumptions are not met on the original scale of measurement. (e.g. There may be curvature, increasing variance, non-normality.) We may be able to find an appropriate transformation that results in the assumptions of the simple linear regression model being met to a satisfactory degree.
Dont know why tf im paying 50k a year to study at a school with dogshit prof when there're people on the internet who can explain everything 10 times better
Because that 50k is a good motivator to get shit done. I tried the whole self learning thing. A lot of self discipline is needed.
Hi, mr. JB.
First of all, thank you for this series of lectures. It's a true pleasure to hear you explain concepts of statistics. I have some questions. How come you can use a transformation to fit your data better? It feels a bit like cheating to show correlation between X and Y. How does this translate in the real world?
omg thank you!!!! you may be the person who is going to save my stats grade because my professor can't even explain properly😖😖😖😖😖😖😖😖, all i got was change the axis variable=actually SWAPPING the axes 🤣🤣🤣🤣🤣
After the sqrt(hardness) transformation you mentioned there is still some non-constant variance. How can you fix this?
Many fields use regression extensively. I know little about econometrics, and would not feel qualified to recommend a book. Cheers.
…It might be a little unsatisfying, but there are definitely far worse ways of cheating. At some point I'll add more video content on transformations. This video is a very basic introduction to the concept. Cheers.
one of the best.
I noticed you said we can transform either the dependent or independent variable. In case of multiple regression would it make sense or is it better to transform the independent variables than the dependent ?
Transform whatever is easier, and then rearrange the equation to put y on one side when you're done. If you square root y, then square the whole equation and you will have y.
Hope this helps for your exam preparation lol
Hi, again!
Let's see if I get this straight. You can transform the data as you wish along as you invert the data back once you have inferred something about your model. This leaves the question: Why do you transform?, unanswered. Is it to easier interpret the data? Is it to more easily interpret the data? And/Or is it to get more certainty (e.g. a better Confidence interval/Prediction Interval) when you want to say something about your data? Maybe it's all of the above?
Can we say this transformation is akin to the kernel trick commonly used in SVM to make data linearly separable, as condition otherwise needed for linear regression ?
+jbstatistics if there is a model i.e. Y= a+ bX and i know R sq as well as R and i want to transform the model to Y^1/2= A + BX then how can i find A, B??
I hope my nosieness doesn't cause to much of a brain fuzz ;)
Many thanks in advance. By the way, can you recommend a good book in econometrics. As I understand it, that is the topic that plunges into linear regression the most?
//Ali
U da bes
Thanks!