There are so many videos explaining the assumptions of linear regression but no one was explaining how to do it............I was searching for this from last 3months,,, thank you sir..... Thank you 🙏
Heyy Aman, I am your subscriber from past 1.5 year and I feel honoured to tell you, after following you I finally got a job transition as a senior data scientist at an MNC 6 month back. Now I have understood the datascience project ecosystem in my company. You are one of the contributors for my success. Thanks a Ton!!!!! Also I would like to open my hands for helping learners. So learners you can tag me asking any doubts. I would be more than happy helping you.
You're the best bro! I understood this explanation of yours more than I understood anybody else's. I've also saved a copy of the notebook (from Google Drive) and imported it into my DataSpell IDE so I can easily refer to it whenever I want to check assumptions - I've heard that memorising syntax is not important as long as one understands the logic :). Much love from Nigeria. Subscribed and liked. Cheers!
The title of your Channel could be " unfolding the untold data science" . Aman ji you reach and teach what no one dare to teach or explain . Amazing job!
Can you please tell me if we have large data means we have more than 30 columns in our data so linear regression will be good for training that data or not?
Hi Aman, if we have encoded any categorical variables to numerical variables by count or frequency, or by onehotencoding or by top categories by any mean so that columns also need to be converted to Gaussian distribution form ?
Great video! I wanted to ask, in order to find whether there is linearity in the first 4 scatterplots, shouldn't one plot the line of best fit? Also, regardless of whether that's true or not, how would one plot the line of best fit? "a, b = np.polyfit(X,Y,1) plt.plot(X,a*X+b)" doesn't seem to do anything and obviously im doing something wrong I'm just not sure what
Sir, does it mean relation between predictor variable and target variable isn't linear means it does non follow Normality condition(Non normally distributed) and vice versa?
No, it does not mean it. Neither way. It's quite possible that a variable is not normally distributed however has a linear relationship with target variable and vice versa.
But after we create a OLS regression model only we get the residuals and residuals normal distribution checks right? But we are saying normality of residuals should be confirmed before making the regression model. How the both cases satisfies each other?
One question though not related to this video. At what point we do train - test split. BEFORE preprocessing like normalization, imputation etc or AFTER preprocessing?
There are so many videos explaining the assumptions of linear regression but no one was explaining how to do it............I was searching for this from last 3months,,, thank you sir.....
Thank you 🙏
Glad you found it helpful Gowtham😊. Please share with others as well who could be benefited.
@Unfold Data Science
Heyy Aman, I am your subscriber from past 1.5 year and I feel honoured to tell you, after following you I finally got a job transition as a senior data scientist at an MNC 6 month back. Now I have understood the datascience project ecosystem in my company. You are one of the contributors for my success.
Thanks a Ton!!!!!
Also I would like to open my hands for helping learners. So learners you can tag me asking any doubts. I would be more than happy helping you.
Thanks Nikhil, your comments are precious.
@@UnfoldDataScience Thank you Aman!!
Join here for free to share your experience live with me
www.lighthall.co/class/6fdae050-85a4-48c4-9ecc-8dd2fa6ff175
You're the best bro! I understood this explanation of yours more than I understood anybody else's. I've also saved a copy of the notebook (from Google Drive) and imported it into my DataSpell IDE so I can easily refer to it whenever I want to check assumptions - I've heard that memorising syntax is not important as long as one understands the logic :). Much love from Nigeria. Subscribed and liked. Cheers!
Thanks, Goriola. Your words mean a lot to me. Keep learning.
among all uTubers, i would rank this guy #1 for insightfulness. what a gift of teaching!!!
THanks a lot. pls share channel with friends.
Well explained. I was very useful. Pls continue uploading lectures.
Thank you Mayank, I will
The title of your Channel could be " unfolding the untold data science" . Aman ji you reach and teach what no one dare to teach or explain . Amazing job!
Thanks shekhar your comments mean a lot.
Clean !!!
Thanks Asit
Can you please tell me if we have large data means we have more than 30 columns in our data so linear regression will be good for training that data or not?
Very intuitive video.
Thank you 🙂
When do we need to check these assumptions? After we do the Train Test Split and prediction, or before the start of Train Test Split?
you are excellent always sir
Thanks for your positive feedback. Please share with others as well who could be benefited from such content.
How can we make a Residuals vs. Leverage plot which displays the Cook's distance?
Hi Aman, if we have encoded any categorical variables to numerical variables by count or frequency, or by onehotencoding or by top categories by any mean so that columns also need to be converted to Gaussian distribution form ?
Good question. No, not needed because gaussian is primarily defined for continious only.
Great video! I wanted to ask, in order to find whether there is linearity in the first 4 scatterplots, shouldn't one plot the line of best fit? Also, regardless of whether that's true or not, how would one plot the line of best fit?
"a, b = np.polyfit(X,Y,1)
plt.plot(X,a*X+b)" doesn't seem to do anything and obviously im doing something wrong I'm just not sure what
Sir, does it mean relation between predictor variable and target variable isn't linear means it does non follow Normality condition(Non normally distributed) and vice versa?
No, it does not mean it. Neither way. It's quite possible that a variable is not normally distributed however has a linear relationship with target variable and vice versa.
Actually what should be normal in linear regression? The training data(target variables) or the residuals only?
Both ideally
But after we create a OLS regression model only we get the residuals and residuals normal distribution checks right?
But we are saying normality of residuals should be confirmed before making the regression model.
How the both cases satisfies each other?
Can you please explain about errors and residuals, I am not able to get the concept clearly in websites
error is the difference between actual and predicted.
One question though not related to this video. At what point we do train - test split. BEFORE preprocessing like normalization, imputation etc or AFTER preprocessing?
Before, do the same preprocessing for both sets
@@UnfoldDataScience ok. Thanks for reply.
Hi Aman can you Please advise us on "How to keep the relations with different Data Science Managers long lasting on Linked In?
Normal human relationship things nothing much.