Multiple Linear Regression with R | 2. Data Preparation
HTML-код
- Опубликовано: 26 апр 2020
- Multiple Linear Regression with R
R and data files: github.com/bkrai/R-files-from...
Previous video: Introductory Concepts
Next video: Model
Time-Series videos: goo.gl/FLztxt
Machine Learning videos: goo.gl/WHHqWP
Becoming Data Scientist: goo.gl/JWyyQc
Introductory R Videos: goo.gl/NZ55SJ
Deep Learning with TensorFlow: goo.gl/5VtSuC
Image Analysis & Classification: goo.gl/Md3fMi
Text mining: goo.gl/7FJGmd
Data Visualization: goo.gl/Q7Q2A8
Playlist: goo.gl/iwbhnE
R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
I got to know something more about data cleansing, thank you sir!!
Welcome!
Thanks for your work!
Welcome!
Great job! Thanks!
Welcome!
Great work. Congratulations.
Thank you! Cheers!
I have learnt a lot from your videos. Thank you
You are welcome!
Thankyou Dr Rai
Welcome!
Hi,
Many thanks, very clear as usual.
I have one suggestion about replacing by the mean:
vehicle$lh[vehicle$lh==0]
That's even better!
Thanks a lots Dr, you have made R language very easy to read for me. I have question D. I analysed income data which showed positive skew then I didn't apply normally distributed to fit the data. So what can I do to fit the distribution because I want to have interval estimate of the population income using sample data. I think exponential distribution is suitable to fit the data but how to fit using R?
Thanks Dr
You can try log transformation. I would also suggest try this:
ruclips.net/video/_3xMSbIde2I/видео.html
Thanks Sir
Welcome
thank u sir .plz put videos for multiclassfication
You can refer to this:
ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG
Thank you very much but really need your assistance
Let me know your question.
Sir, your videos have been very helpful for self-learning R. Always very clear. Thank you so much!
Could you please tell whether there is a method to analyze and interpret how well our model works with testing data? Can we compare the means of the outcome derived from the model, with original outcome data in the testing data, using t-test?
You can make a plot of actual and predicted values with test data. And obtain R-sq.
@@bkrai Thank you very much, Sir!
Is there a cut-off of R-sq value which is required to have a good agreement? I have read "R-sq value
Instead of cutoff, you can use it as a benchmark. Let's say you run a model and get R-sq 0.65. And then you make changes to the model and get r-sq of 0.74. So now you will know that changes to the model are yielding positive outcome.
@@bkrai Thank you very much Sir....!
Welcome!
Great work Dr. Do you mind putting together some videos on how to analyze Liberty scale data in R? Thanks in advance
I meant Likert scale data, sorry about that
Great suggestion! I've added it to my list.
Thank you for these videos, I really benefit from them. Can I ask a question? I was going through an example on kaggle and the author used the dummyVars function. Do you think you can explain how it works when applied to a dataset? Again I really appreciate these lessons
Thanks! Do you remember what method they were using?
@@bkrai I'm sorry I should've included the link to the sample code in my initial question: www.kaggle.com/virosky/the-only-way-to-handle-missing-values/notebook
I am not too sure what the function does when applied to a dataframe as done in the example I am referring too. The piece of code using the dummyVars function is towards the end of the "exploratory data analysis" section after opening the link I provided. Thank you for the reply.
They have used xgboost. It is one of the must know methods in top 10 link below:
ruclips.net/p/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O
@@bkrai Thank you very much Sir
Sir, In Multiple Regression Model, Do we have to consider only the significant independent variables and then do other tests like BP, DW,ad.test, BG,VIF etc for the Linear model to be good or we need to include all the variable both significant and insignificant variables for the further process?
Please help.❤️
I would suggest check for multicollinearity before removing non-significant variables.
@@bkrai Thank you so much ❤️.
Can we do the same procedure of multiple linear regression for timeseries data to find the factors affecting a dependent variable. I have converted the whole raw data into its differences I.ie. Present value minus past. I have done this to remove autocorrelation that occur in time series. Now model variables will be
Change in production - dependent variable
Change in rainfall - independent variable
Change in temperature- independent variable
Change in area - independent variable
For 17 years. Ami following right track. I have followed the same way in R for creating the multiple regression model for cross sectional data
You need time-series with regressors:
ruclips.net/p/PL34t5iLfZdduRvHafEKM6vrDmfnlUfzAy
Can We get R files Sir!
Added a link in the description.
Whoever read this you'll be successful one day, let's help grow this channel together for the future🤑❤
Thanks for your comments!