Multiple Linear Regression with R | 2. Data Preparation

Поделиться
HTML-код
  • Опубликовано: 26 апр 2020
  • Multiple Linear Regression with R
    R and data files: github.com/bkrai/R-files-from...
    Previous video: Introductory Concepts
    Next video: Model
    Time-Series videos: goo.gl/FLztxt
    Machine Learning videos: goo.gl/WHHqWP
    Becoming Data Scientist: goo.gl/JWyyQc
    Introductory R Videos: goo.gl/NZ55SJ
    Deep Learning with TensorFlow: goo.gl/5VtSuC
    Image Analysis & Classification: goo.gl/Md3fMi
    Text mining: goo.gl/7FJGmd
    Data Visualization: goo.gl/Q7Q2A8
    Playlist: goo.gl/iwbhnE
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Комментарии • 45

  • @ArpitSingh-dz7gt
    @ArpitSingh-dz7gt 4 года назад +1

    I got to know something more about data cleansing, thank you sir!!

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @84quaker
    @84quaker 4 года назад +1

    Thanks for your work!

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @geetakhatri6049
    @geetakhatri6049 4 года назад +1

    Great job! Thanks!

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @smsm314
    @smsm314 4 года назад +1

    Great work. Congratulations.

    • @bkrai
      @bkrai  4 года назад

      Thank you! Cheers!

  • @faizahmedmughal9550
    @faizahmedmughal9550 4 года назад +1

    I have learnt a lot from your videos. Thank you

    • @bkrai
      @bkrai  4 года назад

      You are welcome!

  • @stelluspereira
    @stelluspereira 4 года назад +1

    Thankyou Dr Rai

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @nicolassimon9340
    @nicolassimon9340 4 года назад +1

    Hi,
    Many thanks, very clear as usual.
    I have one suggestion about replacing by the mean:
    vehicle$lh[vehicle$lh==0]

    • @bkrai
      @bkrai  4 года назад

      That's even better!

  • @mrishojohn7551
    @mrishojohn7551 4 года назад +1

    Thanks a lots Dr, you have made R language very easy to read for me. I have question D. I analysed income data which showed positive skew then I didn't apply normally distributed to fit the data. So what can I do to fit the distribution because I want to have interval estimate of the population income using sample data. I think exponential distribution is suitable to fit the data but how to fit using R?
    Thanks Dr

    • @bkrai
      @bkrai  4 года назад

      You can try log transformation. I would also suggest try this:
      ruclips.net/video/_3xMSbIde2I/видео.html

  • @irsadahamad4760
    @irsadahamad4760 4 года назад +1

    Thanks Sir

    • @bkrai
      @bkrai  4 года назад

      Welcome

  • @mohamedabdullah9061
    @mohamedabdullah9061 4 года назад +1

    thank u sir .plz put videos for multiclassfication

    • @bkrai
      @bkrai  4 года назад +1

      You can refer to this:
      ruclips.net/p/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG

  • @emmanuelobeng4931
    @emmanuelobeng4931 4 года назад +1

    Thank you very much but really need your assistance

    • @bkrai
      @bkrai  4 года назад

      Let me know your question.

  • @pshani3512
    @pshani3512 4 года назад +1

    Sir, your videos have been very helpful for self-learning R. Always very clear. Thank you so much!
    Could you please tell whether there is a method to analyze and interpret how well our model works with testing data? Can we compare the means of the outcome derived from the model, with original outcome data in the testing data, using t-test?

    • @bkrai
      @bkrai  4 года назад +1

      You can make a plot of actual and predicted values with test data. And obtain R-sq.

    • @pshani3512
      @pshani3512 4 года назад +1

      @@bkrai Thank you very much, Sir!
      Is there a cut-off of R-sq value which is required to have a good agreement? I have read "R-sq value

    • @bkrai
      @bkrai  4 года назад +1

      Instead of cutoff, you can use it as a benchmark. Let's say you run a model and get R-sq 0.65. And then you make changes to the model and get r-sq of 0.74. So now you will know that changes to the model are yielding positive outcome.

    • @pshani3512
      @pshani3512 4 года назад +1

      @@bkrai Thank you very much Sir....!

    • @bkrai
      @bkrai  4 года назад

      Welcome!

  • @techpriest3931
    @techpriest3931 4 года назад +1

    Great work Dr. Do you mind putting together some videos on how to analyze Liberty scale data in R? Thanks in advance

    • @techpriest3931
      @techpriest3931 4 года назад +1

      I meant Likert scale data, sorry about that

    • @bkrai
      @bkrai  4 года назад +1

      Great suggestion! I've added it to my list.

  • @nishadseeraj7034
    @nishadseeraj7034 4 года назад +1

    Thank you for these videos, I really benefit from them. Can I ask a question? I was going through an example on kaggle and the author used the dummyVars function. Do you think you can explain how it works when applied to a dataset? Again I really appreciate these lessons

    • @bkrai
      @bkrai  4 года назад

      Thanks! Do you remember what method they were using?

    • @nishadseeraj7034
      @nishadseeraj7034 4 года назад +1

      @@bkrai I'm sorry I should've included the link to the sample code in my initial question: www.kaggle.com/virosky/the-only-way-to-handle-missing-values/notebook
      I am not too sure what the function does when applied to a dataframe as done in the example I am referring too. The piece of code using the dummyVars function is towards the end of the "exploratory data analysis" section after opening the link I provided. Thank you for the reply.

    • @bkrai
      @bkrai  4 года назад +1

      They have used xgboost. It is one of the must know methods in top 10 link below:
      ruclips.net/p/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O

    • @nishadseeraj7034
      @nishadseeraj7034 4 года назад

      @@bkrai Thank you very much Sir

  • @shuvhamdigitalacademy3228
    @shuvhamdigitalacademy3228 3 года назад +1

    Sir, In Multiple Regression Model, Do we have to consider only the significant independent variables and then do other tests like BP, DW,ad.test, BG,VIF etc for the Linear model to be good or we need to include all the variable both significant and insignificant variables for the further process?
    Please help.❤️

    • @bkrai
      @bkrai  3 года назад +1

      I would suggest check for multicollinearity before removing non-significant variables.

    • @shuvhamdigitalacademy3228
      @shuvhamdigitalacademy3228 3 года назад +1

      @@bkrai Thank you so much ❤️.

  • @shinerajukappil6295
    @shinerajukappil6295 4 года назад +1

    Can we do the same procedure of multiple linear regression for timeseries data to find the factors affecting a dependent variable. I have converted the whole raw data into its differences I.ie. Present value minus past. I have done this to remove autocorrelation that occur in time series. Now model variables will be
    Change in production - dependent variable
    Change in rainfall - independent variable
    Change in temperature- independent variable
    Change in area - independent variable
    For 17 years. Ami following right track. I have followed the same way in R for creating the multiple regression model for cross sectional data

    • @bkrai
      @bkrai  4 года назад

      You need time-series with regressors:
      ruclips.net/p/PL34t5iLfZdduRvHafEKM6vrDmfnlUfzAy

  • @lalithalalitha3178
    @lalithalalitha3178 4 года назад +1

    Can We get R files Sir!

    • @bkrai
      @bkrai  4 года назад

      Added a link in the description.

  • @ebentee
    @ebentee 4 года назад +1

    Whoever read this you'll be successful one day, let's help grow this channel together for the future🤑❤

    • @bkrai
      @bkrai  4 года назад

      Thanks for your comments!