Statistics with R (3) - Generalized, linear, and generalized least squares models (LM, GLM, GLS)

Поделиться
HTML-код
  • Опубликовано: 19 сен 2013
  • In this video, I show how how to implement linear models, generalized linear models and generalized least squares models in R. Using the "airquality" dataset, I show how to fit and interpret the models. The core of the session is the interpretation of partial slope coefficients in Poisson generalized linear models. Finally, I give an outlook on generalized additive models which will be covered in one of the next sessions.
    The R code used in this video is:
    airquality
    plot(Ozone~Wind,airquality)
    model1=lm(Ozone~Wind,airquality)
    plot(model1)
    coef(model1)
    (Intercept) Wind
    96.872895 -5.550923
    #predictions for Wind speeds of 19 and 20 mph:
    Ozone1=coef(model1)[1]+coef(model1)[2]*19
    Ozone2=coef(model1)[1]+coef(model1)[2]*20
    Ozone1
    Ozone2
    ##
    model2=glm(Ozone~Wind,airquality,family=poisson)
    coef(model2)
    (Intercept) Wind
    5.0795877 -0.1488753
    Ozone1.glm=exp(coef(model2)[1]+coef(model2)[2]*19)
    Ozone2.glm=exp(coef(model2)[1]+coef(model2)[2]*20)
    Ozone2.glm/Ozone1.glm
    0.8616765
    exp(coef(model2)[2]) #exp(-0.1488753 )
    #0.8616765
    ##
    library(nlme)
    model3=gls(Ozone~Wind,airquality)
    summary(airquality$Ozone)
    model3=gls(Ozone~Wind,airquality,na.action=na.exclude)
    head(airquality)
    airquality$Date=
    as.Date(paste(1973,airquality$Month,airquality$Day,sep="-"))
    library(lattice)
    xyplot(Ozone~Date,airquality)
    model4=gls(Ozone~Wind*Date,airquality,na.action=na.exclude)
    air2=subset(airquality,complete.cases(Ozone))
    model5=gls(Ozone~Wind*Date,air2)
    plot(ACF(model5,form=~Date),alpha=0.05)
    model6=update(model5,correlation=corAR1())
    library(MuMIn)
    AICc(model5,model6)
    df AICc
    model5 5 1099.40
    model6 6 1095.21
    summary(model6)

Комментарии • 95

  • @farhadwaseel9981
    @farhadwaseel9981 5 лет назад +2

    Thank You Dr, You have explained all fluently and smoothly...

  • @LuisGeorge
    @LuisGeorge 10 лет назад +2

    I think that this one of the best tutorial ever. Thanks a lot Herr Professor

  • @kirangadhave
    @kirangadhave 10 лет назад +2

    Great video! Thanks for posting this.

  • @ditke71
    @ditke71 10 лет назад +3

    You are an excellent teacher! thanks!

  • @angeld5093
    @angeld5093 7 лет назад +1

    Thank you so much for your tutorials! Very helpful

  • @angelamartin3884
    @angelamartin3884 4 года назад +1

    thanks a lot for your time and clear explanation is was priceless, I hope you can share more videos :) I'm looking forward to GLM part 2

  • @getinettafesetucho7999
    @getinettafesetucho7999 5 лет назад +2

    Honestly speaking, you are a wonderful teacher! You saved me a lot of effort, to be frank. Thank you!!

  • @latinhero1818
    @latinhero1818 4 года назад +1

    I wish I'd seen this video last month. Better late than never. Subscribed!

  • @melissad3390
    @melissad3390 8 лет назад +1

    you went on my favs, very good vid

  • @werzenable
    @werzenable 8 лет назад +2

    You are excellent! Keep up the good work!
    Greetings from Sweden!

  • @X_explorer
    @X_explorer 7 лет назад +1

    Cool video! I like observing nature, discover and decompress...

  • @jeffatennis
    @jeffatennis 6 лет назад +1

    terrific video...this is exactly what i needed!!

  • @gustavotenorio4201
    @gustavotenorio4201 8 лет назад +1

    Thank you very much! Not only for this, but also for the other videos! They helped me a lot! I think that would be a nice video if you use those really simple examples in simulations with Montecarlo...... that would be epic!

  • @getinettafesetucho7999
    @getinettafesetucho7999 5 лет назад +1

    Thank you for such a great job!

  • @binbinzhou6938
    @binbinzhou6938 6 лет назад +1

    Vielen Dank Christoph, Sie haben wirklich top erklärt! Mache weiter so!!!!!! ich bin big Fan von Ihnen!!!!

  • @user-ij7bq6rg9o
    @user-ij7bq6rg9o 9 лет назад +1

    very helpful, thank you so much for your nice tutorial.

  • @kvafsu225
    @kvafsu225 2 года назад

    Very good presentation. Thanks

  • @Ztube111
    @Ztube111 10 лет назад +1

    Wasn't aware of the As.Date object...great!

  • @hectortamez64
    @hectortamez64 8 лет назад

    Dude, i love you. I really do. Thank you very much.

  • @terrencevance6799
    @terrencevance6799 10 лет назад

    Excellent video, thank you

  • @Mazoke8989
    @Mazoke8989 8 лет назад +1

    THANK YOU VERY MUCH!

  • @AdrianaHernandez-bt7ys
    @AdrianaHernandez-bt7ys 7 лет назад +1

    Super didactic, thank you very much!

  • @HG1921
    @HG1921 10 лет назад +2

    Thank you so much. A nice, clear and excellent lecture on gls and lm. I have used and applied the lesson learned on my analysis. It came out wonderful!!
    I hope you may add some example videos on linear mixed models with lmer function from the lme4 package. I am sure you would give us a wonderful understanding how the Random effect structure would remove/minimize or handle spatial/temporal variations or differences as the gls does in the case of spatially/temporally correlated residual errors.

    • @ChristophScherber
      @ChristophScherber  10 лет назад +2

      Hi John,
      Thanks very much for your suggestion! I just added a first video on lme models. More complex models will be covered later.
      Best regards,
      Christoph

  • @moth11110
    @moth11110 9 лет назад +1

    Thank you :)

  • @RyanBenkeser
    @RyanBenkeser 9 лет назад +1

    Thank you very much

  • @songbyungsub
    @songbyungsub 9 лет назад +1

    thank you!

  • @weltallaaf
    @weltallaaf 9 лет назад +2

    Thanks for your interesting and helpful viedo.
    Maybe you could use typical R conventions for your code, to keep it a little more readable?

  • @rasmusstokkeland
    @rasmusstokkeland 10 лет назад +6

    If you could add chapters (in the video) so you could jump strait to e.g: GLM that would be great. Saves time when you just want to see GLM.
    Great video by the way!!!

    • @ChristophScherber
      @ChristophScherber  10 лет назад

      Thanks! I´ll try to implement this in the next videos.CheersChristoph

  • @25lobo
    @25lobo 4 года назад

    A quite helpfull video.

  • @Freidepfer
    @Freidepfer 10 лет назад +3

    Do you provide any tutorial where you take more predictors into consideration?
    E.g. working with 6-7 predictors which are influencing one "predicted" variable?
    Besides that:
    You really make great videos. Very explanatory and understandable (Like it that you also take paper and pen!)

  • @mdabdulhalim4422
    @mdabdulhalim4422 9 лет назад +1

    Great!

  • @getinettafesetucho7999
    @getinettafesetucho7999 5 лет назад

    Thank you!!!

  • @hamzanamouchi6380
    @hamzanamouchi6380 6 лет назад +1

    Thank you very much :)

  • @Ztube111
    @Ztube111 10 лет назад +1

    Any plans to model MLE / likelihood ratio models? Nice work!

  • @eceviteyduran5628
    @eceviteyduran5628 4 года назад

    Many thanks for your invaluable information and explanation about the regression modeling. Could you please provide a bit more information for us.
    Best Regards

  • @TheAmibusy
    @TheAmibusy 10 лет назад +1

    I code following in video but I can not get ACF plot . How to get it?

  • @reynoldokudzeto5129
    @reynoldokudzeto5129 8 лет назад

    Christoph, can please add more lectures on identifying non linearity when building a model and how to improve the models? Also do you have videos on DOEs and latent variable analysis (PCA etc)

  • @mikapylvanainen8169
    @mikapylvanainen8169 9 лет назад

    Thanks Christoph for great and understandable tutorial.
    Do you have a tutorial that gives guidance how to handle multicollinearity between explanatory variables in regression analysis?
    If I have understood right, VIF (Variance Inflation Factor) is used as an indicator of multicollinearity and it is somehow utilized
    also in R when handling milticollinearity in regression analysis.
    Thanks in advance for your advice.
    br. Mika

  • @kendramillard564
    @kendramillard564 8 лет назад +1

    Many thanks for the super useful tutorial. What did not like so much, that it could be misunderstood that the simple use of glm() instead of lm() already results in the better model, because you spent plenty of time explaining the issue of how the glm uses exponential functions. However, the superiority of the glm-fit is because of the family selection, as the output of lm() is equivalent to glm(,family="gaussian"). Which is were i stumbled. Because i have mathematically no real idea of what the link functions are actually doing. I just know how and when to apply those families. A tutorial on link functions would be really great :)

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      +Kendra Millard Thanks for these comments, I´ll provide more details on GLMs in new videos. Best regards, Christoph

  • @GoldenspotDOTshop
    @GoldenspotDOTshop 6 лет назад

    Nice videos. I find a better approximation formula for lnv! which is used in universities.
    The formula they use is lnv! ≈ vlnv - v(If i remeber well). My semi-empiric formula is lnv! ≈ (v+1/2)lnv - v +1,08

  • @anettehlleland2269
    @anettehlleland2269 7 лет назад

    This is very helpful :)
    I have one question about the video, I've recently started using R, and dose not understand when you put "as.Date(paste(1973,airquality$Month,airquality$Day,sep="-"))" and said "depending on your system", how do i know what system I use?
    Thank you!

  • @Cupania
    @Cupania 10 лет назад +1

    Good, video. How would you then graph the GLM? Can you follow-up with a video showing this? Thank you!!!

    • @duckwantbreads
      @duckwantbreads 8 лет назад

      One easy way to add a Poisson glm would be to make a vector
      y=1:20
      (all the integers from 1:20) and then write
      predict1=exp(coef(model2)[1]+y*coef(model2)[2]
      and then write
      lines(predict1)
      to add the glm to your plot (you should get a curve if you've done it right). There's probably a better way but this is how I do it.

  • @petraschubert8220
    @petraschubert8220 6 лет назад +1

    Which package I need to install on R-Studio for Mac to run the GLS-models? I just cannot find the right one! These tutorials are great thank you a lot. You got some didactical talent!

  • @kamrantaherkhani2066
    @kamrantaherkhani2066 5 лет назад +1

    Hi, why the errors for the glm model is not e^error? ok I saw later but it was not there when the model was written, thanks

  • @prasunbhattacharjee8415
    @prasunbhattacharjee8415 8 лет назад

    is the model given by GLM linear?
    In lm model it was shown with plot in earlier video. Can we visualize something like that?
    Also your presentation is very nice.

  • @kendramillard564
    @kendramillard564 4 года назад

    This means, that once i introduce the wind*time interaction and correct for the autocorrelation, gls does not find a significant dependance between ozone and neither predictor, judged from the p-values given in summary(model6). Is this correct?

  • @fazlihaleem6603
    @fazlihaleem6603 6 лет назад

    I am running the following gls regression g.lm

  • @kosterix123
    @kosterix123 10 лет назад

    yt viewers have the pause button available, so i don't need to see your typing errors. The video can be compressed. Good tutorial anyway, keep it going!

  • @Jjzein
    @Jjzein 9 лет назад

    is there any problem if we save R console in work space?

  • @LordFrega
    @LordFrega 4 года назад

    Just one question: why did you choose the poisson family when using the glm model?

  • @mbanefomadukaife5958
    @mbanefomadukaife5958 8 лет назад

    Thank you so much, please how can i generate several samples from a multivariate normal population and compute the sample Mahalanobis squared distance for each of them using R?

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      +Mbanefo Madukaife you use the mvrnorm function for this

    • @mbanefomadukaife5958
      @mbanefomadukaife5958 8 лет назад

      +Christoph Scherber Thanks for much. Any existing code for that which can be run in the mvrnorm?

  • @honey1761990
    @honey1761990 10 лет назад +1

    Hi, Thanks for such a nice tutorial :)
    I wish to ask one question related to Model 2 (Poisson). For example if we have two explanatory variables: one numerical and other categorical, then how we can estimate the value of Ozone?
    Ozone ~ Wind+Time # Time can be 'Day' or 'Night'
    Let's assume we get coefficients of Intercept as 2.3, Wind as 0.0065 and Time as 0.56.
    We wish to calculate for: Wind=200 & Time=Day. so, equation will be:
    Ozone_val=exp(coef(mod)[1]+coef(mod)[2]*0.0065+coef(mod)[3]* ...??
    Should we multiply by 0 or 1 with coef(mod)[3]? What should we should choose?
    Thanks & Best regards,
    Chaitanya

    • @ChristophScherber
      @ChristophScherber  10 лет назад +1

      This is usually a bit more complex to understand. If you have Wind+Time, separated by a "plus" sign, then you get the following estimates in the model summary:- The intercept for Wind=0 and Time="Day"- The difference in intercepts between "Day" and "Night"- the common slope for WindIt´s easiest to find out about these things using the "effects" package:require(effects)plot(allEffects(model)))Cheers,Christoph

    • @Pata_Negra
      @Pata_Negra 4 года назад

      @@ChristophScherber Very interesting package Christoph, thanks a lot for your video AND your comments :)
      You're an awesome teacher !

  • @tawilk
    @tawilk 9 лет назад

    at 16:40 roughly, wouldn't it just be e^b where b happens to be negative in this case since we see a decreasing/negative relation b/w wind and ozone.

    • @ChristophScherber
      @ChristophScherber  9 лет назад

      +tawilk In a generalized linear model, all coefficients are on the scale of the linear predictor. Thus, you transform:
      I: ln y= a + bx
      II: y = exp (a+bx) = exp(a) * exp (bx)
      a one-unit change in x will thus lead to a (exp b)-fold change in y.

  • @NeezaRamiah
    @NeezaRamiah 7 лет назад

    Do you know how to do a GLM on statistica?

  • @jeffatennis
    @jeffatennis 6 лет назад

    should the windspeed and ozone measurements be normalized prior to evaluating models?

    • @ChristophScherber
      @ChristophScherber  5 лет назад

      indeed it can be a good idea to scale variables to remove collinearity. But of course often the original scale of variables is interesting

  • @askanalytics7097
    @askanalytics7097 8 лет назад

    I had a question regarding Poisson Model that you performed on Ozone~Wind; While you used the model to control the negative predicted values of Y variable, the assumptions of Poisson model are not considered. The ozone content is not a discrete data which follows a poisson Distribution.

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      +Ask Analytics You are right that actually the variable is likely negative binomially distributed - as can be seen e.g. from
      library(fitdistrplus)
      plot(fitdist(as.numeric(na.exclude(airquality$Ozone)),"nbinom"))
      While it is true that the ozone concentration in itself is not Poisson distributed, the generating measurement process clearly is a poisson process.
      However, for the sake of simplicity, we model it as a Poisson glm here. Further details on negative binomial models (and how they may be used to account for overdispersion) will be added later.
      Thank you very much for your comment!

  • @arnabroy7801
    @arnabroy7801 8 лет назад

    Hi Christoph Scherber, Your videos are really helpful. Thanks a ton for the same. However I am facing a problem.I am unable to install the package MuMin for using AICc function. I am using R studio and R 3.1.2. it will be of great help if you can guide me to resolve the problem. Thanks in advance......

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      +Arnab Roy You may wish to write to the package maintainer if the problem is reproducible. MuMIn does interfer sometimes with other attached packages. Be sure to have the latest versions of R, RStudio and MuMIn. Best wishes, Christoph

  • @jonathanbruner3248
    @jonathanbruner3248 5 лет назад

    What data set did you use, and where can I find it?

    • @ChristophScherber
      @ChristophScherber  5 лет назад

      usually the datasets are loaded by using the data() command (the datasets are part of the base installation of R)

  • @liuyang2493
    @liuyang2493 7 лет назад +1

    Hi professor Scherber, thank you! I really enjoyed your video. Can you further show us how to implement negative binomial model if poisson model shows overdispersion?

  • @robertmarra883
    @robertmarra883 8 лет назад

    Dear Professor Scherber: Your videos are excellent! However, I'm wondering if you can point me in the right direction. I'm looking for a statistical method to compare modeled data against actual measured data, with the goal of calibrating the model. This isn't the place to go into the experimental details, but in brief, I'm trying to calibrate and validate a method my colleagues and I are developing for nondestructively estimating the amount of C stored in living trees. We used this method on a large sample of trees in a research forest, and want to compare estimates from our model to actual measurements obtained by felling the trees. Regressing the model estimates on the actual measurements produced a high correlation coefficient (~96%), a non-zero intercept less than 10% of the smallest model value, and a slope of nearly 1, and while this is nice to see, it doesn't really address the question of agreement between model estimate and measurement. Do you have any suggestions on how we might do this? One idea I have is to regress y-x against x, and then compare (how?) this against a y=x line.... but how to do this in R? I'm asking a lot...hope you can help! Thanks, Bob

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      Dear Rob, I think a regression of the predicted y (from linear models) vs. the observed y (from felled trees) sounds reasonable; you could then test if the slope differs significantly from 1.

    • @robertmarra883
      @robertmarra883 8 лет назад

      Thanks! Since I wrote you, this is exactly what I've tried, and it's quite interesting. I also calculated prediction intervals on the regression of the observed values on the predicted values, to see how well the latter can predict the former. I also used the Bland-Altman Limits of Agreement analysis.

  • @memoonaawan1326
    @memoonaawan1326 8 лет назад

    Hy can you please help me for L-moments in r????

    • @ChristophScherber
      @ChristophScherber  8 лет назад

      Thanks, unfortunately I don´t have experience with L-moments. What do you need it for?

  • @MrSimwawa
    @MrSimwawa 9 лет назад

    I might be wrong but it seem to me that if
    log(y) = a + bx + error
    than, y = exp(a)*exp(bx)*exp(error)
    as opposed to y = exp(a)*exp(bx) + exp(error)
    Is there a reason why the error would be additive as you say in the video, and not multiplicative?

    • @ChristophScherber
      @ChristophScherber  9 лет назад +3

      simon venne The reason is that in a GLM the errors (e) are modeled separately from the linear predictor (a+bx). Mathematically, you are right; but the trick is that a GLM allows you to specify link function (for linear predictor) and variance functions (for errors) separately. Your equations are correct for a model where the response variable is transformed. In the case of a generalized linear model, things are a bit different, though.

  • @melissad3390
    @melissad3390 8 лет назад

    and you remind me of Stewie (the baby in Family Guy) which is awesome. :)

  • @pandumanne2518
    @pandumanne2518 6 лет назад

    were GAM and GLM are same??

  • @Miodiarioditrading200EMA
    @Miodiarioditrading200EMA 10 лет назад

    hallo Professor!, Ich verstehe nicht, warum eine Variable ist besser als der andere. Haben Sie jedes Video, wo ich weiter erklären?
    Entschuldigen Sie mein schlechtes Deutsch, ich bin Argentinien.
    Grüße

  • @brennanbeal2815
    @brennanbeal2815 4 года назад

    Sorry, I believe your interpretations are a bit... misleading. At 16:33, you write that "In a glm, an individual slope gives an estimate of the multiplicative change in the response variable.... ". This is not always the case. This is the case for the Poisson distribution because the default link is a log link. But, you could easily run a glm with a Gaussian distribution and identity link, which would have an additive interpretation. Further, it is best practices to indicate your link, even when the default is intended (learned that the hard way :) ).

  • @Exit42A
    @Exit42A 10 лет назад +1

    I thought this was a very good video, but you kind of hit the afterburners at the 2/3 mark.

    • @ChristophScherber
      @ChristophScherber  10 лет назад

      Hi Exit, I´d be glad to hear how this video can be improved. Cheers!

    • @ayoadeyemo5271
      @ayoadeyemo5271 9 лет назад

      Christoph Scherber Thank you very much for this video....it was really useful for me

  • @TheAmibusy
    @TheAmibusy 10 лет назад

    I code following in video but I can not get ACF plot . How to get it?

    • @ChristophScherber
      @ChristophScherber  10 лет назад

      Where exactly do you get an error message?

    • @ChristophScherber
      @ChristophScherber  9 лет назад

      ah, you need to replace the minus sign with a tilda sign (~).
      Happy new year to you!