| gamm4 {gamm4} | R Documentation |
Fits the specified generalized additive mixed model (GAMM) to
data, by a call to lmer in the normal errors identity link case, or by
a call to glmer otherwise. Smoothness selection is by REML in the gaussian
additive case and ML otherwise.
gamm4 is based on gamm from package mgcv, but uses lme4 rather than
nlme as the underlying fitting engine via a trick due to Fabian Scheipl.
gamm4 is more robust numerically than gamm, and by avoiding PQL gives better
performance for binary and low mean count data. Its main disadvantage is that it can only handle single
penalty smooths (i.e. not tensor product or adaptive) and there is
no facilty for nlme style correlation structures.
For fitting generalized additive models without random effects, gamm4 is much slower
than gam and has slightly worse MSE performance than gam
with REML smoothness selection.
To use this function effectively it helps to be quite familiar with the use of
gam and lmer.
gamm4(formula,random=NULL,family=gaussian(),data=list(),
subset=NULL,na.action,knots=NULL,...)
formula |
A GAM formula (see also formula.gam and gam.models).
This is like the formula for a glm except that smooth terms (s but not te)
can be added to the right hand side of the formula. Note that ids for smooths and fixed smoothing
parameters are not supported. |
random |
An optional formula specifying the random effects structure in lmer style.
See example below. |
family |
A family as used in a call to glm or gam. |
data |
A data frame or list containing the model response variable and
covariates required by the formula. By default the variables are taken
from environment(formula), typically the environment from
which gamm4 is called. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain `NA's. The default is set by the `na.action' setting of `options', and is `na.fail' if that is unset. The ``factory-fresh'' default is `na.omit'. |
knots |
this is an optional list containing user specified knot values to be used for basis construction. Different terms can use different numbers of knots, unless they share a covariate. |
... |
further arguments for passing on e.g. to lmer |
A generalized additive mixed model is a generalized linear mixed model in which the linear predictor
depends linearly on unknown smooth functions of some of the covariates (`smooths' for short). gamm4 follows the approach taken
by package mgcv and represents the smooths using penalized regression spline type smoothers, of
moderate rank. For estimation purposes the penalized component of each smooth is treated as a random effect term,
while the unpenalized component is treated as fixed. The wiggliness penalty matrix for the smooth is in effect the
precision matrix when the smooth is treated as a random effect. Estimating the degree of smoothness of the term
amounts to estimating the variance parameter for the term.
gamm4 uses the same reparameterization trick employed by gamm to allow any single quadratic
penalty smoother to be used (see Wood, 2004, or 2006 for details). Given the reparameterization then Fabian Scheipl's
trick for getting lmer to fit a GAMM can be employed (see package amer). Estimation is by
Maximum Likelihood in the generalized case, and REML in the gaussian additive model case. gamm4 allows
the random effects specifiable with lmer to be combined with any number of any of the (single penalty) smooth
terms available in gam from package mgcv. Note that the model comparison on the basis of the (Laplace
approximate) log likelihood is possible with GAMMs fitted by gamm4.
As in gamm the smooth estimates are assumed to be of interest, and a covariance matrix is returned which
enables Bayesian credile intervals for the smooths to be constructed, which treat all the terms in random as random.
For details on how to condition smooths on factors, set up varying coefficient models, do signal regression or set up terms
involving linear functionals of smooths, see gam.models, but note that tensor product and adaptice smooths are
not available with gamm4.
Note that since the only multidimensional smooths available are isotropic, then you must think carefully about how to scale
variables that appear as arguments of such a smooth, if they are not naturally on the same scale. Fot example if you use a term
s(x,v) then it is important to ensure that x and v are scaled so that it is reasonable to assume that
s will have about the same amount of wiggliness per unit change in x as it has per unit change in v. Sometimes
it can be ok to rescale x and v to lie in the unit square: but the justification for doing this is much weaker than
its widespread use implies.
Returns a list with two items:
gam |
an object of class gam. At present this contains enough information to use
predict, plot, summary and print methods and vis.gam, from package mgcv
but not to use e.g. the anova method function to compare models. |
mer |
the fitted model object returned by lmer or glmer. Extra random and fixed
effect terms will appear relating to the estiamtion of the smooth terms. Note that unlike lme objects returned
by gamm, everything in this object always relates to the fitted model itself, and never to a PQL working
approximation: hence the usual methods of model comparison are entirely legitimate. |
If you don't need random effects in addition to the smooths, then gam is substantially faster, gives fewer convergence warnings, and slightly better MSE performance (based on simulations).
Models must contain at least one random effect: either a smooth with non-zero
smoothing parameter, or a random effect specified in argument random.
Note that the gam object part of the returned object is not complete in
the sense of having all the elements defined in gamObject and
does not inherit from glm: hence e.g. multi-model anova calls will not work.
Linked smoothing parameters, adaptive smoothing and tensor product smoothing are not supported.
This routine is obviously less well tested than gamm.
Simon N. Wood simon.wood@r-project.org
Bates D. and M. Maechler (2009). lme4: Linear mixed-effects models using S4 classes. http://CRAN.R-project.org/package=lme4
Scheipl, F. (2009) amer: Additive mixed models with lme4.http://CRAN.R-project.org/package=amer
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association. 99:673-686
Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
For more GAMM references see gamm
http://www.maths.bath.ac.uk/~sw283/
gam, gamm, gam.models, lmer, predict.gam, plot.gam, summary.gam, s, vis.gam
#################################
## A simple additive model...
#################################
library(gamm4)
set.seed(0)
dat <- gamSim(1,n=400,scale=2) ## simulate data
b <- gamm(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
br <- gamm4(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
plot(b$gam,pages=1)
plot(br$gam,pages=1)
summary(b$gam) ## gam style summary of fitted model
summary(br$gam) ## same from `gamm4'
summary(br$mer) ## underlying mixed model
anova(br$gam)
rm(dat)
#################################################################
## Add a factor to the linear predictor, to be modelled as random
## and make response Poisson. Again compare `gamm' and `gamm4'
#################################################################
dat <- gamSim(6,n=400,scale=.2,dist="poisson")
b2<-gamm(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,random=list(fac=~1))
b2r<-gamm4(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,random = ~ (1|fac))
plot(b2$gam,pages=1)
plot(b2r$gam,pages=1)
rm(dat)
vis.gam(b2r$gam)
##################################
# Multivariate varying coefficient
# With crossed and nested random
# effects.
##################################
## Start by simulating data...
f0 <- function(x, z, sx = 0.3, sz = 0.4) {
(pi^sx * sz) * (1.2 * exp(-(x - 0.2)^2/sx^2 - (z -
0.3)^2/sz^2) + 0.8 * exp(-(x - 0.7)^2/sx^2 -
(z - 0.8)^2/sz^2))
}
f1 <- function(x2) 2 * sin(pi * x2)
f2 <- function(x2) exp(2 * x2) - 3.75887
f3 <- function (x2) 0.2 * x2^11 * (10 * (1 - x2))^6 + 10 * (10 * x2)^3 *
(1 - x2)^10
n <- 1000
## first set up a continuous-within-group effect...
g <- factor(sample(1:50,n,replace=TRUE)) ## grouping factor
x <- runif(n) ## continuous covariate
X <- model.matrix(~g-1)
mu <- X%*%rnorm(50)*.5 + (x*X)%*%rnorm(50)
## now add nested factors...
a <- factor(rep(1:20,rep(50,20)))
b <- factor(rep(rep(1:25,rep(2,25)),rep(20,50)))
Xa <- model.matrix(~a-1)
Xb <- model.matrix(~a/b-a-1)
mu <- mu + Xa%*%rnorm(20) + Xb%*%rnorm(500)*.5
## finally simulate the smooth terms
v <- runif(n);w <- runif(n);z <- runif(n)
r <- runif(n)
mu <- mu + f0(v,w)*z*10 + f3(r)
y <- mu + rnorm(n)*2 ## response data
## First compare gamm and gamm4 on a reduced model
br <- gamm4(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),random = ~ (1|a/b))
ba <- gamm(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),random = list(a=~1,b=~1),method="REML")
par(mfrow=c(2,2))
plot(br$gam);plot(ba$gam)
## now fit the full model
br <- gamm4(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),random = ~ (x+0|g) + (1|g) + (1|a/b))
br$mer
br$gam
plot(br$gam)
## try a Poisson example, based on the same linear predictor...
lp <- mu/5
y <- rpois(exp(lp),exp(lp)) ## simulated response
## again compare gamm and gamm4 on reduced model
br <- gamm4(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),family=poisson,random = ~ (1|a/b))
ba <- gamm(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),family=poisson,random = list(a=~1,b=~1))
par(mfrow=c(2,2))
plot(br$gam);plot(ba$gam)
## and now fit full version...
br <- gamm4(y ~ s(v,w,by=z) + s(r,k=20,bs="cr"),family=poisson,random = ~ (x|g) + (1|a/b))
br$mer
br$gam
plot(br$gam)
####################################
# Different smooths of x2 depending
# on factor `fac'...
####################################
dat <- gamSim(4)
br <- gamm4(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(br$gam,pages=1)
summary(b)
####################################
# Timing comparison with `gam'... #
####################################
dat <- gamSim(1,n=600,dist="binary",scale=.33)
system.time(lr.fit0 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),
family=binomial,data=dat,method="ML"))
system.time(lr.fit <- gamm4(y~s(x0)+s(x1)+s(x2)+s(x3),
family=binomial,data=dat))
lr.fit0;lr.fit$gam
cor(fitted(lr.fit0),fitted(lr.fit$gam))
## plot model components with truth overlaid in red
op <- par(mfrow=c(2,2))
fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3")
for (k in 1:4) {
plot(lr.fit$gam,select=k)
ff <- dat[[fn[k]]];xx <- dat[[xn[k]]]
ind <- sort.int(xx,index.return=TRUE)$ix
lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2)
}
par(op)
####################################
# Simple random effects comparison
# with `gam'
####################################
dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth
## Now add some random effects to the simulation. Response is
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b
## now fit appropriate random effect model using `gam'...
PP <- list(X=list(rank=20,diag(20)))
rm <- gam(y~ X+s(x0)+s(x1)+s(x2)+s(x3),data=dat,
paraPen=PP,method="REML")
plot(rm,pages=1)
## Get estimated random effects standard deviation...
sig.b <- sqrt(rm$sig2/rm$sp[1]);sig.b^2
## Now do same thing with gamm4, rather more simply
rmr <- gamm4(y~ s(x0)+s(x1)+s(x2)+s(x3),data=dat,
random = ~ (1|fac))
plot(rmr$gam,pages=1)
rmr$mer
######################################
## A "signal" regression example, in
## which a univariate response depends
## on functional predictors.
######################################
## simulate data first....
rf <- function(x=seq(0,1,length=100)) {
## generates random functions...
m <- ceiling(runif(1)*5) ## number of components
f <- x*0;
mu <- runif(m,min(x),max(x));sig <- (runif(m)+.5)*(max(x)-min(x))/10
for (i in 1:m) f <- f+ dnorm(x,mu[i],sig[i])
f
}
x <- seq(0,1,length=100) ## evaluation points
## example functional predictors...
par(mfrow=c(3,3));for (i in 1:9) plot(x,rf(x),type="l",xlab="x")
## simulate 200 functions and store in rows of L...
L <- matrix(NA,200,100)
for (i in 1:200) L[i,] <- rf() ## simulate the functional predictors
f2 <- function(x) { ## the coefficient function
(0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10)/10
}
f <- f2(x) ## the true coefficient function
y <- L%*%f + rnorm(200)*20 ## simulated response data
## Now fit the model E(y) = L%*%f(x) where f is a smooth function.
## The summation convention is used to evaluate smooth at each value
## in matrix X to get matrix F, say. Then rowSum(L*F) gives E(y).
## create matrix of eval points for each function. Note that
## `smoothCon' is smart and will recognize the duplication...
X <- matrix(x,200,100,byrow=TRUE)
## compare `gam' and `gamm4' this time
b <- gam(y~s(X,by=L,k=20),method="REML")
br <- gamm4(y~s(X,by=L,k=20))
par(mfrow=c(2,1))
plot(b,shade=TRUE);lines(x,f,col=2)
plot(br$gam,shade=TRUE);lines(x,f,col=2)