\name{glm.binomial.disp}
\alias{glm.binomial.disp}
\title{Overdispersed binomial logit models.}
\description{This function estimates overdispersed binomial logit models using the approach discussed by Williams (1982).}
\usage{
glm.binomial.disp(object, maxit = 30, verbose = TRUE)
}
\arguments{
\item{object}{an object of class `"glm"' providing a fitted binomial logistic regression model.}
\item{maxit}{integer giving the maximal number of iterations for the model fitting procedure.}
\item{verbose}{logical, if \code{TRUE} information are printed during each step of the algorithm.}
}
\details{
Extra-binomial variation in logistic linear models is discussed, among others, in Collett (1991). Williams (1982) proposed a quasi-likelihood approach for handling overdispersion in logistic regression models. 

Suppose we observe the number of successes \eqn{y_i} in \eqn{m_i} trials, for \eqn{i=1,\ldots,n}, such that 
\deqn{y_i|p_i \sim \mathrm{Binomial}(m_i, p_i)}{y_i|p_i ~ Binomial(m_i, p_i)}
\deqn{p_i \sim \mathrm{Beta}(\gamma, \delta)}{p_i ~ Beta(\gamma, \delta)}
Under this model, each of the \eqn{n} binomials has a different probability of success \eqn{p_i}, where \eqn{p_i} is a random draw from a beta distribution. Thus,
\deqn{E(p_i) = \frac{\gamma}{\gamma+\delta} = \theta}{E(p_i) = \gamma/(\gamma+\delta) = \theta}
\deqn{Var(p_i) = \phi\theta(1-\theta)}{Var(p_i) = \phi*\theta*(1-\theta)}
Assume \eqn{\gamma > 1} and \eqn{\delta > 1}, so that the beta density is equal to zero at both zero and one, and thus \eqn{0 < \phi \le 1/3}{0 < \phi <= 1/3}. From this, the unconditional mean and variance can be calculated:
\deqn{E(y_i) = m_i\theta}{E(y_i) = m_i*\theta}
\deqn{Var(y_i) = m_i\theta(1-\theta)(1+(m_i-1)\phi)}{Var(y_i) = m_i*\theta*(1-\theta)*(1+(m_i-1)*\phi)}
so unless \eqn{m_i=1} or \eqn{\phi=0}, the unconditional variance of \eqn{y_i} is larger than binomial variance.

Identical expressions for the mean and variance of \eqn{y_i} can be obtained if we assume that the $m_i$ counts on the i-th unit are dependent, with the same correlation \eqn{\phi}. In this case, \eqn{-1/(m_i-1) < \phi \le 1}{-1/(m_i-1) < \phi <= 1}.

The method proposed by Williams uses an iterative algorithm for estimating the dispersion parameter \eqn{\phi} and hence the necessary weights \eqn{1/(1+\phi(m_i-1))}{1/(1+\phi*(\mu_i - 1))} (for details see Williams, 1982).
}
}
\value{
The function returns an object of class `"glm"' with the usual information (see \code{help(glm)}) and the added components:   
\item{dispersion}{the estimated dispersion parameter.}
\item{disp.weights}{the final weights used to fit the model.}
}
\references{
Collett, D. (1991), \emph{Modelling Binary Data}, London: Chapman and Hall.

Williams, D. A. (1982), Extra-binomial variation in logistic linear models, 
\emph{Applied Statistics}, \bold{31}, 144--148.
}
\author{Luca Scrucca, \email{luca@stat.unipg.it}}
\note{Based on a similar procedure available in Arc (Cook and Weisberg, \url{http://www.stat.umn.edu/arc})}
\seealso{
\code{\link{lm}}, \code{\link{glm}}, \code{\link{lm.disp}}, \code{\link{glm.poisson.disp}}
}
\examples{
data(orobanche)
attach(orobanche)
h <- factor(host)
v <- factor(variety, levels=c("O.a75", "O.a73"))

mod <- glm(cbind(germinated, seeds-germinated) ~ h + v + h*v, family=binomial(logit))
summary(mod)

mod.disp <- glm.binomial.disp(mod)
summary(mod.disp)
mod.disp$dispersion
}
\keyword{models}
\keyword{regression}
\eof
\name{glm.poisson.disp}
\alias{glm.poisson.disp}
\title{Overdispersed Poisson log-linear models.}
\description{This function estimates overdispersed Poisson log-linear models using the approach discussed by Breslow N.E. (1984).}
\usage{
glm.poisson.disp(object, maxit = 30, verbose = TRUE)
}
\arguments{
\item{object}{an object of class `"glm"' providing a fitted binomial logistic regression model.}
\item{maxit}{integer giving the maximal number of iterations for the model fitting procedure.}
\item{verbose}{logical, if \code{TRUE} information are printed during each step of the algorithm.}
}
\details{
Breslow (1984) proposed an iterative algorithm for fitting overdispersed Poisson log-linear models. The method is similar to that proposed by Williams (1982) for handling overdispersion in logistic regression models (\code{\link{glm.binomial.disp}}). 

Suppose we observe \eqn{n} independent responses such that
\deqn{y_i|\lambda_i \sim \mathrm{Poisson}(\lambda_in_i)}{y_i|\lambda_i ~ Poisson(\lambda_i*n_i)}
for \eqn{i=1\ldots,n}.
The response variable \eqn{y_i} may be an event counts variable observed over a period of time (or in the space) of length \eqn{n_i}, whereas \eqn{\lambda_i} is the rate parameter. Then,
\deqn{E(y_i|\lambda_i) = \mu_i = \lambda_in_i=\exp(\log(n_i) + \log(\lambda_i))}{E(y_i|\lambda_i) = \mu_i = \lambda_i*n_i=\exp(\log(n_i) + \log(\lambda_i))}
where \eqn{\log(n_i)} is an offset and \eqn{\log(\lambda_i)=\beta'x_i} expresses the dependence of the Poisson rate parameter on a set of, say \eqn{p}, predictors. If the periods of time are all of the same length, we can set \eqn{n_i=1} for all \eqn{i} so the offset is zero.

The Poisson distribution has \eqn{E(y_i|\lambda_i)=Var(y_i|\lambda_i)}, but it may happen that the actual variance exceeds the nominal variance under the assumed probability model. Suppose now that \eqn{\theta_i=\lambda_i n_i} is a random variable distributed according to
\deqn{\theta_i \sim \mathrm{Gamma} (\mu_i, 1/\phi)}{\theta_i ~ Gamma (\mu_i, 1/\phi)}
where \eqn{E(\theta_i)=\mu_i} and \eqn{Var(\theta_i)=\mu_i^2\phi}{Var(\theta_i)=\mu_i^2 * \phi}. Thus, it can be shown that the unconditional mean and variance of \eqn{y_i} are given by
\deqn{E(y_i) = \mu_i}
and
\deqn{Var(y_i) = \mu_i + \mu_i^2\phi = \mu_i(1+\mu_i\phi)}{Var(y_i) = \mu_i + \mu_i^2 * \phi = \mu_i(1+\mu_i*\phi)}
Hence, for \eqn{\phi>0} we have overdispersion. It is interesting to note that the same mean and variance arise also if we assume a negative binomial distribution for the response variable.

The method proposed by Breslow uses an iterative algorithm for estimating the dispersion parameter \eqn{\phi} and hence the necessary weights \eqn{1/(1+\mu_i\hat{\phi})}{1/(1+\mu_i * \phi)} (for details see Breslow, 1984).
}
\value{
The function returns an object of class `"glm"' with the usual information (see \code{help(glm)}) and the added components:   
\item{dispersion}{the estimated dispersion parameter.}
\item{disp.weights}{the final weights used to fit the model.}
}
\references{
 Breslow, N.E. (1984), Extra-Poisson variation in log-linear models, 
\emph{Applied Statistics}, \bold{33}, 38--44.
}
\author{Luca Scrucca, \email{luca@stat.unipg.it}}
\note{Based on a similar procedure available in Arc (Cook and Weisberg, \url{http://www.stat.umn.edu/arc})}
\seealso{
\code{\link{lm}}, \code{\link{glm}}, \code{\link{lm.disp}}, \code{\link{glm.binomial.disp}}
}
\examples{
##-- Salmonella TA98 data

data(salmonellaTA98)
attach(salmonellaTA98)
log.x10 <- log(x+10)
mod <- glm(y ~ log.x10 + x, family=poisson(log)) 
summary(mod)

mod.disp <- glm.poisson.disp(mod)
summary(mod.disp)
mod.disp$dispersion

# compute predictions on a grid of x-values...
x0 <- seq(min(x), max(x), length=50) 
eta0 <- predict(mod, newdata=data.frame(log.x10=log(x0+10), x=x0), se=TRUE)
eta0.disp <- predict(mod.disp, newdata=data.frame(log.x10=log(x0+10), x=x0), se=TRUE)
# ... and plot the mean functions with variability bands
plot(x, y)
lines(x0, exp(eta0$fit))
lines(x0, exp(eta0$fit+2*eta0$se), lty=2)
lines(x0, exp(eta0$fit-2*eta0$se), lty=2)
lines(x0, exp(eta0.disp$fit), col=2)
lines(x0, exp(eta0.disp$fit+2*eta0.disp$se), lty=2, col=2)
lines(x0, exp(eta0.disp$fit-2*eta0.disp$se), lty=2, col=2)

##--  Holford's data

data(holford)
attach(holford)

mod <- glm(incid ~ offset(log(pop)) + Age + Cohort, family=poisson(log)) 
summary(mod)

mod.disp <- glm.poisson.disp(mod)
summary(mod.disp)
mod.disp$dispersion
}
\keyword{models}
\keyword{regression}
\eof
\name{holford}
\alias{holford}
\non_function{}
\title{Holford's data on prostatic cancer deaths}
\usage{data(minitab)}
\description{
Holford's data on prostatic cancer deaths and mid-period population denominators for non-whites in the US by age and calendar period. Thirteen birth cohorts from 1855-59 through to 1915-19 are represented in at least one of seven 5-year age groups (50-54 through to 80-84) and one of the seven 5-year calendar periods (1935-39 through to 1965-69) for which data are provided. 

}
\format{
This data frame contains the following columns:
\describe{
  \item{incid}{number ofd prostatic cancer deaths.} 
  \item{pop}{mid-period population counts.}
  \item{Age}{age groups.}
  \item{Period}{calendar periods.}
  \item{Cohort}{cohorts.}
}
}
\details{}
\source{Holford, T.R. (1983) The estimation of age, period and cohort effects for vital rates. \emph{Biometrics}, \bold{39}, 311--324.}
\references{ Breslow, N.E. (1984), Extra-Poisson variation in log-linear models, \emph{Applied Statistics}, \bold{33}, 38--44.}
\examples{}
\keyword{datasets}
\eof
\name{lm.disp}
\alias{lm.disp}
\alias{summary.dispmod}
\title{Normal dispersion models.}
\description{This function estimates normal dispersion regression models.}
\usage{
lm.disp(formula, var.formula, data = list(), maxit = 30, 
        epsilon = glm.control()$epsilon, subset, na.action = na.omit, 
        contrasts = NULL, offset = NULL)
}
\arguments{
\item{formula}{a symbolic description of the mean function of the model to be fit. For the details of model formula specification see \code{help(lm)} and \code{help(formula)}.}
\item{var.formula}{a symbolic description of the variance function of the model to be fit. This must be a one-sided formula; if omitted the same terms used for the mean function are used. For the details of model formula specification see \code{help(lm)} and \code{help(formula)}.}
\item{data}{an optional data frame containing the variables in the model. By default the variables are taken from `environment(formula)', typically the environment from which the function is called.}
\item{maxit}{integer giving the maximal number of iterations for the model fitting procedure.}
\item{epsilon}{positive convergence tolerance epsilon; the procedure converge when |dev - devold| < epsilon.}
\item{subset}{an optional vector specifying a subset of observations to be used in the fitting process.}
\item{na.action}{a function which indicates what should happen when the data contain `NA's.  The default is set by the `na.action' setting of `options', and is `na.fail' if that is unset. The default is `na.omit'.}
\item{contrasts}{an optional list. See the `contrasts.arg' of `model.matrix.default'.}
\item{offset}{this can be used to specify an a priori known component to be included in the linear predictor during fitting.  An `offset' term can be included in the formula instead or as well, and if both are specified their sum is used.}
}
\details{
Normal dispersion models allow to model variance heterogeneity in normal regression analysis using a log-linear model for the variance. 

Suppose a response \eqn{y} is modelled as a function of a set of \eqn{p} predictors \eqn{x} through the linear model
\deqn{y_i = \beta'x_i + e_i}
where 
\eqn{e_i \sim N(0,\sigma^2)}{e_i ~ N(0, \sigma^2)} 
under homogeneity. Variance heterogeneity is expressed as
\deqn{Var(e_i) = \sigma^2 = \exp(\lambda'z_i)}
where \eqn{z_i} may contain some or all the variables in \eqn{x_i} and other variables not included in \eqn{x_i}; \eqn{z_i} is however assumed to contain a constant term.
This model can be re-expressed also as
\deqn{E(y|x) = \beta'x}
\deqn{Var(y|x) = \exp(\lambda'z)}
and is fitted by maximum likelihood following the algorithm described in Aitkin (1987).
}

\value{
`lm.dispmod' returns an object of `class' `"dispmod"'.

The function `summary' is used to obtain and print a summary of the results.  

An object of class `"lm.dispmod"' is a list containing the following components:

\item{call}{the matched call.}
\item{mean}{an object of class `"glm"' giving the fitted model for the mean function.}
\item{var}{an object of class `"glm"' giving the fitted model for the variance function.}
\item{initial.deviance}{the value of the deviance at the beginning of the iterative procedure, i.e. assuming constant variance.}
\item{deviance}{the value of the deviance at the end of the iterative procedure.}
}
\references{
Aitkin, M. (1987), Modelling variance heterogeneity in normal regression models using GLIM, \emph{Applied Statistics}, \bold{36}, 332--339.
}
\author{Luca Scrucca, \email{luca@stat.unipg.it}}
\note{Based on a similar procedure available in Arc (Cook and Weisberg, \url{http://www.stat.umn.edu/arc})}
\seealso{
\code{\link{lm}}, \code{\link{glm}}, \code{\link{glm.binomial.disp}}, \code{\link{glm.poisson.disp}}, \code{\link{ncv.test}} (in the \code{car} library).
}
\examples{
data(minitab)
attach(minitab)

y <- V^(1/3)
summary(mod <- lm(y ~ H + D))

summary(mod.disp1 <- lm.disp(y ~ H + D))
summary(mod.disp2 <- lm.disp(y ~ H + D, ~ H))

# Likelihood ratio test
deviances <- c(mod.disp1$initial.deviance, mod.disp2$deviance, mod.disp1$deviance)
lrt <- c(NA, abs(diff(deviances)))
cbind(deviances, lrt, p.value=1-pchisq(lrt, 1))

# quadratic dispersion model on D (as discussed by Aitkin)
summary(mod.disp4 <- lm.disp(y ~ H + D, ~ D + I(D^2)))
r <- mod$residuals
plot(D, log(r^2))
phi.est <- mod.disp4$var$fitted.values
lines(D, log(phi.est))
}
\keyword{models}
\keyword{regression}
\eof
\name{minitab}
\alias{minitab}
\non_function{}
\title{Minitab tree data}
\usage{data(minitab)}
\description{
Data on 31 black cherry trees sampled from the Allegheny Natinoal Forest,
Pennsylvania.  
}
\format{
This data frame contains the following columns:
\describe{
  \item{D}{diameter 4.5 feet of the ground, inches}
  \item{H}{height of the tree, feet}
  \item{V}{marketable volume of wood, cubic feet}
}
}
\details{}
\source{Ryan, T.A., Joiner, B.L. and Ryan, B.F. (1976) \emph{Minitab Student Handbook}. N. Scituate, MA: Duxbury.}
\references{Cook, R.D. and Weisberg, S. (1982) \emph{Residuals and Influence in Regression}, New York:  Chapman and Hall, p. 66.}
\examples{}
\keyword{datasets}
\eof
\name{orobanche}
\alias{orobanche}
\non_function{}
\title{Germination of Orobanche}
\usage{data(orobanche)}
\description{
Orobanche, commonly known as broomrape, is a genus of parasitic plants with
chlorophyll that grow on the roots of flowering plants.  Batches of seeds of
two varieties of the plant were were brushed onto a plate of diluted extract
of bean or cucumber, and the number germinating were recorded.
}
\format{
This data frame contains the following columns:
\describe{
  \item{germinated}{Number germinated}
  \item{seeds}{Number of seeds}
  \item{slide}{Slide number}
  \item{host}{Host type}
  \item{variety}{Variety name}
}
}
\details{}
\source{Crowder, M.J. (1978) Beta-binomial anova for proportions. \emph{Applied Statistics}, \bold{27}, 34--37.}

\references{Collett, D. (1991) \emph{Modelling Binary Data}, London: Chapman and Hall, Chapter 6.}
\examples{}
\keyword{datasets}
\eof
\name{salmonellaTA98}
\alias{salmonellaTA98}
\non_function{}
\title{Salmonella reverse mutagenicity assay}
\usage{data(salmonellaTA98)}
\description{
Data on Ames Salmonella reverse mutagenicity assay.
}
\format{
This data frame contains the following columns:
\describe{
  \item{x}{dose levels of quinoline}
  \item{y}{numbers of revertant colonies of TA98 Salmonella observed on each of three replicate plates testes at each of six dose levels of quinolinediameter 4.5 feet of the ground, inches}
}
}
\details{}
\source{Margolin, B.J., Kaplan, N. and Zeiger, E. (1981) Statistical analysis of the Ames Salmonella/microsome test, \emph{Proc. Natl. Acad. Sci. USA}, \bold{76}, 3779--3783.}
\references{ Breslow, N.E. (1984), Extra-Poisson variation in log-linear models, \emph{Applied Statistics}, \bold{33}, 38--44.}
\examples{}
\keyword{datasets}
\eof
