cozigam               package:COZIGAM               R Documentation

_F_i_t_t_i_n_g _C_o_n_s_t_r_a_i_n_e_d _Z_e_r_o-_I_n_f_l_a_t_e_d _G_e_n_e_r_a_l_i_z_e_d _A_d_d_i_t_i_v_e _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a COnstrained Zero-Inflated Generalized Additive Model
     (COZIGAM) to data.

_U_s_a_g_e:

     cozigam (formula, constraint = "proportional", zero.delta = NULL, 
         maxiter = 20, conv.crit.in = 1e-5, conv.crit.out = 1e-3, size = NULL, 
         log.tran = FALSE, family, data=list(), ...)

_A_r_g_u_m_e_n_t_s:

 formula: A GAM formula. This is exactly like the formula for a GLM
          except that smooth terms  can be added to the right hand side
          of the formula (and a formula of the form 'y ~ .' is not
          allowed).  Smooth terms are specified by expressions of the
          form: 's(var1,var2,...,k=12,fx=FALSE,bs="tp",by=a.var)' 
          where 'var1', 'var2', etc. are the covariates which the
          smooth is a function of and  'k' is the dimension of the
          basis used to represent the smooth term. 'by' can be used to 
          specify a variable by which the smooth should be multiplied.

constraint: Type of constraint on the zero-inflation probability, can
          be either ``proportional" or ``component".  See details for
          more information.

zero.delta: A vector specifying which subset of constraint parameter
          delta set to be zero when the 'constraint' is set to
          ``component". For instance, 'zero.delta=c(NA, 0)' means only
          the first smooth function is included in the zero-inflation
          constraint. See details for the model formulation.

 maxiter: The maximum number of iterations allowed in the estimation
          procedure.

conv.crit.in: Convergence criterion in the inner loop.

conv.crit.out: Convergence criterion in the outer loop.

    size: Number of trials. Must be specified when 'family' is
          'binomial'.

log.tran: Logical. 'TRUE' if log-transformation is needed for the
          response.

  family: This is a family object specifying the distribution and link
          to use in fitting etc.  See 'glm' and 'family' for more
          details. Currently support Gaussian/lognormal, Gamma, 
          Poisson and binomial distributions.

    data: A data frame or list containing the model response variable
          and covariates required by the formula.

     ...: Additional arguments to be passed to the low level regression
          fitting functions.

_D_e_t_a_i_l_s:

     A COnstrained Zero-Inflated Generalized Additive Model (COZIGAM)
     assumes the response variable Y_i  is distributed from a
     zero-inflated 1-parameter exponential family with covariate x_i
     (could be high dimensional).  More specifically, Y_i comes from a
     regular exponential family distribution f(x_i) with  probability
     p_i and equals zero with probability 1-p_i, with the further
     assumption that the  probability of non-zero-inflation p_i is
     related to the regular mean response or its smooth components,
     depending on the type of the 'constraint'. If 'constraint' is
     ``proportional", p_i is assumed to be  some monotone function of
     the regular mean response function on the link scale, e.g., if a
     logit link is used, p_i is linearly constrained by: 

                 logit(p_i) = alpha + delta  g(mu_i),

     where g() is the link function of the regular exponential family
     with mean mu_i and alpha and delta are  two unknown parameters to
     be estimated; If 'constraint' is ``component", p_i is assumed to
     be linearly related to the smooth components of the regular mean
     response on the link scale, i.e.,

 logit(p_i) = alpha + delta_1 s(x_{1i},x_{2i}) + delta_2 s(x_{3i}) + ... ,

     and the regular mean response has the structure

        g(mu_i) =  b_0 + s(x_{1i},x_{2i}) + s(x_{3i}) + ... ,

     where s()'s are some centered smooth functions to be estimated
     nonparametrically by a Generalized Additive Model (GAM) and b_0 is
     the overall intercept. In the component-specific-proportional
     constraint case, a vector 'zero.delta' must be specified.  For
     instance, in the above model setting, 'zero.delta=c(NA, 0)' would
     estimate delta_1  as an unknown parameter and set delta_2 to be
     fixed at 0. That is, the model would assume that p_i is not
     dependent on the covariate x_3.  

     The estimation approach is a modified penalized iteratively
     reweighted least squares algorithm (P-IRLS) which requires  the
     'magic()' function in the 'mgcv' library. EM algorithm is used if
     the underlying  regular exponential family has strictly positive
     probability mass at zero.  The covariance matrix of the estimated
     coefficients is obtained by inverting the  observed information
     matrix. See Liu and Chan (2008) for more detail.

_V_a_l_u_e:

     An object of class '"cozigam"' is a list containing the following
     components:  

coefficients: A named vector of coefficients including the linear
          constraint parameters.

 V.theta: The covariance matrix of the estimated parameters.

  V.beta: The covariance matrix of the estimated parameters associated
          with the smooth functions.

se.alpha, se.delta: Estimated standard errors of parameters alpha and
          delta.

      mu: The fitted regular mean values.

linear.predictor: The fitted values on the link scale.

dispersion: (Estimated) dispersion parameter.

 formula: Model formula.

       p: The fitted non-zero-inflation probabilities.

       G: An objected returned by 'gam()' containing information used
          in the estimation procedure.

     psi: Conditional expectation of the zero-inflation indicator (only
          for the discrete case which involves EM algorithm).

  family: The family used.

loglik, ploglik: The (penalized) log-likelihood of the fitted model.

       y: The response used.

converged: Logical. 'TRUE' if the iterative procedure is converged.

est.disp: Logical. 'TRUE' if the dispersion parameter is estimated.

fit.nonzero: Fitted GAM using only the nonzero data as in the
          presence/absence analysis

   score: The GCV or UBRE score returned by 'magic()' in the iterative
          procedure.

     edf: Estimated degrees of freedom for each model parameter. 
          Penalization means that many of these are less than 1.

edf.smooth: Estimated degrees of freedom of each smooth term.

      sp: Estimated smoothing parameters.

    logE: Approximated logarithmic marginal likelihood by Laplace
          method used for model selection.

_A_u_t_h_o_r(_s):

     Hai Liu and Kung-Sik Chan

_R_e_f_e_r_e_n_c_e_s:

     Liu, H and Chan, K.S. (2008) Constrained Generalized Additive
     Model with Zero-Inflated Data.  Technical Report 388, Department
     of Statistics and Actuarial Science, The University of Iowa. <URL:
     http://www.stat.uiowa.edu/techrep/tr388.pdf>

_S_e_e _A_l_s_o:

     'plot.cozigam', 'predict.cozigam', 'summary.cozigam', 'zigam'

_E_x_a_m_p_l_e_s:

     ## Normal/Log-Normal Response with proportional constraint
     set.seed(11)
     n <- 400
     x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1)
     x3 <- runif(n, 0, 1)

     f <- test(x1,x2)*4-mean(test(x1,x2)*4) + f0(x3)/2-mean(f0(x3)/2)
     sig <- 0.5
     mu0 <- f + 3
     y <- mu0 + rnorm(n, 0, sig)

     alpha0 <- -2.2
     delta0 <- 1.2
     p0 <- .Call("logit_linkinv", alpha0 + delta0 * mu0, PACKAGE = "stats")
     z <- rbinom(rep(1,n), 1, p0)
     y[z==0] <- 0

     res <- cozigam(y~s(x1,x2)+s(x3), constraint = "proportional", family = gaussian)

     plot(res)

     ## Poisson Response with component-specific constraint
     set.seed(11)
     n <- 600
     x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1)
     x3 <- runif(n, 0, 1)

     f <- test(x1, x2)*2 + f0(x3)/5
     eta0 <- f/1.1
     mu0 <- exp(eta0)  

     eta.p10 <- (test(x1,x2) - mean(test(x1,x2)))*2/1.1
     eta.p20 <- (f0(x3) - mean(f0(x3)))/5/1.1

     alpha0 <- 0.2
     delta10 <- 1.2
     delta20 <- 0
     eta.p0 <- delta10*eta.p10 + delta20*eta.p20 
     p0 <- .Call("logit_linkinv", alpha0 + eta.p0, PACKAGE = "stats")

     z <- rbinom(rep(1,n), 1, p0)
     y <- rpois(rep(1,n), mu0)
     y[z==0] <- 0; rm(z)

     res <- cozigam(y~s(x1,x2)+s(x3), constraint="component", zero.delta=c(NA, 0),
       conv.crit.out = 1e-3, family=poisson)

     ## A Tensor Product Smooth Example
     test.ten <- function(x,z,sx=0.3,sz=0.4)  
     { x<-x*20
       (pi**sx*sz)*(1.2*exp(-(x-0.2)^2/sx^2-(z-0.3)^2/sz^2)+
       0.8*exp(-(x-0.7)^2/sx^2-(z-0.8)^2/sz^2))
     }

     set.seed(11)
     n <- 400
     x1 <- runif(n, 0, 1)/20
     x2 <- runif(n, 0, 1)
     x3 <- runif(n, 0, 1)

     f <- test.ten(x1, x2)*4-mean(test.ten(x1, x2)*4) + f0(x3)/2-mean(f0(x3)/2)
     sig <- 0.5
     mu0 <- f + 3
     y <- mu0 + rnorm(n, 0, sig)

     alpha0 <- -2.2
     delta0 <- 1.2
     p0 <- .Call("logit_linkinv", alpha0 + delta0 * mu0, PACKAGE = "stats")

     z <- rbinom(rep(1,n), 1, p0)
     y[z==0] <- 0

     # If use thin plate spline ...
     res.tps <- cozigam(y~s(x1,x2)+s(x3), constraint = "proportional", family=gaussian)
     par(mfrow=c(1,2))
     plot(res.tps, select=1)
     # Compare with tensor product spline
     res.ten <- cozigam(y~te(x1,x2)+s(x3), constraint = "proportional", family=gaussian)
     plot(res.ten, select=1)

