zigam                package:COZIGAM                R Documentation

_F_i_t_t_i_n_g (_U_n_c_o_n_s_t_r_a_i_n_e_d) _Z_e_r_o-_I_n_f_l_a_t_e_d _G_e_n_e_r_a_l_i_z_e_d _A_d_d_i_t_i_v_e _M_o_d_e_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a Zero-Inflated Generalized Additive Model (ZIGAM) to data.

_U_s_a_g_e:

     zigam (formula, maxiter = 20, conv.crit = 1e-3,
         size = NULL, log.tran = FALSE, family, data=list(), ...)

_A_r_g_u_m_e_n_t_s:

 formula: A GAM formula. This is exactly like the formula for a GLM
          except that smooth terms  can be added to the right hand side
          of the formula (and a formula of the form 'y ~ .' is not
          allowed).  Smooth terms are specified by expressions of the
          form: 's(var1,var2,...,k=12,fx=FALSE,bs="tp",by=a.var)' 
          where 'var1', 'var2', etc. are the covariates which the
          smooth is a function of and  'k' is the dimension of the
          basis used to represent the smooth term. 'by' can be used to 
          specify a variable by which the smooth should be multiplied.

 maxiter: The maximum number of iterations allowed in the EM algorithm
          in estimating discrete ZIGAMs.

conv.crit: Convergence criterion in the iterative estimation algorithm.

    size: Number of trials. Must be specified when 'family' is
          'binomial'.

log.tran: Logical. 'TRUE' if log-transformation is needed for the
          response.

  family: This is a family object specifying the distribution and link
          to use in fitting etc.  See 'glm' and 'family' for more
          details. Currently support Gaussian/lognormal, Gamma, 
          Poisson and binomial distributions.

    data: A data frame or list containing the model response variable
          and covariates required by the formula.

     ...: Additional arguments to be passed to the low level regression
          fitting functions.

_D_e_t_a_i_l_s:

     A Zero-Inflated Generalized Additive Model (ZIGAM) assumes the
     response variable Y_i  is distributed from a zero-inflated
     1-parameter exponential family with covariate x_i (could be high
     dimensional).  More specifically, Y_i comes from a
     non-zero-inflated exponential family distribution f(x_i)  (regular
     component) with probability p_i and equals zero with probability
     1-p_i.  The probability of non-zero-inflation p_i also depends on
     the covariates through some unknown smooth funtions. Different
     from the COnstrained Zero-Inflated Generalized Additive Model
     (COZIGAM), the process of generating the  non-zero-inflated
     responses and the zero-inflation process are assumed to be
     independent. The mean of the non-zero-inflated exponential family
     distribution is assumed to be

                         g(mu_i) = s_1(x_i),

     and the non-zero-inflation probability is linked to the covariates
     by

                        logit(p_i) = s_2(x_i),

     where s_1 and s_2 are two possibly distinct smooth functions to be
     estimated nonparametrically by Generalized Additive Models. See
     Liu and Chan (2008) for more detail.

_V_a_l_u_e:

     A list containing the following components:  

 fit.gam: A fitted GAM of the regular component, i.e.,
          non-zero-inflated exponential family regression model.

  fit.lr: A logistic regression model on the zero-inflation process.

  V.beta: The covariance matrix of the estimated parameters associated
          with the smooth functions in the  non-zero-inflated data
          generating process.

 V.gamma: The covariance matrix of the estimated parameters associated
          with the smooth functions in the  zero-inflation process.

      mu: The fitted regular mean values.

dispersion: (Estimated) dispersion parameter.

 formula: Model formula.

       p: The fitted non-zero-inflation probabilities.

     psi: Conditional expectation of the zero-inflation indicator (only
          for the discrete case which involves EM algorithm).

  family: The family used.

loglik, ploglik: The (penalized) log-likelihood of the fitted model.

    logE: Approximated logarithmic marginal likelihood by Laplace
          method used for model selection.

      X1: Design matrix in the non-zero-inflated exponential family
          regression model.

      X2: Design matrix in the logistic regression model.

_A_u_t_h_o_r(_s):

     Hai Liu and Kung-Sik Chan

_R_e_f_e_r_e_n_c_e_s:

     Liu, H and Chan, K.S. (2008) Constrained Generalized Additive
     Model with Zero-Inflated Data.  Technical Report 388, Department
     of Statistics and Actuarial Science, The University of Iowa. <URL:
     http://www.stat.uiowa.edu/techrep/tr388.pdf>

_S_e_e _A_l_s_o:

     'cozigam'

_E_x_a_m_p_l_e_s:

     ## Gaussian Response 
     set.seed(11)
     n <- 200
     x1 <- runif(n, 0, 1)
     f <- (f0(x1)-mean(f0(x1)))/2
     sig <- 0.3
     mu0 <- f + 1.5
     y <- mu0 + rnorm(n, 0, sig)

     eta.p0 <- f1(x1)*2 - 1 # true function used in zero-infation process
     p0 <- .Call("logit_linkinv", eta.p0, PACKAGE = "stats")

     z <- rbinom(rep(1,n), 1, p0)
     y[z==0] <- 0

     # Fit a ZIGAM
     res.un <- zigam(y~s(x1), family=gaussian)

     # Compare with a COZIGAM
     res <- cozigam(y~s(x1), family=gaussian)
     res.un$logE > res$logE

     ## Poisson Response
     set.seed(11)
     n <- 400
     x1 <- runif(n, 0, 1)
     x2 <- runif(n, 0, 1)

     eta0 <- test(x1,x2)*3
     mu0 <- exp(eta0)  

     alpha0 <- -0.2
     delta0 <- 1.0
     p0 <- .Call("logit_linkinv", alpha0 + delta0 * eta0, PACKAGE = "stats")

     z <- rbinom(rep(1,n), 1, p0)
     y <- rpois(rep(1,n), mu0)
     y[z==0] <- 0

     res.un <- zigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a ZIGAM
     res <- cozigam(y~s(x1,x2), maxiter=30, family=poisson) # fit a COZIGAM
     res.un$logE < res$logE # compare the model selction criterion

