gamlssNP              package:gamlss.mx              R Documentation

_A _f_u_n_c_t_i_o_n _t_o _f_i_t _f_i_n_i_t_e _m_i_x_t_u_r_e_s _u_s_i_n_g _t_h_e _g_a_m_l_s_s _f_a_m_i_l_y _o_f _d_i_s_t_r_i_b_u_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function will fit a finite (or normal) mixture distribution
     where the kernel distribution can belong to  any gamlss family of
     distributions using the EM algorithm. The function is based  on
     functions 'alldist()' and 'allvc' of the 'npmlreg' package of 
     Jochen Einbeck, John Hinde and Ross Darnell.

_U_s_a_g_e:

     gamlssNP(formula, random = ~1, family = NO(), data = NULL, K = 4, 
               mixture = c("np", "gq"), 
               tol = 0.5, weights, pluginz, control = NP.control(...), 
               g.control = gamlss.control(trace = FALSE), ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula defining the response and the fixed effects for the
          'mu' parameters

  random: a formula defining the random part of the model

  family: a gamlss family object 

    data: the data frame which for this function is mandatory even if
          it the data are attached

       K: the number of mass points/integretion points (supported
          values are 1:10,20)  

 mixture: the mixing distribution, "np" for non-parametric or "gq" for
          Gaussian Quadrature

     tol: the toletance scalar ussualy between zero and one

 weights: prior weights

 pluginz: optional   

 control: this sets the control parameters for the EM iterations
          algorithm. The default setting is the 'NP.control' function 

g.control: the gamlss control function, 'gamlss.control', passed to the
          gamlss fit

     ...: for extra arguments

_D_e_t_a_i_l_s:

     The function 'gamlssNP()' is a modification of the R functions 
     'alldist()' and 'allvc' created by Jochen Einbeck and John Hinde. 
     Both functions were originally created by Ross Darnell (2002).
     Here the two  functions are merged to one 'gamlssNP' and allows
     finite mixture from  gamlss family of distributions.

     The following are comments from the original Einbeck and  Hinde
     documentation. 

     "The nonparametric maximum likelihood (NPML) approach was 
     introduced in Aitkin (1996) as a tool to fit overdispersed 
     generalized linear models. Aitkin (1999) extended this method to 
     generalized linear models with shared random effects arising
     through  variance component or repeated measures structure.
     Applications are  two-stage sample designs, when firstly the
     primary sampling units  (the upper-level units, e.g. classes) and
     then the secondary  sampling units (lower-level units, e.g.
     students) are selected, or  longitudinal data.  Models of this
     type have also been referred to as  multi-level models (Goldstein,
     2003). This R function is restricted  to 2-level models. The idea
     of NPML is to approximate the unknown  and unspecified
     distribution of the random effect by a discrete  mixture of k
     exponential family densities, leading to a simple  expression of
     the marginal likelihood, which can then be maximized  using a
     standard EM algorithm. When option 'gq' is set, then 
     Gauss-Hermite masses and mass points are used and considered as 
     fixed, otherwise they serve as starting points for the EM
     algorithm.  The position of the starting points can be
     concentrated or extended  by setting tol smaller or larger than
     one, respectively. Variance  component models with random
     coefficients (Aitkin, Hinde & Francis,  2005, p. 491) are also
     possible, in this case the option  random.distribution is
     restricted to the setting 'np' . The weights  have to be
     understood as frequency weights, i.e. setting all weights  equal
     to 2 will duplicate each data point and hence double the 
     disparity and deviance. Warning: There might be some options and 
     circumstances which had not been tested and where the weights do
     not  work." Note that in keeping with the gamlss notation
     disparity is called global deviance.

_V_a_l_u_e:

     The function 'gamlssNP' produces an object of class "gamlssNP". 
     This object contain several components.  

  family: the name of the gamlss family

    type: the type of distribution which in this case is "Mixture" 

parameters: the parameters for the kernel gamlss family distribution

    call: the call of the gamlssNP function

       y: the response variable

      bd: the binomial demominator, only for BI and BB models

 control: the NP.control settings

 weights: the vector of weights of te expanded fit

G.deviance: the global deviance

       N: the number of observations in the fit

   rqres: a function to calculate the normalized (randomized) quantile
          residuals of the object (here is the gamlss object rather
          than gamlssNP and it should change??)

    iter: the number of external iterations in the last gamlss fitting
          (?? do we need this?)

    type: the type of the distribution or the response variable here
          set to "Mixture"

  method: which algorithm is used for the gamlss fit, RS(), CG() or
          mixed()

contrasts: the type of contrasts use in the fit

converged: whether the gamlss fit has  converged

residuals: the normalized (randomized) quantile residuals of the model

   mu.fv: the fitted values of the extended mu model, also  sigma.fv,
          nu.fv, tau.fv for the other parameters if present

   mu.lp: the linear predictor of the extended mu model, also 
          sigma.lp, nu.lp, tau.lp for the other parameters if present

   mu.wv: the working variable of the extended mu model, also 
          sigma.wv, nu.wv, tau.wv for the other parameters if present

   mu.wt: the working weights of the mu model, also  sigma.wt, nu.wt,
          tau.wt for the other parameters if present

 mu.link: the link function for the mu model, also  sigma.link,
          nu.link, tau.link for the other parameters if present

mu.terms: the terms for the mu model, also  sigma.terms, nu.terms,
          tau.terms for the other parameters if present

    mu.x: the design matrix for the mu, also  sigma.x, nu.x, tau.x for
          the other parameters if present

   mu.qr: the QR decomposition of the mu model, also sigma.qr, nu.qr,
          tau.qr for the other parameters if present

mu.coefficients: the linear coefficients of the mu model, also 
          sigma.coefficients, nu.coefficients, tau.coefficients for the
          other parameters if present

mu.formula: the formula for the mu model, also  sigma.formula,
          nu.formula, tau.formula for the other parameters if present

   mu.df: the mu degrees of freedom also  sigma.df, nu.df, tau.df for
          the other parameters if present

mu.nl.df: the non linear degrees of freedom, also sigma.nl.df,
          nu.nl.df, tau.nl.df for the other parameters if present

  df.fit: the total degrees of freedom use by the model

df.residual: the residual degrees of freedom left after the model is
          fitted

    data: the original data set

  EMiter: the number of EM iterations

EMconverged: whether the EM has converged

allresiduals: the residuas for the long fit

mass.points: the estimates mass point (if "np" mixture is used)

       K: the number of mass points used

post.prob: contains a matrix of posteriori probabilities,

    prob: the estimated mixture probalilities

     aic: the Akaike information criterion 

     sbc: the Bayesian information criterion

 formula: the formula used in the expanded fit

  random: the random effect formula

pweights: prior weights

     ebp: the Empirical Bayes Predictions (Aitkin, 1996b) on the scale
          of the linear predictor


     Note that in case of Gaussian quadrature, the coefficient given at
     'z' in coefficients corresponds to the standard deviation of the
     mixing distribution. 

     As a by-product, gamlssNP produces a plot showing the global
     deviance against the iteration number.  Further, a plot with the
     EM trajectories is given.  The x-axis corresponds to the iteration
     number, and the y-axis to the value of the mass points at a
     particular iteration.  This plot is not produced when mixture is
     set to "gq"

_A_u_t_h_o_r(_s):

     Mikis Stasinopoulos based on function created by Jochen Einbeck
     John Hinde and Ross Darnell

_R_e_f_e_r_e_n_c_e_s:

     Aitkin, M. and Francis, B. (1995). Fitting overdispersed
     generalized linear models by  nonparametric maximum likelihood.
     GLIM Newsletter 25 , 37-45.

     Aitkin, M. (1996a). A general maximum likelihood analysis of
     overdispersion in generalized linear models.  Statistics and
     Computing 6 , 251-262.

     Aitkin, M. (1996b). Empirical Bayes shrinkage using posterior
     random effect means from nonparametric maximum  likelihood
     estimation in general random effect models. Statistical Modelling:
      Proceedings of the 11th IWSM 1996 , 87-94.

     Aitkin, M., Francis, B. and Hinde, J. (2005) Statistical Modelling
     in GLIM 4. Second Edition,  Oxford Statistical Science Series,
     Oxford, UK.

     Einbeck, J. & Hinde, J. (2005). A note on NPML estimation for
     exponential family regression models  with unspecified dispersion
     parameter. Technical Report IRL-GLWY-2005-04, National University
     of Ireland, Galway.

     Einbeck, J. Darnell R. and  Hinde J. (2006) npmlreg: Nonparametric
     maximum likelihood estimation for random effect models, R package
     version 0.34

     Hinde, J. (1982). Compound Poisson regression models. Lecture
     Notes in Statistics 14 ,109-121. 

     Rigby, R. A. and  Stasinopoulos D. M. (2005). Generalized additive
     models for location, scale and shape, (with discussion), _Appl.
     Statist._, *54*, part 3, pp 507-554.

     Stasinopoulos D. M., Rigby R.A. and Akantziliotou C. (2003)
     Instructions on how to use the GAMLSS package in R. Accompanying
     documentation in the current GAMLSS  help files, (see also  <URL:
     http://www.gamlss.com/>).

_S_e_e _A_l_s_o:

     'gamlss', 'gamlss.family'

_E_x_a_m_p_l_e_s:

     data(enzyme)
     # equivalent model using gamlssNP
     mmNP1 <- gamlssNP(act~1, data=enzyme, random=~1,family=NO, K=2)
     mmNP2 <- gamlssNP(act~1, data=enzyme, random=~1, sigma.fo=~MASS, family=NO, K=2)
     AIC(mmNP1, mmNP2)

