ssden                  package:gss                  R Documentation

_E_s_t_i_m_a_t_i_n_g _P_r_o_b_a_b_i_l_i_t_y _D_e_n_s_i_t_y _U_s_i_n_g _S_m_o_o_t_h_i_n_g _S_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Estimate probability densities using smoothing spline ANOVA
     models. The symbolic model specification via 'formula' follows the
     same rules as in 'lm', but with the response missing.

_U_s_a_g_e:

     ssden(formula, type=NULL, data=list(), alpha=1.4, weights=NULL,
           subset, na.action=na.omit, id.basis=NULL, nbasis=NULL, seed=NULL,
           domain=as.list(NULL), quadrature=NULL, prec=1e-7, maxiter=30)

_A_r_g_u_m_e_n_t_s:

 formula: Symbolic description of the model to be fit.

    type: List specifying the type of spline for each variable. See
          'mkterm' for details.

    data: Optional data frame containing the variables in the model.

   alpha: Parameter defining cross-validation score for smoothing
          parameter selection.

 weights: Optional vector of bin-counts for histogram data.

  subset: Optional vector specifying a subset of observations to be
          used in the fitting process.

na.action: Function which indicates what should happen when the data
          contain NAs.

id.basis: Index of observations to be used as "knots."

  nbasis: Number of "knots" to be used.  Ignored when 'id.basis' is
          specified.

    seed: Seed to be used for the random generation of "knots." Ignored
          when 'id.basis' is specified.

  domain: Data frame specifying marginal support of density.

quadrature: Quadrature for calculating integral.  Mandatory if
          variables other than factors or numerical vectors are
          involved.

    prec: Precision requirement for internal iterations.

 maxiter: Maximum number of iterations allowed for internal iterations.

_D_e_t_a_i_l_s:

     The model specification via 'formula' is for the log density. For
     example, '~x1*x2' prescribes a model of the form

       log f(x1,x2) = g_{1}(x1) + g_{2}(x2) + g_{12}(x1,x2) + C

     with the terms denoted by '"x1"', '"x2"', and '"x1:x2"'; the
     constant is determined by the fact that a density integrates to
     one.

     The selective term elimination may characterize (conditional)
     independence structures between variables.  For example,
     '~x1*x2+x1*x3' yields the conditional independence of x2 and x3
     given x1.

     Parallel to those in a 'ssanova' object, the model terms are sums
     of unpenalized and penalized terms.  Attached to every penalized
     term there is a smoothing parameter, and the model complexity is
     largely determined by the number of smoothing parameters.

     The selection of smoothing parameters is through a
     cross-validation mechanism described in the references, with a
     parameter 'alpha'; 'alpha=1' is "unbiased" for the minimization of
     Kullback-Leibler loss but may yield severe undersmoothing, whereas
     larger 'alpha' yields smoother estimates.

     A subset of the observations are selected as "knots."  Unless
     specified via 'id.basis' or 'nbasis', the number of "knots" q is
     determined by max(30,10n^{2/9}), which is appropriate for the
     default cubic splines for numerical vectors.

_V_a_l_u_e:

     'ssden' returns a list object of 'class "ssden"'.

     'dssden' and 'cdssden' can be used to evaluate the estimated joint
     density and conditional density; 'pssden', 'qssden', 'cpssden',
     and 'cqssden' can be used to evaluate (conditional) cdf and
     quantiles.  The method 'project.ssden' can be used to calculate
     the Kullback-Leibler projection for model selection.

_N_o_t_e:

     Default quadrature will be constructed for up to 4 numerical
     vectors on a hyper cube, then outer product with factor levels
     will be taken if factors are involved.  The sides of the hyper
     cube are specified by 'domain'; for 'domain$x' missing, the
     default is 'c(min(x),max(x))+c(-1,1)*(max(x)-mimn(x))*.05'.

     On a 1-D interval, the quadrature is the 200-point Gauss-Legendre
     formula returned from 'gauss.quad'.  For 2, 3, or 4 numerical
     vectors, delayed Smolyak cubatures from 'smolyak.quad' with 449,
     2527, and 13697 points are used on cubes with the marginals
     properly transformed; see Gu and Wang (2003) for the marginal
     transformations.

     The results may vary from run to run.  For consistency, specify
     'id.basis' or set 'seed'.

_A_u_t_h_o_r(_s):

     Chong Gu, chong@stat.purdue.edu

_R_e_f_e_r_e_n_c_e_s:

     Gu, C. (2002), _Smoothing Spline ANOVA Models_.  New York:
     Springer-Verlag.

     Gu, C. and Wang, J. (2003), Penalized likelihood density
     estimation: Direct cross-validation and scalable approximation.
     _Statistica Sinica_, *13*, 811-826.

_E_x_a_m_p_l_e_s:

     ## 1-D estimate: Buffalo snowfall
     data(buffalo)
     buff.fit <- ssden(~buffalo,domain=data.frame(buffalo=c(0,150)))
     plot(xx<-seq(0,150,len=101),dssden(buff.fit,xx),type="l")
     plot(xx,pssden(buff.fit,xx),type="l")
     plot(qq<-seq(0,1,len=51),qssden(buff.fit,qq),type="l")
     ## Clean up
     ## Not run: 
     rm(buffalo,buff.fit,xx,qq)
     dev.off()
     ## End(Not run)

     ## 2-D with triangular domain: AIDS incubation
     data(aids)
     ## rectangular quadrature
     quad.pt <- expand.grid(incu=((1:40)-.5)/40*100,infe=((1:40)-.5)/40*100)
     quad.pt <- quad.pt[quad.pt$incu<=quad.pt$infe,]
     quad.wt <- rep(1,nrow(quad.pt))
     quad.wt[quad.pt$incu==quad.pt$infe] <- .5
     quad.wt <- quad.wt/sum(quad.wt)*5e3
     ## additive model (pre-truncation independence)
     aids.fit <- ssden(~incu+infe,data=aids,subset=age>=60,
                       domain=data.frame(incu=c(0,100),infe=c(0,100)),
                       quad=list(pt=quad.pt,wt=quad.wt))
     ## conditional (marginal) density of infe
     jk <- cdssden(aids.fit,xx<-seq(0,100,len=51),data.frame(incu=50))
     plot(xx,jk$pdf,type="l")
     ## conditional (marginal) quantiles of infe (TIME-CONSUMING)
     ## Not run: 
     cqssden(aids.fit,c(.05,.25,.5,.75,.95),data.frame(incu=50),jk$int)
     ## End(Not run)
     ## Clean up
     ## Not run: 
     rm(aids,quad.pt,quad.wt,aids.fit,jk,xx)
     dev.off()
     ## End(Not run)

     ## One factor plus one vector
     data(gastric)
     gastric$trt
     fit <- ssden(~futime*trt,data=gastric)
     ## conditional density
     cdssden(fit,c("1","2"),cond=data.frame(futime=150))
     ## conditional quantiles
     cqssden(fit,c(.05,.25,.5,.75,.95),data.frame(trt="1"))
     ## Clean up
     ## Not run: rm(gastric,fit)

