CSDPbinary         package:DPpackage         R Documentation(latin1)

_B_a_y_e_s_i_a_n _a_n_a_l_y_s_i_s _f_o_r _a _s_e_m_i_p_a_r_a_m_e_t_r_i_c _l_o_g_i_s_t_i_c _r_e_g_r_e_s_s_i_o_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     This function generates a posterior density sample for a
     semiparametric binary regression model using a Centrally
     Standarized Dirichlet process prior for the link function.

_U_s_a_g_e:

     CSDPbinary(formula,baseline="logistic",prior,mcmc,state,status,misc=NULL,
              data=sys.frame(sys.parent()),na.action=na.fail)

_A_r_g_u_m_e_n_t_s:

 formula: a two-sided linear formula object describing the model fit,
          with the response on the left of a '~' operator and the
          terms, separated by '+' operators, on the right.

baseline: a description of the baseline error distribution to be used
          in the model. The baseline distributions considered by 
          'CSDPbinary' so far is _logistic_.

   prior: a list giving the prior information. The list includes the
          following parameters: _a0_ and _b0_ giving the
          hyperparameters for prior distribution of the precision
          parameter of the Centrally-Standarized  Dirichlet process
          prior, _alpha_ giving the value of the precision  parameter
          (it must be specified if _a0_ and _b0_ are missing,  see the
          details below), and _beta0_ and _Sbeta0_ giving the 
          hyperparameters of the normal prior distribution for the
          regression coefficients.

    mcmc: a list giving the MCMC parameters. The list must include the
          following integers: _nburn_ giving the number of burn-in 
          scans, _nskip_ giving the thinning interval, _nsave_ giving
          the total number of scans to be saved, _ntheta_ giving the
          thinning  interval for the _theta_ parameter (if missing, the
          value 1 is considered),  _ndisplay_ giving the number of
          saved scans to be displayed on the screen  (the function
          reports on the screen when every _ndisplay_ iterations have
          been carried out), and _tune_ giving the Metropolis tuning
          parameter (the default value is 1.1).

   state: a list giving the current value of the parameters. This list
          is used if the current analysis is the continuation of a
          previous analysis.

  status: a logical variable indicating whether this run is new
          ('TRUE') or the  continuation of a previous analysis
          ('FALSE'). In the latter case the current value of the
          parameters must be specified in the  object _state_.

    misc: misclassification information. When used, this list must
          include two objects, _sens_ and _spec_, giving the
          sensitivity and specificity, respectively. Both can be a
          vector or a scalar.  This information is used to correct for
          misclassification in the conditional bernoulli model.

    data: data frame.

na.action: a function that indicates what should happen when the data
          contain 'NA's. The default action ('na.fail') causes 
          'CSDPbinary' to print an error message and terminate if there
          are any incomplete observations.

_D_e_t_a_i_l_s:

     This generic function fits a semiparametric binary regression
     model using  a Centrally-Standarized Dirichlet Process Prior
     (CSDP) (Newton, Czado and Chappell, 1996):

                   yi = I(Vi <= Xi beta), i=1,...,n


                          V1,...,Vn | G ~ G


                    G | m, p, d, h ~ CSDP(m,p,d,h)


     where, m={m1,m2,m3,m4} is the base measure,  mj(B )=alpha
     G0(B)I{Aj(theta)},

     A1 (theta ) = (-infty,theta-d ], A2 (theta ) = (theta-d,0 ]


        A3 (theta ) = (0,theta ], A4 (theta ) = (theta,infty),


     and h is a uniform distribution on (0,d). Note that in the
     construction of Newton et al. (1996),
     G=((1-2)/2)*(G1+G4)+(p/2)*(G2+G3), where Gj are conditionally
     independent Dirichlet processes with base measure mj.

     To complete the model specification, the following prior
     distributions are assumed,

                    alpha | a0, b0 ~ Gamma(a0,b0)


                beta | beta0, Sbeta0 ~ N(beta0,Sbeta0)


     The precision parameter, alpha, of the _CSDP_ prior  can be
     considered as random, having a _Gamma_ distribution, Gamma(a0,b0),
      or fixed at some particular value. When alpha is random a
     strategy similar to the method  described by Escobar and West
     (1995) is used. To let alpha to be fixed at a particular value,
     set a0 to NULL in the prior specification.

     A Metropolis-Hastings step is used to sample the fully conditional
     distribution of the regression coefficients and errors (see, Jara,
     Garcia-Zattera and  Lesaffre, 2006). In the computational
     implementation of the model, G is  considered as latent data and
     sampled partially with sufficient accuracy to be  able to generate
     V1,...,Vn+1 which are exactly iid G, as proposed by Doss (1994). 
     Both Ferguson's definition of DP and the Sethuraman-Tiwari (1982)
     representation  of the process are used, as described in Jara,
     Garcia-Zattera and Lesaffre (2006). An extra step which moves the
     clusters in such a way that the posterior distribution is still a
     stationary distribution, is performed in order to improve the rate
     of mixing.

_V_a_l_u_e:

     An object of class 'CSDPbinary' representing the semiparametric
     logistic regression model fit. Generic functions such as 'print',
     'plot', 'predict', 'summary', and 'anova' have methods to show the
     results of the fit. The results include _beta_,  the precision
     parameter (_alpha_), the number of clusters (_ncluster_), and  the
     _link_ function.

     The MCMC samples of the parameters and the errors in the model are
     stored in the object  _thetasave_ and _randsave_, respectively.
     Both objects are included in the  list _save.state_ and are
     matrices which can be analyzed directly by functions  provided by
     the coda package.

     The list _state_ in the output object contains the current value
     of the parameters  necessary to restart the analysis. If you want
     to specify different starting values  to run multiple chains set
     _status=TRUE_ and create the list state based on  this starting
     values. In this case the list _state_ must include the following
     objects:

    beta: giving the value of the regression coefficients.

   theta: giving the value of the third quartile parameter.

       v: giving the value of the errors (it must be consistent with
          _yi = I(Vi < xi beta)_.

       y: giving the value of the true response binary variable (only
          if the model considers correction for misclassification).

   alpha: giving the value of the precision parameter.

_A_u_t_h_o_r(_s):

     Alejandro Jara Vallejos <Alejandro.JaraVallejos@med.kuleuven.be>

_R_e_f_e_r_e_n_c_e_s:

     Doss, H. (1994) Bayesian nonparametric estimation for incomplete
     data using  mixtures of Dirichlet priors. The Annals of
     Statistics, 22: 1763 - 1786.

     Escobar, M.D. and West, M. (1995) Bayesian Density Estimation and
     Inference  Using Mixtures. Journal of the American Statistical
     Association, 90: 577-588.

     Jara, A., Garcia-Zattera, M.J., Lesaffre, E. (2006) Semiparametric
     Bayesian Analysis of Misclassified Binary Data. XXIII
     International Biometric Conference, July 16-21, Montral, Canada.

     Newton, M.A., Czado, C., and Chappell, R. (1996) Bayesian
     inference  for semiparametric binary regression. Journal of the
     American Statistical  Association, 91, 142-153.

     Sethuraman, J., and Tiwari, R. C. (1982) Convergence of Dirichlet
     Measures and  the Interpretation of their Parameter, in
     Statistical Decision Theory and Related  Topics III (vol. 2), eds.
     S. S. Gupta and J. O. Berger, New York: Academic Press,  pp. 305 -
     315.

_E_x_a_m_p_l_e_s:

     ## Not run: 
         # Bioassay Data Example
         # Cox, D.R. and Snell, E.J. (1989). Analysis of Binary Data. 2nd ed. 
         # Chapman and Hall. p. 7. 
         # In this example there are 150 subjects at 5 different stimulus 
         # levels, 30 at each level.

           y<-c(rep(0,30-2),rep(1,2),
                rep(0,30-8),rep(1,8),
                rep(0,30-15),rep(1,15),
                rep(0,30-23),rep(1,23),
                rep(0,30-27),rep(1,27))

           x<-c(rep(0,30),
                rep(1,30),
                rep(2,30),
                rep(3,30),
                rep(4,30))

         # Initial state
           state <- NULL

         # MCMC parameters
           nburn<-5000
           nsave<-10000
           nskip<-10
           ntheta<-1
           ndisplay<-100
           mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,
                        ntheta=ntheta,ndisplay=ndisplay,tune=1.1)

         # Prior distribution
           prior <- list(alpha=1, d=2*log(3), p=0.5, beta0=rep(0,2), 
                         Sbeta0=diag(1000,2))

         # Fitting the model

           fit1 <- CSDPbinary(y~x,prior=prior,mcmc=mcmc,state=state,
                              status=TRUE) 
           fit1

         # Summary with HPD and Credibility intervals
           summary(fit1)
           summary(fit1,hpd=FALSE)

         # Plot model parameters (to see the plots gradually set ask=TRUE)
           plot(fit1)

         # Plot an specific model parameter (to see the plots gradually 
         # set ask=TRUE)
           plot(fit1,ask=FALSE,nfigr=1,nfigc=2,param="x")    
           plot(fit1,ask=FALSE,param="link",nfigc=1,nfigr=1)

         # Table of Pseudo Contour Probabilities
           anova(fit1)

         # Predictive Distribution
           npred<-40  
           xnew<-cbind(rep(1,npred),seq(0,4,length=npred))

           pp<-predict(fit1,xnew)       

           plot(seq(0,4,length=npred),pp$pmean,type='l',ylim=c(0,1),
                xlab="log2(concentration)",ylab="Probability")
           
         # Adding MLE estimates
           points(c(0,1,2,3,4),c(0.067,0.267,0.500,0.767,0.900),col="red")

     ## End(Not run)      

