DPbinary          package:DPpackage          R Documentation(latin1)

_B_a_y_e_s_i_a_n _a_n_a_l_y_s_i_s _f_o_r _a _s_e_m_i_p_a_r_a_m_e_t_r_i_c _B_e_r_n_o_u_l_l_i _r_e_g_r_e_s_s_i_o_n _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     This function generates a posterior density sample for a
     semiparametric binary regression model.

_U_s_a_g_e:

     DPbinary(formula,inter=TRUE,baseline="logistic",prior,mcmc,state,status,
              misc=NULL,data=sys.frame(sys.parent()),na.action=na.fail)

_A_r_g_u_m_e_n_t_s:

 formula: a two-sided linear formula object describing the model fit,
          with the response on the left of a '~' operator and the
          terms, separated by '+' operators, on the right.

   inter: a logical variable indicating whether the intercept should be
          explicitly included in the linear predictor 'TRUE' or
          absorved in the unknown link function 'FALSE'. The default
          option is 'TRUE'. Note that if inter is 'FALSE' the dimension
          of the prior distribution for the regression coefficients
          should not consider the intercept in the model.

baseline: a description of the baseline error distribution to be used
          in the model. The baseline distributions considered by 
          'DPbinary' so far are _logistic_ (default), _normal_, and
          _cauchy_.

   prior: a list giving the prior information. The list includes the
          following parameters: _a0_ and _b0_ giving the
          hyperparameters for prior distribution of the precision
          parameter of the Dirichlet process prior, _alpha_ giving the
          value of the precision parameter (it  must be specified if
          _a0_ and _b0_ are missing, see the details below), and
          _beta0_ and _Sbeta0_ giving the  hyperparameters of the
          normal prior distribution for the regression coefficients.

    mcmc: a list giving the MCMC parameters. The list must include the
          following integers: _nburn_ giving the number of burn-in 
          scans, _nskip_ giving the thinning interval, _nsave_ giving
          the total number of scans to be saved, _ndisplay_ giving the
          number of saved scans to be displayed on the screen (the
          function reports  on the screen when every _ndisplay_
          iterations have been carried out), and _tune_ giving the
          Metropolis tuning parameter (the default value is 1.1).

   state: a list giving the current value of the parameters. This list
          is used if the current analysis is the continuation of a
          previous analysis.

  status: a logical variable indicating whether this run is new
          ('TRUE') or the  continuation of a previous analysis
          ('FALSE'). In the latter case the current value of the
          parameters must be specified in the  object _state_.

    misc: misclassification information. When used, this list must
          include two objects, _sens_ and _spec_, giving the
          sensitivity and specificity, respectively. Both can be a
          vector or a scalar.  This information is used to correct for
          misclassification in the conditional bernoulli model.

    data: data frame.

na.action: a function that indicates what should happen when the data
          contain 'NA's. The default action ('na.fail') causes 
          'DPbinary' to print an error message and terminate if there
          are any incomplete observations.

_D_e_t_a_i_l_s:

     This generic function fits a semiparametric binary regression
     model using a Dirichlet process prior (see, Jara, Garcia-Zattera
     and Lesaffre, 2006):

                   yi = I(Vi <= Xi beta), i=1,...,n


                          V1,...,Vn | G ~ G


                     G | alpha, G0 ~ DP(alphaG0)

     where, G0 = Logistic(V| 0, 1) if the baseline is logistic, G0 =
     N(V| 0, 1) if the baseline is normal, and  G0 = Cauchy(V| 0, 1) if
     the baseline is cauchy.  To complete the model specification, the
     following prior distributions are assumed,

                    alpha | a0, b0 ~ Gamma(a0,b0)


                beta | beta0, Sbeta0 ~ N(beta0,Sbeta0)

     The precision or total mass parameter, alpha, of the _DP_ prior 
     can be considered as random, having a _gamma_ distribution,
     Gamma(a0,b0),  or fixed at some particular value. When alpha is
     random the method described by Escobar and West (1995) is used. To
     let alpha to be fixed at a particular value, set a_0 to NULL in
     the prior specification.

     A Metropolis-Hastings step is used to sample the fully conditional
     distribution of the regression coefficients and errors (see, Jara,
     Garcia-Zattera and  Lesaffre, 2006). In the computational
     implementation of the model, G is  considered as latent data and
     sampled partially with sufficient accuracy to be  able to generate
     V1,...,Vn+1 which are exactly iid G, as proposed by Doss (1994). 
     Both Ferguson's definition of DP and the Sethuraman-Tiwari (1982)
     representation  of the process are used, as described in Jara,
     Garcia-Zattera and Lesaffre (2006). An extra step which moves the
     clusters in such a way that the posterior distribution is still a
     stationary distribution, is performed in order to improve the rate
     of mixing.

_V_a_l_u_e:

     An object of class 'DPbinary' representing the semiparametric
     logistic regression model fit. Generic functions such as 'print',
     'plot', 'predict', 'summary', and 'anova' have methods to show the
     results of the fit. The results include 'beta',  the precision
     parameter (_alpha_), the number of clusters (_ncluster_), and the
     _link_  function.

     The MCMC samples of the parameters and the errors in the model are
     stored in the object  _thetasave_ and _randsave_, respectively.
     Both objects are included in the  list _save.state_ and are
     matrices which can be analyzed directly by functions  provided by
     the coda package.

     The list _state_ in the output object contains the current value
     of the parameters  necessary to restart the analysis. If you want
     to specify different starting values  to run multiple chains set
     _status=TRUE_ and create the list state based on  this starting
     values. In this case the list _state_ must include the following
     objects:

    beta: giving the value of the regression coefficients.

       v: giving the value of the errors (it must be consistent with
          _yi = I(Vi < xi beta)_.

       y: giving the value of the true response binary variable (only
          if the model considers correction for misclassification).

   alpha: giving the value of the precision parameter.

_A_u_t_h_o_r(_s):

     Alejandro Jara Vallejos <Alejandro.JaraVallejos@med.kuleuven.be>

_R_e_f_e_r_e_n_c_e_s:

     Doss, H. (1994) Bayesian nonparametric estimation for incomplete
     data using  mixtures of Dirichlet priors. The Annals of
     Statistics, 22: 1763 - 1786.

     Escobar, M.D. and West, M. (1995) Bayesian Density Estimation and
     Inference  Using Mixtures. Journal of the American Statistical
     Association, 90: 577-588.

     Jara, A., Garcia-Zattera, M.J., Lesaffre, E. (2006) Semiparametric
     Bayesian Analysis of Misclassified Binary Data. XXIII
     International Biometric Conference, July 16-21, Montral, Canada.

     Sethuraman, J., and Tiwari, R. C. (1982) Convergence of Dirichlet
     Measures and  the Interpretation of their Parameter, in
     Statistical Decision Theory and Related  Topics III (vol. 2), eds.
     S. S. Gupta and J. O. Berger, New York: Academic Press,  pp. 305 -
     315.

_E_x_a_m_p_l_e_s:

     ## Not run: 
         # Bioassay Data Example
         # Cox, D.R. and Snell, E.J. (1989). Analysis of Binary Data. 2nd ed. 
         # Chapman and Hall. p. 7  
         # In this example there are 150 subjects at 5 different stimulus levels, 
         # 30 at each level.

           y<-c(rep(0,30-2),rep(1,2),
                rep(0,30-8),rep(1,8),
                rep(0,30-15),rep(1,15),
                rep(0,30-23),rep(1,23),
                rep(0,30-27),rep(1,27))

           x<-c(rep(0,30),
                rep(1,30),
                rep(2,30),
                rep(3,30),
                rep(4,30))

         # Initial state
           state <- NULL

         # MCMC parameters
           nburn<-5000
           nsave<-10000
           nskip<-10
           ndisplay<-100
           mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,ndisplay=ndisplay,
                        tune=1.1)

         # Prior distribution
           prior <- list(a0=2,b0=1,beta0=rep(0,2), Sbeta0=diag(10000,2))

         # Fit the model
           fit1 <- DPbinary(y~x,prior=prior,mcmc=mcmc,state=state,status=TRUE) 
           fit1

         # Summary with HPD and Credibility intervals
           summary(fit1)
           summary(fit1,hpd=FALSE)

         # Plot model parameters (to see the plots gradually set ask=TRUE)
           plot(fit1)
           plot(fit1,nfigr=2,nfigc=2)        

         # Plot an specific model parameter (to see the plots gradually 
         # set ask=TRUE)
           plot(fit1,ask=FALSE,nfigr=1,nfigc=2,param="x")    
           plot(fit1,ask=FALSE,nfigr=1,nfigc=2,param="ncluster")     
           plot(fit1,ask=FALSE,param="link",nfigc=1,nfigr=1)

         # Table of Pseudo Contour Probabilities
           anova(fit1)

         # Predictive Distribution
           npred<-40  
           xnew<-cbind(rep(1,npred),seq(0,4,length=npred))

           pp<-predict(fit1,xnew)       

           plot(seq(0,4,length=npred),pp$pmean,type='l',ylim=c(0,1),
                xlab="log2(concentration)",ylab="Probability")
           
         # Adding MLE estimates
           points(c(0,1,2,3,4),c(0.067,0.267,0.500,0.767,0.900),col="red")

     ## End(Not run)      

