DPdensity             package:DPpackage             R Documentation

_S_e_m_i_p_a_r_a_m_e_t_r_i_c _B_a_y_e_s_i_a_n _d_e_n_s_i_t_y _e_s_t_i_m_a_t_i_o_n _u_s_i_n_g _a _D_P_M _o_f _n_o_r_m_a_l_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function generates a posterior density sample for a 
     Dirichlet process mixture of normals model.

_U_s_a_g_e:

     DPdensity(y,ngrid=1000,prior,mcmc,state,status,method="neal",
               data=sys.frame(sys.parent()),na.action=na.fail)      
           

_A_r_g_u_m_e_n_t_s:

       y: a vector or matrix giving the data from which the density
          estimate  is to be computed.

   ngrid: number of grid points where the density estimate is 
          evaluated. This is only used if dimension of 'y' is lower or
          equal than 2. The default value is 1000.

   prior: a list giving the prior information. The list includes the
          following parameter: 'a0' and 'b0' giving the hyperparameters
          for prior distribution of the precision parameter of the
          Dirichlet process prior, 'alpha' giving the value of the
          precision parameter (it  must be specified if 'a0' is
          missing, see details below), 'nu2' and 'psiinv2' giving the
          hyperparameters of the  inverted Wishart prior distribution
          for the scale matrix, 'Psi1',  of the inverted Wishart part
          of the baseline distribution, 'tau1' and 'tau2' giving the
          hyperparameters for the  gamma prior distribution of the
          scale parameter 'k0' of the normal part of the baseline
          distribution, 'm2' and 's2' giving the mean and the
          covariance of the normal prior for the mean, 'm1', of the
          normal  component of the baseline distribution, respectively,
          'nu1' and  'psiinv1' (it must be specified if 'nu2' is
          missing, see details below) giving the hyperparameters of the
          inverted Wishart part of the baseline distribution and, 'm1'
          giving the mean of the normal part of the baseline 
          distribution (it must be specified if 'm2' is missing, see
          details below) and, 'k0' giving the scale parameter of the
          normal part of the baseline distribution (it must be
          specified if 'tau1' is missing, see details below).

    mcmc: a list giving the MCMC parameters. The list must include the
          following integers: 'nburn' giving the number of burn-in 
          scans, 'nskip' giving the thinning interval, 'nsave' giving
          the total number of scans to be saved, and 'ndisplay' giving
          the number of saved scans to be displayed on screen (the
          function reports  on the screen when every 'ndisplay'
          iterations have been carried out).

   state: a list giving the current value of the parameters. This list
          is used if the current analysis is the continuation of a
          previous analysis.

  status: a logical variable indicating whether this run is new
          ('TRUE') or the  continuation of a previous analysis
          ('FALSE'). In the latter case the current value of the
          parameters must be specified in the  object 'state'.

  method: the method to be used. See 'Details'.

    data: data frame.

na.action: a function that indicates what should happen when the data
          contain 'NA's. The default action ('na.fail') causes 
          'DPdensity' to print an error message and terminate if there
          are any incomplete observations.

_D_e_t_a_i_l_s:

     This generic function fits a Dirichlet process mixture of normal
     model for density estimation (Escobar and West, 1995):

              y | mui, Sigmai ~ N(mui,Sigmai), i=1,...,n


                         (mui,Sigmai) | G ~ G


                     G | alpha, G0 ~ DP(alpha G0)


     where, the baseline distribution is the conjugate
     normal-inverted-Wishart,

         G0 = N(mu| m1, (1/k0) Sigma) IW (Sigma | nu1, psi1)


     To complete the model specification, independent hyperpriors are
     assumed (optional),

                    alpha | a0, b0 ~ Gamma(a0,b0)


                        m1 | m2, s2 ~ N(m2,s2)


                k0 | tau1, tau2 ~ Gamma(tau1/2,tau2/2)


                   psi1 | nu2, psi2 ~ IW(nu2,psi2)


     Note that the inverted-Wishart prior is parametrized such that if
     A ~ IWq(nu, psi) then E(A)= psiinv/(nu-q-1).

     To let part of the baseline distribution fixed at a particular
     value, set the corresponding hyperparameters of the prior
     distributions to NULL  in the hyperprior specification of the
     model.

     Although the baseline distribution, G0, is a conjugate prior in
     this model specification, the algorithms with auxiliary parameters
     described in MacEachern and Muller (1998) and Neal (2000) are
     adopted. Specifically, the no-gaps algorithm of  MacEachern and
     Muller (1998), '"no-gaps"' and the algorithm 8 with m=1  of Neal
     (2000), '"neal"', are considered in the 'DPdensity' function.  The
     default method is the algorithm 8 of Neal.

_V_a_l_u_e:

     An object of class 'DPdensity' representing the DP mixture of
     normals model fit. Generic functions such as 'print', 'summary',
     and 'plot' have methods to  show the results of the fit. The
     results include the baseline parameters, 'alpha', and the  number
     of clusters.

     The function 'DPrandom' can be used to extract the posterior mean
     of the  subject-specific means and covariance matrices.

     The MCMC samples of the parameters and the errors in the model are
     stored in the object  'thetasave' and 'randsave', respectively.
     Both objects are included in the  list 'save.state' and are
     matrices which can be analyzed directly by functions  provided by
     the coda package.

     The list 'state' in the output object contains the current value
     of the parameters  necessary to restart the analysis. If you want
     to specify different starting values  to run multiple chains set
     'status=TRUE' and create the list state based on  this starting
     values. In this case the list 'state' must include the following
     objects: 

ncluster: an integer giving the number of clusters.

  muclus: a matrix of dimension (nobservations+2)*(nvariables) giving
          the means of the clusters  (only the first 'ncluster' are
          considered to start the chain).

sigmaclus: a matrix of dimension (nobservations+2)*(
          (nvariables)*((nvariables)+1)/2) giving  the lower matrix of
          the covariance matrix of the clusters (only the first
          'ncluster' are  considered to start the chain).

      ss: an interger vector defining to which of the 'ncluster'
          clusters each observation belongs.

   alpha: giving the value of the precision parameter.

      m1: giving the mean of the normal components of the baseline
          distribution.

      k0: giving the scale parameter of the normal part of the baseline
          distribution.

    psi1: giving the scale matrix of the inverted-Wishart part of the
          baseline distribution.

_A_u_t_h_o_r(_s):

     Alejandro Jara <Alejandro.JaraVallejos@med.kuleuven.be>

_R_e_f_e_r_e_n_c_e_s:

     Escobar, M.D. and West, M. (1995) Bayesian Density Estimation and
     Inference  Using Mixtures. Journal of the American Statistical
     Association, 90: 577-588.

     MacEachern, S. N. and Muller, P. (1998) Estimating mixture of
     Dirichlet Process Models. Journal of Computational and Graphical
     Statistics, 7 (2): 223-338.

     Neal, R. M. (2000). Markov Chain sampling methods for Dirichlet
     process mixture models. Journal of Computational and Graphical
     Statistics, 9: 249-265.

_S_e_e _A_l_s_o:

     'DPrandom', 'PTdensity', 'BDPdensity'

_E_x_a_m_p_l_e_s:

     ## Not run: 
         ####################################
         # Univariate example
         ####################################

         # Data
           data(galaxy)
           galaxy<-data.frame(galaxy,speeds=galaxy$speed/1000) 
           attach(galaxy)

         # Initial state
           state <- NULL

         # MCMC parameters

           nburn<-1000
           nsave<-10000
           nskip<-10
           ndisplay<-100
           mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,ndisplay=ndisplay)

         # Example of Prior information 1
         # Fixing alpha, m1, and Psi1

           prior1<-list(alpha=1,m1=rep(0,1),psiinv1=diag(0.5,1),nu1=4,tau1=1,tau2=100)

         # Example of Prior information 2
         # Fixing alpha and m1

           prior2<-list(alpha=1,m1=rep(0,1),psiinv2=solve(diag(0.5,1)),nu1=4,nu2=4,
                        tau1=1,tau2=100)

         # Example of Prior information 3
         # Fixing only alpha

           prior3<-list(alpha=1,m2=rep(0,1),s2=diag(100000,1),
                        psiinv2=solve(diag(0.5,1)),
                        nu1=4,nu2=4,tau1=1,tau2=100)

         # Example of Prior information 4
         # Everything is random

           prior4<-list(a0=2,b0=1,m2=rep(0,1),s2=diag(100000,1),
                        psiinv2=solve(diag(0.5,1)),
                        nu1=4,nu2=4,tau1=1,tau2=100)

         # Fit the models

           fit1.1<-DPdensity(y=speeds,prior=prior1,mcmc=mcmc,state=state,status=TRUE)
           fit1.2<-DPdensity(y=speeds,prior=prior2,mcmc=mcmc,state=state,status=TRUE)
           fit1.3<-DPdensity(y=speeds,prior=prior3,mcmc=mcmc,state=state,status=TRUE)
           fit1.4<-DPdensity(y=speeds,prior=prior4,mcmc=mcmc,state=state,status=TRUE)

         # Posterior means
           fit1.1
           fit1.2
           fit1.3
           fit1.4

         # Plot the estimated density
           plot(fit1.1,ask=FALSE)
           plot(fit1.2,ask=FALSE)
           plot(fit1.3,ask=FALSE)
           plot(fit1.4,ask=FALSE)

         # Extracting the density estimate
           cbind(fit1.1$x1,fit1.1$dens)
           cbind(fit1.2$x1,fit1.2$dens)
           cbind(fit1.3$x1,fit1.3$dens)
           cbind(fit1.4$x1,fit1.4$dens)
           
         # Plot the parameters (only prior 2 for illustration)
         # (to see the plots gradually set ask=TRUE)
           plot(fit1.2,ask=FALSE,output="param")

         # Plot the a specific parameters 
         # (to see the plots gradually set ask=TRUE)
           plot(fit1.2,ask=FALSE,output="param",param="psi1-speeds",nfigr=1,nfigc=2)

         # Extracting the posterior mean of the specific means and covariance matrices 
         # (only prior 2 for illustration)
           DPrandom(fit1.2) 

         # Ploting predictive information about the specific means and covariance matrices 
         # with HPD and Credibility intervals
         # (only prior 2 for illustration)
         # (to see the plots gradually set ask=TRUE)
           plot(DPrandom(fit1.2,predictive=TRUE),ask=FALSE)
           plot(DPrandom(fit1.2,predictive=TRUE),ask=FALSE,hpd=FALSE)

         # Ploting information about all the specific means and covariance matrices 
         # with HPD and Credibility intervals
         # (only prior 2 for illustration)
         # (to see the plots gradually set ask=TRUE)
           plot(DPrandom(fit1.2),ask=FALSE,hpd=FALSE)



         ####################################
         # Bivariate example
         ####################################

         # Data
           data(airquality)
           attach(airquality)

           ozone<-Ozone**(1/3)
           radiation<-Solar.R

         # Prior information

           s2<-matrix(c(10000,0,0,1),ncol=2)
           m2<-c(180,3)
           psiinv2<-solve(matrix(c(10000,0,0,1),ncol=2))
          
           prior<-list(a0=1,b0=1/5,nu1=4,nu2=4,s2=s2,
                       m2=m2,psiinv2=psiinv2,tau1=0.01,tau2=0.01)

         # Initial state
           state <- NULL

         # MCMC parameters

           nburn<-5000
           nsave<-10000
           nskip<-10
           ndisplay<-1000
           mcmc <- list(nburn=nburn,nsave=nsave,nskip=nskip,ndisplay=ndisplay)

         # Fit the model
           fit1<-DPdensity(y=cbind(radiation,ozone),prior=prior,mcmc=mcmc,
                           state=state,status=TRUE,na.action=na.omit)

         # Plot the estimated density
           plot(fit1)

         # Extracting the density estimate
           fit1$x1
           fit1$x2
           fit1$dens
     ## End(Not run)

