MCMCmixfactanal           package:MCMCpack           R Documentation

_M_a_r_k_o_v _C_h_a_i_n _M_o_n_t_e _C_a_r_l_o _f_o_r _M_i_x_e_d _D_a_t_a _F_a_c_t_o_r _A_n_a_l_y_s_i_s _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     This function generates a sample from the posterior distribution
     of a mixed data (both continuous and ordinal) factor analysis
     model. Normal priors are assumed on the factor loadings and factor
     scores, improper uniform priors are assumed on the cutpoints, and
     inverse gamma priors are assumed for the error variances
     (uniquenesses). The user supplies data and parameters for the
     prior distributions, and a sample from the posterior distribution
     is returned as an mcmc object, which can be subsequently analyzed
     with functions provided in the coda package.

_U_s_a_g_e:

     MCMCmixfactanal(x, factors, lambda.constraints=list(),
                     data=parent.frame(), burnin = 1000, mcmc = 20000,
                     thin=1, tune=NA, verbose = 0, seed = NA,
                     lambda.start = NA, psi.start=NA,
                     l0=0, L0=0, a0=0.001, b0=0.001,
                     store.lambda=TRUE, store.scores=FALSE,
                     std.mean=TRUE, std.var=TRUE, ... )
      

_A_r_g_u_m_e_n_t_s:

       x: A one-sided formula containing the manifest variables.
          Ordinal (including dichotomous) variables must be coded as
          ordered factors. Each level of these ordered factors must be
          present in the data passed to the function.  NOTE: data input
          is different in 'MCMCmixfactanal' than in either
          'MCMCfactanal' or 'MCMCordfactanal'.

 factors: The number of factors to be fitted.

lambda.constraints: List of lists specifying possible equality or
          simple inequality constraints on the factor loadings. A
          typical entry in the list has one of three forms:
          'varname=list(d,c)' which will constrain the dth loading for
          the variable named varname to be equal to c,
          'varname=list(d,"+")' which will constrain the dth loading
          for the variable named varname to be positive, and
          'varname=list(d, "-")' which will constrain the dth loading
          for the variable named varname to be negative. If x is a
          matrix without column names defaults names of ``V1", ``V2",
          ... , etc will be used. Note that, unlike 'MCMCfactanal', the
          Lambda matrix used here has 'factors'+1 columns. The first
          column of Lambda corresponds to negative item difficulty
          parameters for ordinal manifest variables and mean parameters
          for continuous manifest variables and should generally not be
          constrained directly by the user.   

    data: A data frame.

  burnin: The number of burn-in iterations for the sampler.

    mcmc: The number of iterations for the sampler.

    thin: The thinning interval used in the simulation.  The number of
          iterations must be divisible by this value.

    tune: The tuning parameter for the Metropolis-Hastings sampling.
          Can be either a scalar or a k-vector (where k is the number
          of manifest variables). 'tune' must be strictly positive.

 verbose: A switch which determines whether or not the progress of the
          sampler is printed to the screen.  If 'verbose' is great than
          0 the iteration number and the Metropolis-Hastings acceptance
          rate are printed to the screen every 'verbose'th iteration.

    seed: The seed for the random number generator.  If NA, the
          Mersenne Twister generator is used with default seed 12345;
          if an integer is  passed it is used to seed the Mersenne
          twister.  The user can also pass a list of length two to use
          the L'Ecuyer random number generator, which is suitable for
          parallel computation.  The first element of the list is the
          L'Ecuyer seed, which is a vector of length six or NA (if NA 
          a default seed of 'rep(12345,6)' is used).  The second
          element of  list is a positive substream number. See the
          MCMCpack  specification for more details.

lambda.start: Starting values for the factor loading matrix Lambda. If
          'lambda.start' is set to a scalar the starting value for all
          unconstrained loadings will be set to that scalar. If
          'lambda.start' is a matrix of the same dimensions as Lambda
          then the 'lambda.start' matrix is used as the starting values
          (except for equality-constrained elements). If 'lambda.start'
          is set to 'NA' (the default) then starting values for
          unconstrained elements in the first column of Lambda are
          based on the observed response pattern, the remaining
          unconstrained elements of Lambda are set to 0, and starting
          values for inequality constrained elements are set to either
          1.0 or -1.0 depending on the  nature of the constraints.

psi.start: Starting values for the error variance (uniqueness) matrix.
          If  'psi.start' is set to a scalar then the starting value
          for all diagonal elements of 'Psi' that represent error
          variances for continuous variables are set to this value. If
          'psi.start' is a k-vector (where k is the number of manifest
          variables) then the staring value of 'Psi' has 'psi.start' on
          the main diagonal with the exception that entries
          corresponding to error variances for ordinal variables are
          set to 1.. If 'psi.start' is set to 'NA' (the default) the
          starting values of all the continuous variable uniquenesses
          are set to 0.5. Error variances for ordinal response
          variables are always constrained (regardless of the value of
          'psi.start' to have an error variance of 1 in order to
          achieve identification.

      l0: The means of the independent Normal prior on the factor
          loadings. Can be either a scalar or a matrix with the same
          dimensions as 'Lambda'.

      L0: The precisions (inverse variances) of the independent Normal
          prior on the factor loadings. Can be either a scalar or a
          matrix with the same dimensions as 'Lambda'.

      a0: Controls the shape of the inverse Gamma prior on the
          uniqueness. The actual shape parameter is set to 'a0/2'. Can
          be either a scalar or a k-vector.

      b0: Controls the scale of the inverse Gamma prior on the
          uniquenesses. The actual scale parameter is set to 'b0/2'.
          Can be either a scalar or a k-vector.

store.lambda: A switch that determines whether or not to store the
          factor loadings for posterior analysis. By default, the
          factor loadings are all stored.

store.scores: A switch that determines whether or not to store the
          factor scores for posterior analysis.  _NOTE: This takes an
          enormous amount of memory, so should only be used if the
          chain is thinned heavily, or for applications with a small
          number of observations_.  By default, the factor scores are
          not stored.

std.mean: If 'TRUE' (the default) the continuous manifest variables are
          rescaled to have zero mean.

 std.var: If 'TRUE' (the default) the continuous manifest variables are
          rescaled to have unit variance.

     ...: further arguments to be passed

_D_e_t_a_i_l_s:

     The model takes the following form:

     Let 1=1,...,n index observations and j=1,...,K index response
     variables within an observation. An observed variable x_ij can be
     either ordinal with a total of C_j   categories or continuous. The
     distribution of X is governed by a N by K matrix of latent
     variables Xstar and a series of cutpoints gamma. Xstar is assumed
     to be generated according to:


                  xstar_i = Lambda phi_i + epsilon_i


                        epsilon_i ~ N(0, Psi)


     where xstar_i is the k-vector of latent variables specific to
     observation i, Lambda is the k by d matrix of factor loadings, and
     phi_i is the d-vector of latent factor scores. It is assumed that
     the first element of phi_i is equal to 1 for all i. 

     If the jth variable is ordinal, the probability that it takes the
     value c in observation i is:


 pi_ijc = pnorm(gamma_jc - Lambda'_j phi_i) - pnorm(gamma_j(c-1) - Lambda'_j phi_i)


     If the jth variable is continuous, it is assumed that xstar_{ij} =
     x_{ij} for all i. 

     The implementation used here assumes independent conjugate priors
     for each element of Lambda and each phi_i. More specifically we
     assume:


        Lambda_ij ~ N(l0_ij,  L0_ij^-1), i=1,...,k, j=1,...,d



                   phi_i(2:d) ~ N(0, I), i=1,...,n


     'MCMCmixfactanal' simulates from the posterior distribution using
     a Metropolis-Hastings within Gibbs sampling algorithm. The
     algorithm employed is based on work by Cowles (1996).  Note that
     the first element of phi_i is a 1. As a result, the first column
     of Lambda can be interpretated as negative item difficulty
     parameters.  Further, the first element  gamma_1 is normalized to
     zero, and thus not  returned in the mcmc object. The simulation
     proper is done in compiled C++ code to maximize efficiency. 
     Please consult the coda documentation for a comprehensive list of
     functions that can be used to analyze the posterior sample. 

     As is the case with all measurement models, make sure that you
     have plenty of free memory, especially when storing the scores.

_V_a_l_u_e:

     An mcmc object that contains the posterior sample.  This  object
     can be summarized by functions provided by the coda package.

_R_e_f_e_r_e_n_c_e_s:

     Kevin M. Quinn. 2004. ``Bayesian Factor Analysis for Mixed Ordinal
     and Continuous Responses.'' _Political Analysis_. 12: 338-353.

     M. K. Cowles. 1996. ``Accelerating Monte Carlo Markov Chain
     Convergence for Cumulative-link Generalized Linear Models."
     _Statistics and Computing._ 6: 101-110.

     Valen E. Johnson and James H. Albert. 1999. ``Ordinal Data
     Modeling."  Springer: New York.

     Andrew D. Martin, Kevin M. Quinn, and Daniel Pemstein.  2004.  
     _Scythe Statistical Library 1.0._ <URL: http://scythe.wustl.edu>.

     Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002.
     _Output Analysis and Diagnostics for MCMC (CODA)_. <URL:
     http://www-fis.iarc.fr/coda/>.

_S_e_e _A_l_s_o:

     'plot.mcmc', 'summary.mcmc', 'factanal', 'MCMCfactanal',
     'MCMCordfactanal', 'MCMCirt1d', 'MCMCirtKd'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     data(PErisk)

     post <- MCMCmixfactanal(~courts+barb2+prsexp2+prscorr2+gdpw2,
                             factors=1, data=PErisk,
                             lambda.constraints = list(courts=list(2,"-")),
                             burnin=5000, mcmc=1000000, thin=50,
                             verbose=500, L0=.25, store.lambda=TRUE,
                             store.scores=TRUE, tune=1.2)
     plot(post)
     summary(post)



     library(MASS)
     data(Cars93)
     attach(Cars93)
     new.cars <- data.frame(Price, MPG.city, MPG.highway,
                      Cylinders, EngineSize, Horsepower,
                      RPM, Length, Wheelbase, Width, Weight, Origin)
     rownames(new.cars) <- paste(Manufacturer, Model)
     detach(Cars93)

     # drop obs 57 (Mazda RX 7) b/c it has a rotary engine
     new.cars <- new.cars[-57,]
     # drop 3 cylinder cars
     new.cars <- new.cars[new.cars$Cylinders!=3,]
     # drop 5 cylinder cars
     new.cars <- new.cars[new.cars$Cylinders!=5,]

     new.cars$log.Price <- log(new.cars$Price)
     new.cars$log.MPG.city <- log(new.cars$MPG.city)
     new.cars$log.MPG.highway <- log(new.cars$MPG.highway)
     new.cars$log.EngineSize <- log(new.cars$EngineSize)
     new.cars$log.Horsepower <- log(new.cars$Horsepower)

     new.cars$Cylinders <- ordered(new.cars$Cylinders)
     new.cars$Origin    <- ordered(new.cars$Origin)


     post <- MCMCmixfactanal(~log.Price+log.MPG.city+
                      log.MPG.highway+Cylinders+log.EngineSize+
                      log.Horsepower+RPM+Length+
                      Wheelbase+Width+Weight+Origin, data=new.cars,
                      lambda.constraints=list(log.Horsepower=list(2,"+"),
                      log.Horsepower=c(3,0), weight=list(3,"+")),
                      factors=2,
                      burnin=5000, mcmc=500000, thin=100, verbose=500,
                      L0=.25, tune=3.0)
     plot(post)
     summary(post)

     ## End(Not run)

