ecoNP                  package:eco                  R Documentation

_F_i_t_t_i_n_g _t_h_e _N_o_n_p_a_r_a_m_e_t_r_i_c _B_a_y_e_s_i_a_n _M_o_d_e_l _o_f _E_c_o_l_o_g_i_c_a_l _I_n_f_e_r_e_n_c_e
_i_n _2_x_2 _T_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'ecoNP' is used to fit the nonparametric Bayesian model (based on
     a Dirichlet process prior) for ecological inference in 2 times 2
     tables via Markov chain Monte Carlo. It gives the in-sample
     predictions as well as out-of-sample predictions for population
     inference.  The model and algorithm are described in Imai and Lu
     (2004). The contextual effect can also be modeled by following the
     strategy described in Imai and Lu (2005).

_U_s_a_g_e:

     ecoNP(formula, data = parent.frame(), N = NULL, supplement = NULL,
           context = FALSE, mu0 = 0, tau0 = 2, nu0 = 4, S0 = 10, 
           alpha = NULL, a0 = 1, b0 = 0.1, parameter = FALSE, 
           grid = FALSE, n.draws = 5000, burnin = 0, thin = 0, 
           verbose = FALSE)

_A_r_g_u_m_e_n_t_s:

 formula: A symbolic description of the model to be fit, specifying the
          column and row margins of 2 times 2 ecological tables. 'Y ~
          X' specifies 'Y' as the column margin and 'X' as the row
          margin. Details and specific examples are given below. 

    data: An optional data frame in which to interpret the variables in
          'formula'. The default is the environment in which 'ecoNP' is
          called.  

       N: An optional variable representing the size of the unit; e.g.,
          the total number of voters.

supplement: An optional matrix of supplemental data. The matrix has two
          columns, which contain additional individual-level data such
          as survey data for W_1 and W_2, respectively.  If 'NULL', no
          additional individual-level data are included in the model.
          The default is 'NULL'. 

 context: Logical. If 'TRUE', the contextual effect is also modeled.
          See Imai and Lu (2005) for details. The default is 'FALSE'.  

     mu0: A scalar or a numeric vector that specifies the prior mean 
          for the mean parameter mu. If it is a scalar, then its value
          will be repeated to yield a vector of the length of mu,
          otherwise, it needs to be a vector of same length as mu. When
          'context=TRUE ', the length of mu is 3,  otherwise it is 2.
          The default is '0'.  

    tau0: A positive integer representing the prior scale for the mean
          parameter mu. The default is '2'. 

     nu0: A positive integer representing the prior degrees of freedom
          of the variance matrix Sigma. the default is '4'. 

      S0: A postive scalar or a positive definite matrix that specifies
           the prior scale matrix for the variance matrix Sigma. If it
          is  a scalar, then the prior scale matrix will be a digonal
          matrix with  the same dimensions as Sigma and the diagonal
          elements all take value  of 'S0', otherwise 'S0' needs to
          have same dimensions as  Sigma. When 'context=TRUE', Sigma is
          a  3 times 3 matrix, otherwise, it is 2 times 2.  The default
          is '10'. 

   alpha: A positive scalar representing a user-specified fixed value
          of the concentration parameter, alpha. If 'NULL', alpha will
          be updated at each Gibbs draw, and its prior parameters 'a0'
          and 'b0' need to be specified. The default is 'NULL'.  

      a0: A positive integer representing the value of shape parameter
          of the gamma prior distribution for alpha. The default is
          '1'. 

      b0: A positive integer representing the value of the scale
          parameter of the gamma prior distribution for alpha. The
          default is '0.1'. 

parameter: Logical. If 'TRUE', the Gibbs draws of the population
          parameters, mu and Sigma, are returned in addition to the
          in-sample predictions of the missing internal cells, W. The
          default is 'FALSE'. This needs to be set to 'TRUE' if one
          wishes to make population inferences through 'predict.eco'.
          See an example below. 

    grid: Logical. If 'TRUE', the grid method is used to sample W in
          the Gibbs sampler. If 'FALSE', the Metropolis algorithm is
          used where candidate draws are sampled from the uniform
          distribution on the tomography line for each unit. Note that
          the grid method is significantly slower than the Metropolis
          algorithm. 

 n.draws: A positive integer. The number of MCMC draws. The default is
          '5000'. 

  burnin: A positive integer. The burnin interval for the Markov chain;
          i.e. the number of initial draws that should not be stored.
          The default is '0'. 

    thin: A positive integer. The thinning interval for the Markov
          chain; i.e. the number of Gibbs draws between the recorded
          values that are skipped. The default is '0'. 

 verbose: Logical. If 'TRUE', the progress of the gibbs  sampler is
          printed to the screen. The default is 'FALSE'. 

_D_e_t_a_i_l_s:

     An example of 2 times 2 ecological table for racial voting is
     given below: 

                  black voters  white voters    
       Voted         W_{1i}        W_{2i}      Y_i
       Not voted    1-W_{1i}      1-W_{2i}    1-Y_i
                      X_i          1-X_i        

     where Y_i and X_i represent the observed margins, and W_1 and W_2
     are unknown variables. All variables are proportions and hence
     bounded between 0 and 1. For each i, the following deterministic
     relationship holds, Y_i=X W_{1i}+(1-X_i)W_{2i}.

_V_a_l_u_e:

     An object of class 'ecoNP' containing the following elements: 

    call: The matched call.

       X: The row margin, X.

       Y: The column margin, Y.

  burnin: The number of initial burnin draws.

    thin: The thinning interval.

     nu0: The prior degrees of freedom.

    tau0: The prior scale parameter.

     mu0: The prior mean.

      S0: The prior scale matrix.

      a0: The prior shape parameter.

      b0: The prior scale parameter.

       W: A three dimensional array storing the posterior in-sample
          predictions of W. The first dimension indexes the Monte Carlo
          draws, the second dimension indexes the columns of the table,
          and the third dimension represents the observations.

    Wmin: A numeric matrix storing the lower bounds of W.

    Wmax: A numeric matrix storing the upper bounds of W.

      mu: A three dimensional array storing the posterior draws of the
          population mean parameter, mu. The first dimension indexes
          the Monte Carlo draws, the second dimension indexes the
          columns of the table, and the third dimension represents the
          observations.

   Sigma: A three dimensional array storing the posterior draws of the
          population variance matrix, Sigma. The first dimension
          indexes the Monte Carlo draws, the second dimension indexes
          the parameters, and the third dimension represents the
          observations. 

   alpha: The posterior draws of alpha.

   nstar: The number of clusters at each Gibbs draw.

_A_u_t_h_o_r(_s):

     Kosuke Imai, Department of Politics, Princeton University
     kimai@Princeton.Edu, <URL: http://www.princeton.edu/~kimai>; Ying
     Lu, Institute for Quantitative Social Sciences,  Harvard
     University ylu@Latte.Harvard.Edu

_R_e_f_e_r_e_n_c_e_s:

     Imai, Kosuke and Ying Lu. (2004) " Parametric and Nonparametric
     Bayesian Models for Ecological Inference in 2 times 2 Tables."
     Proceedings of the American Statistical Association. <URL:
     http://www.princeton.edu/~kimai/research/einonpar.html>

     Imai, Kosuke and Ying Lu. (2005) "An Incomplete Data Approach to
     Ecological Inference." Working Paper, Princeton University,
     available at <URL:
     http://www.princeton.edu/~kimai/research/einonpar.html>

_S_e_e _A_l_s_o:

     'eco', 'predict.eco', 'summary.ecoNP'

_E_x_a_m_p_l_e_s:

     ## load the registration data
     data(reg)

     ## NOTE: We set the number of MCMC draws to be a very small number in
     ## the following examples; i.e., convergence has not been properly
     ## assessed. See Imai and Lu (2004, 2005) for more complete examples.

     ## fit the nonparametric model to give in-sample predictions
     ## store the parameters to make population inference later
     res <- ecoNP(Y ~ X, data = reg, n.draws = 50, param = TRUE, verbose = TRUE) 
     ##summarize the results
     summary(res)

     ## obtain out-of-sample prediction
     out <- predict(res, verbose = TRUE)
     ## summarize the results
     summary(out)

     ## density plots of the out-of-sample predictions
     par(mfrow=c(2,1))
     plot(density(out[,1]), main = "W1")
     plot(density(out[,2]), main = "W2")

     ## load the Robinson's census data
     data(census)

     ## fit the parametric model with contextual effects and N 
     ## using the default prior specification
     res1 <- ecoNP(Y ~ X, N = N, context = TRUE, param = TRUE, data = census,
                   n.draws = 25, verbose = TRUE)
     ## summarize the results
     summary(res1)

     ## out-of sample prediction 
     pres1 <- predict(res1)
     summary(pres1)

