boolean               package:boolean               R Documentation

_P_a_r_t_i_a_l-_O_b_s_e_r_v_a_b_i_l_i_t_y _B_i_n_a_r_y _R_e_s_p_o_n_s_e _M_o_d_e_l_s _f_o_r _T_e_s_t_i_n_g 
_B_o_o_l_e_a_n _H_y_p_o_t_h_e_s_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Boolean binary response models are a family of
     partial-observability _n_-variate models designed to permit
     researchers to model causal complexity, or multiple causal "paths"
     to a given outcome.

_U_s_a_g_e:

     boolean(formula, data, link = "logit", start.values = NULL,
             method = "nlm", bootstrap = 0, control = list(fnscale = -1), 
             weights = NULL, robust = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: The object producted by 'boolprep';  see 'boolprep' for
          details.

    data: An optional dataframe that contains the variables specified
          in the call to 'boolprep'.

    link: A character string or character vector of length _n_ that 
          identifies the CDF(s) to be used for each causal path. If
          only one link function is given,  it is used for all _n_
          causal paths.The string may be "cauchit", "cloglog", "log",
          "logit" (the default), "probit", "scobit", "scobitL",
          "scobitR", or  any combination thereof in a character vector
          of length _n_.  Aside from "scobit", "scobitL", and "scobitR"
          these are the same as used in the 'glm' function. For
          details, see 'binomial'. For documentation of the scobit link
          function, see the Nagler reference below; "scobitR" is an
          alias for "scobit" and scobitL is a transformation of the
          scobit CDF that is left-skewed.

start.values: If not NULL (the default), a vector of starting values.
          Such a vector must have length equal to the number of
          estimated parameters and should be ordered so that the
          starting values for the coefficients come before the starting
          values for the ancillary parameter for any "scobit",
          "scobitL" or "scobitR" link functions. Otherwise, the
          parameters should be ordered left-to-right based on their
          order in the call to 'boolprep'. The 'start.values'  argument
          is useful for checking whether the results are robust to
          overdispersion of starting values. If NULL, the starting
          values will be created from a series of calls to  'glm' using
          the appropriate link function, assuming the ancillary
          parameter is one in the case of the various scobits.

  method: A character string or character vector of length two that
          specifies the  algorithm used to estimate the boolean model.
          If 'bootstrap > 0' and 'method' is of length two, then the
          first element of 'method' is used to obtain the 
          maximum-likelihood estimates (presumably with a
          computationally intensive algorithm)  and the second element
          is used for the bootstraps (possibly using a faster
          algorithm). Possible choices are: "nlm" (the default)
          "Nelder-Mead", "BFGS", "CG", "L-BFGS-B",  "SANN" (the
          preceding five represent the possibilities for 'optim'),
          "constrOptim",  "genoud", "BHHH", or "MH" (short for
          Metropolis-Hastings without simulating annealing).  For
          "nlm", see 'nlm'; for the optim-related methods, see 'optim';
           for "constrOptim", see 'constrOptim'; for genoud, see
          'genoud';  for "BHHH", see 'maxBHHH'; for "MH"
          (Metropolis-Hastings), see  'MCMCmetrop1R'. Additional
          arguments for each of these methods can  (and often should)
          be passed through the '...' as indicated in the respective
          documentations.  If 'method = "constrOptim"', the constrained
          optimization uses the BFGS algorithm. Note that  for previous
          versions of this package, the 'method' argument did something
          very different.

bootstrap: The number of bootstraps to execute, which must be a
          non-negative integer. If zero (the default), standard errors
          are approximated using the Hessian. Note that bootstrapping
          is appropriate only if the 'method' is something other than
          "MH".

 control: A list, identical in usage with that for 'optim'.

 weights: A vector of weights with length equal to the number of valid
          observations, which is used to produce a weighted sum of the
          observation-specific log-likelihoods. If NULL (the default),
          a weight of one is used for each observation. Currently, only
          NULL is supported. 

  robust: If TRUE a sandwich-style variance-covariance matrix would be
          estimated. However, this option is not current supported.

     ...: Additional arguments to be passed to the procedure identified
          by the 'method'  argument. Such arguments are often necessary
          to obtain reasonable results but are too numerous  to be
          listed here. However, it is not possible to pass a function
          that calculates the gradient.

_D_e_t_a_i_l_s:

     Boolean permits estimation of Boolean binary response models (see
     Braumoeller 2003 for derivation), which are a family of
     partial-observability _n_-variate models designed to permit
     researchers to model causal complexity, or multiple causal "paths"
     to a given outcome.  The various "paths" are modeled as latent
     dependent variables that are multiplied together in a manner
     determined by the logic of their (Boolean) interaction.  If, for
     example, we wanted to model a situation in which diet OR smoking
     causes heart failure, we would use one set of independent
     variables (caloric intake, fat intake, etc.) to predict the latent
     probability of diet-related coronary failure (y1*), use another
     set of variables (cigarettes smoked per day, exposure to
     second-hand smoke, etc.) to predict the latent probability of
     smoking-related coronary failure (y2*), and model the observed
     outcome (y, or coronary failure) as a function of the Boolean
     interaction of the two: Pr(y=1) = 1-([1-y1*] x [1-y2*]). 
     Independent variables that have an impact on both latent dependent
     variables can be included in both paths.  Any combination of ANDs
     and ORs can be posited, and the interaction of any number of
     latent dependent variables can be modeled, although the procedure
     becomes exponentially more data-intensive as the number of latent
     dependent variables increases.

_V_a_l_u_e:

     Returns an object of class booltest, with the following list
     elements: 

   value: The maximized value of the log-likelihood or NA, if
          'method="MH"'.

  counts: The number of calls to the function and the gradient, or the
          number of MCMC iterations if 'method="MH"'.

convergence: The convergence code for maximum-likelihood methods  or NA
          if 'method = "MH"', which does NOT indicate the presence,
          absence, or irrelevance of convergence of the Markov chain to
          the stationary distribution.

 message: The convergence message for maximum-likelihood methods, (NA
          if 'method = "nlm")' or NA if 'method = "MH"', which does NOT
           indicate the presence, absence, or irrelevance of
          convergence of the Markov chain  to the stationary
          distribution.

 hessian: The Hessian matrix, unless 'method = "MH"', in which case, 
          NULL. If 'variance' is a "caught" error, examining the
          Hessian may  indicate where the problem is.

coefficients: Either a vector of maximum-likelihood estimates, or if
          'method = "MH"' or 'bootstap > 0', a matrix of MCMC or
          bootstrapped samples with rows equal to the number of samples
          and columns equal to the number of estimated parameters.

variance: A "try" of the variance-covariance matrix, unless  'method =
          "MH"', in which case NA.

boolean.call: The call to the 'boolean' function.

_N_o_t_e:

     Examining profile likelihoods with 'boolprof' is highly
     recommended.  These are partial observability models are
     generically  starved for information; as a result, maximum
     likelihood estimation can encounter problems with plateaus in
     likelihood functions even with very large datasets. In principle,
     if 'method = "BHHH"', the Hessian can be approximated in
     circumstances where other maximization methods might fail to
     estimate the Hessian due to plateaus or discontinuities at the
     maximum. However, such estimates should be analyzed with caution.

_A_u_t_h_o_r(_s):

     Bear F. Braumoeller, Harvard University, bfbraum@fas.harvard.edu,
      Ben Goodrich, Harvard University, goodrich@fas.harvard.edu, and
      Jacob Kline, Harvard University, jkline@fas.harvard.edu

_R_e_f_e_r_e_n_c_e_s:

     Braumoeller, Bear F. (2003) "Causal Complexity and the Study of
     Politics." _Political Analysis_ 11(3): 209-233. Nagler, Jonathon.
     (1994) "Scobit: An Alternative Estimator to Logit and Probit."
     _American Journal of Political Science_ 38(1): 230-255.

_S_e_e _A_l_s_o:

     'boolprep' to prepare the structure of the model, and 'boolprof'
     to produce profile likelihoods after estimation.

_E_x_a_m_p_l_e_s:

     set.seed(50)
     x1 <- rnorm(1000)
     x2 <- rnorm(1000)
     x3 <- rnorm(1000)
     x4 <- rnorm(1000)
     x5 <- rnorm(1000)
     x6 <- rnorm(1000)
     y<-1-(1-pnorm(-2+0.33*x1+0.66*x2+1*x3)*1-(pnorm(1+1.5*x4-0.25*x5)*pnorm(1+0.2*x6)))
     y <- y>runif(1000)
     bp <- boolprep("(a|(b&c))", "y", a = ~ x1 + x2 + x3, b = ~ x4 + x5, c = ~ x6)
     answer <- boolean(bp, link = c("probit", "logit", "cloglog"), start.values = ## For speed
                       c(-1.750,  0.354,  0.698,  1.231,  1.473,  2.628, -0.452,  0.764,  0.173))

     ## Plot profiles
     boolprof(answer)

     ## Examine coefficients
     coef(answer)

     ## Examine coefficients, standard errors, etc.
     summary(answer)

