bmonomvn               package:monomvn               R Documentation

_B_a_y_e_s_i_a_n _E_s_t_i_m_a_t_i_o_n _f_o_r _M_u_l_t_i_v_a_r_i_a_t_e _N_o_r_m_a_l _D_a_t_a _w_i_t_h
_M_o_n_o_t_o_n_e _M_i_s_s_i_n_g_n_e_s_s

_D_e_s_c_r_i_p_t_i_o_n:

     Bayesian estimation via sampling from the posterior distribution
     of the of the mean and covariance matrix of multivariate normal
     (MVN) distributed data with a monotone missingness pattern, via
     Gibbs Sampling. Through the use of parsimonious/shrinkage
     regressions (lasso/NG & ridge), where standard regressions fail,
     this function can handle an (almost) arbitrary amount of missing
     data

_U_s_a_g_e:

     bmonomvn(y, pre = TRUE, p = 0.9, B = 100, T = 200, thin = 1,
              economy = FALSE, method = c("lasso", "ridge", "lsr", "factor",
              "hs", "ng"), RJ = c("p", "bpsn", "none"), capm = TRUE,
              start = NULL, mprior = 0, rd = NULL, theta = 0, rao.s2 = TRUE,
              QP = NULL, verb = 1, trace = FALSE)

_A_r_g_u_m_e_n_t_s:

       y: data 'matrix'  were each row is interpreted as a random
          sample from a MVN distribution with missing values indicated
          by 'NA'

     pre: logical indicating whether pre-processing of the 'y' is to be
          performed.  This sorts the columns so that the number of
          'NA's is non-decreasing with the column index 

       p: when performing regressions, 'p' is the proportion of the
          number of columns to rows in the design matrix before an
          alternative regression (lasso, ridge, or RJ) is performed as
          if least-squares regression has failed. Least-squares
          regression is known to fail when the number of columns equals
          the number of rows, hence a default of 'p = 0.9 <= 1'.
          Alternatively, setting 'p = 0' forces a parsimonious method
          to be used for _every_ regression. Intermediate settings of
          'p' allow the user to control when least-squares regressions
          stop and the parsimonious ones start;  When 'method =
          "factor"' the 'p' argument represents an integer (positive)
          number of initial columns of 'y' to treat as known factors

       B: number of Burn-In MCMC sampling rounds, during which samples
          are discarded 

       T: total number of MCMC sampling rounds to take place after
          burn-in, during which samples are saved 

    thin: multiplicative thinning in the MCMC.  Each Bayesian (lasso)
          regression will discard 'thin*M' MCMC rounds, where 'M' is
          the number of columns in its design matrix, before a sample
          is saved as a draw from the posterior distribution; Likewise
          if 'theta != 0' a further 'thin*N', for 'N' responses will be
          discarded 

economy : indicates whether memory should be economized at the expense
          of speed.  When 'TRUE' the individual Bayesian (lasso)
          regressions are cleaned between uses so that only one of them
          has a large footprint at any time during sampling from the
          Markov chain.  When 'FALSE' (default) all regressions are
          pre-allocated and the full memory footprint is realized at
          the outset, saving dynamic allocations

  method: indicates the Bayesian parsimonious regression specification
          to be used, choosing between the lasso (default) of Park &
          Casella, the NG extension, a ridge regression special case,
          and least-squares. The '"factor"' method treats the first 'p'
          columns of 'y' as known factors

     RJ : indicates the Reversible Jump strategy to be employed. The
          default argument of '"p"' method uses RJ whenever a
          parsimonious regression is used;  '"bpsn"' only uses RJ for
          regressions with 'p >= n', and '"none"' never uses RJ

    capm: when 'TRUE' this argument indicates that the number of
          components of beta should not exceed n, the number of
          response variables in a particular regression

   start: a list depicting starting values for the parameters that are
          use to initialize the Markov chain.  Usually this will be a
          '"monomvn"'-class object depicting maximum likelihood
          estimates output from the 'monomvn' function. The relevant
          fields are the mean vector '$mu', covariance matrix '$S',
          monotone ordering '$o' (for sanity checking with input 'y'),
          component vector '$ncomp' and penalty parameter vector
          '$lambda'; see note below 

  mprior: prior on the number of non-zero regression coefficients (and
          therefore covariates) 'm' in the model. The default ('mprior
          = 0') encodes the uniform prior on '0 < m < M'. A scalar
          value '0 <= mprior <= 1' implies a Binomial prior
          'Bin(m|n=M,p=mprior)'.  A 2-vector 'mprior=c(g,h)' of
          positive values 'g' and 'h' represents gives 'Bin(m|n=M,p)'
          prior where 'p~Beta(g,h)'

      rd: '=c(r,delta)'; a 2-vector of prior parameters for lambda^2
          which depends on the regression method.  When 'method =
          "lasso"' then the components are the alpha (shape) and beta
          (rate) parameters to the a gamma distribution 'G(r,delta)';
          when 'method = "ridge"' the components are the alpha (shape)
          and beta (scale) parameters to an inverse-gamma distribution
          'IG(r/2,delta/2)' 

   theta: the rate parameter ('> 0') to the exponential prior on the
          degrees of freedom paramter 'nu' for each regression model
          implementing Student-t errors (for each column of 'Y'
          marginally) by a scale-mixture prior.  See 'blasso' for more
          details. The default setting of 'theta = 0' turns off this
          prior, defaulting to a normal errors prior.  A negative
          setting triggers a pooling of the degrees of freedom
          parameter across all columns of 'Y'.  I.e., 'Y' is modeled as
          multivariate-t.  In this case 'abs{theta}' is used as the
          prior parameterization 

 rao.s2 : indicates whether to  Rao-Blackwellized samples for s^2
          should be used (default 'TRUE'); see the details section of
          'blasso' for more information 

     QP : if non-'NULL' this argument should either be 'TRUE', a
          positive integer, or contain a 'list' specifying a Quadratic
          Program to solve as a function of the samples of 'mu = dvec'
          and 'Sigma = Dmat' in the notation of 'solve.QP'; see
          'default.QP' for a default specification that is used when
          'QP = TRUE' or a positive integer is is given; more details
          are below 

    verb: verbosity level; currently only 'verb = 0' and 'verb = 1' are
          supported 

   trace: if 'TRUE' then samples from all parameters are saved to files
          in the CWD, and then read back into the '"monomvn"'-class
          object upon return 

_D_e_t_a_i_l_s:

     If 'pre = TRUE' then 'bmonomvn' first re-arranges the columns of
     'y' into nondecreasing order with respect to the number of missing
     ('NA')  entries. Then (at least) the first column should be
     completely observed.

     Samples from the posterior distribution of the MVN mean vector and
     covariance matrix are obtained sampling from the posterior
     distribution of Bayesian regression models. The methodology for
     converting these to samples from the mean vector and covariance
     matrix is outlined in the 'monomvn' documentation, detailing a
     similarly structured maximum likelihood approach.  Also see the
     references below.

     Whenever the regression model is ill-posed (i.e., when there are
     more covariates than  responses, or a big 'p' small 'n' problem)
     then Bayesian lasso or ridge regressions - possibly augmented with
     Reversible Jump (RJ) for model selection - are used instead. See
     the Park & Casella reference below, and the 'blasso'
     documentation.  To guarantee each regression is well posed the
     combination setting of 'method="lsr"' and 'RJ="none"' is not
     allowed. As in 'monomvn' the 'p' argument can be used to turn on
     lasso or ridge regressions (possibly with RJ) at other times. The
     exception is the '"factor"' method which always involves an OLS
     regression on (a subset of) the first 'p' columns of 'y'.

     Samples from a function of samples of 'mu' and 'Sigma' can be
     obtained by specifying a Quadratic program via the argument 'QP'. 
     The idea is to allow for the calculation of the distribution of
     minimum variance and mean-variance portfolios, although the
     interface is quite general.  See 'default.QP' for more details, as
     'default.QP(ncol(y))' is used when the argument 'QP = TRUE' is
     given.  When a positive integer is given, then the first 'QP'
     columns of 'y' are treated as factors by using

     'default.QP(ncol(y) - QP)'

     instead. The result is that the corresponding components of
     (samples of) 'mu' and rows/cols of 'S' are not factored into the
     specification of the resulting Quadratic Program

_V_a_l_u_e:

     'bmonomvn' returns an object of class '"monomvn"', which is a
     'list' containing the inputs above and a subset of the components
     below.

   call : a copy of the function call as used

     mu : estimated mean vector with columns corresponding to the
          columns of 'y' 

      S : estimated covariance matrix with rows and columns
          corresponding to the columns of 'y' 

 mu.var : estimated variance of the mean vector with columns
          corresponding to the columns of 'y' 

 mu.cov : estimated covariance matrix of the mean vector with columns
          corresponding to the columns of 'y' 

  S.var : estimated variance of the individual components of the
          covariance matrix with columns and rows corresponding to the
          columns of 'y'

 mu.map : estimated maximum _a' posteriori_ (MAP) of the mean vector
          with columns corresponding to the columns of 'y' 

  S.map : estimated MAP of the individual components of the covariance
          matrix with columns and rows corresponding to the columns of
          'y'

   S.nz : posterior probability that the individual entries of the
          covariance matrix are non-zero 

  Si.nz : posterior probability that the individual entries of the
          inverse of the covariance matrix are non-zero 

     nu : when 'theta < 0' this field provides a trace of the pooled
          'nu' parameter to the multivariate-t distribution

lpost.map: log posterior probability of the MAP estimate

which.map: gives the time index of the sample corresponding to the MAP
          estimate

    llik: a trace of the log likelihood of the data 

llik.norm: a trace of the log likelihood  under the Normal errors model
          when sampling under the Student-t model; i.e., it is not
          present unless 'theta > 0'. Used for calculating Bayes
          Factors

     na : when 'pre = TRUE' this is a vector containing number of 'NA'
          entries in each column of 'y'

      o : when 'pre = TRUE' this is a vector containing the index of
          each column in the sorting of the columns of 'y' obtained by
          'o <- order(na)'

 method : method of regression used on each column, or '"bcomplete"'
          indicating that no regression was used

   thin : the (actual) number of thinning rounds used for the
          regression ('method') in each column

lambda2 : records the mean lambda^2 value found in the trace of the
          Bayesian Lasso regressions.  Zero-values result when the
          column corresponds to a complete case or an ordinary least
          squares regression (these would be 'NA' entries from
          'monomvn') 

  ncomp : records the mean number of components (columns of the design
          matrix) used in the regression model for each column of 'y'.
          If input 'RJ = FALSE' then this simply corresponds to the
          monotone ordering (these would correspond to the 'NA' entries
          from 'monomvn'). When 'RJ = TRUE' the monotone ordering is an
          upper bound (on each entry)

   trace: if input 'trace = TRUE' then this field contains traces of
          the samples of mu in the field '$mu' and of S in the field
          '$S', and of all regression parameters for each of the 'm =
          length(mu)' columns in the field '$reg'. This '$reg' field is
          a stripped-down '"blasso"'-class object so that the methods
          of that object may be used for analysis.  If data
          augmentation is required to complete the monotone missingness
          pattern, then samples from these entries of 'Y' are contained
          in '$DA' where the column names indicate the 'i'-'j' entry of
          'Y' sampled; see the 'R' output below 

       R: gives a 'matrix' version of the missingness pattern used:
          '0'-entries mean observed; '1'-entries indicate missing
          values conforming to a monotone pattern; '2'-entries indicate
          missing values that require data augmentation to complete a
          monotone missingness pattern 

       B: from inputs: number of Burn-In MCMC sampling rounds, during
          which samples are discarded 

       T: from inputs: total number of MCMC sampling rounds to take
          place after burn-in, during which samples are saved 

       r: from inputs: alpha (shape) parameter to the gamma
          distribution prior for the lasso parameter lambda 

   delta: from inputs: beta (rate) parameter to the gamma distribution
          prior for the lasso parameter lambda 

      QP: if a valid (non-'FALSE' or 'NULL') 'QP' argument is given,
          then this field contains the specification of a Quadratic
          Program in the form of a list with entries including '$dvec',
          '$Amat', '$b0', and '$meq', similar to the usage in
          'solve.QP', and some others; see 'default.QP' for more
          details

       W: when input 'QP = TRUE' is given, then this field contains a
          'T*ncol(y)' matrix of samples from the posterior distribution
          of the solution to the Quadratic Program, which can be
          visualized via 'plot.monomvn' using the argument 'which =
          "QP"' 

_N_o_t_e:

     Whenever the 'bmonomvn' algorithm requires a regression where 'p
     >= n', i.e., if any of the columns in the 'y' matrix have fewer
     non-'NA' elements than the number of columns with more non-'NA'
     elements, then it is helpful to employ both lasso/ridge and the RJ
     method.

     It is important that any starting values provided in the 'start'
     be compatible with the regression model specified by inputs 'RJ'
     and 'method'.  Any incompatibilities will result with a warning
     that (alternative) default action was taken and may result in an
     undesired (possibly inferior) model being fit

_A_u_t_h_o_r(_s):

     Robert B. Gramacy bobby@statslab.cam.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     R.B. Gramacy and E. Pantaleo (2010). _Shrinkage regression for
     multivariate inference with missing data, and an application to
     portfolio balancing._  Preprint available on arXiv:0710.5837 <URL:
     http://arxiv.org/abs/0907.2135>

     Roderick J.A. Little and Donald B. Rubin (2002). _Statistical
     Analysis with Missing Data_, Second Edition. Wilely.

     <URL: http://www.statslab.cam.ac.uk/~bobby/monomvn.html>

_S_e_e _A_l_s_o:

     'blasso', 'monomvn', 'default.QP', 'em.norm' in the now defunct
     'norm' package, and 'mlest' in the 'mvnmle' package, 'returns'

_E_x_a_m_p_l_e_s:

     ## standard usage, duplicating the results in
     ## Little and Rubin, section 7.4.3
     data(cement.miss)
     out <- bmonomvn(cement.miss)
     out
     out$mu
     out$S

     ##
     ## A bigger example, comparing the various
     ## parsimonious methods
     ##

     ## generate N=100 samples from a 10-d random MVN
     xmuS <- randmvn(100, 20)

     ## randomly impose monotone missingness
     xmiss <- rmono(xmuS$x)

     ## using least squares only when necessary,
     obl <- bmonomvn(xmiss)
     obl

     ## look at the posterior variability
     par(mfrow=c(1,2))
     plot(obl)
     plot(obl, "S")

     ## compare to maximum likelihood
     Ellik.norm(obl$mu, obl$S, xmuS$mu, xmuS$S)
     oml <- monomvn(xmiss, method="lasso")
     Ellik.norm(oml$mu, oml$S, xmuS$mu, xmuS$S)

     ##
     ## a min-variance portfolio allocation example
     ##

     ## get the returns data, and use 20 random cols
     data(returns)
     train <- returns[,sample(1:ncol(returns), 20)]

     ## missingness pattern requires DA; also gather
     ## samples from the solution to a QP
     obl.da <- bmonomvn(train, p=0, QP=TRUE)

     ## plot the QP weights distribution
     plot(obl.da, "QP", xaxis="index")

     ## get ML solution: will warn about monotone violations
     suppressWarnings(oml.da <- monomvn(train, method="lasso"))

     ## add mean and MLE comparison, requires the
     ## quadprog library for the solve.QP function
     add.pe.QP(obl.da, oml.da)

     ## now consider adding in the market as a factor
     data(market)
     mtrain <- cbind(market, train)

     ## fit the model using only factor regressions
     obl.daf <- bmonomvn(mtrain, method="factor", p=1, QP=1)
     plot(obl.daf, "QP", xaxis="index", main="using only factors")
     suppressWarnings(oml.daf <- monomvn(mtrain, method="factor"))
     add.pe.QP(obl.daf, oml.daf)

     ##
     ## a Bayes/MLE comparison using least squares sparingly
     ##

     ## fit Bayesian and classical lasso
     obls <- bmonomvn(xmiss, p=0.25)
     Ellik.norm(obls$mu, obls$S, xmuS$mu, xmuS$S)
     omls <- monomvn(xmiss, p=0.25, method="lasso")
     Ellik.norm(omls$mu, omls$S, xmuS$mu, xmuS$S)

     ## compare to ridge regression
     obrs <- bmonomvn(xmiss, p=0.25, method="ridge")
     Ellik.norm(obrs$mu, obrs$S, xmuS$mu, xmuS$S)
     omrs <- monomvn(xmiss, p=0.25, method="ridge")
     Ellik.norm(omrs$mu, omrs$S, xmuS$mu, xmuS$S)

     ## using the maximum likelihood solution to initialize
     ## the Markov chain and avoid burn-in.  
     ob2s <- bmonomvn(xmiss, p=0.25, B=0, start=omls, RJ="p")
     Ellik.norm(ob2s$mu, ob2s$S, xmuS$mu, xmuS$S)

