gbayes                 package:Hmisc                 R Documentation

_G_a_u_s_s_i_a_n _B_a_y_e_s_i_a_n _P_o_s_t_e_r_i_o_r _a_n_d _P_r_e_d_i_c_t_i_v_e _D_i_s_t_r_i_b_u_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     'gbayes' derives the (Gaussian) posterior and optionally the
     predictive distribution when both the prior and the likelihood are
     Gaussian, and when the statistic of interest comes from a 2-sample
     problem. This function is especially useful in obtaining the
     expected power of a statistical test, averaging over the
     distribution of the population effect parameter (e.g., log hazard
     ratio) that is obtained using pilot data.  'gbayes' is also useful
     for summarizing studies for which the statistic of interest is
     approximately Gaussian with known variance.  An example is given
     for comparing two proportions using the angular transformation,
     for which the variance is independent of unknown parameters except
     for very extreme probabilities. A 'plot' method is also given. 
     This plots the prior, posterior, and predictive distributions on a
     single graph using a nice default for the x-axis limits and using
     the 'labcurve' function for automatic labeling of the curves.

     'gbayes2' uses the method of Spiegelhalter and Freedman (1986) to
     compute the probability of correctly concluding that a new
     treatment is superior to a control.  By this we mean that a
     1-'alpha' normal theory-based confidence interval for the new
     minus old treatment effect lies wholly to the right of 'delta.w',
     where 'delta.w' is the minimally worthwhile treatment effect
     (which can be zero to be consistent with ordinary null hypothesis
     testing, a method not always making sense).  This kind of power
     function is averaged over a prior distribution for the unknown
     treatment effect.  This procedure is applicable to the situation
     where a prior distribution is not to be used in constructing the
     test statistic or confidence interval, but is only used for
     specifying the distribution of 'delta', the parameter of interest.

     Even though 'gbayes2' assumes that the test statistic has a normal
     distribution with known variance (which is strongly a function of
     the sample size in the two treatment groups), the prior
     distribution function can be completely general.  Instead of using
     a step-function for the prior distribution as Spiegelhalter and
     Freedman used in their appendix, 'gbayes2' uses the built-in
     'integrate' function for numerical integration. 'gbayes2' also
     allows the variance of the test statistic to be general as long as
     it is evaluated by the user.  The conditional power given the
     parameter of interest 'delta' is '1 - pnorm((delta.w - delta)/sd +
     z)', where z is the normal critical value corresponding to 1 -
     'alpha'/2.

     'gbayesMixPredNoData' derives the predictive distribution of a
     statistic that is Gaussian given 'delta' when no data have yet
     been observed and when the prior is a mixture of two Gaussians.

     'gbayesMixPost' derives the posterior density or cdf of 'delta'
     given the statistic 'x', when the prior for 'delta' is a mixture
     of two Gaussians and when 'x' is Gaussian given 'delta'.

     'gbayesMixPowerNP' computes the power for a test for 'delta' >
     'delta.w' for the case where (1) a Gaussian prior or mixture of
     two Gaussian priors is used as the prior distribution, (2) this
     prior is used in forming the statistical test or credible
     interval, (3) no prior is used for the distribution of 'delta' for
     computing power but instead a fixed single 'delta' is given (as in
     traditional frequentist hypothesis tests), and (4) the test
     statistic has a Gaussian likelihood with known variance (and mean
     equal to the specified 'delta'). 'gbayesMixPowerNP' is handy where
     you want to use an earlier study in testing for treatment effects
     in a new study, but you want to mix with this prior a
     non-informative prior.  The mixing probability 'mix' can be
     thought of as the "applicability" of the previous study.  As with
     'gbayes2', power here means the probability that the new study
     will yield a left credible interval that is to the right of
     'delta.w'. 'gbayes1PowerNP' is a special case of
     'gbayesMixPowerNP' when the prior is a single Gaussian.

_U_s_a_g_e:

     gbayes(mean.prior, var.prior, m1, m2, stat, var.stat, 
            n1, n2, cut.prior, cut.prob.prior=0.025)

     ## S3 method for class 'gbayes':
     plot(x, xlim, ylim, name.stat='z', ...)

     gbayes2(sd, prior, delta.w=0, alpha=0.05, upper=Inf, prior.aux)

     gbayesMixPredNoData(mix=NA, d0=NA, v0=NA, d1=NA, v1=NA,
                         what=c('density','cdf'))

     gbayesMixPost(x=NA, v=NA, mix=1, d0=NA, v0=NA, d1=NA, v1=NA,
                   what=c('density','cdf'))

     gbayesMixPowerNP(pcdf, delta, v, delta.w=0, mix, interval,
                      nsim=0, alpha=0.05)

     gbayes1PowerNP(d0, v0, delta, v, delta.w=0, alpha=0.05)

_A_r_g_u_m_e_n_t_s:

mean.prior: mean of the prior distribution 

cut.prior: 

cut.prob.prior: 

var.prior: variance of the prior.  Use a large number such as 10000 to
          effectively use a flat (noninformative) prior.  Sometimes it
          is useful to compute the variance so that the prior
          probability that 'stat' is greater than some impressive value
          'u' is only 'alpha'.  The correct 'var.prior' to use is then
          '((u-mean.prior)/qnorm(1-alpha))^2'. You can specify
          'cut.prior=u' and 'cut.prob.prior=alpha' (whose default is
          0.025) in place of 'var.prior' to have 'gbayes' compute the
          prior variance in this manner.  

      m1: sample size in group 1 

      m2: sample size in group 2 

    stat: statistic comparing groups 1 and 2, e.g., log hazard ratio,
          difference in means, difference in angular transformations of
          proportions 

var.stat: variance of 'stat', assumed to be known.  'var.stat' should
          either be a constant (allowed if 'n1' is not specified), or a
          function of two arguments which specify the sample sizes in
          groups 1 and 2.  Calculations will be approximate when the
          variance is estimated from the data. 

       x: an object returned by 'gbayes' or the value of the statistic
          which is an estimator of delta, the parameter of interest 

      sd: the standard deviation of the treatment effect 

   prior: a function of possibly a vector of unknown treatment effects,
          returning the prior density at those values 

    pcdf: a function computing the posterior CDF of the treatment
          effect 'delta', such as a function created by 'gbayesMixPost'
          with 'what="cdf"'. 

   delta: a true unknown single treatment effect to detect 

       v: the variance of the statistic 'x', e.g., 's^2 * (1/n1 +
          1/n2)'. Neither 'x' nor 'v' need to be defined to
          'gbayesMixPost', as they can be defined at run time to the
          function created by 'gbayesMixPost'. 

      n1: number of future observations in group 1, for obtaining a
          predictive distribution 

      n2: number of future observations in group 2 

    xlim: vector of 2 x-axis limits.  Default is the mean of the
          posterior plus or minus 6 standard deviations of the
          posterior. 

    ylim: vector of 2 y-axis limits.  Default is the range over
          combined prior and  posterior densities. 

name.stat: label for x-axis.  Default is '"z"'. 

     ...: optional arguments passed to 'labcurve' from 'plot.gbayes' 

 delta.w: the minimum worthwhile treatment difference to detech.  The
          default is zero for a plain uninteristing null hypothesis. 

   alpha: type I error, or more accurately one minus the confidence
          level for a two-sided confidence limit for the treatment
          effect 

   upper: upper limit of integration over the prior distribution
          multiplied by the normal likelihood for the treatment effect
          statistic.  Default is infinity. 

prior.aux: argument to pass to 'prior' from 'integrate' through
          'gbayes2'. Inside of 'power' the argument must be named
          'prior.aux' if it exists.  You can pass multiple parameters
          by passing 'prior.aux' as a list and pulling off elements of
          the list inside 'prior'.  This setup was used because of
          difficulties in passing '...' arguments through 'integrate'
          for some situations. 

     mix: mixing probability or weight for the Gaussian prior having
          mean 'd0' and variance 'v0'.  'mix' must be between 0 and 1,
          inclusive. 

      d0: mean of the first Gaussian distribution (only Gaussian for
          'gbayes1PowerNP' and is a required argument) 

      v0: variance of the first Gaussian (only Gaussian for
          'gbayes1PowerNP' and is a required argument) 

      d1: mean of the second Gaussian (if 'mix' < 1) 

      v1: variance of the second Gaussian (if 'mix' < 1).  Any of these
          last 5 arguments can be omitted to 'gbayesMixPredNoData' as
          they can be provided at run time to the function created by
          'gbayesMixPredNoData'. 

    what: specifies whether the predictive density or the CDF is to be
          computed.  Default is '"density"'. 

interval: a 2-vector containing the lower and upper limit for possible
          values of the test statistic 'x' that would result in a left
          credible interval exceeding 'delta.w' with probability
          1-'alpha'/2 

    nsim: defaults to zero, causing 'gbayesMixPowerNP' to solve
          numerically for the critical value of 'x', then to compute
          the power accordingly.  Specify a nonzero number such as
          20000 for 'nsim' to instead have the function estimate power
          by simulation.  In this case 0.95 confidence limits on the
          estimated power are also computed.  This approach is
          sometimes necessary if 'uniroot' can't solve the equation for
          the critical value. 

_V_a_l_u_e:

     'gbayes' returns a list of class '"gbayes"' containing the
     following names elements: 'mean.prior','var.prior','mean.post',
     'var.post', and if 'n1' is specified, 'mean.pred' and 'var.pred'. 
     Note that 'mean.pred' is  identical to 'mean.post'.  'gbayes2'
     returns a single number which is the probability of correctly
     rejecting the null hypothesis in favor of the new treatment. 
     'gbayesMixPredNoData' returns a function that can be used to
     evaluate the predictive density or cumulative distribution. 
     'gbayesMixPost' returns a function that can be used to evaluate
     the posterior density or cdf.  'gbayesMixPowerNP' returns a vector
     containing two values if 'nsim' = 0.  The first value is the
     critical value for the test statistic that will make the left
     credible interval > 'delta.w', and the second value is the power. 
     If 'nsim' > 0, it returns the power estimate and confidence limits
     for it if 'nsim' > 0.  The examples show how to use these
     functions.

_A_u_t_h_o_r(_s):

     Frank Harrell 
      Department of Biostatistics 
      Vanderbilt University School of Medicine 
      f.harrell@vanderbilt.edu

_R_e_f_e_r_e_n_c_e_s:

     Spiegelhalter DJ, Freedman LS, Parmar MKB (1994): Bayesian
     approaches to randomized trials.  JRSS A 157:357-416.  Results for
     'gbayes' are derived from Equations 1, 2, 3, and 6.

     Spiegelhalter DJ, Freedman LS (1986): A predictive approach to
     selecting the size of a clinical trial, based on subjective
     clinical opinion.  Stat in Med 5:1-13.

     Joseph, Lawrence and Belisle, Patrick (1997): Bayesian sample size
     determination for normal means and differences between normal
     means. The Statistician 46:209-226.

_E_x_a_m_p_l_e_s:

     # Compare 2 proportions using the var stabilizing transformation
     # arcsin(sqrt((x+3/8)/(n+3/4))) (Anscombe), which has variance 
     # 1/[4(n+.5)]

     m1 <- 100;     m2 <- 150
     deaths1 <- 10; deaths2 <- 30

     f <- function(events,n) asin(sqrt((events+3/8)/(n+3/4)))
     stat <- f(deaths1,m1) - f(deaths2,m2)
     var.stat <- function(m1, m2) 1/4/(m1+.5) + 1/4/(m2+.5)
     cat("Test statistic:",format(stat),"  s.d.:",
         format(sqrt(var.stat(m1,m2))), "\n")
     #Use unbiased prior with variance 1000 (almost flat)
     b <- gbayes(0, 1000, m1, m2, stat, var.stat, 2*m1, 2*m2)
     print(b)
     plot(b)
     #To get posterior Prob[parameter > w] use 
     # 1-pnorm(w, b$mean.post, sqrt(b$var.post))

     #If g(effect, n1, n2) is the power function to
     #detect an effect of 'effect' with samples size for groups 1 and 2
     #of n1,n2, estimate the expected power by getting 1000 random
     #draws from the posterior distribution, computing power for
     #each value of the population effect, and averaging the 1000 powers
     #This code assumes that g will accept vector-valued 'effect'
     #For the 2-sample proportion problem just addressed, 'effect'
     #could be taken approximately as the change in the arcsin of
     #the square root of the probability of the event

     g <- function(effect, n1, n2, alpha=.05) {
       sd <- sqrt(var.stat(n1,n2))
       z <- qnorm(1 - alpha/2)
       effect <- abs(effect)
       1 - pnorm(z - effect/sd) + pnorm(-z - effect/sd)
     }

     effects <- rnorm(1000, b$mean.post, sqrt(b$var.post))
     powers <- g(effects, 500, 500)
     hist(powers, nclass=35, xlab='Power')
     describe(powers)



     # gbayes2 examples
     # First consider a study with a binary response where the
     # sample size is n1=500 in the new treatment arm and n2=300
     # in the control arm.  The parameter of interest is the 
     # treated:control log odds ratio, which has variance
     # 1/[n1 p1 (1-p1)] + 1/[n2 p2 (1-p2)].  This is not
     # really constant so we average the variance over plausible
     # values of the probabilities of response p1 and p2.  We
     # think that these are between .4 and .6 and we take a 
     # further short cut

     v <- function(n1, n2, p1, p2) 1/(n1*p1*(1-p1)) + 1/(n2*p2*(1-p2))
     n1 <- 500; n2 <- 300
     ps <- seq(.4, .6, length=100)
     vguess <- quantile(v(n1, n2, ps, ps), .75)
     vguess
     #        75% 
     # 0.02183459

     # The minimally interesting treatment effect is an odds ratio
     # of 1.1.  The prior distribution on the log odds ratio is
     # a 50:50 mixture of a vague Gaussian (mean 0, sd 100) and
     # an informative prior from a previous study (mean 1, sd 1)

     prior <- function(delta) 
       0.5*dnorm(delta, 0, 100)+0.5*dnorm(delta, 1, 1)
     deltas <- seq(-5, 5, length=150)
     plot(deltas, prior(deltas), type='l')

     # Now compute the power, averaged over this prior
     gbayes2(sqrt(vguess), prior, log(1.1))
     # [1] 0.6133338

     # See how much power is lost by ignoring the previous
     # study completely

     gbayes2(sqrt(vguess), function(delta)dnorm(delta, 0, 100), log(1.1))
     # [1] 0.4984588

     # What happens to the power if we really don't believe the treatment
     # is very effective?  Let's use a prior distribution for the log
     # odds ratio that is uniform between log(1.2) and log(1.3).
     # Also check the power against a true null hypothesis

     prior2 <- function(delta) dunif(delta, log(1.2), log(1.3))
     gbayes2(sqrt(vguess), prior2, log(1.1))
     # [1] 0.1385113

     gbayes2(sqrt(vguess), prior2, 0)
     # [1] 0.3264065

     # Compare this with the power of a two-sample binomial test to
     # detect an odds ratio of 1.25
     bpower(.5, odds.ratio=1.25, n1=500, n2=300)
     #     Power 
     # 0.3307486

     # For the original prior, consider a new study with equal
     # sample sizes n in the two arms.  Solve for n to get a
     # power of 0.9.  For the variance of the log odds ratio
     # assume a common p in the center of a range of suspected
     # probabilities of response, 0.3.  For this example we
     # use a zero null value and the uniform prior above

     v   <- function(n) 2/(n*.3*.7)
     pow <- function(n) gbayes2(sqrt(v(n)), prior2)
     uniroot(function(n) pow(n)-0.9, c(50,10000))$root
     # [1] 2119.675
     # Check this value
     pow(2119.675)
     # [1] 0.9

     # Get the posterior density when there is a mixture of two priors,
     # with mixing probability 0.5.  The first prior is almost
     # non-informative (normal with mean 0 and variance 10000) and the
     # second has mean 2 and variance 0.3.  The test statistic has a value
     # of 3 with variance 0.4.
     f <- gbayesMixPost(3, 4, mix=0.5, d0=0, v0=10000, d1=2, v1=0.3)

     args(f)

     # Plot this density
     delta <- seq(-2, 6, length=150)
     plot(delta, f(delta), type='l')

     # Add to the plot the posterior density that used only
     # the almost non-informative prior
     lines(delta, f(delta, mix=1), lty=2)

     # The same but for an observed statistic of zero
     lines(delta, f(delta, mix=1, x=0), lty=3)

     # Derive the CDF instead of the density
     g <- gbayesMixPost(3, 4, mix=0.5, d0=0, v0=10000, d1=2, v1=0.3,
                        what='cdf')
     # Had mix=0 or 1, gbayes1PowerNP could have been used instead
     # of gbayesMixPowerNP below

     # Compute the power to detect an effect of delta=1 if the variance
     # of the test statistic is 0.2
     gbayesMixPowerNP(g, 1, 0.2, interval=c(-10,12))

     # Do the same thing by simulation
     gbayesMixPowerNP(g, 1, 0.2, interval=c(-10,12), nsim=20000)

     # Compute by what factor the sample size needs to be larger
     # (the variance needs to be smaller) so that the power is 0.9
     ratios <- seq(1, 4, length=50)
     pow <- single(50)
     for(i in 1:50) 
       pow[i] <- gbayesMixPowerNP(g, 1, 0.2/ratios[i], interval=c(-10,12))[2]

     # Solve for ratio using reverse linear interpolation
     approx(pow, ratios, xout=0.9)$y

     # Check this by computing power
     gbayesMixPowerNP(g, 1, 0.2/2.1, interval=c(-10,12))
     # So the study will have to be 2.1 times as large as earlier thought

