OO2002               package:emulator               R Documentation

_I_m_p_l_e_m_e_n_t_a_t_i_o_n _o_f _t_h_e _i_d_e_a_s _o_f _O_a_k_l_e_y _a_n_d _O'_H_a_g_a_n _2_0_0_2

_D_e_s_c_r_i_p_t_i_o_n:

     Implementation of the ideas of Oakley and O'Hagan 2002:
     'var.conditional()' calculates the conditional variance-covariance
     matrix, and 'cond.sample()' samples from the appropriate
     multivariate t distribution.

_U_s_a_g_e:

     cond.sample(n = 1, x, xold, d, A, Ainv, scales = NULL, pos.def.matrix =
     NULL, func = regressor.basis, ...)
     var.conditional(x, xold, d, A, Ainv, scales = NULL, pos.def.matrix = NULL, 
         func = regressor.basis, distance.function = corr, ...)

_A_r_g_u_m_e_n_t_s:

       n: In function 'cond.sample()', the number of observations to
          take, defaulting to 1

       x: Simulation design points

    xold: Design points

       d: Data vector

       A: Correlation matrix

    Ainv: Inverse of correlation matrix A

  scales: Roughness lengths

pos.def.matrix: Positive definite matrix of correlations

    func: Function to calculate H

distance.function: Distance function (defaulting to 'corr()')

     ...: Further arguments passed to the distance function, usually
          'corr()'

_D_e_t_a_i_l_s:

     We wish to generate the distribution for the process at uncertain
     point 'x'; uncertainty in 'x' is captured by assuming it to be
     drawn from a pdf 'X'.

     The basic idea is to estimate m* at _simulated_ design points
     using 'cond.sample()', which samples from the multivariate t
     distribution conditional on the data 'd' at the design points. The
     random datavector of estimates m* is called 'ddash'.

     We repeat this process many times, each time estimating eta(.)
     using the augmented dataset 'c(d,ddash)' as a training set.

     For each estimated eta(.), we have a complete emulator that can be
     used to build up an ensemble of estimates.

_V_a_l_u_e:

     Function 'cond.sample()' returns a n*p matrix whose rows are
     independent samples from the appropriate multivariate t
     distribution.  Here, p is the number of rows of 'x' (ie the number
     of simulated design points).  Consider a case where there are just
     two simulated design points, close to each other but far from any
     point of the original design points.  Then function
     'cond.sample(n=4, ...)' will give four numbers which are close to
     one another but have high (between-instantiation) variance.

     Function 'var.conditional()' calculates the denominator of
     equation 3 of Oakley and OHagan 2002.  This function is intended
     to be called by 'cond.sample()' but might be interesting _per se_
     when debugging or comparing different choices of simulated design
     points.

_N_o_t_e:

     Function 'cond.sample()' and 'var.conditional()' together are a
     superset of function 'interpolant()' because it accounts for
     covariance between multiple observations.  It is, however, much
     slower.

     Also note that these functions are used to good effect in the
     examples section of 'oo2002.Rd'.

_A_u_t_h_o_r(_s):

     Robin K. S. Hankin

_R_e_f_e_r_e_n_c_e_s:

        *  J. Oakley 2002. _Bayesian inference for the uncertainty
           distribution of computer model outputs_.  Biometrika,
           89(4):769-784

        *  R. K. S. Hankin 2005. _Introducing BACCO, an R bundle for
           Bayesian analysis of computer code output_, Journal of
           Statistical Software, 14(16)

_S_e_e _A_l_s_o:

     'regressor.basis', for a more visually informative example of
     'cond.sample()' et seq; and 'interpolant' for more examples

_E_x_a_m_p_l_e_s:

          # Now we use the functions.  First we set up some design points:
          # Suppose we are given the toy dataset and want to know the PDF of
          # fourth power of the response at point x, where uncertainty in x
          # may be represented as it being drawn from a normnl distribution
          # with mean c(0.5,0.5,...,0.5) and a variance of 0.001.

     data(toy)
     val <- toy
     real.relation <- function(x){sum( (0:6)*x )}
     H <- regressor.multi(val)
     d <- apply(H,1,real.relation)

          # and some scales (which are assumed to be known):
     fish <- rep(1,6)
     fish[6] <- 4

          # And determine A and Ainv:
     A <- corr.matrix(val,scales=fish, power=2)
     Ainv <- solve(A)

          # and add some suitably correlated Gaussian noise:
     d.noisy <-  as.vector(rmvnorm(n=1, mean=d, 0.1*A))
     names(d.noisy) <- names(d)

          # Now some simulation design points.  Choose n'=6:
     xdash <-  matrix(runif(36),ncol=6)

          # And just for fun, we insert a near-useless seventh simulation
          # design point (it is nearly useless because it is a near copy of
          # the sixth).  We do this in order to test the coding:
     xdash <- rbind(xdash,xdash[6,] + 1e-4)
     colnames(xdash) <- colnames(val)
     rownames(xdash) <- c("alpha","beta","gamma","delta","epsilon","zeta","zeta.copy")

          # Print the variance matrix:
     (vm <- var.conditional(x=xdash,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish))
          # Note that the sixth and seventh columns are almost identical
          # (and so, therefore, are the sixth and seventh rows) as
          # expected.

          # Also, the final eigenvalue of vm should be small:
     eigen(vm)$values

          # Now sample from the conditional t-distribution.  Taking n=3 samples:
     (cs <- cond.sample(n=3, x=xdash, xold=val, d=d.noisy, A=A, Ainv=Ainv,
                        scales = fish, func = regressor.basis))

          # Note the last two columns are nearly identical, as expected.

          # Just as a test, what is the variance matrix at the design points?
     (vc <- var.conditional(x=val,xold=val,d=d.noisy,A=A,Ainv=Ainv,scales=fish))
          # (This should be exactly zero);
     max(eigen(vc)$values)
          # should be small

          # Next, we apply the methods of OO2002 using Monte Carlo techniques.
          # We will generate  10 different versions of eta:
     number.of.eta <- 10

          # And, for each eta,  we will sample from the posterior t distribution 11 times:
     number.of.X <- 11

          # create an augmented design matrix, of the design points plus the
          # simulated design points:
     design.augmented <- rbind(val,xdash)
     A.augmented <- corr.matrix(design.augmented, scales=fish)
     Ainv.augmented <- solve(A.augmented)

     out <- NULL
     for(i in seq_len(number.of.eta)){
             # Create random data by sampling from the conditional
             # multivariate t at the simulated design points xdash, from
             # the t-distribution given the data d:
        ddash <- cond.sample(n=1, x=xdash, xold=val, d=d.noisy, Ainv=Ainv, scales=fish)

             # Now use the emulator to calculate m^* at points chosen from
             # the PDF of X:
        jj <-
          interpolant.quick(x=rmvnorm(n=number.of.X,rep(0.5,6),diag(6)/1000),
                            d=c(d.noisy,ddash),
                            xold=design.augmented,
                            Ainv=Ainv.augmented,
                            scales=fish)
        out <- c(out,jj)
     }

        # histogram of the fourth power:
     hist(out^4, col="gray")
        # See oo2002 for another example of cond.sample() in use

