interpolant             package:emulator             R Documentation

_I_n_t_e_r_p_o_l_a_t_e_s _b_e_t_w_e_e_n _k_n_o_w_n _p_o_i_n_t_s _u_s_i_n_g _B_a_y_e_s_i_a_n _e_s_t_i_m_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the a postiori distribution of results at a point using
     the techniques outlined by Oakley.  This function is the primary
     function of the package.  Function 'interpolant.quick()' gives the
     expectation of the emulator at a set of points, and function
     'interpolant()' gives the expectation and other information (such
     as the variance) at a single point.  Function 'int.qq()' gives a
     quick-quick vectorized interpolant using certain timesaving
     assumptions.

_U_s_a_g_e:

     interpolant(x, d, xold, Ainv=NULL, A=NULL, use.Ainv=TRUE, scales=NULL, pos.def.matrix=NULL,
     func=regressor.basis, give.full.list = FALSE, distance.function=corr, ...)
     interpolant.quick(x, d, xold, Ainv, scales=NULL,
     pos.def.matrix=NULL, func=regressor.basis, give.Z = FALSE,
     distance.function=corr, ...)
     int.qq(x, d, xold, Ainv, func=regressor.basis)

_A_r_g_u_m_e_n_t_s:

       x: Point(s)  at which estimation is desired.  For
          'interpolant.quick()', argument 'x' is a data frame and an
          expectation is given for each row.

       d: vector of observations, one for each row of 'xold'

    xold: Data frame with rows corresponding to points at which the
          function is known

       A: Correlation matrix 'A'.  If not given, it is calculated.

    Ainv: Inverse of correlation matrix 'A'.  Required by
          'interpolant.quick()' and 'int.qq()'.  In 'interpolant()',
          using the default value of 'NULL' results in 'Ainv' being
          calculated explicitly (which may be slow: see next argument
          for more details).

use.Ainv: Boolean, with default 'TRUE' meaning to use the inverse
          matrix 'Ainv' (and, if necessary, calculate it using
          'solve(.)').  This requires the not inconsiderable overhead
          of inverting a matrix.   If, however, 'Ainv' is available,
          using the default option is _much_ faster than setting
          'use.Ainv=FALSE'; see below.

          If 'FALSE', function 'interpolant()' does not use 'Ainv', but
          makes extensive use of 'solve(A,x)' (mostly in the form of
          'quad.form.inv()' calls).  This option avoids the overhead of
          inverting a matrix, but has non-negligible marginal costs.

          If 'Ainv' is not available, there is little to choose, in
          terms of execution time, between calculating it explicitly
          (that is, setting 'use.Ainv=TRUE') and using 'solve(A,x)' (ie
          'use.Ainv=TRUE').

          *Note:* if 'Ainv' is given to the function, but 'use.Ainv' is
          'FALSE', the code will do as requested and use the slow
          'solve(A,x)', which is probably not what you want.  

    func: Function used to determine basis vectors, defaulting to
          'regressor.basis'

give.full.list: In 'interpolant()', Boolean variable with 'TRUE'
          meaning to return the whole list of a postiori parameters as
          detailed on pp12-15 of Oakley, and default 'FALSE' meaning to
          return just the best estimate.

  scales: Vector of "roughness" lengths used to calculate 't(x)'.  Note
          that 'scales' is needed twice: once to calculate 'Ainv' and
          once to calculate 't(x)' inside 'interpolant' (which is
          determined by calling 'corr' inside an 'apply()' loop).  A
          good place to start might be 'scales=rep(1,ncol(xold))'.

pos.def.matrix: A positive definite matrix that is used if 'scales' is
          not supplied.  Note that precisely one of 'scales' and
          'pos.def.matrix' must be supplied.

  give.Z: In function 'interpolant.quick()', Boolean variable with
          'TRUE' meaning to return the best estimate and the error, and
          default 'FALSE' meaning to return just the best estimate.

distance.function: Function to compute distances between points,
          defaulting to 'corr()'.  See 'corr.Rd' for details. Note that
          'method=2' or 'method=3' is required if a non-standard
          distance function is used.

     ...: Further arguments passed to the distance function, usually
          'corr()'

_V_a_l_u_e:

     If 'give.full.list' is TRUE, a list is return with components 

 betahat: Standard MLE of the (linear) fit, given the observations

   prior: Estimate for the prior

sigmahat.square: A postiori estimate for variance

mstar.star: A postiori expectation

   cstar: a priori correlation of a point with itself

cstar.star: A postiori correlation of a point with itself

       Z: Standard deviation (although the distribution is actually a
          t-distribution with n-q degrees of freedom)

_A_u_t_h_o_r(_s):

     Robin K. S. Hankin

_R_e_f_e_r_e_n_c_e_s:

     J. Oakley 2004. "Estimating percentiles of uncertain computer code
     outputs".  Applied Statistics, 53(1), pp89-93.

     J. Oakley 1999. "Bayesian uncertainty analysis for complex
     computer codes", PhD thesis, University of Sheffield.

     J. Oakley and A. O'Hagan, 2002. "Bayesian Inference for the
     Uncertainty Distribution of Computer Model Outputs", Biometrika
     89(4), pp769-784

     R. K. S. Hankin 2005. "Introducing BACCO, an R bundle for Bayesian
     analysis of computer code output", Journal of Statistical
     Software, 14(16)

_S_e_e _A_l_s_o:

     'makeinputfiles'

_E_x_a_m_p_l_e_s:

     # example has 10 observations on 6 dimensions.
     # function is just sum( (1:6)*x) where x=c(x_1, ... , x_2)

     data(toy)
     val <- toy
     real.relation <- function(x){sum( (0:6)*x )}
     H <- regressor.multi(val)
     d <- apply(H,1,real.relation)

     fish <- rep(1,6)
     fish[6] <- 4

     A <- corr.matrix(val,scales=fish, power=2)
     Ainv <- solve(A)

     # now add some suitably correlated noise to d:
     d.noisy <-  as.vector(rmvnorm(n=1, mean=d, 0.1*A))
     names(d.noisy) <- names(d)

     # First try a value at which we know the answer (the first row of val):
     x.known <- as.vector(val[1,])
     bayes.known <- interpolant(x.known, d, val, Ainv=Ainv, scales=fish, g=FALSE)
     print("error:")
     print(d[1]-bayes.known)

     # Now try the same value, but with noisy data:
     print("error:")
     print(d.noisy[1]-interpolant(x.known, d.noisy, val, Ainv=Ainv, scales=fish, g=FALSE))

     #And now one we don't know:
     x.unknown <- rep(0.5 , 6)
     bayes.unknown <- interpolant(x.unknown, d.noisy, val, scales=fish, Ainv=Ainv,g=TRUE)

     ## [   compare with the "true" value of sum(0.5*0:6) = 10.5   ]


     # Just a quickie for int.qq():
     int.qq(x=rbind(x.unknown,x.unknown+0.1),d.noisy,val,Ainv)

     ## (To find the best correlation lengths, use optimal.scales())

      # Now we use the SAME dataset but a different set of basis functions.
      # Here, we use the functional dependence of
      # "A+B*(x[1]>0.5)+C*(x[2]>0.5)+...+F*(x[6]>0.5)".
      # Thus the basis functions will be c(1,x>0.5).
      # The coefficients will again be 1:6.

            # Basis functions:
     f <- function(x){c(1,x>0.5)}
            # (other examples might be
            # something like  "f <- function(x){c(1,x>0.5,x[1]^2)}"

            # now create the data
     real.relation2 <- function(x){sum( (0:6)*f(x) )}
     d2 <- apply(val,1,real.relation2)

            # Define a point at which the function's behaviour is not known:
     x.unknown2 <- rep(1,6)
            # Thus real.relation2(x.unknown2) is sum(1:6)=21

            # Now try the emulator:
     interpolant(x.unknown2, d2, val, Ainv=Ainv, scales=fish, g=TRUE)$mstar.star
            # Heh, it got it wrong!  (we know that it should be 21)

            # Now try it with the correct basis functions:
     interpolant(x.unknown2, d2, val, Ainv=Ainv,scales=fish, func=f,g=TRUE)$mstar.star
            # That's more like it.

            # We can tell that the coefficients are right by:
     betahat.fun(val,Ainv,d2,func=f)
            # Giving c(0:6), as expected.

            # It's interesting to note that using the *wrong* basis functions
            # gives the *correct* answer when evaluated at a known point:
     interpolant(val[1,], d2, val, Ainv=Ainv,scales=fish, g=TRUE)$mstar.star
     real.relation2(val[1,])
            # Which should agree.

            # Now look at Z.  Define a function Z() which determines the
            # standard deviation at a point near a known point.
     Z <- function(o) {
         x <- x.known 
         x[1] <- x[1]+ o
         interpolant(x, d.noisy, val, Ainv=Ainv, scales=fish, g=TRUE)$Z
       } 

     Z(0)       #should be zero because we know the answer (this is just Z at x.known)
     Z(0.1)     #nonzero error.

       ## interpolant.quick() should  give the same results faster, but one
       ##   needs a matrix:
     u <- rbind(x.known,x.unknown)
     interpolant.quick(u, d.noisy, val, scales=fish, Ainv=Ainv,g=TRUE)

     data(results.table)
     data(expert.estimates)

            # Decide which column we are interested in:
     output.col <- 26

            #
     wanted.cols <- c(2:9,12:19)

            # Decide how many to keep;
            # 30-40 is about the most we can handle:
     wanted.row <- 1:27

            # Values to use are the ones that appear in goin.test2.comments:
     val <- results.table[wanted.row , wanted.cols]

            # Now normalize val so that 0<results.table[,i]<1 for all i:

     normalize <- function(x){(x-mins)/(maxes-mins)}
     unnormalize <- function(x){mins + (maxes-mins)*x}

     mins  <- expert.estimates$low 
     maxes <- expert.estimates$high
     jj <- t(apply(val,1,normalize))

     jj <- as.data.frame(jj) 
     names(jj) <- names(val)
     val <- jj

            ## Answer is the 19th (or 20th or ... or 26th)
     d  <- results.table[wanted.row ,  output.col]

     A <- corr.matrix(val,scales=rep(1,ncol(val)), method=2, power=1.5)
     Ainv <-  solve(A)

     scales.optim <- c( -2.917, -4.954, -3.354, 2.377, -2.457, -1.934, -3.395,
     -0.444, -1.448, -3.075, -0.052, -2.890, -2.832, -2.322, -3.092, -1.786)

     print("and plot points used in optimization:")
     d.observed <- results.table[ , output.col]

     A <- corr.matrix(val,scales=scales.optim, method=2, power=1.5)
     Ainv <- solve(A)

     print("now plot all points:")
     design.normalized <- as.matrix(t(apply(results.table[,wanted.cols],1,normalize)))
     d.predicted <- interpolant.quick(design.normalized , d , val , Ainv=Ainv,
     scales=scales.optim, power=1.5)
     jj <- range(c(d.observed,d.predicted))
     par(pty="s")
     plot(d.observed, d.predicted, pch=16, asp=1,
     xlim=jj,ylim=jj,
     xlab=expression(paste(temperature," (",{}^o,C,"), model"   )),
     ylab=expression(paste(temperature," (",{}^o,C,"), emulator"))
     )
     abline(0,1)

