| interpolant {emulator} | R Documentation |
Calculates the a postiori distribution of results at a point using the
techniques outlined by Oakley. Function interpolant() is the
primary function of the package. Function interpolant.quick()
gives the expectation of the emulator at a set of points, and function
interpolant() gives the expectation and other information (such
as the variance) at a single point. Function int.qq() gives a
quick-quick vectorized interpolant using certain timesaving
assumptions.
interpolant(x, d, xold, Ainv=NULL, A=NULL, use.Ainv=TRUE, scales=NULL, pos.def.matrix=NULL, func=regressor.basis, give.full.list = FALSE, distance.function=corr, ...) interpolant.quick(x, d, xold, Ainv, scales=NULL, pos.def.matrix=NULL, func=regressor.basis, give.Z = FALSE, distance.function=corr, ...) int.qq(x, d, xold, Ainv, pos.def.matrix, func=regressor.basis)
x |
Point(s) at which estimation is desired. For
interpolant.quick(), argument x is a matrix
and an expectation is given for each row |
d |
vector of observations, one for each row of xold |
xold |
Matrix with rows corresponding to points at which the function is known |
A |
Correlation matrix A. If not given, it is calculated |
Ainv |
Inverse of correlation matrix A. Required by
interpolant.quick() and int.qq(). In interpolant(),
using the default value of NULL results in Ainv being
calculated explicitly (which may be slow: see next argument for more
details) |
use.Ainv |
Boolean, with default TRUE meaning to use the
inverse matrix Ainv (and, if necessary, calculate it using
solve(.)). This requires the not inconsiderable overhead of
inverting a matrix. If, however, Ainv is available, using
the default option is much faster than setting
use.Ainv=FALSE; see below.
If FALSE, function interpolant() does not use
Ainv, but makes extensive use of solve(A,x), mostly in
the form of quad.form.inv() calls. This option avoids the
overhead of inverting a matrix, but has non-negligible marginal
costs.
If Ainv is not available, there is little to choose, in terms
of execution time, between calculating it explicitly (that is,
setting use.Ainv=TRUE) and using solve(A,x) (ie
use.Ainv=TRUE).
Note: if Ainv is given to the function, but
use.Ainv is FALSE, the code will do as requested and use
the slow solve(A,x), which is probably not what you want |
func |
Function used to determine basis vectors, defaulting
to regressor.basis if not given |
give.full.list |
In interpolant(), Boolean variable with
TRUE meaning to return the whole list of a postiori
parameters as detailed on pp12-15 of Oakley, and default FALSE
meaning to return just the best estimate |
scales |
Vector of “roughness” lengths used to calculate
t(x), the correlations between x and the points in the
design matrix xold.
Note that scales is needed twice overall: once to calculate
Ainv, and once to calculate t(x) inside
interpolant() (t(x) is determined by calling
corr() inside an apply() loop). A good place to start
might be scales=rep(1,ncol(xold)).
It's probably worth restating here that the elements of scales correspond to the diagonal elements of the B
matrix (see ?corr) and so have the dimensions of
1/D^2 where D is the dimensions of
xold |
pos.def.matrix |
A positive definite matrix that is used if
scales is not supplied. Note that precisely one of
scales and pos.def.matrix must be supplied |
give.Z |
In function interpolant.quick(), Boolean variable
with TRUE meaning to return the best estimate and the error,
and default FALSE meaning to return just the best estimate |
distance.function |
Function to compute distances between
points, defaulting to corr(). See corr.Rd for
details. Note that method=2 or method=3 is required
if a non-standard distance function is used |
... |
Further arguments passed to the distance function,
usually corr() |
In function interpolant(), if give.full.list is
TRUE, a list is returned with components
betahat |
Standard MLE of the (linear) fit, given the observations |
prior |
Estimate for the prior |
sigmahat.square |
A postiori estimate for variance |
mstar.star |
A postiori expectation |
cstar |
a priori correlation of a point with itself |
cstar.star |
A postiori correlation of a point with itself |
Z |
Standard deviation (although the distribution is actually a t-distribution with n-q degrees of freedom) |
Robin K. S. Hankin
# example has 10 observations on 6 dimensions.
# function is just sum( (1:6)*x) where x=c(x_1, ... , x_2)
data(toy)
val <- toy
real.relation <- function(x){sum( (0:6)*x )}
H <- regressor.multi(val)
d <- apply(H,1,real.relation)
fish <- rep(1,6)
fish[6] <- 4
A <- corr.matrix(val,scales=fish, power=2)
Ainv <- solve(A)
# now add some suitably correlated noise to d:
d.noisy <- as.vector(rmvnorm(n=1, mean=d, 0.1*A))
names(d.noisy) <- names(d)
# First try a value at which we know the answer (the first row of val):
x.known <- as.vector(val[1,])
bayes.known <- interpolant(x.known, d, val, Ainv=Ainv, scales=fish, g=FALSE)
print("error:")
print(d[1]-bayes.known)
# Now try the same value, but with noisy data:
print("error:")
print(d.noisy[1]-interpolant(x.known, d.noisy, val, Ainv=Ainv, scales=fish, g=FALSE))
#And now one we don't know:
x.unknown <- rep(0.5 , 6)
bayes.unknown <- interpolant(x.unknown, d.noisy, val, scales=fish, Ainv=Ainv,g=TRUE)
## [ compare with the "true" value of sum(0.5*0:6) = 10.5 ]
# Just a quickie for int.qq():
int.qq(x=rbind(x.unknown,x.unknown+0.1),d.noisy,val,Ainv,pos.def.matrix=diag(fish))
## (To find the best correlation lengths, use optimal.scales())
# Now we use the SAME dataset but a different set of basis functions.
# Here, we use the functional dependence of
# "A+B*(x[1]>0.5)+C*(x[2]>0.5)+...+F*(x[6]>0.5)".
# Thus the basis functions will be c(1,x>0.5).
# The coefficients will again be 1:6.
# Basis functions:
f <- function(x){c(1,x>0.5)}
# (other examples might be
# something like "f <- function(x){c(1,x>0.5,x[1]^2)}"
# now create the data
real.relation2 <- function(x){sum( (0:6)*f(x) )}
d2 <- apply(val,1,real.relation2)
# Define a point at which the function's behaviour is not known:
x.unknown2 <- rep(1,6)
# Thus real.relation2(x.unknown2) is sum(1:6)=21
# Now try the emulator:
interpolant(x.unknown2, d2, val, Ainv=Ainv, scales=fish, g=TRUE)$mstar.star
# Heh, it got it wrong! (we know that it should be 21)
# Now try it with the correct basis functions:
interpolant(x.unknown2, d2, val, Ainv=Ainv,scales=fish, func=f,g=TRUE)$mstar.star
# That's more like it.
# We can tell that the coefficients are right by:
betahat.fun(val,Ainv,d2,func=f)
# Giving c(0:6), as expected.
# It's interesting to note that using the *wrong* basis functions
# gives the *correct* answer when evaluated at a known point:
interpolant(val[1,], d2, val, Ainv=Ainv,scales=fish, g=TRUE)$mstar.star
real.relation2(val[1,])
# Which should agree.
# Now look at Z. Define a function Z() which determines the
# standard deviation at a point near a known point.
Z <- function(o) {
x <- x.known
x[1] <- x[1]+ o
interpolant(x, d.noisy, val, Ainv=Ainv, scales=fish, g=TRUE)$Z
}
Z(0) #should be zero because we know the answer (this is just Z at x.known)
Z(0.1) #nonzero error.
## interpolant.quick() should give the same results faster, but one
## needs a matrix:
u <- rbind(x.known,x.unknown)
interpolant.quick(u, d.noisy, val, scales=fish, Ainv=Ainv,g=TRUE)
# Now an example from climate science. "results.table" is a dataframe
# of goldstein (a climate model) results. Each of its 100 rows shows a
# point in parameter space together with certain key outputs from the
# goldstein program. The following R code shows how we can set up an
# emulator based on the first 27 goldstein runs, and use the emulator to
# predict the output for the remaining 73 goldstein runs. The results
# of the emulator are then plotted on a scattergraph showing that the
# emulator is producing estimates that are close to the "real" goldstein
# runs.
data(results.table)
data(expert.estimates)
# Decide which column we are interested in:
output.col <- 26
# extract the "important" columns:
wanted.cols <- c(2:9,12:19)
# Decide how many to keep;
# 30-40 is about the most we can handle:
wanted.row <- 1:27
# Values to use are the ones that appear in goin.test2.comments:
val <- results.table[wanted.row , wanted.cols]
# Now normalize val so that 0<results.table[,i]<1 is
# approximately true for all i:
normalize <- function(x){(x-mins)/(maxes-mins)}
unnormalize <- function(x){mins + (maxes-mins)*x}
mins <- expert.estimates$low
maxes <- expert.estimates$high
jj <- t(apply(val,1,normalize))
jj <- as.data.frame(jj)
names(jj) <- names(val)
val <- jj
## The value we are interested in is the 19th (or 20th or ... or 26th) column.
d <- results.table[wanted.row , output.col]
## Now some scales, estimated earlier from the data using
## optimal.scales():
scales.optim <- exp(c( -2.917, -4.954, -3.354, 2.377, -2.457, -1.934, -3.395,
-0.444, -1.448, -3.075, -0.052, -2.890, -2.832, -2.322, -3.092, -1.786))
A <- corr.matrix(val,scales=scales.optim, method=2, power=1.5)
Ainv <- solve(A)
print("and plot points used in optimization:")
d.observed <- results.table[ , output.col]
A <- corr.matrix(val,scales=scales.optim, method=2, power=1.5)
Ainv <- solve(A)
print("now plot all points:")
design.normalized <- as.matrix(t(apply(results.table[,wanted.cols],1,normalize)))
d.predicted <- interpolant.quick(design.normalized , d , val , Ainv=Ainv,
scales=scales.optim, power=1.5)
jj <- range(c(d.observed,d.predicted))
par(pty="s")
plot(d.observed, d.predicted, pch=16, asp=1,
xlim=jj,ylim=jj,
xlab=expression(paste(temperature," (",{}^o,C,"), model" )),
ylab=expression(paste(temperature," (",{}^o,C,"), emulator"))
)
abline(0,1)