| cov.shrink {corpcor} | R Documentation |
The functions var.shrink, cor.shrink, and cov.shrink compute
shrinkage estimates of variance, correlation, and covariance, respectively.
var.shrink(x, lambda.var, w, verbose=TRUE) cor.shrink(x, lambda, w, verbose=TRUE) cov.shrink(x, lambda, lambda.var, w, verbose=TRUE)
x |
a data matrix |
lambda |
the correlation shrinkage intensity (range 0-1).
If lambda is not specified (the default) it is estimated
using an analytic formula from Schaefer and Strimmer (2005)
- see details below.
For lambda=0 the empirical correlations are recovered. |
lambda.var |
the variance shrinkage intensity (range 0-1).
If lambda.var is not specified (the default) it is estimated
using an analytic formula from Schaefer and Strimmer (2005)
- see details below.
For lambda.var=0 the empirical variances are recovered. |
w |
optional: weights for each data point - if not specified uniform weights are assumed
(w = rep(1/n, n) with n = dim(x)[1]). |
verbose |
output some status messages while computing (default: TRUE) |
var.shrink computes the empirical variance of each considered random variable, and shrinks
them towards their common average. The shrinkage intensity is estimated using Eq. 6 from
Schaefer and Strimmer (2005), which leads to
λ_{var}^{*} = ( (1-1/p) sum_{k=1}^p Var(s_{kk}) )/ sum_{k=1}^p (s_{kk} - bar{s})^2 .
Similarly cor.shrink computes a shrinkage estimate of the correlation matrix by
shrinking the empirical correlations towards the identity matrix.
In this case the shrinkage intensity is (Schaefer and Strimmer, 2005)
λ^{*} = sum_{k neq k} Var(r_{kl}) / sum_{k neq l} r_{kl}^2 .
cov.shrink computes the corresponding full covariance matrix
on the basis of the shrunken correlation matrix and the shrunken variances.
In comparison with the standard empirical estimates
(var, cov, and cor) the shrinkage estimates exhibit
a number of favorable properties. For instance,
These properties also carry over to derived quantities, such as partial covariance and
partial correlation (pcov.shrink and pcor.shrink).
As an extra benefit, the shrinkage estimators have a form that can be very efficiently inverted
using the Woodbury matrix identity, even if the number of variables is large.
Note that this identity is employed in the functions
invcov.shrink and invcor.shrink. and is much faster than directly
inverting the matrix output by cov.shrink and cor.shrink, respectively.
var.shrink returns a vector with estimated variances.
cov.shrink returns a covariance matrix.
cor.shrink returns the corresponding correlation matrix.
Juliane Schaefer (http://www.stat.math.ethz.ch/~schaefer/) and Korbinian Strimmer (http://www.statistik.lmu.de/~strimmer/).
Schaefer, J., and Strimmer, K. (2005). A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol.4:32. (http://www.bepress.com/sagmb/vol4/iss1/art32/)
invcov.shrink, pcor.shrink, cor2pcor
# load corpcor library
library("corpcor")
# small n, large p
p <- 100
n <- 20
# generate random pxp covariance matrix
sigma <- matrix(rnorm(p*p),ncol=p)
sigma <- crossprod(sigma)+ diag(rep(0.1, p))
# simulate multinormal data of sample size n
sigsvd <- svd(sigma)
Y <- t(sigsvd$v %*% (t(sigsvd$u) * sqrt(sigsvd$d)))
X <- matrix(rnorm(n * ncol(sigma)), nrow = n) %*% Y
# estimate covariance matrix
s1 <- cov(X)
s2 <- cov.shrink(X)
# squared error
sum((s1-sigma)^2)
sum((s2-sigma)^2)
# compare positive definiteness
is.positive.definite(s1)
is.positive.definite(s2)
is.positive.definite(sigma)
# compare ranks and condition
rank.condition(s1)
rank.condition(s2)
rank.condition(sigma)
# compare eigenvalues
e1 <- eigen(s1, symmetric=TRUE)$values
e2 <- eigen(s2, symmetric=TRUE)$values
e3 <- eigen(sigma, symmetric=TRUE)$values
m <-max(e1, e2, e3)
yl <- c(0, m)
par(mfrow=c(1,3))
plot(e1, main="empirical")
plot(e2, ylim=yl, main="shrinkage")
plot(e3, ylim=yl, main="true")
par(mfrow=c(1,1))