| remMap.CV {remMap} | R Documentation |
Fit remMap models for a series of tuning parameters and return the corresponding v-fold cross-validation scores. Two types of cross-validation scores are computed: cv based on unshrinked estimator (ols.cv); and cv based on shrinked estimator (rss.cv); ols.cv is recommended. rss.cv tends to select very large models and thus is not recommended in general (especially for very sparse models). v-fold CV is computationally demanding. But it assumes less assumptions than BIC and thus ols.cv is recommended over BIC unless computation is a concern.
remMap.CV(X, Y,lamL1.v, lamL2.v, C.m=NULL, fold=10, seed=1)
X |
numeric matrix (n by p): columns correspond to predictor variables and rows correspond to samples. Missing values are not allowed. |
Y |
numeric matrix (n by q): columns correspond to response variables and rows correspond to samples. Missing values are not allowed. |
lamL1.v |
numeric vector: a set of l_1 norm penalty parameters. |
lamL2.v |
numeric vector: a set of l_2 norm penalty parameters. |
C.m |
numeric matrix (p by q). C_m[i,j]=0 means the corresponding coefficient beta[i,j] is set to be zero in the model; C_m[i,j]=1 means the corresponding beta[i,j] is included in the MAP penalty; C_m[i,j]=2 means the corresponding beta[i,j] is not included in the MAP penalty; default(=NULL): C_m[i,j] are all set to be 1. |
fold |
positive integer. It specifies the cross validation fold number; default=10. |
seed |
numeric scalar. It specifies the seed of the random number generator in R for generating cross validation subsets; default=1. |
remMap.CV is used to perform two-dimensional grid search of the tuning parameters (lamL1.v, lamL2.v) based on v-fold
cross-validation scores. (Peng and et.al., 2008).
A list with four components
ols.cv |
a numeric matrix recording the cross validation scores based on unshrinked OLS estimators for each pair of (lamL1, lamL2). |
rss.cv |
a numeric matrix recording the cross validation scores based on shrinked remMap estimators for each pair of (lamL1, lamL2). |
phi.cv |
a list recording the fitted remMap coefficients on cross validation training subsets. Each component corresponds to one cv fold and it is again a list with each component corresponding to the estimated remMap coefficients for one pair of (lamL1, lamL2) on that training subset. |
l.index |
numeric matrix with two rows: each column is a pair of (lamL1, lamL2) and the kth column corresponds to the kth cv score in as.vector(ols.cv) and as.vector(rss.cv). |
Jie Peng, Pei Wang, Ji Zhu
J. Peng, J. Zhu, A. Bergamaschi, W. Han, D.-Y. Noh, J. R. Pollack, P. Wang, Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. (http://arxiv.org/abs/0812.3671)
############################################
############# Generate an example data set
############################################
n=100
p=300
q=300
set.seed(1)
### generate X matrix
rho=0.5; Sig<-matrix(0,p,p)
for(i in 2:p){ for(j in 1: (i-1)){
Sig[i,j]<-rho^(abs(i-j))
Sig[j,i]<-Sig[i,j]
}}
diag(Sig)<-1
R<-chol(Sig)
X.m<-matrix(rnorm(n*p),n,p)
X.m<-X.m%*%R
### generate coefficient
coef.m<-matrix(0,p,q)
hub.n=20
hub.index=sample(1:p, hub.n)
for(i in 1:q){
cur=sample(1:3,1)
temp=sample(hub.index, cur)
coef.m[temp,i]<-runif(length(temp), min=2, max=3)
}
### generate responses
E.m<-matrix(rnorm(n*q),n,q)
Y.m<-X.m%*%coef.m+E.m
##############################################################################################
############ perform analysis
##############################################################################################
###############################################
## 1. ## fit model for one pair of (lamL1, lamL2)
###############################################
try1=remMap(X.m, Y.m,lamL1=100, lamL2=50, phi0=NULL, C.m=NULL)
#################################################################################################
## 2. ## Select tuning parameters with BIC:
### ## computationally easy; but the BIC procedure assumes orthogonality of the design matrix to estimate the degrees of freedom;
### ## thus it tends to select too small models when the actual design matrix (X.m) is far from orthogonal
#################################################################################################
lamL1.v=exp(seq(log(51),log(150), length=5))
lamL2.v=seq(0,100, length=5)
df.m=remMap.df(X.m, Y.m, lamL1.v, lamL2.v, C.m=NULL)
### The estimated degree freedom can be used to select the ranges of tuning parameters.
try2=remMap.BIC(X.m, Y.m,lamL1.v, lamL2.v, C.m=NULL)
pick=which.min(as.vector(t(try2$BIC)))
result=try2$phi[[pick]]
FP=sum(result$phi!=0 & coef.m==0) ## number of false positives
FN=sum(result$phi==0 & coef.m!=0) ## number of false negatives
print(paste("lamL1=", round(result$lam1,3), "; lamL2=", round(result$lam2,3), sep="")) ##BIC selected tuning parameters
print(paste("FP=", FP, "; FN=", FN, sep=""))
################################################################################################################
## 3. ## Select tuning parameters with v-fold cross-validation;
### ## computationally demanding;
### ## but cross-validation assumes less assumptions than BIC and thus is recommended unless computation is a concern;
## ## alos cv based on unshrinked estimator (ols.cv) is recommended over cv based on shrinked estimator (rss.cv);
### ## the latter tends to select too large models.
################################################################################################################
lamL1.v=exp(seq(log(51),log(150), length=5))
lamL2.v=seq(0,100, length=5)
try3=remMap.CV(X=X.m, Y=Y.m,lamL1.v, lamL2.v, C.m=NULL, fold=10, seed=1)
############# use CV based on unshrinked estimator (ols.cv)
pick=which.min(as.vector(try3$ols.cv))
lamL1.pick=try3$l.index[1,pick] ##find the optimal (LamL1,LamL2) based on the cv score
lamL2.pick=try3$l.index[2,pick]
result=remMap(X.m, Y.m,lamL1=lamL1.pick, lamL2=lamL2.pick, phi0=NULL, C.m=NULL) ##fit the remMap model under the optimal (LamL1,LamL2).
FP=sum(result$phi!=0 & coef.m==0) ## number of false positives
FN=sum(result$phi==0 & coef.m!=0) ## number of false negatives
print(paste("lamL1=", round(lamL1.pick,3), "; lamL2=", round(lamL2.pick,3), sep="")) ##CV (unshrinked) selected tuning parameters
print(paste("FP=", FP, "; FN=", FN, sep=""))
############# use CV based on shrinked estimator (rss.cv); it tends to select very large models: thus is not recommended in general
pick=which.min(as.vector(try3$rss.cv))
lamL1.pick=try3$l.index[1,pick] ##find the optimal (LamL1,LamL2) based on the cv score
lamL2.pick=try3$l.index[2,pick]
result=remMap(X.m, Y.m,lamL1=lamL1.pick, lamL2=lamL2.pick, phi0=NULL, C.m=NULL)
FP=sum(result$phi!=0 & coef.m==0) ## number of false positives
FN=sum(result$phi==0 & coef.m!=0) ## number of false negatives
print(paste("lamL1=", round(lamL1.pick,3), "; lamL2=", round(lamL2.pick,3), sep="")) ##CV (shrinked) selected tuning parameters
print(paste("FP=", FP, "; FN=", FN, sep=""))