| sda.ranking {sda} | R Documentation |
sda.ranking determines a ranking of features by computing cat scores
between the group centroids and the pooled mean.
plot.sda.ranking provides a graphical visualization of the top ranking features..
sda.ranking(Xtrain, L, diagonal=FALSE, fdr=TRUE, plot.fdr=FALSE, verbose=TRUE) ## S3 method for class 'sda.ranking': plot(x, top=40, ...)
Xtrain |
A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables. |
L |
A factor with the class labels of the training samples. |
diagonal |
Chooses between LDA (default, diagonal=FALSE) and DDA (diagonal=TRUE). |
fdr |
compute FDR values and HC scores for each feature. |
plot.fdr |
Show plot with estimated FDR values. |
verbose |
Print out some info while computing. |
x |
An "sda.ranking" object – this is produced by the sda.ranking() function. |
top |
The number of top-ranking features shown in the plot (default: 40). |
... |
Additional arguments for generic plot. |
For each feature and centroid a shrinkage cat scores of the mean versus the pooled mean is computed. The overall ranking of a feature is determined by the sum of the squared cat scores across all centroids. For the diagonal case (LDA) the cat score reduce to the t-score. Thus in the two-class diagonal case the feature are simply ranked according to the (shrinkage) t-scores.
Calling sda.ranking should be step 1 in a classification analysis. Steps 2 and 3 are
sda and predict.sda
See Ahdesm"aki and Strimmer (2009) for details. For the case of two classes see Zuber and Strimmer (2009).
sda.ranking returns a matrix with the follwing columns:
idx |
original feature number |
score |
sum of the squared cat scores - this determines the overall ranking |
cat |
for each group and feature the cat score of the centroid versus the pooled mean |
If fdr=TRUE then additionally local false discovery rate (FDR) values
as well as higher criticism (HC) scores are computed for each feature
(using fdrtool).
Miiika Ahdesm"aki and Korbinian Strimmer (http://strimmerlab.org).
Ahdesm"aki, A., and K. Strimmer. 2009. Feature selection in "omics" prediction problems using cat scores and false non-discovery rate control. See http://arxiv.org/abs/0903.2003 for publication details.
Zuber, V., and K. Strimmer. 2009. Gene ranking and biomarker discovery under correlation. See http://arxiv.org/abs/0902.0751 for publication details.
# load sda library
library("sda")
#################
# training data #
#################
# prostate cancer set
data(singh2002)
# training data
Xtrain = singh2002$x
Ytrain = singh2002$y
#########################################
# feature ranking (diagonal covariance) #
#########################################
# ranking using t-scores (DDA)
ranking.DDA = sda.ranking(Xtrain, Ytrain, diagonal=TRUE)
ranking.DDA[1:10,]
# plot t-scores for the top 40 genes
plot(ranking.DDA, top=40)
# number of features with local FDR < 0.8
# (i.e. features useful for prediction)
sum(ranking.DDA[,"lfdr"] < 0.8)
# number of features with local FDR < 0.2
# (i.e. significant non-null features)
sum(ranking.DDA[,"lfdr"] < 0.2)
# optimal feature set according to HC score
plot(ranking.DDA[,"HC"], type="l")
which.max( ranking.DDA[1:1000,"HC"] )
#####################################
# feature ranking (full covariance) #
#####################################
# ranking using cat-scores (LDA)
ranking.LDA = sda.ranking(Xtrain, Ytrain, diagonal=FALSE)
ranking.LDA[1:10,]
# plot t-scores for the top 40 genes
plot(ranking.LDA, top=40)
# number of features with local FDR < 0.8
# (i.e. features useful for prediction)
sum(ranking.LDA[,"lfdr"] < 0.8)
# number of features with local FDR < 0.2
# (i.e. significant non-null features)
sum(ranking.LDA[,"lfdr"] < 0.2)
# optimal feature set according to HC score
plot(ranking.LDA[,"HC"], type="l")
which.max( ranking.LDA[1:1000,"HC"] )