| adjOutlyingness {robustbase} | R Documentation |
For an n * p data matrix (or data frame) x,
compute the “outlyingness” of all n observations.
Outlyingness here is a generalization of the Donoho-Stahel
outlyingness measure, where skewness is taken into account via the
medcouple, mc().
adjOutlyingness(x, ndir = 250, clower = 3, cupper = 4,
alpha.cutoff = 0.75, coef = 1.5, qr.tol = 1e-12)
x |
a numeric matrix or data.frame. |
ndir |
positive integer specifying the number of directions that should be searched. |
clower, cupper |
the constant to be used for the lower and upper tails, in order to transform the data towards symmetry. |
alpha.cutoff |
number in (0,1) specifying the quantiles (α, 1-α) which determine the “outlier” cutoff. |
coef |
positive number specifying the factor with which the
interquartile range (IQR) is multiplied to determine
‘boxplot hinges’-like upper and lower bounds. |
qr.tol |
positive tolerance to be used for qr and
solve.qr for determining the ndir directions
each determined by a random sample of p (out of n)
observations. |
FIXME: Details in the comment of the Matlab code; also in the reference(s).
The method as described can be useful as preprocessing in FASTICA (http://www.cis.hut.fi/projects/ica/fastica/
a list with components
adjout |
numeric of length(n) giving the adjusted
outlyingness of each observation. |
cutoff |
cutoff for “outlier” with respect to the adjusted
outlyingnesses, and depending on alpha.cutoff. |
nonOut |
logical of length(n), TRUE when the
corresponding observation is non-outlying with respect to the
cutoff and the adjusted outlyingnesses. |
Guy Brys; help page and improvements by Martin Maechler
Brys, G., Hubert, M., and Rousseeuw, P.J. (2005) A Robustification of Independent Component Analysis; Journal of Chemometrics, 19, 1–12.
For the up-to-date reference, please consult http://wis.kuleuven.be/stat/robust.html
the adjusted boxplot, adjbox and the medcouple,
mc.
## An Example with bad condition number and "border case" outliers
if(FALSE) {## Not yet ok, because of bug in adjOutl
dim(longley)
set.seed(1) ## result is random
ao1 <- adjOutlyingness(longley)
## which are not outlying ?
table(ao1$nonOut) ## all of them
stopifnot(all(ao1$nonOut))
}
## An Example with outliers :
dim(hbk)
set.seed(1)
ao.hbk <- adjOutlyingness(hbk)
str(ao.hbk)
hist(ao.hbk $adjout)## really two groups
table(ao.hbk$nonOut)## 14 outliers, 61 non-outliers:
## outliers are :
which(! ao.hbk$nonOut) # 1 .. 14 --- but not for all random seeds!
## here, they are the same as found by (much faster) MCD:
cc <- covMcd(hbk)
stopifnot(all(cc$mcd.wt == ao.hbk$nonOut))
## This is revealing (about 1--2 cases, where outliers are *not* == 1:14
## but needs almost 1 [sec] per call:
if(interactive()) {
for(i in 1:30) {
print(system.time(ao.hbk <- adjOutlyingness(hbk)))
if(!identical(iout <- which(!ao.hbk$nonOut), 1:14)) {
cat("Outliers:\n"); print(iout)
}
}
}