| gknn {scrime} | R Documentation |
Predicts the classes of new observations with k Nearest Neighbors based on an user-specified distance measure.
gknn(data, cl, newdata, nn = 5, distance = NULL, use.weights = FALSE, ...)
data |
a numeric matrix in which each row represents an observation and each column
a variable. If distance is "smc", "cohen" or "pcc",
the values in data must be integers between 1 and n.cat,
where n.cat is the maximum number of levels one of the variables can
take. Missing values are allowed. |
cl |
a numeric vector of length nrow(data) giving the class labels of
the observations represented by the rows of data. cl must consist
of integers between 1 and n.cl, where n.cl is the
number of groups. |
newdata |
a numeric matrix in which each row represents a new observation for
which the class label should be predicted and each column consists of the same
variable as the corresponding column of data. |
nn |
an integer specifying the number of nearest neighbors used to classify the new observations. |
distance |
character vector naming the distance measure used to identify the
nn nearest neighbors. Must be one of "smc", "cohen",
"pcc", "euclidean", "maximum", "manhattan",
"canberra", and "minkowski". If NULL, it is determined in
an ad hoc way if the data seems to be categorical. If this is the case distance
is set to "smc". Otherwise, it is set to "euclidean". |
use.weights |
should the votes of the nearest neighbors be weighted by the reciprocal of the distances to the new observation when the class of a new observation should be predicted? |
... |
further arguments for the distance measure. If, e.g.,
distance = "minkowski", then p can also be specified, see dist.
If distance = "pcc", then version can also be specified,
see pcc. |
The predicted classes of the new observations.
Holger Schwender, holger.schwender@udo.edu
Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
## Not run: # Using the example from the function knn. library(class) data(iris3) train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3]) test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3]) cl <- c(rep(2, 25), rep(1, 25), rep(1, 25)) knn.out <- knn(train, test, as.factor(cl), k = 3, use.all = FALSE) gknn.out <- gknn(train, cl, test, nn = 3) # Both applications lead to the same predictions. knn.out == gknn.out # But gknn allows to use other distance measures than the Euclidean # distance. E.g., the Manhattan distance. gknn(train, cl, test, nn = 3, distance = "manhattan") ## End(Not run)