knn                   package:EMV                   R Documentation

_E_s_t_i_m_a_t_e _t_h_e _m_i_s_s_i_n_g _v_a_l_u_e_s _o_f _a _m_a_t_r_i_x

_D_e_s_c_r_i_p_t_i_o_n:

     'knn' estimates the missing values of a matrix based on a k-th
     neighboors algorithm. Missing values can be either -Inf,Inf, NA,
     NaN.

_U_s_a_g_e:

     knn(m,k=max(dim(m)[1]*0.01,2),na.rm=TRUE,nan.rm=TRUE,inf.rm=TRUE,correlation=FALSE, dist.bound=FALSE)

_A_r_g_u_m_e_n_t_s:

       m: a numeric matrix that contains the missing values to be
          estimated

       k: the number of neighboors (rows) to estimate the missing
          values

   na.rm: a logical value indicating whether `NA' values should be
          estimated.

  nan.rm: a logical value indicating whether `NaN' values should be
          estimated.

  inf.rm: a logical value indicating whether `Inf' and '-Inf' values
          should be estimated.

correlation: a logical value, if TRUE the selection of the neighboors
          is based on the sample correlation. The neighboors with the
          highest correlations are selected.

dist.bound: A bound for the distance, if correlation is FALSE, the
          algorithm will only use a neighboor if the Euclidean distance
          is less than dist.bound. If correlation is TRUE, the
          algorithm will only use a neighboor if the sample correlation
          is greater than dist.bound (in this case, between -1 and 1).
          If dist.bound=FALSE, all the neighboors are used.

_D_e_t_a_i_l_s:

     Based on the Euclidian distance, the algorithm selects the k-th
     nearest rows (that do not contain any missing values) to the one
     containing at least one missing value, based on the Euclidian
     distance or the sample correlation. Then the missing values are
     replaced by the average of the neighboors. Note that if a row only
     contains missing values then the estimation is not possible.

_V_a_l_u_e:

    data: The  data matrix, the missing values being replaced by their
          estimates (when possible).

distance: The average of the neighboor's distances used for the
          estimation.

_A_u_t_h_o_r(_s):

     Raphael Gottardo raph@lanl.gov

_R_e_f_e_r_e_n_c_e_s:

     Missing Value estimation methods for DNA microarrays.
     O.Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R.
     Tibshirani, D. Botstein, & R. B. Altman. Bioinformatics
     17(6):520-525, 2001.

_S_e_e _A_l_s_o:

     'NA', 'NaN', 'Inf'

_E_x_a_m_p_l_e_s:

     m<-matrix(rnorm(1000),100,10)
     ## Place some missing values NA
     m[1:10,1]<-NA
     m[50:52,10]<-NA
     ## Place some infinite values Inf, -Inf
     m[1:10,3]<-(1/0)
     m[70:73,10]<-(-1/0)
     ## Estimate the missing values and infinite values based on the Euclidean distance
     m1<-knn(m,k=10)
     ## Estimate the missing values and infinite values based on the correlation distance
     m2<-knn(m,k=10, correlation=TRUE)

