NNclean               package:prabclus               R Documentation

_N_e_a_r_e_s_t _n_e_i_g_h_b_o_r _b_a_s_e_d _c_l_u_t_t_e_r/_n_o_i_s_e _d_e_t_e_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Detects if data points are noise or part of a cluster, based on a
     Poisson process model.

_U_s_a_g_e:

     NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
     convergence = 0.001, plot=FALSE)

     ## S3 method for class 'nnclean':
     print(x, ...)

_A_r_g_u_m_e_n_t_s:

    data: numerical matrix or data frame.

       k: integer. Nmber of considered nearest neighbors per point.

distances: distance matrix object of class 'dist'. If specified, it is
          used instead of computing distances from the data.

edge.correct: logical. If 'TRUE' and the data is two-dimensional,
          neighbors for points at the edges of the parent region of the
          noise Poisson process are determined after wrapping the
          region onto a toroid.

    wrap: numerical. If 'edge.correct=TRUE', points in a strip of size
          'wrap*range' along the edge for each variable are candidates
          for being neighbors of points from the opposite.

convergence: numerical. Convergence criterion for EM-algorithm.

    plot: logical. If 'TRUE', a histogram of the distance to kth
          nearest neighbor and fit is plotted.

       x: object of class 'nnclean'.

     ...: necessary for print methods.

_D_e_t_a_i_l_s:

     The assumption is that the noise is distributed as a homogeneous
     Poisson process  on a certain region and the clusters are
     distributed as a homogeneous Poisson process with larger intensity
     on a subregion (disconnected in case of more than one cluster).
     The distances are then distributed according to a mixture of two
     transformed Gamma distributions, and this mixture is estimated via
     the EM-algorithm. The points are assigned to noise or cluster
     component by use of the estimated a posteriori probabilities.

_V_a_l_u_e:

     'NNclean' returns a list of class 'nnclean' with components 

       z: 0-1-vector of lenght of the number of data points. 1 means
          cluster, 0 means noise.

   probs: vector of estimated a priori probabilities for each point to
          belong to the cluster component.

       k: see above.

 lambda1: intensity parameter of cluster component.

 lambda2: intensity parameter of noise component.

       p: estimated probability of cluster component.

  kthNND: distance to kth nearest neighbor.

_N_o_t_e:

     The software can be freely used for non-commercial purposes, and
     can be freely distributed for non-commercial purposes only.

_A_u_t_h_o_r(_s):

     R-port by Christian Hennig hennig@math.uni-hamburg.de <URL:
     http://www.math.uni-hamburg.de/home/hennig/>,
      original Splus package by S. Byers and A. E. Raftery.

_R_e_f_e_r_e_n_c_e_s:

     Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter
     Removal for Estimating Features in Spatial Point Processes,
     _Journal of the American Statistical Association_, 93, 577-584.

_E_x_a_m_p_l_e_s:

     library(mclust)
     data(chevron)
     nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
     plot(chevron[,2:3],col=1+nnc$z)

