binning                  package:sm                  R Documentation

_C_o_n_s_t_r_u_c_t _f_r_e_q_u_e_n_c_y _t_a_b_l_e _f_r_o_m _r_a_w _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Given a vector or a matrix 'x', this function constructs a
     frequency table associated to appropriate intervals covering the
     range of 'x'.

_U_s_a_g_e:

     binning(x, y, breaks, nbins)

_A_r_g_u_m_e_n_t_s:

    x, y: a vector or a matrix with either one or two columns.  If 'x'
          is a one-dimentional matrix, this is equivalent to a vector. 

  breaks: either a vector or a matrix with two columns (depending on
          the dimension of 'x'), assigning the division points of the
          axis, or the axes in the matrix case. It must not include
          'Inf','-Inf' or 'NA's, and it must span the whole range of 
          the 'x' points. If 'breaks' is not given, it is computed by
          dividing the range of 'x' into 'nbins' intervals for each of
          the axes. 

   nbins: the number of intervals on the 'x' axis (in the vector case),
           or a vector of two elements with the number of intervals on
          each axes of 'x' (in the matrix case). If 'nbins' is not
          given, a value is computed as
          'round(log(length(x))/log(2)+1)' or using a similar
          expression in the matrix case. 

_D_e_t_a_i_l_s:

     This function is called automatically (under the default settings)
     by some of the functions of the 'sm' library when the sample size
     is large, to allow handling of datasets of essentially unlimited
     size. Specifically, it is used by 'sm.density', 'sm.regression',
     'sm.ancova', 'sm.binomial' and 'sm.poisson'.

_V_a_l_u_e:

     In the vector case, a list is returned containing the following
     elements: a vector 'x' of the midpoints of the bins excluding
     those with 0 frequecies,  its associated matrix 'x.freq' of
     frequencies, the coodinateds of the  'midpoints', the division
     points, and the complete vector of observed  frequencies
     'freq.table' (including the 0 frequencies), and the vector
     'breaks' of division points. In the matrix case, the returned
     value is a list with the following  elements: a two-dimensional
     matrix 'x' with the coordinates of the midpoints of the
     two-dimensional bins excluding those with 0 frequecies,  its
     associated matrix 'x.freq' of frequencies, the coordinates of the 
     'midpoints', the matrix 'breaks' of division points, and the
     observed  frequencies 'freq.table' in full tabular form.

_R_e_f_e_r_e_n_c_e_s:

     Bowman, A.W. and Azzalini, A. (1997).  _Applied Smoothing
     Techniques for Data Analysis: the Kernel Approach with S-Plus
     Illustrations._ Oxford University Press, Oxford.

_S_e_e _A_l_s_o:

     'sm', 'sm.density', 'sm.regression', 'sm.binomial', 'sm.poisson',
     'cut', 'table'

_E_x_a_m_p_l_e_s:

     # example of 1-d use
     x  <- rnorm(1000)
     xb <- binning(x)
     xb <- binning(x, breaks=seq(-4,4,by=0.5))
     # example of 2-d use
     x <- rnorm(1000)
     y <- 2*x + 0.5*rnorm(1000)
     x <- cbind(x, y)
     xb<- binning(x, nbins=12)

