spe                   package:spe                   R Documentation

_I_m_p_l_e_m_e_n_t_s _t_h_e _s_t_o_c_h_a_s_t_i_c _p_r_o_x_i_m_i_t_y _e_m_b_e_d_d_i_n_g _a_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     Embeds an N dimensional dataset in M dimensions, such that
     distances (or similarities) in the original N dimensions are
     maintained (as close as possible) in the final M dimensions

_U_s_a_g_e:

     spe( coord, rcutpercent = 1, maxdist = 0,
          nobs = 0, ndim = 0, edim,
          lambda0 = 2.0, lambda1 = 0.01,
          nstep = 1e6, ncycle = 100, 
          evalstress=FALSE, sampledist=TRUE, samplesize = 1e6)

_A_r_g_u_m_e_n_t_s:

   coord: This should be a matrix with number of rows equal to the
          number of observations and number of columns equal to the
          input dimension. A data.frame may also be supplied and it
          will be converted to a matrix (so all names will be lost) 

rcutpercent: This is the percentage of the maximum distance (as
          determined by probability sampling) that will be used as the
          neighborhood radius. Setting rcutpercent to a value greater
          than 1 effectively sets it to  infinity. 

 maxdist: If you have alread calculated a mxaimum distance then you can
          supply it and  probability sampling will not be carried out
          to obtain a maximum distance. The default is to carry out
          sampling. By setting maxdist to a non zero value sampling
          will not be carried out (even if sampledist=TRUE) 

    nobs: The number of observations. If it is not specified nobs will
          be taken as nrow(coord) 

    ndim: The number of input dimensions.  If not specified it will be
          taken as ncol(coord) 

    edim: The number of dimensions to embed in 

 lambda0: The starting value of the learning parameter 

 lambda1: The ending value of the learning parameter 

   nstep: The number of refinement steps 

  ncycle: The number of cycles to carry out refinement for 

evalstress: If TRUE the function will evaluate the Sammon stress on the
          final embedding 

sampledist: If TRUE an approximation to the maximum distance in the
          input dimensions will be obtained via probability sampling 

samplesize: The number of iterations for probability sampling. For a
          dataset of 6070 observations there will be 6070x6069/2
          pairwise distances. The default value gives a close
          approximation and runs fast. If you want a bettr
          approximation 1e7 is a good value. YMMV 

_D_e_t_a_i_l_s:

     Efficient determination of rcut is yet to be implemented (using
     the connected component method). As a result you will have to
     determine a value of rcutpercent by trail and error.  The pivot
     SPE method (_J. Mol. Graph. Model._, 2003, *22*, 133-140) is not
     yet implemented

_V_a_l_u_e:

     If evalstress is TRUE it will be a list with two components named
     x and stress. x is the matrix of the final embedding and stress is
     the final stress

_A_u_t_h_o_r(_s):

     Rajarshi Guha rajarshi@presidency.com

_R_e_f_e_r_e_n_c_e_s:

     A Self Organizing Principle for Learning Nonlinear Manifolds,
     _Proc. Nat. Acad. Sci._, 2002, *99*, 15869-15872 Stochastic
     Proximity Embedding, _J. Comput. Chem._, 2003, *24*, 1215-1221  A
     Modified Rule for Stochastic Proximity Embedding, _J. Mol. Graph.
     Model._, 2003, *22*, 133-140 A Geodesic Framework for Analyzing
     Molecular Similarities, _J. Chem. Inf. Comput. Sci._, 2003, *43*,
     475-484

_S_e_e _A_l_s_o:

     'eval.stress', 'sample.max.distance'

_E_x_a_m_p_l_e_s:

     ## load the phone dataset
     data(phone)

     ## run SPE, embed$stress should be 0 or very close to it
     ## You can plot the embedding using the scatterplot3d package
     ## (This will take a few minutes to run)
     embed <- spe(phone, edim=3, evalstress=TRUE)

     ## evaluate the Sammon stress
     stress <- eval.stress(embed$x, phone)

     ## embed the Swiss Roll dataset in 2D
     data(swissroll)
     embed <- spe(swissroll, edim=2, evalstress=TRUE)

