dimcalc                 package:lsa                 R Documentation

_D_i_m_e_n_s_i_o_n_a_l_i_t_y _C_a_l_c_u_l_a_t_i_o_n _R_o_u_t_i_n_e_s (_L_S_A)

_D_e_s_c_r_i_p_t_i_o_n:

     Methods for choosing a `good' number of singular values for the
     dimensionality reduction in LSA.

_U_s_a_g_e:

        dimcalc_share(share=0.5)
        dimcalc_ndocs(ndocs)
        dimcalc_kaiser()
        dimcalc_raw()

_A_r_g_u_m_e_n_t_s:

   share: Optional: a fraction of the sum of the selected singular
          values to the sum of all singular values (default: 0.5). Only
          needed by 'dimcalc_share'.

   ndocs: Optional: the number of documents (only needed for
          'dimcalc_ndocs()').

_D_e_t_a_i_l_s:

     In an LSA process, the diagonal matrix of the singular value
     decomposition is  usually reduced to a specific number of
     dimensions (also `factors' or `singular values').

     The functions 'dimcalc_share()', 'dimcalc_ndocs()',
     'dimcalc_kaiser()' and also the redundant function 'dimcalc_raw()'
     offer methods to calculate a useful number of singular values
     (based on the distribution and values of the given sequence  of
     singular values).

     All of them are tightly coupled to the core LSA functions: they
     generates  a function to be executed by the calling (higher-level)
      function 'lsa()'. The output function contains only one
     parameter,  namely 's', which is expected to be the sequence of
     singular values.  In 'lsa()', the code returned is executed, the
     mandatory  singular values are provided as a parameter within
     'lsa()'.

     The dimensionality calculation methods, however, can still be
     called directly by adding a second, separate parameter set: e.g.

     'dimcalc_share(share=0.2)(mysingularvalues)'

     The method 'dimcalc_share()' finds the first position in the
     descending sequence of  singular values 's' where their sum
     (divided by the sum of all  values) meets or exceeds the specified
     share.

     The method 'dimcalc_ndocs()' calculates the first position in the
     descending sequence of singular values where their sum meets or
     exceeds the number of documents.

     The method 'dimcalc_kaiser()' calculates the number of singular
     values according to the  Kaiser-Criterium, i.e. from the
     descending order of values all values  with 's[n] > 1' will be
     taken. The number of dimensions is returned accordingly.

     The method 'dimcalc_raw()' return the maximum number of singular
     values (= the length  of 's'). It is here only for completeness.

_V_a_l_u_e:

     Returns a function that takes the singular values as a parameter
     to return the recommended number of dimensions. The expected
     parameter  of this function is  

       s: A sequence of singular values (as produced by the SVD). Only
          needed when calling the dimensionality calculation routines
          directly.

_A_u_t_h_o_r(_s):

     Fridolin Wild fridolin.wild@wu-wien.ac.at

_R_e_f_e_r_e_n_c_e_s:

     Wild, F., Stahl, C., Stermsek, G., Neumann, G., Penya, Y. (2005)
     _Parameters Driving Effectiveness of Automated Essay Scoring with
     LSA_. In: Proceedings of the 9th CAA, pp.485-494, Loughborough

_S_e_e _A_l_s_o:

     'lsa'

_E_x_a_m_p_l_e_s:

     ## create some data 
     vec1 = c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
     vec2 = c( 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0 )
     vec3 = c( 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0 )
     matrix = cbind(vec1,vec2, vec3)
     s = svd(matrix)$d

     # standard share of 0.5
     dimcalc_share()(s) 

     # specific share of 0.9
     dimcalc_share(share=0.9)(s) 

     # meeting the number of documents (here: 3)
     n = ncol(matrix)
     dimcalc_ndocs(n)(s)

