fold_in                 package:lsa                 R Documentation

_E_x-_p_o_s_t _f_o_l_d_i_n_g-_i_n _o_f _t_e_x_t_m_a_t_r_i_c_e_s _i_n_t_o _a_n _e_x_i_s_t_i_n_g _l_a_t_e_n_t _s_e_m_a_n_t_i_c _s_p_a_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     Additional documents can be mapped into a pre-exisiting latent
     semantic space without influencing the factor distribution of the
     space. Applied, when additional documents  must not influence the
     calculated existing latent semantic  factor structure.

_U_s_a_g_e:

     fold_in( docvecs, LSAspace )

_A_r_g_u_m_e_n_t_s:

LSAspace: a latent semantic space generated by createLSAspace.

 docvecs: a textmatrix.

_D_e_t_a_i_l_s:

     To keep additional documents from influencing the factor
     distribution calculated previously from a particular text basis,
     they can be folded-in  after the singular value decomposition
     performed in 'lsa()'.

     Background Information: For folding-in, a pseudo document vector
     'mi' of the new documents  is calculated into as shown in the
     equations (1) and (2) (cf. Berry et al., 1995):

     (1) di = t(v) Tk Sk^(-1)

     (2) mi = Tk Sk t(di)

     The document vector t(v) in equation~(1) is identical to an
     additional  column of an input textmatrix M with the term
     frequencies of the  essay to be folded-in. Tk and Sk are the
     truncated matrices  from the SVD applied through 'lsa()' on a
     given text  collection to construct the latent semantic space. The
     resulting vector mi from equation~(2) is identical to an
     additional column in the textmatrix representation of the latent
     semantic space (as produced by  'as.textmatrix()').

_V_a_l_u_e:

textmatrix: a textmatrix representation of the additional documents in
          the latent semantic space.

_A_u_t_h_o_r(_s):

     Fridolin Wild fridolin.wild@wu-wien.ac.at

_S_e_e _A_l_s_o:

     'textmatrix', 'lsa', 'as.textmatrix'

_E_x_a_m_p_l_e_s:

     # create a first textmatrix with some files
     td = tempfile()
     dir.create(td)
     write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/") )
     write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/") )
     write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/") )
     matrix1 = textmatrix(td, minWordLength=1)
     unlink(td, recursive=TRUE)

     # create a second textmatrix with some more files
     td = tempfile()
     dir.create(td)
     write( c("cat", "mouse", "mouse"), file=paste(td, "A1", sep="/") )
     write( c("nothing", "mouse", "monster"), file=paste(td, "A2", sep="/") )
     write( c("cat", "monster", "monster"), file=paste(td, "A3", sep="/") )
     matrix2 = textmatrix(td, vocabulary=rownames(matrix1), minWordLength=1)
     unlink(td, recursive=TRUE)

     # create an LSA space from matrix1
     space1 = lsa(matrix1, dims=dimcalc_share())
     as.textmatrix(space1)

     # fold matrix2 into the space generated by matrix1
     fold_in( matrix2, space1)

