ECIMCI_profiles           package:textcat           R Documentation

_E_C_I/_M_C_I _N-_G_r_a_m _P_r_o_f_i_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     N-gram profile db for 26 languages based on the European Corpus
     Initiative Multilingual Corpus I.

_U_s_a_g_e:

     ECIMCI_profiles

_D_e_t_a_i_l_s:

     This profile db was built by Johannes Rauch using the ECI/MCI
     corpus using the default options employed by package 'textcat',
     with all text documents encoded in UTF-8.

     The category ids used for the db are the respective IETF language
     tags (see language in package 'tau'), using the ISO 639-2 Part B
     language subtags and, for Serbian, the script employed (i.e., 
     '"scc-Cyrl"' and '"scc-Latn"' for Serbian written in Cyrillic and
     Latin script, respectively; all other languages in the profile are
     always written in Latin script.)

