textcat_options           package:textcat           R Documentation

_T_e_x_t_c_a_t _O_p_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     Get and set options used for n-gram based text categorization.

_U_s_a_g_e:

     textcat_options(option, value)

_A_r_g_u_m_e_n_t_s:

  option: character string indicating the option to get or set (see
          *Details*).  If missing, all options are returned as a list.

   value: Value to be set.  If omitted, the current value of the given
          option is returned.

_D_e_t_a_i_l_s:

     Currently, the following options are available:

     '_n': the maximum number of character in the n-gram profiles.

          Default: '5L'.

     '_s_p_l_i_t': the regular expression pattern to be used in word
          splitting.

          Default: '"[[:space:][:punct:][:digit:]]+"'.

     '_t_o_l_o_w_e_r': A logical indicating whether to transform texts to
          lower case (after word splitting).

          Default: 'TRUE'.

     '_r_e_d_u_c_e': A logical indicating whether a representation of n-grams
          more efficient than the one used by Cavnar and Trenkle should
          be employed.

          Default: 'TRUE'.

     '_u_s_e_B_y_t_e_s': A logical indicating whether to use byte n-grams
          rather than character n-grams.

          Default: 'FALSE'.

     '_i_g_n_o_r_e': a character vector of n-grams to be ignored when
          computing n-gram profiles.

          Default: '"_"' (corresponding to a word boundary).

     '_s_i_z_e': The maximal number of n-grams used for a profile.

          Default: '1000L'.

     '_m_e_t_h_o_d': A character string or function specifying a method for
          computing distances between n-gram profiles (see 'textcat').

          Default: '"CT"', giving the Cavnar-Trenkle out of place
          measure.


_S_e_e _A_l_s_o:

     'textcat_profile_db' for how the first 6 options are used when
     computing n-gram profiles.

     'textcnt' in package 'tau' which provides the functionality for
     term or pattern counting of text documents employed by 'textcat'.

