Weka_tokenizers            package:RWeka            R Documentation

_R/_W_e_k_a _T_o_k_e_n_i_z_e_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     R interfaces to Weka tokenizers.

_U_s_a_g_e:

     AlphabeticTokenizer(x, control = NULL)
     NGramTokenizer(x, control = NULL)
     WordTokenizer(x, control = NULL)

_A_r_g_u_m_e_n_t_s:

       x: a character vector with strings to be tokenized.

 control: an object of class 'Weka_control', or a character vector of
          control options, or 'NULL' (default). Available options can
          be obtained on-line using the Weka Option Wizard 'WOW', or
          the Weka documentation.

_D_e_t_a_i_l_s:

     'AlphabeticTokenizer' is an alphabetic string tokenizer, where
     tokens are to be formed only from contiguous alphabetic sequences.

     'NGramTokenizer' splits strings into n-grams with given minimal
     and maximal numbers of grams.

     'WordTokenizers' is a simple word tokenizer.

_V_a_l_u_e:

     A character vector with the tokenized strings.

