VSS                 package:corpora                 R Documentation

_A _s_m_a_l_l _c_o_r_p_u_s _o_f _v_e_r_y _s_h_o_r_t _s_t_o_r_i_e_s _w_i_t_h _l_i_n_g_u_i_s_t_i_c _a_n_n_o_t_a_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     This data set contains a small corpus (8043 tokens) of short
     stories from the collection _Very Short Stories_ (VSS, see <URL:
     http://www.schtepf.de/pages/stories.html>).  The text was
     automatically segmented (tokenised) and annotated with
     part-of-speech tags (from the Penn tagset) and lemmas (base
     forms), using the IMS TreeTagger (Schmid 1994).

_U_s_a_g_e:

     data(VSS)

_F_o_r_m_a_t:

     A data set with 8043 rows corresponding to tokens and the
     following columns:

     '_w_o_r_d': the word form (or surface form) of the token

     '_p_o_s': the part-of-speech tag of the token (using the Penn tagset)

     '_w_o_r_d': the lemma (or base form) of the token

_D_e_t_a_i_l_s:

     The Penn tagset defines the following part-of-speech tags:

       'CC'   Coordinating conjunction
       'CD'   Cardinal number
       'DT'   Determiner
       'EX'   Existential _there_
       'FW'   Foreign word
       'IN'   Preposition or subordinating conjunction
       'JJ'   Adjective
       'JJR'  Adjective, comparative
       'JJS'  Adjective, superlative
       'LS'   List item marker
       'MD'   Modal
       'NN'   Noun, singular or mass
       'NNS'  Noun, plural
       'NP'   Proper noun, singular
       'NPS'  Proper noun, plural
       'PDT'  Predeterminer
       'POS'  Possessive ending
       'PP'   Personal pronoun
       'PP$'  Possessive pronoun
       'RB'   Adverb
       'RBR'  Adverb, comparative
       'RBS'  Adverb, superlative
       'RP'   Particle
       'SYM'  Symbol
       'TO'   _to_
       'UH'   Interjection
       'VB'   Verb, base form
       'VBD'  Verb, past tense
       'VBG'  Verb, gerund or present participle
       'VBN'  Verb, past participle
       'VBP'  Verb, non-3rd person singular present
       'VBZ'  Verb, 3rd person singular present
       'WDT'  Wh-determiner
       'WP'   Wh-pronoun
       'WP$'  Possessive wh-pronoun
       'WRB'  Wh-adverb

_A_u_t_h_o_r(_s):

     Stefan Evert (<URL: http://purl.org/stefan.evert>)

_R_e_f_e_r_e_n_c_e_s:

     Schmid, Helmut (1994). Probabilistic part-of-speech tagging using
     decision trees. In: _Proceedings of the International Conference
     on New Methods in Language Processing (NeMLaP)_, pages 44-49.

