BNCdomains              package:corpora              R Documentation

_D_i_s_t_r_i_b_u_t_i_o_n _o_f _d_o_m_a_i_n_s _i_n _t_h_e _B_r_i_t_i_s_h _N_a_t_i_o_n_a_l _C_o_r_p_u_s (_B_N_C)

_D_e_s_c_r_i_p_t_i_o_n:

     This data set gives the number of documents and tokens in each of
     the 18 domains represented in the British National Corpus, World
     Edition (BNC).  See Aston & Burnard (1998) for more information
     about the BNC and the domain classification, or go to <URL:
     http://www.natcorp.ox.ac.uk/>.

_U_s_a_g_e:

     data(BNCdomains)

_F_o_r_m_a_t:

     A data set with 19 rows and the following columns:

     '_d_o_m_a_i_n': name of the respective domain in the BNC

     '_d_o_c_u_m_e_n_t_s': number of documents from this domain

     '_t_o_k_e_n_s': total number of tokens in all documents from this domain

_D_e_t_a_i_l_s:

     For one document in the BNC, the domain classification is missing.
     This document is represented by the code 'Unlabeled' in the data
     set.

_A_u_t_h_o_r(_s):

     Marco Baroni (baroni@sslmit.unibo.it)

_R_e_f_e_r_e_n_c_e_s:

     Aston, Guy and Burnard, Lou (1998). _The BNC Handbook._ Edinburgh
     University Press, Edinburgh. See also the BNC homepage at <URL:
     http://www.natcorp.ox.ac.uk/>.

