BNCcomparison            package:corpora            R Documentation

_C_o_m_p_a_r_i_s_o_n _o_f _w_r_i_t_t_e_n _a_n_d _s_p_o_k_e_n _f_r_e_q_u_e_n_c_i_e_s (_B_N_C)

_D_e_s_c_r_i_p_t_i_o_n:

     This data set compares the frequencies of 60 selected nouns in the
     written and spoken parts of the British National Corpus, World
     Edition (BNC).  Nouns were chosen from three frequency bands,
     namely the 20 most frequent nouns in the corpus, 20 nouns with
     approximately 1000 occurrences, and 20 nouns with approximately
     100 occurrences.

     See Aston & Burnard (1998) for more information about the BNC, or
     go to <URL: http://www.natcorp.ox.ac.uk/>.

_U_s_a_g_e:

     data(BNCcomparison)

_F_o_r_m_a_t:

     A data set with 61 rows and the following columns:

     '_n_o_u_n': lemmatised noun (aka stem form)

     '_w_r_i_t_t_e_n': frequency in the written part of the BNC

     '_s_p_o_k_e_n': frequency in the spoken part of the BNC

_D_e_t_a_i_l_s:

     In addition to the 60 nouns, the data set contains a column
     labelled 'OTHER', which represents the total frequency of all
     other nouns in the BNC.  This value is needed in order to
     calculate the sample sizes of the written and spoken part for
     frequency comparison tests.

_A_u_t_h_o_r(_s):

     Stefan Evert (<URL: http://purl.org/stefan.evert>)

_R_e_f_e_r_e_n_c_e_s:

     Aston, Guy and Burnard, Lou (1998). _The BNC Handbook._ Edinburgh
     University Press, Edinburgh. See also the BNC homepage at <URL:
     http://www.natcorp.ox.ac.uk/>.

