prop.cint              package:corpora              R Documentation

_C_o_n_f_i_d_e_n_c_e _i_n_t_e_r_v_a_l _f_o_r _p_r_o_p_o_r_t_i_o_n _b_a_s_e_d _o_n _f_r_e_q_u_e_n_c_y _c_o_u_n_t_s (_c_o_r_p_o_r_a)

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes a confidence interval for a population
     proportion from the corresponding frequency count in a corpus. The
     confidence interval can be based on a binomial test or on a
     z-score test (with or without continuity correction).

_U_s_a_g_e:

     prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE,
               conf.level = 0.95, alternative = c("two.sided", "less", "greater"))

_A_r_g_u_m_e_n_t_s:

       k: frequency of a type in the corpus (or an integer vector of
          frequencies)

       n: number of tokens in the corpus, i.e. sample size (or an
          integer vector specifying the sizes of different samples)

  method: a character string specifying whether the confidence interval
          is based on the binomial test ('binomial') or the z-score
          test ('z.score')

 correct: if 'TRUE', apply Yates' continuity correction for the z-score
          test (default)

conf.level: the desired confidence level (defaults to 95%)

alternative: a character string specifying the alternative hypothesis,
          yielding a two-sided ('two.sided', default), lower one-sided
          ('less') or upper one-sided ('greater') confidence interval

_D_e_t_a_i_l_s:

     The confidence intervals computed by this function correspond to
     those returned by 'binom.test' and 'prop.test', respectively. 
     However, 'prop.cint' accepts vector arguments, allowing many
     confidence intervals to be computed with a single function call. 
     In addition, it uses a fast approximation of the two-sided
     binomial test that can safely be applied to large samples.

     The confidence interval for a z-score test is computed by solving
     the z-score equation 

                  (k - np) / sqrt(n p (1-p)) = alpha

     for p, where alpha is the z-value corresponding to the chosen
     confidence level (e.g. +/- 1.96 for a two-sided test with 95%
     confidence).  This leads to the quadratic equation 

         p^2 (n + alpha^2) + p (-2k - alpha^2) + k^2 / n = 0

     whose two solutions correspond to the lower and upper boundary of
     the confidence interval.

     When Yates' continuity correction is applied, the value k in the
     numerator of the z-score equation has to be replaced by k*, with
     k* = k - 1/2 for the _lower_ boundary of the confidence interval
     (where k > np) and k* = k + 1/2 for the _upper_ boundary of the
     confidence interval (where k < np).  In each case, the
     corresponding solution of the quadratic equation has to be chosen
     (i.e., the solution with k > np for the lower boundary and vice
     versa).

_V_a_l_u_e:

     A data frame with two columns, labelled 'lower' for the lower
     boundary and 'upper' for the upper boundary of the confidence
     interval.  The number of rows is determined by the length of the
     longest input vector ('k', 'n' and 'conf.level').

_A_u_t_h_o_r(_s):

     Stefan Evert

_S_e_e _A_l_s_o:

     'z.score.pval', 'prop.test', 'binom.pval', 'binom.test'

