rel.risk.cint            package:corpora            R Documentation

_C_o_n_s_e_r_v_a_t_i_v_e _c_o_n_f_i_d_e_n_c_e _i_n_t_e_r_v_a_l _f_o_r _t_h_e _r_e_l_a_t_i_v_e _r_i_s_k _r_a_t_i_o (_c_o_r_p_o_r_a)

_D_e_s_c_r_i_p_t_i_o_n:

     This function approximates a conservative confidence interval for
     the relative risk coefficient, i.e. the ratio r = p_1/p_2 between
     two population proportions, based on frequency counts from two
     corpora.  The approximation is computed from individual confidence
     intervals for the two proportions, with confidence levels adjusted
     accordingly.

_U_s_a_g_e:

     rel.risk.cint(k1, n1, k2, n2,
                   conf.level = 0.95, alternative = c("two.sided", "less", "greater"),
                   method = c("binomial", "z.score"), correct = TRUE)

_A_r_g_u_m_e_n_t_s:

      k1: frequency of a type in the first corpus (or an integer vector
          of type frequencies)

      n1: the sample size of the first corpus (or an integer vector
          specifying the sizes of different samples)

      k2: frequency of the type in the second corpus (or an integer
          vector of type frequencies, in parallel to 'k1')

      n2: the sample size of the second corpus (or an integer vector
          specifying the sizes of different samples, in parallel to
          'n1')

conf.level: the desired confidence level (defaults to 95%)

alternative: a character string specifying the alternative hypothesis,
          yielding a two-sided ('two.sided', default), lower one-sided
          ('less') or upper one-sided ('greater') confidence interval

  method: a character string specifying whether the individual
          confidence intervals for the two proportions are based on the
          binomial test ('binomial') or the z-score test ('z.score')

 correct: if 'TRUE', apply Yates' continuity correction for the z-score
          test (default)

_D_e_t_a_i_l_s:

     This function computes individual confidence intervals for the two
     population proportions p_1 (from k_1 and n_1) and p_2 (from k_2
     and n_2).  Then, a confidence interval for the relative risk ratio
     r = p_1 / p_2 is determined in such a way, that r lies within the
     interval whenever p_1 and p_2 lie in their respective confidence
     intervals.

     Thus, when these intervals are computed with a confidence level of
     e.g. .975, r is certain to fall within its confidence interval in
     .975^2 = .95 of all cases.  This adjustment of confidence levels
     is made automatically.  Note that r _might_ fall within its
     confidence interval even when either p_1 or p_2 is outside the
     respective interval, hence 'rel.risk.cint' computes a
     _conservative_ confidence interval that will be larger than
     necessary.

     Exact confidence intervals for the _odds ratio_ coefficient theta
     = (p_1 / (1-p_1)) / (p_2 / (1-p_2)) can be computed with the
     'fisher.test' function.  However, these exact intervals are
     computationally _very_ expensive and may cause R to run out of
     memory for large frequency counts.  In addition, 'fisher.test'
     only computes a single confidence interval for each function call
     (i.e., it cannot be applied to vectorised data).

_V_a_l_u_e:

     A data frame with two columns, labelled 'lower' for the lower
     boundary and 'upper' for the upper boundary of the confidence
     interval.  The number of rows is determined by the length of the
     longest input vector ('k1', 'n1', 'k2', 'n2' and 'conf.level').

_A_u_t_h_o_r(_s):

     Stefan Evert

_S_e_e _A_l_s_o:

     'prop.cint', 'chisq.pval', 'fisher.pval', 'fisher.test'

