fitNBP                package:statmod                R Documentation

_N_e_g_a_t_i_v_e _B_i_n_o_m_i_a_l _M_o_d_e_l _f_o_r _S_A_G_E _L_i_b_r_a_r_i_e_s _w_i_t_h _P_e_a_r_s_o_n _E_s_t_i_m_a_t_i_o_n _o_f _D_i_s_p_e_r_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a multi-group negative-binomial model to SAGE data, with
     Pearson estimation of the common overdispersion parameter.

_U_s_a_g_e:

     fitNBP(y, group=NULL, lib.size=colSums(y), tol=1e-5, maxit=40, verbose=FALSE)

_A_r_g_u_m_e_n_t_s:

       y: numeric matrix giving counts. Rows correspond to tags (genes)
          and columns to SAGE libraries.

   group: factor indicating which library belongs to each group. If
          'NULL' then one group is assumed.

lib.size: vector giving total number of tags in each library.

     tol: small positive numeric tolerance to judge convergence

   maxit: maximum number of iterations permitted

 verbose: logical, if 'TRUE' then iteration progress information is
          output.

_D_e_t_a_i_l_s:

     The overdispersion parameter is estimated equating the Pearson
     goodness of fit to its expectation. The variance is assumed to be
     of the form Var(y)=mu*(1+phi*mu) where E(y)=mu and phi is the
     dispersion parameter. All tags are assumed to share the same
     dispersion.

     For given dispersion, the model for each tag is a
     negative-binomial generalized linear model with log-link and
     'log(lib.size)' as offset. The coefficient parametrization used is
     that corresponding to the formula '~0+group+offset(log(lib.size)'.

     Except for the dispersion being common rather than genewise, the
     model fitted by this function is equivalent to that proposed by Lu
     et al (2005). The numeric algorithm used is that of alternating
     iterations (Smyth, 1996) using Newton's method as the outer
     iteration for the dispersion parameter starting at phi=0. This
     iteration is monotonically convergent for the dispersion.

_V_a_l_u_e:

     List with components 

coefficients: numeric matrix of rates for each tag (gene) and each
          group

fitted.values: numeric matrix of fitted values

dispersion: estimated dispersion parameter

_A_u_t_h_o_r(_s):

     Gordon Smyth

_R_e_f_e_r_e_n_c_e_s:

     Lu, J, Tomfohr, JK, Kepler, TB (2005). Identifying differential
     expression in multiple SAGE libraries: an overdispersed log-linear
     model approach. _BMC Bioinformatics_ 6,165.

     Smyth, G. K. (1996). Partitioned algorithms for maximum likelihood
     and other nonlinear estimation. _Statistics and Computing_, 6,
     201-216.

_S_e_e _A_l_s_o:

     'sage.test'

_E_x_a_m_p_l_e_s:

     # True value for dispersion is 1/size=2/3
     # Note the Pearson method tends to under-estimate the dispersion
     y <- matrix(rnbinom(10*4,mu=4,size=1.5),10,4)
     lib.size <- rep(50000,4)
     group <- c(1,1,2,2)
     fit <- fitNBP(y,group=group,lib.size=lib.size)
     logratio <- fit$coef %*% c(-1,1)

