dinucleotides             package:seqinr             R Documentation

_S_t_a_t_i_s_t_i_c_a_l _o_v_e_r- _a_n_d _u_n_d_e_r- _r_e_p_r_e_s_e_n_t_a_t_i_o_n _o_f _d_i_n_u_c_l_e_o_t_i_d_e_s _i_n _a
_s_e_q_u_e_n_c_e

_D_e_s_c_r_i_p_t_i_o_n:

     These two functions compute two different types of statistics for
     the measure of statistical dinculeotide over- and
     under-representation : the rho statistic, and the z-score, each
     computed for all 16 dinucleotides.

_U_s_a_g_e:

     rho(sequence)
     zscore(sequence, simulations = NULL, modele, ... )

_A_r_g_u_m_e_n_t_s:

sequence: A nucleic acids sequence 

simulations: If 'NULL', analytical solution is computed when available
          (models 'base' and {codon}). Otherwise, it should be the
          number of permutations for the z-score computation 

  modele: A string of characters describing the model chosen for the
          random generation 

     ...: Optional parameters for specific model permutations are
          passed on to 'permutation' function. 

_D_e_t_a_i_l_s:

     The 'rho' statistic, as presented in Karlin S., Cardon LR. (1994),
     can be computed on each of the 16 dinucleotides. It is the
     frequence of dinucleotide _xy_ divided by the product of
     frequencies of nucleotide _x_ and nucleotide _y_. It is equal to
     1.00 when dinucleotide _xy_ is formed by pure chance, and it is
     superior (respectively inferior) to 1.00 when dinucleotide _xy_ is
     over- (respectively under-) represented.

     The 'zscore' statistic, as presented in Palmeira, L., Guguen, L.
     and Lobry JR. (in prep.). The statistic is the normalization of
     the 'rho' statistic by its expectation and variance according to a
     given random sequence generation model, and follows the standard
     normal distribution. This statistic can be computed with several
     models (cf. 'permutation' for the description of each of the
     models). We provide analytical calculus for two of them: the
     'base' permutations model and the  'codon' permutations model.

     The 'base' model allows for random sequence generation by
     shuffling (with/without replacement) of all bases in the sequence.
     Analytical computation is available for this model.

     The 'position' model allows for random sequence generation by
     shuffling (with/without replacement) of bases within their
     position in the codon (bases in position I, II or III stay in
     position I, II or III in the new sequence.

     The 'codon' model allows for random sequence generation by
     shuffling (with/without replacement) of codons. Analytical
     computation is available for this model.

     The 'syncodon' model allows for random sequence generation by
     shuffling (with/without replacement) of synonymous codons.

_V_a_l_u_e:

     a table containing the computed statistic for each dinucleotide

_A_u_t_h_o_r(_s):

     Leonor Palmeira

_R_e_f_e_r_e_n_c_e_s:

     'citation("seqinr")'

     Karlin S. and Cardon LR. (1994) Computational DNA sequence
     analysis. _Annu Rev Microbiol_, *48*, 619-54.

     Palmeira, L., Guguen, L. and Lobry JR. (in prep) UV-targeted
     dinucleotides are not depleted in light-exposed Prokaryotic
     genomes.

_S_e_e _A_l_s_o:

     'permutation'

_E_x_a_m_p_l_e_s:

     sequence=sample(s2c('acgt'),6000,rep=TRUE)
     rho(sequence)
     zscore(sequence,modele='base')
     zscore(sequence,modele='codon')
     zscore(sequence,1000,modele='syncodon')

