oriloc                package:seqinr                R Documentation

_P_r_e_d_i_c_t_i_o_n _o_f _o_r_i_g_i_n _a_n_d _t_e_r_m_i_n_u_s _o_f _r_e_p_l_i_c_a_t_i_o_n _i_n _b_a_c_t_e_r_i_a

_D_e_s_c_r_i_p_t_i_o_n:

     This program finds the putative origin and terminus of 
     replication in procaryotic genomes. The program works with
     unannotated sequences and therefore uses glimmer2 outputs to 
     discriminate between codon positions.

_U_s_a_g_e:

     oriloc(seq.fasta = system.file("sequences/ct.fasta", package ="seqinr"),
      g2.coord = system.file("sequences/ct.coord", package = "seqinr"),
     oldoriloc = FALSE, gbk = NULL, clean.tmp.files = TRUE, rot = 0)

_A_r_g_u_m_e_n_t_s:

seq.fasta: the name of a file which contains the dna sequence of a
          bacterial chromosome in fasta format 

g2.coord: the name of file which contains the output of glimmer2
          program

oldoriloc: logical to be set at TRUE to reproduce the (deprecated)
          outputs of previous (publication date: 2000) version  of the
          oriloc program 

     gbk: the URL of a file in GenBank format

clean.tmp.files: Logical, if TRUE temporary files are removed 

     rot: Integer, with zero default value, used to permute circurlarly
          the genome. 

_D_e_t_a_i_l_s:

     The method builds on the fact that there are compositional
     asymmetries between the leading and the lagging strand for
     replication. The program works with unannotated sequences in fasta
     format and therefore uses glimmer2.0 outputs to discriminate
     between codon positions so as to increase the signal/noise ratio.

_V_a_l_u_e:

     A data.frame with seven columns: 'g2num' for the CDS number in the
     'g2.coord' file, 'start.kb' for the start position of CDS
     expressed in Kb (this is the position of the first occurence of a
     nucleotide in a CDS _regardless_ of its orientation), 'end.kb' for
     the last position of a CDS, 'CDS.excess' for the DNA walk for gene
     orientation (+1 for a CDS in the direct strand, -1 for a CDS in
     the reverse strand) cummulated over genes, 'skew' for the
     cummulated composite skew in third codon positions, 'x' for the
     cummulated T - A skew in third codon position, 'y' for the
     cummulated C - G skew in third codon positions.

_N_o_t_e:

     The method works only for genomes having a single origin of
     replication  from which the replication is bidirectional. To
     detect the composition changes, a DNA-walk is performed. In a
     2-dimensional DNA walk, a C in the sequence  corresponds to the
     movement in the positive y-direction and G to a movement  in the
     negative y-direction. T and A are mapped by analogous steps along
     the  x-axis. When there is a strand asymmetry, this will form a
     trajectory that  turns at the origin and terminus of replication.
     Each step is the sum of  nucleotides in a gene in third codon
     positions. Then ortogonal regression is  used to find a line
     through this trajectory. Each point in the trajectory will  have a
     corresponding point on the line, and the coordinates of each are 
     calculated. Thereafter, the distances from each of these points to
     the origin (of the plane), are calculated. These distances will
     represent a form of  cumulative skew. This permets us to make a
     plot with the gene position (gene  number, start or end position)
     on the x-axis and the cumulative skew (distance) at the y-axis.
     Depending on where the sequence starts, such a plot will display 
     one or two peaks. Positive peak means origin, and negative means
     terminus.  In the case of only one peak, the sequence starts at
     the origin or terminus  site.

_A_u_t_h_o_r(_s):

     J.R. Lobry and A.C. Frank

_R_e_f_e_r_e_n_c_e_s:

     The original paper for oriloc:
      Frank, A.C., Lobry, J.R. (2000) Oriloc: prediction of replication
     boundaries in unannotated bacterial chromosomes. _Bioinformatics_,
      *16*:566-567.
      <URL: http://bioinformatics.oupjournals.org/cgi/reprint/16/6/560>

     A simple informal introduction to DNA-walks:
      Lobry, J.R. (1999) Genomic landscapes. _Microbiology Today_,
     *26*:164-165.
      <URL: http://www.socgenmicrobiol.org.uk/QUA/049906.pdf>

     An early and somewhat historical application of DNA-walks:
      Lobry, J.R. (1996) A simple vectorial representation of DNA
     sequences  for the detection of replication origins in bacteria.
     _Biochimie_, *78*:323-326.

     'citation("seqinr")'

_E_x_a_m_p_l_e_s:

       ## Not run:   out <- oriloc() 
       ## Not run: 
       plot(out$st, out$sk, type="l", xlab="Map position in Kb",
         ylab = "Cumulated composite skew", 
         main=expression(italic(Chlamydia~~trachomatis)~~complete~~genome))
     ## End(Not run)

