query                 package:seqinr                 R Documentation

_T_o _g_e_t _a _l_i_s_t _o_f _s_e_q_u_e_n_c_e _n_a_m_e_s _f_r_o_m _a_n _A_C_N_U_C _d_a_t_a _b_a_s_e _l_o_c_a_t_e_d _o_n _t_h_e _w_e_b

_D_e_s_c_r_i_p_t_i_o_n:

     This is a major command of the package. It executes all sequence
     retrievals using any selection criteria the data base allows.  The
     sequences are coming from ACNUC data base located on the web and
     they are transfered by socket. The command produces the list of
     all sequence names that fit the required criteria. The sequence
     names belong to the class of sequence 'SeqAcnucWeb'.

_U_s_a_g_e:

     query(listname, query, socket = "auto", invisible = TRUE, verbose = FALSE, virtual = FALSE)

_A_r_g_u_m_e_n_t_s:

listname: The name of the list as a quoted string of chars

   query: A quoted string of chars containing the request with the
          syntax given in the details section

  socket: a socket of class connection and sockconn returned by
          'choosebank'.Default value (auto) means that the socket will
          be set to to the socket component of the banknameSocket
          variable. 

invisible: if 'FALSE', the result is returned visibly.

 verbose: if 'TRUE', verbose mode is on

 virtual: if 'TRUE', no attempt is made to retrieve the information
          about all the elements of the list. In this case, the 'req'
          component of the list is set to  'NA'.

_D_e_t_a_i_l_s:

     Each selection criterion is written using the following syntax:

_c = _c_r_i_t_e_r_i_o_n _v_a_l_u_e where c indicates which criterion is used.  Many
     selection criteria are available. They correspond mainly to the 
     structured elements of the sequence documentation in the data
     banks, and are detailled thereafter. Criteria can be combined
     using 3 logical  operations:   

     criterion1 ET criterion2 : logical AND (sequences that fit
     criteria 1 and 2  simultaneously).

     criterion1 OU criterion2 : logical OR (sequences that fit at least
     one of both criteria).

     NO criterion1 : logical negation (sequences that do not fit
     criterion 1).

     Parentheses can be used to delimit the range of operations. List
     of sequences can be re-used at will, which is very convenient to
     fragment complexe requests into simple requests. For instance,
     here are two equivalent ways to get all coding sequences from
     _Escherichia coli_  that are not partial:

     s=choosebank("genbank")

     'query(s$socket,"final","sp=escherichia coli ET t=cds ET NO
     k=partial")'

     s=choosebank("genbank")

     'query(s$socket,"eco","sp=escherichia coli")'

     query(s$socket,"ecocds","eco ET t=cds")

     'query(s$socket,"final","ecocds ET NO k=partial")'


_S_P = _s_p_e_c_i_e_s _n_a_m_e sequences from given (group of) species.             
        The special character @ can be used to match any group of
     characters in       the species name, ex: SP=RATTUS@. Use of space
     is allowed. Examples: ESCHERICHIA COLI, @COLI, E@COLI. Species
     names are tree-structured according to the biological
     classification  of species.

_K = _k_e_y_w_o_r_d sequences having a given keyword. Since keywords are    
     tree structured, as are species, you will select all     sequences
     associated to keywords further down in tree.   (@ can be used to
     match any group of characters) 

_R = _r_e_f_e_r_e_n_c_e _c_o_d_e sequences from a given reference. References are
     specified as follows depending on the type of document:

         Document                   Format                        Example
         Journal article            journal_code/volume/1st_page  jme/34/17
         Book                       book/year/1st_author          book/1980/broker
         Thesis                     thesis/year/1st_author        thesis/1984/wildgruber
         Patent                     patent/patent_coded_number    patent/ep0238993
         Unpublished, or submitted  unpubl/year/1st_author        unpubl/1993/cho

_J = _j_o_u_r_n_a_l _n_a_m_e sequences published in a given journal.                                        

_Y = _y_e_a_r sequences published in given year (e.g. 1982).

_Y > _y_e_a_r sequences published after or during a given year.

_Y < _y_e_a_r sequences published before or during a given year.

_A_U = _a_u_t_h_o_r sequences published by given author(s). Use @ to specify
     any letters in name (e.g. @ORMOND@ for Van Ormondt). Only last
     names are indexed - initials are ignored. All authors of journal
     articles are indexed. Only the first author of books, theses,
     patents and other documents is indexed.


_T = _s_e_q_u_e_n_c_e _t_y_p_e sequences of given type. You generally obtain 
     subsequences with this criterion because types are for example
     tRNA,  rRNA or protein gene. Type should not be confused with
     molecule which denotes the chemical nature of the sequenced
     molecule (_e.g._, DNA, mRNA, tRNA). Type is defined only for the
     nucleotide sequence banks. Presently the existing types are:

       ID        Locus entry                             (EMBL, SWISS-PROT, NRSub)
       LOCUS     Locus entry                             (GenBank, Hovergen, EMGLib)
       CDS       .PE protein coding region               (all)
       RRNA      .RR mature ribosomal RNA                (all)
       TRNA      .TR mature transfer RNA                 (all)
       MISC_RNA  .RN other structural RNA coding region  (EMBL, GenBank, Hovergen, NRSub, EMGLib)
       SNRNA     .SN small nuclear RNA                   (EMBL, GenBank, Hovergen, EMGLib)
       SCRNA     .SC small cytoplasmic RNA               (EMBL, GenBank, Hovergen, NRSub, EMGLib)
       3'INT     .3I 3' intron                           (Hovergen)
       3'NCR     .3F 3' non-coding region                (Hovergen)
       5'INT     .5I 5' intron                           (Hovergen)
       5'NCR     .5F 5' non-coding region                (Hovergen)
       CPG       .CG CpGobs/CpGexp>0.5                   (Hovergen)
       INT_INT   .IN internal intron                     (Hovergen)

     Each entry of a FEATURE TABLE describing a coding region of a DNA
     fragment gives rise to a subsequence equal to the fragments
     described in the location of the feature. The type of the
     resulting subsequence equals the key of the corresponding feature
     table entry. The name of the resulting subsequence is  built by
     adding to the parent sequence's name an extension uniquely
     identifying this particular feature. 

     Sequences of a given type are generally subsequences, _i.e._,
     fragments of parent sequences, except if the coding region covers
     totally the parent sequence, in which case ACNUC does not create a
     subsequence.

_O = _o_r_g_a_n_e_l_l_e sequences from a given organelle.  Organelle (_e.g._,
     chloroplast, mitochondrion) denotes the nature of the genome that
     harbors a particular gene. By extension, ACNUC also sees the
     nucleus as an organelle. Also, a nuclear-encoded gene coding for a
     protein exported to an organelle is considered as a nuclear gene.
     The existing organelles are:

       CHLOROPLAST    Chloroplast genome    (EMBL, GenBank, NBRF, Hovergen)
       MITOCHONDRION  Mitochondrial genome  (EMBL, GenBank, NBRF, Hovergen)
       KINETOPLAST    Kinetoplast genome    (EMBL, GenBank, Hovergen)
       NUCLEAR        Nuclear genome        (all)

_M = _m_o_l_e_c_u_l_e _n_a_m_e sequences with given chemical structure.  In ACNUC,
     molecule denotes the chemical nature of the sequenced molecule
     (_e.g._, DNA, mRNA, tRNA).  Molecule should not be confused with
     type which identifies the encoded molecule (_e.g._, protein, tRNA,
     rRNA). Thus the sequence of a tRNA gene has DNA for molecule
     because DNA rather than tRNA was sequenced. The subsequence
     covering the tRNA region has tRNA for type because this is the
     nature of the encoded product. Molecule is defined only for the
     nucleotide sequence banks (GenBank, EMBL, Hovergen, NRSub, and
     CGDB). Presently the existing molecules are: 

       DNA   Sequenced molecule is DNA    (all)
       RNA   Sequenced molecule is RNA    (all)
       MRNA  Sequenced molecule is mRNA   (GenBank, Hovergen)
       RRNA  Sequenced molecule is rRNA   (GenBank, Hovergen)
       TRNA  Sequenced molecule is tRNA   (GenBank, Hovergen)
       URNA  Sequenced molecule is snRNA  (GenBank, Hovergen)

_N = _s_e_q_u_e_n_c_e _n_a_m_e sequence of given name.

_A_C = _a_c_c_e_s_s_i_o_n _n_u_m_b_e_r sequences of given accession number.

_F = _f_i_l_e _n_a_m_e sequences whose names are in a specified file.                                     

_F_A = _f_i_l_e _n_a_m_e sequences whose accesion numbers are in a specified
     file.

_V_a_l_u_e:

     A list with the following components: 

    bank: the name of the bank that has been choosen by
          'choosebank.socket'

    call: original call

    name: list name

   nelem: number of elements in the list on the server

typelist: the type of the elemnts of the list. Could be SQ for a list
          of sequence names, KW for a list of keywords, SP for a list
          of species names.

     req: a list of sequence names that fit the required criteria or
          'NA' when called with parameter 'virtual' is 'TRUE'

_N_o_t_e:

     Most of the documentation was imported from ACNUC help files
     written by Manolo Gouy

_A_u_t_h_o_r(_s):

     J.R. Lobry & D. Charif

_R_e_f_e_r_e_n_c_e_s:

     To get the release date and content of all the databases located
     at the pbil, please look at the following url: <URL:
     http://pbil.univ-lyon1.fr/search/releases.php>
      Gouy, M., Milleret, F., Mugnier, C., Jacobzone, M., Gautier,C.
     (1984) ACNUC: a nucleic acid sequence data base and analysis
     system.  _Nucl. Acids Res._, *12*:121-127.
      Gouy, M., Gautier, C., Attimonelli, M., Lanave, C., Di Paola, G.
     (1985)  ACNUC - a portable retrieval system for nucleic acid
     sequence databases: logical and physical designs and usage.
     _Comput. Appl. Biosci._, *3*:167-172.
      Gouy, M., Gautier, C., Milleret, F. (1985) System analysis and
     nucleic acid sequence banks. _Biochimie_, *67*:433-436. 

     'citation("seqinr")'

_S_e_e _A_l_s_o:

     'choosebank', 'getSequence', 'plot.SeqAcnucWeb'

_E_x_a_m_p_l_e_s:

      ## Not run: s <- choosebank("genbank")
      ## Not run: query(s$socket,"ecoli","sp=escherichia coli@")
      ## Not run: ecoli
      # To have the 4 first names of the sequence
      ## Not run: ecoli$req[1:4]
      ## Not run: ecoli$req[[5]]
      ## Not run: ecoli$call

