FunCluster            package:FunCluster            R Documentation

_F_u_n_c_t_i_o_n_a_l _P_r_o_f_i_l_i_n_g _o_f _c_D_N_A _M_i_c_r_o_a_r_r_a_y _E_x_p_r_e_s_s_i_o_n _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     FunCluster performs a functional analysis of microarray expression
     data based on Gene Ontology & KEGG annotations. FunCluster is
     designed to build functional classes of putatively co-regulated
     biological processes through a specially designed clustering
     procedure relying on expression data and functional annotations.

_U_s_a_g_e:

               FunCluster(wd = "", org = "HS", go.direct = FALSE, clusterm = "cc",
                          compare = "common.correl.genes", corr.met = "greedy",
                          corr.th = 0.85, two.lists = TRUE, restrict = FALSE,
                          alpha = 0.05, location = FALSE, details = FALSE)
               

_A_r_g_u_m_e_n_t_s:

      wd: sets the working directory where the expression data files
          are to be found and  where results are to be stored.

     org: indicates the biological species to which analyzable
          transcript expression data is related;  currently only three
          possibilities are available with FunCluster: "HS" for human
          expression data, "MM" for mouse (Mus Musculus) expression
          data and "SC" for yeast (Saccharomyces Cerevisiae) 
          expression data. Default value is "HS".

two.lists: possible values are TRUE if a discriminatory functional
          analysis of two lists of transcripts is required (e.g.
          significantly up-regulated transcripts versus down-regulated
          transcripts) or  FALSE if only one list of transcripts is to
          be analyzed. In the case of differential analysis of  two
          lists of transcripts, FunCluster expects to find within the
          working folder two tab separated  text files containing the
          transcript expression data named "up.txt" and "down.txt"
          respectively  (names are mandatory). In the case of only one
          list of transcripts FunCluster expects to find  within its
          working directory a single tab separated text file named
          "genes.txt". Please see the  example dataset for the format
          of the data files. The default value of this parameter is
          TRUE.

restrict: possible values are TRUE if a reference list of transcripts
          is provided for the statistical significance calculation of
          the transcript enrichment of the biological annotations or
          FALSE if  such a restriction is not imposed and the
          transcript enrichment significance is therefore estimated 
          with regards of the whole genome. The purpose of this
          parameter was to correct the enrichment  significance
          calculations for those situations in which expression data is
          not available for the  whole genome but only for a fraction
          of it, either because of microarray processing errors which 
          limits the number of transcripts available for analysis, or
          for the case of dedicated microarrays,  designed to scan only
          a fraction of the genome. For the case in which such a
          restriction is needed a  tab separated text file named
          "ref.txt" should be provided, containing the list of all the 
          transcripts initially available for the analysis (after
          filtering for missing data). The transcripts  should be
          identified only by their LocusLink ID number or by their
          EntrezGene ID number. The default  value for this parameter
          is FALSE.

go.direct: if TRUE it restricts the transcript enrichment calculations
          for the GO (Gene Ontology) annotations only to directly
          annotated transcripts, without taking into account the
          ontological lattice and the  subsuming relations inside Gene
          Ontology. Default value is FALSE, which means that, when
          calculating  the transcript enrichment significance of a GO
          functional annotation, directly annotated transcripts  are
          considered together with transcripts annotated by the
          directly subsumed terms within the ontological lattice.

 compare: refers to the approach used for clustering highly
          co-expressed transcripts from available data needed in order
          to identify, compare and group functional annotations sharing
          a significant number of highly co-expressed transcripts. The
          default value is "common.correl.genes" which implies  that a
          detailed analysis in search for shared co-expressed
          transcripts is performed (very expensive  computationally and
          requiring enough microarray samples for correlation
          calculations on transcript expression profiles). If
          "common.genes" is selected this means that when comparing two
          functional  annotations only the transcripts commonly
          annotated by the two terms are considered, without taking
          into account transcripts expression. If "correl.mean.exp" is
          selected the comparison of two functional  annotations is
          based only on the correlations of their "mean expression
          profile" computed for each  annotation as a vector of mean
          expression levels of annotated transcripts for each available
          microarray sample.

 corr.th: it allows varying the correlation threshold used to search
          for and build clusters of highly co-expressed transcripts
          with the greedy approach. Default value based on currently
          available literature data is  0.85 corresponding to a
          Spearman correlation coefficient Rs > =  0.85.

corr.met: indicates the procedure to be used to build transcript
          expression clusters. It counts only if "compare" is set to
          "common.correl.genes". Two values are possible:
          "hierarchical" will use a hierarchical  agglomerative
          procedure combined with Silhouette computing; "greedy" will
          use an original greedy clustering procedure conditioned by a
          correlation threshold specified by "corr.th" to assure
          homogeneity of clusters.

clusterm: is related to the algorithm used to group terms (biological
          annotations), having significant transcript enrichment within
          the analyzed data, in order to build functional classes of
          putatively co-regulated  biological functions. Default value
          is "cc" and it should not be modified.

   alpha: signifies the threshold of p-values significance (alpha)
          resulting from statistical calculations concerning transcript
          enrichment of biological annotations. Default value is 0.05.

location: allows to perform an analysis of the transcript enrichment of
          genome locations based on available  genome location data
          (chromosome and cytoband transcript locations). If TRUE is
          selected it provides  two lists, one containing chromosome
          transcript enrichment data and the other cytoband transcript 
          enrichment data, separately for each list of analyzed
          transcripts. Default value is FALSE.

 details: specifies if intermediary results (detailed annotation data)
          has to be saved.

_D_e_t_a_i_l_s:

     FunCluster can be used with the currently available R
     distributions (tested with distributions posterior to 2.0.0),
     either with Microsoft Windows operating environments (tested with
     Windows XP) or, better, with a Linux operating environment (tested
     with Fedora Core 3 and 4 and Suse Linux 10.0). Please be aware
     that  FunCluster analysis implies a lot of computations and
     therefore high processing power and good stability of  the
     operating system are absolute requirements.

     Together with the FunCluster algorithm this package provide also:
      1. GO and KEGG annotations (as of February 2006) automatically
     extracted from their respective web resources

      2. The routine for the automated extraction and update of the
     functional annotations from their respective web resources. The
     use of this routine is simple: 'annotations(date.annot = "")'.
     Under common circumstances these routine will provide up-to-date
     annotations, stored into environmental variables, directly
     formatted for FunCluster's use. Some errors may be seen when using
     this routine related to a lack of availability of the GO
     annotations for the current month. In case of extraction errors, 
     explained most usually by a delay in updating GO web servers, the
     release date can be expressly  indicated (see 'annotations').

      3. The two test data sets used for the JBCB paper (see examples
     below). The first data set is related to the  dichotomous
     functional analysis of the genes specifically expressed within
     adipocytes and stroma  vascular fraction (SVF) cells, extracted
     from adipose tissue of morbidly obese subjects (see  submitted
     paper and cited reference for further details). Two lists of
     transcripts significantly expressed within adipocytes and SVF
     cells respectively are provided together with the list of all
     initial transcripts available for the analysis (necessary for the
     accurate computation of transcript enrichment during automated
     annotation of transcript expression data performed by FunCluster).
      The second data set is structured in a similar way and is
     containing the hyperinsulinemic muscle clamp expression data. 

     The format of the data files should be respected in order to
     perform a successful analysis. All the files are  tab separated
     text files which can be easily obtained from Excel data. The only
     transcript  identification system acceptable for FunCluster
     analysis is EntrezGene GeneID's. Please see more  details on this
     choice in the JBCB paper. The transcript expression data within
     the tab separated text files is organized within rows, one for
     each transcript, and columns with the  first one containing the
     transcript identifiers for each transcript and the rest of them 
     containing the expression level of that transcript in each of the
     available microarray samples.  See test data and JBCB paper for
     more details.

     The results of the FunCluster analysis of transcript expression
     data are stored as HTML files in  the "Results" subfolder of the
     working folder. For each type of available biological annotations 
     and for each list of transcript expression data to be analyzed
     (one or two), FunCluster provides  a ranked list with the
     significant functional clusters observed, stored within a separate
     file.  Detailed findings on the terminological composition and
     transcript enrichment significance of the  resulting functional
     clusters are provided.

_R_e_f_e_r_e_n_c_e_s:

     1. Henegar C, Cancello R, Rome S, Vidal H, Clement K, Zucker JD.
     Clustering biological annotations and gene  expression data to
     identify putatively co-regulated biological processes. J Bioinform
     Comput Biol. 2006 Aug; 4(4) (in press).

     2. Cancello R, Henegar C, Viguerie N, Taleb S, Poitou C, Rouault
     C, Coupaye M, Pelloux V, Hugol D, Bouillot  JL, Bouloumie A,
     Barbatelli G, Cinti S, Svensson PA, Barsh GS, Zucker JD, Basdevant
     A, Langin D, Clement K. Reduction of macrophage infiltration and
     chemoattractant gene expression changes in  white adipose tissue
     of morbidly obese subjects after surgery-induced weight loss. 
     Diabetes 2005; 54(8):2277-86.

     3. FunCluster website: <URL:
     http://corneliu.henegar.info/FunCluster.htm>

_S_e_e _A_l_s_o:

     'cluster, annotations'.

_E_x_a_m_p_l_e_s:

               ## Not run: 
               ## load adipose tissue data (see Diabetes and JBCB papers for details)
               data(adipose)

               ## or load hyperinsulinemic muscle clamp data (see JBCB paper for details)
               data(insulin)

               ## most common use
               FunCluster(go.direct = FALSE, alpha = 0.05, clusterm = "cc",
                                   org = "HS", location = FALSE, compare = 
                                   "common.correl.genes", corr.th = 0.85, 
                                   corr.met = "greedy", two.lists = TRUE, 
                                   restrict = TRUE)
               
               ## when only GO direct annotations are to be used and detailed 
               findings are needed
               FunCluster(go.direct = TRUE, alpha = 0.05, clusterm = "cc",
                                   org = "HS", location = FALSE, compare = 
                                   "common.correl.genes", corr.th = 0.85, 
                                   corr.met = "greedy", two.lists = TRUE, 
                                   restrict = TRUE, details = TRUE)
               
               ## hierarchical agglomerative clustering and Silhouette computations 
               can be used for the preliminary step of building clusters of 
               co-expressed transcripts
               FunCluster(go.direct = TRUE, alpha = 0.05, clusterm = "cc",
                                   org = "HS", location = FALSE, compare = 
                                   "common.correl.genes", corr.th = 0.85, 
                                   corr.met = "hierarchical", 
                                   two.lists = TRUE, restrict = TRUE)

               ## use only common annotated transcripts for the annotation clustering  
               FunCluster(go.direct = FALSE, alpha = 0.05, clusterm = "cc",
                                   org = "HS", location = FALSE, compare = 
                                   "common.genes", two.lists = TRUE, 
                                   restrict = TRUE)
               ## End(Not run)

