selectPrototypes            package:GOSim            R Documentation

_H_e_u_r_i_s_t_i_c _s_e_l_e_c_t_i_o_n _o_f _p_r_o_t_o_t_y_p_e_s _a_n_d _d_i_m_e_n_s_i_o_n_a_l_i_t_y _r_e_d_u_c_t_i_o_n _o_f _f_e_a_t_u_r_e _v_e_c_t_o_r_s.

_D_e_s_c_r_i_p_t_i_o_n:

     \begin{enumerate}

     *  Heuristic selection of prototypes 

     *  Dimensionality reduction of feature vectors \end{enumerate}

_U_s_a_g_e:

     selectPrototypes(n = 250, method = "frequency", data = NULL, verbose = TRUE)

_A_r_g_u_m_e_n_t_s:

       n: number of prototypes or maximum number of clusters 

  method: method to select prototypes or to perform subset selection 

    data: data matrix (l x d) of feature vectors (l = number of genes) 

 verbose: print out information 

_D_e_t_a_i_l_s:

     The following heuristics to perform automatic selection of
     prototypes are implemented: \begin{ldescription}

"_f_r_e_q_u_e_n_c_y" select n genes with highest number of GO annotations in the
     currently selected ontology

"_r_a_n_d_o_m" select n genes uniform randomly over all genes with
     annotations in the currently selected ontology \end{ldescription}

     To perfom dimensionality reduction implemented methods are:

     \begin{itemize}

     *  {\bf "PCA"}: dimensionality reduction via principal component
        analysis; the number of prinicipal components is determined
        such that at least 95

     *  {\bf "clustering"}: EM-clustering in feature space
        \end{itemize}

_V_a_l_u_e:

     If the function is called to automatically select prototypes, a
     character vector of gene IDs is returned.

     If the function is called to perform dimensionality via PCA, the
     result is a list with items 

"features": original data projected on the first k principal components 

   "pcs": l x k matrix of principal components. Each column is one
          principal component 

"lambda": first k eigenvalues 


     If the function is called to perform clustering in feature space,
     the cluster centers are returned in a l x k matrix (each column is
     one cluster center). The "Mclust" function in the package "mclust"
     is called to perform the clustering. The BIC is used to calculate
     the optimal number of clusters in the range 2,...,n.

_N_o_t_e:

     The result depends on the currently set ontology ("BP","MF","CC").

_A_u_t_h_o_r(_s):

     Holger Froehlich

_R_e_f_e_r_e_n_c_e_s:

     [1] H. Froehlich, N. Speer, C. Spieth, and A. Zell, Kernel Based
     Functional Gene Grouping, Proc. Int. Joint Conf. on Neural
     Networks (IJCNN), pp. 6886 - 6891, 2006\ [2] N. Speer, H.
     Froehlich, A. Zell, Functional Grouping of Genes Using Spectral
     Clustering and Gene Ontology, Proc. Int. Joint Conf. on Neural
     Networks (IJCNN), pp. 298 - 303, 2005

_S_e_e _A_l_s_o:

     'getGeneFeaturesPrototypes', 'getGeneSimPrototypes', 'setOntology'

_E_x_a_m_p_l_e_s:

     ## Not run: 
      # takes too much time in the R CMD check
      proto=selectPrototypes(n=50) # --> returns a character vector of 50 genes with the highest number of annotations 
      feat=getGeneFeaturesPrototypes(c("207","208","7494"),prototypes=proto,pca=FALSE) # --> compute feature vectors 
      selectPrototypes(data=feat$features,method="pca") # ... and PCA projection
      
     ## End(Not run)

