clusterRepro          package:clusterRepro          R Documentation

_G_e_n_e _e_x_p_r_e_s_s_i_o_n _c_l_u_s_t_e_r_s _r_e_p_r_o_d_u_c_i_b_i_l_i_t_y _a_n_d _v_a_l_i_d_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Validate gene expression clusters by determining whether or not
     they are reproducible

_U_s_a_g_e:

     clusterRepro(Centroids, New.data, Number.of.permutations)

_A_r_g_u_m_e_n_t_s:

Centroids: The matrix of centroids with annotated rows.  The labeled
          rows are either genes (for gene clusters) or samples (for
          sample clusters) and the columns are the centroids.

New.data: The matrix of gene expression data (with annotated rows)
          independent of the dataset used to form the centroids.  For
          gene clusters, the rows are samples and the columns are
          genes.  For sample clusters, the rows are genes and the
          columns are samples.

Number.of.permutations: The number of times the centroids will be
          permuted to generate the null distribution.

_D_e_t_a_i_l_s:

     This function looks for gene expression clusters found in one
     dataset in another independent dataset.  The centroids from the
     first dataset are used to classify the independent data and the
     corresponding in-group proportions (IGPs) are computed.  These
     in-group proportions are compared to null distributions of
     in-group proportions to produce p-values.  The IGP null
     distributions are generated by repeatedly permuting the centroids
     within the box aligned with the principal components, classifying
     the independent data, and calculating the corresponding IGPs.

_V_a_l_u_e:

Actual.Size: The number of columns of New.data assigned to each
          centroid.

Actual.IGP: The in-group proportions of the groups formed when New.data
          is classified using Centroids.

 p.value: The p-values for each of the groups represented by the
          centroids.

  Number: The number of permutations used to compute the corresponding
          p-value.

_A_u_t_h_o_r(_s):

     Amy Kapp and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s:

     Amy Kapp and Robert Tibshirani.  Are clusters in one dataset
     present in another dataset?  To be published.

_S_e_e _A_l_s_o:

     'IGP.clusterRepro'.

_E_x_a_m_p_l_e_s:

     ### Generate centroids with annotated rows
     Centroids <- matrix(rnorm(30, sd = 10), 10)
     rownames(Centroids) <- letters[1:nrow(Centroids)]

     ### Generate data with annotated rows
     Data <- cbind(matrix(rep(Centroids[,1], 10), 10),
     matrix(rep(Centroids[,2], 15), 10), matrix(rep(Centroids[,3], 20), 10))
     Data <- Data + matrix(rnorm(length(Data), sd = 10), nrow(Data))
     rownames(Data) <- letters[1:nrow(Data)]

     ### Classify the data and calculate the corresponding in-group
     ### proportions and group size
     Result <- clusterRepro(Centroids, Data, Number.of.permutations = 1)
     Result$Actual.IGP
     Result$Actual.Size

     ### Generate null distributions and compare to actual in-group proportions to obtain p-values
     Result2 <- clusterRepro(Centroids, Data, Number.of.permutations = 1000)

     ### If the number of rows in the centroid matrix does not match the
     ### number of rows in the data matrix and the row labels are unique, this
     ### function will only use the rows that the two matrices have in common. 
     Data <- matrix(rnorm(200), 20)
     rownames(Data) <- letters[(nrow(Data)+6):7]
     Result <- IGP.clusterRepro(Data, Centroids)
     Result2 <- clusterRepro(Centroids, Data, Number.of.permutations = 1000)

