wilma                package:supclust                R Documentation

_S_u_p_e_r_v_i_s_e_d _C_l_u_s_t_e_r_i_n_g _o_f _P_r_e_d_i_c_t_o_r _V_a_r_i_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs supervised clustering of predictor variables for large
     (microarray gene expression) datasets. Works in a greedy forward
     strategy and optimizes a combination of the Wilcoxon and Margin
     statistics for finding the clusters.

_U_s_a_g_e:

     wilma(x, y, noc, genes = NULL, flip = TRUE, once.per.clust = FALSE, trace = 0)

_A_r_g_u_m_e_n_t_s:

       x: Numeric matrix of explanatory variables (p variables in
          columns, n cases in rows). For example, these can be
          microarray gene expression data which should be clustered.

       y: Numeric vector of length n containing the class labels of the
          individuals. These labels have to be coded by 0 and 1.

     noc: Integer, the number of clusters that should be searched for
          on the data.

   genes: Defaults to 'NULL'. An optional list (of length 'noc') of
          vectors containing the indices (column numbers) of the
          previously known initial clusters.

    flip: Logical, defaults to 'TRUE'. Is indicating whether the
          clustering should be done with or without sign-flipping.

once.per.clust: Logical, defaults to 'FALSE'. Is indicating if each
          variable (gene) should only be allowed to enter into each
          cluster once; equivalently, the cluster mean profile has only
          weights +/- 1 for each variable.

   trace: Integer >= 0; when positive, the output of the internal loops
          is provided; 'trace >= 2' provides output even from the
          internal C routines.

_V_a_l_u_e:

     'wilma' returns an object of class "wilma". The functions 'print'
     and 'summary' are used to obtain an overview of the clusters that
     have been found. The function 'plot' yields a two-dimensional
     projection into the space of the first two clusters that 'wilma'
     found. The generic function 'fitted' returns the fitted values,
     these are the cluster representatives. Finally, 'predict' is used
     for classifying test data on the basis of Wilma's cluster with
     either the nearest-neighbor-rule, diagonal linear discriminant
     analysis, logistic regression or aggregated trees.

     An object of class "wilma" is a list containing: 

   clist: A list of length 'noc', containing integer vectors consisting
          of the indices (column numbers) of the variables (genes) that
          have been clustered.

   steps: Numerical vector of length 'noc', showing the number of
          forward/backward cycles in the fitting process of each
          cluster.

       y: Numeric vector of length n containing the class labels of the
          individuals. These labels have to be coded by 0 and 1.

 x.means: A list of length 'noc', containing numerical matrices
          consisting of the cluster representatives after insertion of
          each variable.

     noc: Integer, the number of clusters that has been searched for on
          the data.

   signs: Numerical vector of length p, saying whether the ith variable
          (gene) should be sign-flipped (-1) or not (+1).

_A_u_t_h_o_r(_s):

     Marcel Dettling, dettling@stat.math.ethz.ch

_R_e_f_e_r_e_n_c_e_s:

     Marcel Dettling (2002) _Supervised Clustering of Genes_, see <URL:
     http://stat.ethz.ch/~dettling/supercluster.html>

     Marcel Dettling and Peter Bhlmann (2002). Supervised Clustering
     of Genes. _Genome Biology_, *3*(12): research0069.1-0069.15.

     Marcel Dettling and Peter Bhlmann (2004). Finding Predictive Gene
     Groups from Microarray Data. To appear in the _Journal of
     Multivariate Analysis_.

_S_e_e _A_l_s_o:

     'score', 'margin', and for a newer methodology, 'pelora'.

_E_x_a_m_p_l_e_s:

     ## Working with a "real" microarray dataset
     data(leukemia, package="supclust")

     ## Generating random test data: 3 observations and 250 variables (genes)
     set.seed(724)
     xN <- matrix(rnorm(750), nrow = 3, ncol = 250)

     ## Fitting Wilma
     fit  <- wilma(leukemia.x, leukemia.y, noc = 3, trace = 1)

     ## Working with the output
     fit
     summary(fit)
     plot(fit)
     fitted(fit)

     ## Fitted values and class predictions for the training data
     predict(fit, type = "cla")
     predict(fit, type = "fitt")

     ## Predicting fitted values and class labels for test data
     predict(fit, newdata = xN)
     predict(fit, newdata = xN, type = "cla", classifier = "nnr", noc = c(1,2,3))
     predict(fit, newdata = xN, type = "cla", classifier = "dlda", noc = c(1,3))
     predict(fit, newdata = xN, type = "cla", classifier = "logreg")
     predict(fit, newdata = xN, type = "cla", classifier = "aggtrees")

