pelora               package:supclust               R Documentation

_S_u_p_e_r_v_i_s_e_d _G_r_o_u_p_i_n_g _o_f _P_r_e_d_i_c_t_o_r _V_a_r_i_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs selection and supervised grouping of predictor variables
     in large (microarray gene expression) datasets, with an option for
     simultaneous classification. Works in a greedy forward strategy
     and optimizes the binomial log-likelihood, based on estimated
     conditional probabilities from penalized logistic regression
     analysis.

_U_s_a_g_e:

     pelora(x, y, u = NULL, noc = 10, lambda = 1/32, flip = "pm",
            standardize = TRUE, trace = 1)

_A_r_g_u_m_e_n_t_s:

       x: Numeric matrix of explanatory variables (p variables in
          columns, n cases in rows). For example, these can be
          microarray gene expression data which should be grouped.

       y: Numeric vector of length n containing the class labels of the
          individuals. These labels have to be coded by 0 and 1.

       u: Numeric matrix of additional (clinical) explanatory variables
          (m variables in columns, n cases in rows) that are used in
          the (penalized logistic regression) prediction model, but
          neither grouped nor averaged. For example, these can be
          'traditional' clinical variables.

     noc: Integer, the number of clusters that should be searched for
          on the data.

  lambda: Real, defaults to 1/32. Rescaled penalty parameter that
          should be in [0,1].

    flip: Character string, describing a method how the 'x' (gene
          expression) matrix should be sign-flipped. Possible are
          '"pm"' (the default) where the sign for each variable is
          determined upon its entering into the group, '"cor"' where
          the sign for each variable is determined a priori as the sign
          of the empirical correlation of that variable with the
          'y'-vector, and '"none"' where no sign-flipping is carried
          out.

standardize: Logical, defaults to 'TRUE'. Is indicating whether the
          predictor variables (genes) should be standardized to zero
          mean and unit variance.

   trace: Integer >= 0; when positive, the output of the internal loops
          is provided; 'trace >= 2' provides output even from the
          internal C routines.

_V_a_l_u_e:

     'pelora' returns an object of class "pelora". The functions
     'print' and 'summary' are used to obtain an overview of the
     variables (genes) that have been selected and the groups that have
     been formed. The function 'plot' yields a two-dimensional
     projection into the space of the first two group centroids that
     'pelora' found. The generic function 'fitted' returns the fitted
     values, these are the cluster representatives. 'coef' returns the
     penalized logistic regression coefficients theta_j for each of the
     predictors. Finally, 'predict' is used for classifying test data
     with Pelora's internal penalized logistic regression classifier on
     the basis of the (gene) groups that have been found. 

     An object of class "pelora" is a list containing: 

   genes: A list of length 'noc', containing integer vectors consisting
          of the indices (column numbers) of the variables (genes) that
          have been clustered.

  values: A numerical matrix with dimension n times 'noc', containing
          the fitted values, i.e. the group centroids tilde{x}_j.

       y: Numeric vector of length n containing the class labels of the
          individuals. These labels are coded by 0 and 1.

   steps: Numerical vector of length 'noc', showing the number of
          forward/backward cycles in the fitting process of each
          cluster.

  lambda: The rescaled penalty parameter.

     noc: The number of clusters that has been searched for on the
          data.

      px: The number of columns (genes) in the 'x'-matrix.

    flip: The method that has been chosen for sign-flipping the
          'x'-matrix.

var.type: A factor with 'noc' entries, describing whether the jth
          predictor is a group of predictors (genes) or a single
          (clinical) predictor variable.

    crit: A list of length 'noc', containing numerical vectors that
          provide information about the development of the grouping
          criterion during the clustering.

   signs: Numerical vector of length p, saying whether the ith variable
          (gene) should be sign-flipped (-1) or not (+1).

samp.names: The names of the samples (rows) in the 'x'-matrix.

gene.names: The names of the variables (columns) in the 'x'-matrix.

    call: The function call.

_A_u_t_h_o_r(_s):

     Marcel Dettling, dettling@stat.math.ethz.ch

_R_e_f_e_r_e_n_c_e_s:

     Marcel Dettling (2003) _Finding Predictive Gene Groups from
     Microarray Data_, see <URL:
     http://stat.ethz.ch/~dettling/supervised.html>

     Marcel Dettling and Peter Bhlmann (2002). Supervised Clustering
     of Genes. _Genome Biology_, *3*(12): research0069.1-0069.15.

     Marcel Dettling and Peter Bhlmann (2004). Finding Predictive Gene
     Groups from Microarray Data. To appear in the _Journal of
     Multivariate Analysis_

_S_e_e _A_l_s_o:

     'wilma' for another supervised clustering technique.

_E_x_a_m_p_l_e_s:

     ## Working with a "real" microarray dataset
     data(leukemia, package="supclust")

     ## Generating random test data: 3 observations and 250 variables (genes)
     set.seed(724)
     xN <- matrix(rnorm(750), nrow = 3, ncol = 250)

     ## Fitting Pelora
     fit <- pelora(leukemia.x, leukemia.y, noc = 3)

     ## Working with the output
     fit
     summary(fit)
     plot(fit)
     fitted(fit)
     coef(fit)

     ## Fitted values and class probabilities for the training data
     predict(fit, type = "cla")
     predict(fit, type = "prob")

     ## Predicting fitted values and class labels for the random test data
     predict(fit, newdata = xN)
     predict(fit, newdata = xN, type = "cla", noc = c(1,2,3))
     predict(fit, newdata = xN, type = "pro", noc = c(1,3))

     ## Fitting Pelora such that the first 70 variables (genes) are not grouped
     fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70])

     ## Working with the output
     fit
     summary(fit)
     plot(fit)
     fitted(fit)
     coef(fit)

     ## Fitted values and class probabilities for the training data
     predict(fit, type = "cla")
     predict(fit, type = "prob")

     ## Predicting fitted values and class labels for the random test data
     predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70])
     predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc  = 1:10)
     predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")

