leukemia             package:plsgenomics             R Documentation

_G_e_n_e _e_x_p_r_e_s_s_i_o_n _d_a_t_a _f_r_o_m _G_o_l_u_b _e_t _a_l. (_1_9_9_9)

_D_e_s_c_r_i_p_t_i_o_n:

     Gene expression data (3051 genes and 38 tumor mRNA samples) from
     the leukemia microarray study of Golub et al. (1999).

_U_s_a_g_e:

     data(leukemia)

_D_e_t_a_i_l_s:

_V_a_l_u_e:

     A list with the following elements: 

       X: a (38 x 3051) matrix giving the expression levels of 3051 
          genes for 38 leukemia patients. Each row corresponds to a
          patient, each column to a gene.

       Y: a numeric vector of length 38 giving the cancer class of each
          patient.

gene.names: a matrix containing the names of the 3051 genes for the
          gene expression matrix 'X'. The three columns correspond to
          the gene 'index', 'ID', and 'Name', respectively. 

_S_o_u_r_c_e:

     The dataset was taken from the R package multtest. The data are
     described in Golub et al. (1999) and can be freely downloaded from
     <URL: http://www-genome.wi.mit.edu/MPR/>.

_R_e_f_e_r_e_n_c_e_s:

     S. Dudoit, J. Fridlyand and T. P. Speed (2002). Comparison of
     discrimination methods for the classification of tumors using gene
     expression data, Journal of the American Statistical Association
     *97*, 77-87.  

     Golub et al. (1999). Molecular classification of cancer: class
     discovery and class prediction by gene expression monitoring,
     Science *286*, 531-537.

_E_x_a_m_p_l_e_s:

     # load plsgenomics library
     library(plsgenomics)

     # load data set
     data(leukemia)

     # how many samples and how many genes ?
     dim(leukemia$X)

     # how many samples of class 1 and 2, respectively ?
     sum(leukemia$Y==1)
     sum(leukemia$Y==2)

