preprocess            package:plsgenomics            R Documentation

_p_r_e_p_r_o_c_e_s_s _f_o_r _m_i_c_r_o_a_r_r_a_y _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'preprocess' performs a preprocessing of microarray
     data.

_U_s_a_g_e:

     preprocess(Xtrain, Xtest=NULL,Threshold=c(100,16000),Filtering=c(5,500),log10.scale=TRUE,row.stand=TRUE)

_A_r_g_u_m_e_n_t_s:

  Xtrain: a (ntrain x p) data matrix of predictors. 'Xtrain' must be a
          matrix.  Each row corresponds to an observation and each
          column to a predictor variable.

   Xtest: a (ntest x p) matrix containing the predictors for the test
          data set. 'Xtest' may also be a vector of length p
          (corresponding to only one  test observation).

Threshold: a vector of length 2 containing the values
          (threshmin,threshmax) for thresholding data in preprocess.
          Data is thresholded to value threshmin and ceiled to value
          threshmax. If 'Threshold' is NULL then no thresholding is
          done. By default, if the value given for 'Threshold' is not
          valid, no thresholding is done.

Filtering: a vector of length 2 containing the values (FiltMin,FiltMax)
          for filtering genes in preprocess. Genes with max/min$<=q
          FiltMin$ and (max-min)$<=q FiltMax$ are excluded. If
          'Filtering' is NULL then no thresholding is done. By default,
          if the value given for 'Filtering' is not valid, no filtering
          is done.

log10.scale: a logical value equal to TRUE if a log10-transformation
          has to be done.

row.stand: a logical value equal to TRUE if a standardisation in row
          has to be done.

_D_e_t_a_i_l_s:

     The pre-processing steps recommended by Dudoit et al. (2002) are
     performed. The default values are those adapted for Colon data.

_V_a_l_u_e:

     A list with the following components: 

 pXtrain: the (ntrain x p') matrix containing the preprocessed train
          data.

  pXtest: the (ntest x p') matrix containing the preprocessed test
          data.

_A_u_t_h_o_r(_s):

     Sophie Lambert-Lacroix (<URL:
     http://www-lmc.imag.fr/lmc-sms/Sophie.Lambert>)  and  Julie Peyre
     (<URL: http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/>).

_R_e_f_e_r_e_n_c_e_s:

     Dudoit, S. and Fridlyand, J. and Speed, T. (2002). Comparison of
     discrimination methods for the classification of tumors using gene
     expression data, Journal of the American Statistical Association,
     97, 77-87.

_E_x_a_m_p_l_e_s:

     # load plsgenomics library
     library(plsgenomics)

     # load Colon data
     data(Colon)
     IndexLearn <- c(sample(which(Colon$Y==2),27),sample(which(Colon$Y==1),14))

     Xtrain <- Colon$X[IndexLearn,]
     Ytrain <- Colon$Y[IndexLearn]
     Xtest <- Colon$X[-IndexLearn,]

     # preprocess data
     resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),log10.scale=TRUE,row.stand=TRUE)

     # how many genes after preprocess ?
     dim(resP$pXtrain)[2]

