rpls               package:plsgenomics               R Documentation

_R_i_d_g_e _P_a_r_t_i_a_l _L_e_a_s_t _S_q_u_a_r_e _f_o_r _b_i_n_a_r_y _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'mrpls' performs prediction using Fort and
     Lambert-Lacroix (2005) RPLS algorithm.

_U_s_a_g_e:

     rpls(Ytrain,Xtrain,Lambda,ncomp,Xtest=NULL,NbIterMax=50)

_A_r_g_u_m_e_n_t_s:

  Xtrain: a (ntrain x p) data matrix of predictors. 'Xtrain' must be a
          matrix.  Each row corresponds to an observation and each
          column to a predictor variable.

  Ytrain: a ntrain vector of responses. 'Ytrain' must be a vector. 
          'Ytrain' is a {1,2}-valued vector and contains the response
          variable for each observation.

   Xtest: a (ntest x p) matrix containing the predictors for the test
          data set. 'Xtest' may also be a vector of length p
          (corresponding to only one  test observation).If 'Xtest' is
          not equal to NULL, then the prediction  step is made for
          these new predictor variables.

  Lambda: a positive real value. 'Lambda' is the ridge regularization
          parameter.

   ncomp: a positive integer. 'ncomp' is the number of PLS components. 
          If 'ncomp'=0,then the Ridge regression is performed without
          reduction  dimension. 

NbIterMax: a positive integer. 'NbIterMax' is the maximal number of
          iterations in the  Newton-Rapson parts.

_D_e_t_a_i_l_s:

     The columns of the data matrices 'Xtrain' and 'Xtest' may not be
     standardized,  since standardizing is performed by the function
     'rpls' as a preliminary step before the algorithm is run. 

     The procedure described in Fort and Lambert-Lacroix (2005) is used
     to determine latent components to be used for classification and
     when 'Xtest'  is not equal to NULL, the procedure predicts the
     labels for these new  predictor variables.

_V_a_l_u_e:

     A list with the following components: 

   Ytest: the ntest vector containing the predicted labels for the
          observations from  'Xtest'.

Coefficients: the (p+1) vector containing the coefficients weighting
          the  design matrix.

DeletedCol: the vector containing the column number of 'Xtrain' when
          the  variance of the corresponding predictor variable is
          null. Otherwise 'DeletedCol'=NULL

    hatY: If 'ncomp' is greater than 1, 'hatY' is a matrix of size
          ntest x ncomp  in such a way that the kth column corresponds
          to the predicted label obtained with k PLS components.

_A_u_t_h_o_r(_s):

     Sophie Lambert-Lacroix (<URL:
     http://www-lmc.imag.fr/lmc-sms/Sophie.Lambert>).

_R_e_f_e_r_e_n_c_e_s:

     G. Fort and S. Lambert-Lacroix (2005). Classification using
     Partial Least Squares with  Penalized Logistic Regression,
     Bioinformatics, vol 21,  n 8, 1104-1111.

_S_e_e _A_l_s_o:

     'rpls.cv', 'mrpls', 'mrpls.cv'.

_E_x_a_m_p_l_e_s:

     # load plsgenomics library
     library(plsgenomics)

     # load Colon data
     data(Colon)
     IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8))

     # preprocess data
     res <- preprocess(Xtrain= Colon$X[IndexLearn,], Xtest=Colon$X[-IndexLearn,],Threshold = c(100,16000),Filtering=c(5,500),log10.scale=TRUE,row.stand=TRUE)
     # the results are given in res$pXtrain and res$pXtest

     # perform prediction by RPLS
     resrpls <- rpls(Ytrain=Colon$Y[IndexLearn],Xtrain=res$pXtrain,Lambda=0.6,ncomp=1,Xtest=res$pXtest)
     resrpls$hatY
     sum(resrpls$Ytest!=Colon$Y[-IndexLearn])

     # prediction for another sample
     Xnew <- res$pXtest[1,]
     # Compute the linear predictor for each classes expect class 0
     eta <- c(1,Xnew) %*% resrpls$Coefficients
     Ypred <- which.max(c(0,eta))
     Ypred

