gsim               package:plsgenomics               R Documentation

_G_S_I_M _f_o_r _b_i_n_a_r_y _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'gsim' performs prediction using Lambert-Lacroix and
     Peyre's GSIM algorithm.

_U_s_a_g_e:

     gsim(Xtrain, Ytrain, Xtest=NULL, Lambda, hA, hB=NULL, NbIterMax=50)

_A_r_g_u_m_e_n_t_s:

  Xtrain: a (ntrain x p) data matrix of predictors. 'Xtrain' must be a
          matrix.  Each row corresponds to an observation and each
          column to a predictor variable.

  Ytrain: a ntrain vector of responses. 'Ytrain' must be a vector. 
          'Ytrain' is a {1,2}-valued vector and contains the response
          variable for each observation.

   Xtest: a (ntest x p) matrix containing the predictors for the test
          data set. 'Xtest' may also be a vector of length p
          (corresponding to only one test observation). If 'Xtest' is
          not equal to NULL, then the prediction  step is made for
          these new predictor variables.

  Lambda: a positive real value. 'Lambda' is the ridge regularization
          parameter.

      hA: a strictly positive real value. 'hA' is the bandwidth for
          GSIM step A.

      hB: a strictly positive real value. 'hB' is the bandwidth for 
          GSIM step B. if 'hB' is equal to NULL, then hB value is
          chosen using a plug-in method.

NbIterMax: a positive integer. 'NbIterMax' is the maximal number of
          iterations in the Newton-Rapson parts.

_D_e_t_a_i_l_s:

     The columns of the data matrices 'Xtrain' and 'Xtest' may not be
     standardized,  since standardizing is performed by the function
     'gsim' as a preliminary step before the algorithm is run. 

     The procedure described in Lambert-Lacroix and Peyre (2005) is
     used to estimate  the projection direction beta. When 'Xtest'  is
     not equal to NULL, the procedure predicts the labels for these new
     predictor variables.

_V_a_l_u_e:

     A list with the following components: 

   Ytest: the ntest vector containing the predicted labels for the
          observations from  'Xtest'.

    beta: the p vector giving the projection direction estimated.

      hB: the value of hB used in step B of GSIM (value given by the
          user or estimated by plug-in if the argument value was equal
          to NULL)

DeletedCol: the vector containing the column number of 'Xtrain' when
          the  variance of the corresponding predictor variable is
          null. Otherwise 'DeletedCol'=NULL

     Cvg: the 0-1 value indicating convergence of the algorithm (1 for
          convergence, 0 otherwise).

_A_u_t_h_o_r(_s):

     Sophie Lambert-Lacroix (<URL:
     http://www-lmc.imag.fr/lmc-sms/Sophie.Lambert>)  and  Julie Peyre
     (<URL: http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/>).

_R_e_f_e_r_e_n_c_e_s:

     S. Lambert-Lacroix and Julie Peyre (2005). Local likelyhood
     regression in  generalized linear single-index models with
     applications to microarray data,  TR0563 IAP Statistics network
     (<URL: http://www.stat.ucl.ac.be/IAP/>).

_S_e_e _A_l_s_o:

     'gsim.cv', 'mgsim', 'mgsim.cv'.

_E_x_a_m_p_l_e_s:

     # load plsgenomics library
     library(plsgenomics)

     # load Colon data
     data(Colon)
     IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8))

     Xtrain <- Colon$X[IndexLearn,]
     Ytrain <- Colon$Y[IndexLearn]
     Xtest <- Colon$X[-IndexLearn,]

     # preprocess data
     resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),log10.scale=TRUE,row.stand=TRUE)

     # perform prediction by GSIM
     res <- gsim(Xtrain=resP$pXtrain,Ytrain= Ytrain,Xtest=resP$pXtest,Lambda=10,hA=50,hB=NULL)
        
     res$Cvg
     sum(res$Ytest!=Colon$Y[-IndexLearn])

