mgsim              package:plsgenomics              R Documentation

_G_S_I_M _f_o_r _c_a_t_e_g_o_r_i_c_a_l _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The function 'mgsim' performs prediction using Lambert-Lacroix and
     Peyre's MGSIM algorithm.

_U_s_a_g_e:

     mgsim(Ytrain,Xtrain,Lambda,h,Xtest=NULL,NbIterMax=50)

_A_r_g_u_m_e_n_t_s:

  Xtrain: a (ntrain x p) data matrix of predictors. 'Xtrain' must be a
          matrix.  Each row corresponds to an observation and each
          column to a predictor variable.

  Ytrain: a ntrain vector of responses. 'Ytrain' must be a vector. 
          'Ytrain' is a {1,...,c+1}-valued vector and contains the
          response variable for each observation. c+1 is the number of
          classes.

   Xtest: a (ntest x p) matrix containing the predictors for the test
          data set. 'Xtest' may also be a vector of length p
          (corresponding to only one  test observation). If 'Xtest' is
          not equal to NULL, then the prediction  step is made for
          these new predictor variables.

  Lambda: a positive real value. 'Lambda' is the ridge regularization
          parameter.

       h: a strictly positive real value. 'h' is the bandwidth for GSIM
          step A.

NbIterMax: a positive integer. 'NbIterMax' is the maximal number of
          iterations in the  Newton-Rapson parts.

_D_e_t_a_i_l_s:

     The columns of the data matrices 'Xtrain' and 'Xtest' may not be
     standardized,  since standardizing is performed by the function
     'mgsim' as a preliminary step before the algorithm is run. 

     The procedure described in Lambert-Lacroix and Peyre (2005) is
     used to estimate  the c projection directions and the coefficients
     of the parametric fit obtained  after projecting predictor
     variables onto the estimated directions. When 'Xtest'  is not
     equal to NULL, the procedure predicts the labels for these new
     predictor variables.

_V_a_l_u_e:

     A list with the following components: 

   Ytest: the ntest vector containing the predicted labels for the
          observations from  'Xtest'.

    beta: the (p x c) matrix containing the c estimated projection
          directions.

Coefficients: the (2 x c) matrix containing the coefficients of the
          parametric fit obtained  after projecting predictor variables
          onto these estimated directions. 

DeletedCol: the vector containing the column number of 'Xtrain' when
          the  variance of the corresponding predictor variable is
          null. Otherwise 'DeletedCol'=NULL

     Cvg: the 0-1 value indicating convergence of the algorithm (1 for
          convergence, 0 otherwise).

_A_u_t_h_o_r(_s):

     Sophie Lambert-Lacroix (<URL:
     http://www-lmc.imag.fr/lmc-sms/Sophie.Lambert>)  and  Julie Peyre
     (<URL: http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/>).

_R_e_f_e_r_e_n_c_e_s:

     S. Lambert-Lacroix and Julie Peyre (2005). Local likelyhood
     regression in  generalized linear single-index models with
     applications to microarray data,  TR0563 IAP Statistics network
     (<URL: http://www.stat.ucl.ac.be/IAP/>).

_S_e_e _A_l_s_o:

     'mgsim.cv', 'gsim', 'gsim.cv'.

_E_x_a_m_p_l_e_s:

     # load plsgenomics library
     library(plsgenomics)

     # load SRBCT data
     data(SRBCT)
     IndexLearn <- c(sample(which(SRBCT$Y==1),10),sample(which(SRBCT$Y==2),4),sample(which(SRBCT$Y==3),7),sample(which(SRBCT$Y==4),9))

     # perform prediction by MGSIM
     res <- mgsim(Ytrain=SRBCT$Y[IndexLearn],Xtrain=SRBCT$X[IndexLearn,],Lambda=0.001,h=19,Xtest=SRBCT$X[-IndexLearn,])
     res$Cvg
     sum(res$Ytest!=SRBCT$Y[-IndexLearn])

     # prediction for another sample
     Xnew <- SRBCT$X[83,]
     # projection of Xnew onto the c estimated direction
     Xproj <- Xnew %*% res$beta
     # Compute the linear predictor for each classes expect class 1
     eta <- diag(cbind(rep(1,3),t(Xproj)) %*% res$Coefficients)
     Ypred <- which.max(c(0,eta))
     Ypred
     SRBCT$Y[83]

