logilasso             package:logilasso             R Documentation

_F_i_t_s _a _l_o_g-_l_i_n_e_a_r _m_o_d_e_l _a_n_d/_o_r _p_e_r_f_o_r_m_s _c_r_o_s_s-_v_a_l_i_d_a_t_i_o_n.

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a log-linear interaction model (log(p)=X*beta) penalizing the
     log-likelihood function assuming a multinomial sampling scheme.
     Penalization can be chosen as either an l1, l2 or a group l1
     penalty. In addition cross-validation is performed if cvfold is
     chosen larger than 1. The returned objects are of class
     'logilasso' if no cross-validation has been performed or of class
     'cvlogilasso', which is a subclass of 'logilasso', if
     cross-validation has been performed.

_U_s_a_g_e:

     logilasso(Y, combX = NULL, to.which.int=NULL, epsilon = 0.1,
     lambdainit = 1, lambdamin = 0.1, trace = 1, method = "groupl1",
     cvfold=1, Newton = TRUE, stopkrit = 1e-8, sse = NULL, X = NULL)

_A_r_g_u_m_e_n_t_s:

     There are 2 different ways of initializing the function: 1. 'Y' as
     a contingency table.  2. 'Y' together wit a combination matrix
     'combX' (see example).

       Y: The vector containing the counts for each cell in the
          contingency table. Either given as table or as vector
          together with a matrix 'combX', where each component of the
          vector corresponds to a combination of factors in the
          corresponding row of 'combX'.

   combX: Matrix of dimension length(Y) x number of factors. For each
          component of 'Y' the corresponding combinations of the level
          is given as a row entry. See example.

to.which.int: Up to which interaction should the solution be
          calculated. to.which.int=n indicate that interaction
          involving n factors are considered.

 epsilon: The step length of lambda. In each step the penalization
          parameter lambda is decreased by epsilon.

lambdainit: The upper bound for lambda, where the solution path for
          beta starts.

lambdamin: The lower bound for lambda, where the solution path ends.

   trace: Defines what is printed out during the calculation. 0 =
          nothing, 1 = Current cvfold  2 = additionally points for each
          lambda 3 = Every 10th step writes the active set 4 =
          Additionally the new active set are written out, whenever a
          component enters the active set

  method: Is either "groupl1", "l1" or "l2", depending on the
          penalization of  the coefficients.

  cvfold: If cvfold is larger than 1, cross-validation is performed.

  Newton: Logical. If 'Newton=TRUE', Newton steps are performed,
          otherwise the function 'optim' is used.

stopkrit: Convergence tolerance; the smaller the more precise, see
          details below.

     sse: 

       X: The design matrix 'X' which is used for fitting a log-linear
          model. Should not be specified by the user.

_D_e_t_a_i_l_s:

     For the convergence criteria see chapter 8.2.3.2 of Gill et al.
     (1981).  _Practical Optimization_, Academic Press.

     Dimitri P. Bertsekas (2003) _Nonlinear Programming_, Athena
     Scientific.

     For the resulting objects of class 'logilasso' and 'cvlogilasso'
     the methods 'plot', 'predict' and 'traceplot' are available. If
     Cross-Validation was performed ('cvfold>1'), then in addition the
     method 'graphmod' is applicable, which plots a graphical model.

_V_a_l_u_e:

     A 'cvlogilasso' object is returned in case cvfold is larger than
     1. A 'logilasso' object is returned in case cvfold is equal to 1.
     The class 'cvlogilasso' is a subclass of 'logilasso'.

    loss: A loss matrix of dimension cvfold x number of assessed
          lambdas in the solution path. For each part of the data left
          out, the loss for all lambdas of the solution path is
          calculated. The 'loss' is 'NULL' in case of 'cvfold=1'

    path: A matrix. In the second row the newly active or newly
          inactive components of beta are listed to the corresponding
          lambda in the first row of the matrix. Is 'NULL' for the
          class 'cvlogilasso'

betapath: A matrix of dimension length(beta) x number of assessed
          lambda in the solution path. The columns consist of the betas
          for the different lambdas in 'lambdapath'. Is 'NULL' for the
          class 'cvlogilasso'.

lambdapath: A vector of all lambdas for which the solution was
          calculated.

       X: The design matrix used to fit the log-linear model.

   nrfac: 

     {Number of factors.}

_A_u_t_h_o_r(_s):

     Corinne Dahinden, dahinden@stat.math.ethz.ch

_R_e_f_e_r_e_n_c_e_s:

     Corinne Dahinden, Giovanni Parmigiani, Mark Emerik and Peter
     Buehlmann available at <URL:
     http://stat.ethz.ch/~dahinden/Paper/BMC.pdf>

_E_x_a_m_p_l_e_s:

     ## Use logilasso on the reinis dataset provided in the
     ## package gRbase
     library(gRbase)
     data(reinis)

     ## Fit a log-linear model for lambdas between 1 and 0.1
     ## No cross-validation is performed
     fit <- logilasso(reinis,lambdainit=1,lambdamin=0.1)

     ### Different initialization: Y and combX
     ### 5 factors: All have 2 levels
     Y     <- c(4,1,3,2,9)
     combX <- rbind(c(1,0,1,1,0),c(1,0,0,1,1),c(0,1,0,0,1),c(0,0,1,0,0),c(1,1,0,0,1))
     ### 4 observations wit level 1 of factor 1, level 0 of factor two, level 1 of factor
     ### 3 and so on.
     ### The rows of combX correspond to a the levels of the five factors
     ### Must be numeric with 0/1/2/... and so on.
     fit2 <- logilasso(Y,combX)

     ## Trajectories from lambdainit to lambda optimal
     plot(fit2)
     traceplot(fit2)

     ## Predict functions
     pred <- predict(fit,lambda=0.3)

     ## Perform 3-fold cross-validation
     fitcv <- logilasso(reinis,lambdainit=1,lambdamin=0.1,cvfold=3)

     ## Plots a graphical model with the lambda calculated by cross-validation.
     plot(fitcv)
     graphmod(fitcv)
       
     predcv <- predict(fitcv)

