mcsimex                package:simex                R Documentation

_T_h_e _M_i_s_c_l_a_s_s_i_f_i_c_a_t_i_o_n _S_I_M_E_X

_D_e_s_c_r_i_p_t_i_o_n:

     Implementation of the Misclassification SIMEX Algorithm as
     described by Kchenhoff, Mwalili and Lesaffre.

_U_s_a_g_e:

     mcsimex(model
             , SIMEXvariable
             , mc.matrix
             , lambda = c(0.5,1,1.5,2)
             , B = 100
             , jackknife.estimation = "quad"
             , asymptotic = TRUE
             , fitting.method = "quad")

_A_r_g_u_m_e_n_t_s:

   model: The naive model, the misclassified variable must be a factor

mc.matrix: If one variable is misclassified it can be a matrix. If more
          than one variable is misclssified it must be a list of the
          misclassification matrices, names must match with the
          SIMEXvariabel names, column- and row-names must match with
          the factor levels. If a special misclssification is desired,
          the name of a function can be specified (see details)

  lambda: vector of exponents for the misclassification matrix (without
          0)

SIMEXvariable: vector of names of the variables for which the
          MCSIMEX-method should be applied

       B: number of iterations for each lambda

fitting.method: linear, quadratic and loglinear are implemented (first
          4 letters are enough)

jackknife.estimation: specifying the extrapolation method for jackknife
          variance estimation. Can be set to FALSE if it should not be
          performed

asymptotic: logical, indicating if asymptotic variance estimation
          should be done, the option 'x =TRUE' must be enabled in the
          naive model.

_D_e_t_a_i_l_s:

     if 'mc.matrix' is a function the first argument of that function
     must  be the whole dataset used in the naive model, the second
     argument must be  the exponent (lambda) for the misclassification.
     The function must return a 'data.frame' containing the
     misclassified 'SIMEXvariable'. An example can be found below.

     Asymptotic variance estimation is only implemented for 'lm' and
     'glm'

     The loglinear fit has the form g(lambda,GAMMA) =
     exp(gamma0+gamma1*lambda). It is realized via the log() function.
     To avoid negaitve values the minimum +1 of the dataset is added
     and after the prediction later subtracted. exp(predict(...)) -
     min(data)-1

     the 'log2' fit is fitted via the 'nls()' function. As starting
     values the fit of the loglinear extrapolant is used. For details
     see 'fit.logl'

_V_a_l_u_e:

     object of class 'MCSIMEX' 

coefficients: corrected coefficients of the MCSIMEX-model 

SIMEX.estimates: the MCSIMEX-estimates of the coefficients for each
          lambda

  lambda: the values of lambda

   model: naive model

mc.matrix: the misclassification matrix

       B: the number of iterations

extrapolation: model-object of the extrapolation step 

fitting.method: the fitting method used in the exrapolation step

SIMEXvariable: name of the SIMEXvariables

    call: the function call,

variance.asymptotic: the asymptotic variance estimates

variance.jackknife: the jackknife variance estimates

extrapolation.variance: the model-object of the variance extrapolation

variance.jackknife.lambda: data set for the extrapolation

   theta: all estimated coefficients for each lambda and B

     ...

_A_u_t_h_o_r(_s):

     Wolfgang Lederer, wolfgang.lederer@googlemail.com

_R_e_f_e_r_e_n_c_e_s:

     Kchenhoff, H., Mwalili, S. M.  and Lesaffre (2005) E. A general
     method for dealing with misclassification in regression: the
     Misclassification SIMEX. _Biometrics_,in press

_S_e_e _A_l_s_o:

     'misclass', 'simex','refit'

_E_x_a_m_p_l_e_s:

     x <- rnorm(200,0,1.142)
     z <- rnorm(200,0,2)
     y <- factor(rbinom(200,1,(1/(1+exp(-1*(-2 + 1.5*x -0.5*z))))))
     Pi <- matrix(data = c(0.9,0.1,0.3,0.7), nrow =2, byrow =FALSE)
     dimnames(Pi) <- list(levels(y),levels(y))
     ystar <- misclass(data.frame(y), list(y = Pi), k=1)[,1]
     naive.model <- glm(ystar ~ x + z, family = binomial, x=TRUE, y =TRUE)
     true.model  <- glm(y ~ x + z, family = binomial)
     simex.model <- mcsimex(naive.model, mc.matrix = Pi, SIMEXvariable = "ystar")

     op <-par(mfrow = c(2,3))
     invisible(lapply(simex.model$theta, boxplot, notch=TRUE, outline =FALSE, names=c(0.5,1,1.5,2)))
     plot(simex.model)
     par(op)

     ## example for a function which can be supplied to the function mcsimex()
     ## "xm" is the variable which is to be misclassified

     my.mc <- function(datas,k){
             xm <- datas$"xm"
             p1 <- matrix(data = c(0.75,0.25,0.25,0.75), nrow =2, byrow = FALSE)
             colnames(p1) <- levels(xm)
             rownames(p1) <- levels(xm)
             p0 <- matrix(data = c(0.8,0.2,0.2,0.8), nrow =2, byrow =FALSE)
             colnames(p0) <- levels(xm)
             rownames(p0) <- levels(xm)
             xm[datas$y=="1"] <- misclass(data.frame(xm=xm[datas$y=="1"]),list(xm=p1), k=k)[,1]
             xm[datas$y=="0"] <- misclass(data.frame(xm=xm[datas$y=="0"]),list(xm=p0), k=k)[,1]
             xm <- factor(xm)
             return(data.frame(xm))
             }

