ms.nprev              package:meanscore              R Documentation

_L_o_g_i_s_t_i_c _r_e_g_r_e_s_s_i_o_n _o_f _t_w_o-_s_t_a_g_e _d_a_t_a _u_s_i_n_g _s_e_c_o_n_d _s_t_a_g_e _s_a_m_p_l_e 
_a_n_d _f_i_r_s_t _s_t_a_g_e _s_a_m_p_l_e _s_i_z_e_s _o_r _p_r_o_p_o_r_t_i_o_n_s (_p_r_e_v_a_l_e_n_c_e_s) _a_s _i_n_p_u_t

_D_e_s_c_r_i_p_t_i_o_n:

     Weighted logistic regression using the Mean Score method 

     *BACKGROUND*

     This algorithm will analyse the second stage data from a two-stage
     design, incorporating as appropriate weights the first stage
     sample sizes in each of the strata defined by the first-stage
     variables. If the first-stage sample sizes are unknown, you can
     still get estimates (but not standard errors) using estimated
     relative  frequencies (prevalences)of the strata. To ensure that
     the sample sizes or prevalences are provided in the correct order,
     it is  advisable to first run the 'coding' function.

_U_s_a_g_e:

     ms.nprev(x=x,y=y,z=z,n1="option",prev="option",factor=NULL,print.all=FALSE)

_A_r_g_u_m_e_n_t_s:

     REQUIRED ARGUMENTS 

       x: matrix of predictor variables for regression model

       y: response variable (should be binary 0-1)

       z: matrix of any surrogate or auxiliary variables which must be
          categorical , 


          and one of the following:

      n1: vector of the first stage sample sizes  for each (y,z)
          stratum: must be provided in the correct order (see 'coding'
          function) 
           OR

    prev: vector of the first-stage or population proportions
          (prevalences) for each (y,z) stratum: must be provided in the
          correct order  (see 'coding' function) 


          OPTIONAL ARGUMENTS

print.all: logical value determining all output to be printed.  The
          default is False (F).

  factor: factor variables; if the columns of the matrix of predictor
          variables have names, supply these names,  otherwise supply
          the column numbers. MS.NPREV will fit  separate coefficients
          for each level of the factor variables.

_D_e_t_a_i_l_s:

     The response, predictor and surrogate variables  have to be
     numeric. If you have multiple columns of  z, say (z1,z2,..zn),
     these will be recoded into a single vector 'new.z'

       z1  z2  z3  new.z
        0   0   0      1
        1   0   0      2
        0   1   0      3
        1   1   0      4
        0   0   1      5
        1   0   1      6
        0   1   1      7
        1   1   1      8

     If some of the value combinations do not exist  in your data, the
     function will adjust accordingly.  For example if the combination
     (0,1,1) is absent, then (1,1,1) will be coded as 7.

_V_a_l_u_e:

     If called with 'prev' will return only:

     A list called "table" containing the following:

  ylevel: the distinct values (or levels) of y

  zlevel: the distinct values (or levels) of z

    prev: the prevalences for each '(ylevel,zlevel)' stratum

      n2: the sample sizes at the second stage in each stratum  defined
          by '(ylevel,zlevel)' 

          and a list called "parameters" containing:

     est: the Mean score estimates of the coefficients in the logistic
          regression model 


          If called with 'n1' it will return:

          a list called "table" containing:

  ylevel: the distinct values (or levels) of y

  zlevel: the distinct values (or levels) of z

      n1: the sample size at the first stage in each '(ylevel,zlevel)'
          stratum

      n2: the sample sizes at the second stage in each stratum  defined
          by '(ylevel,zlevel)' 

          and a list called "parameters" containing:

     est: the Mean score estimates of the coefficients in the logistic
          regression model

      se: the standard errors of the Mean Score estimates

       z: Wald statistic for each coefficient

  pvalue: 2-sided p-value (H0: coeff=0) 


          If print.all=TRUE, the following lists will also be returned:

     Wzy: the weight matrix used by the mean score algorithm, for each
          '(ylevel,zlevel)' stratum: this will be in the same order  as
          n1 and prev

   varsi: the variance of the score in each '(ylevel,zlevel)' stratum

    Ihat: the Fisher information matrix

_R_e_f_e_r_e_n_c_e_s:

     Reilly,M and M.S. Pepe. 1995. A mean score method for  missing and
     auxiliary covariate data in  regression models. _Biometrika_
     *82:*299-314

_S_e_e _A_l_s_o:

     'meanscore','coding', 'ectopic','simNA','glm'.

_E_x_a_m_p_l_e_s:

     ## Not run: 
     As an illustrative example, we use a simulated data set, simNA.
     Use
     ## End(Not run) 

     data(simNA)        #to load the data
     ## Not run: and
     help(simNA)        #for details

     ## Not run: The "complete cases" (i.e. second-stage data) can be extracted by:

     complete=simNA[!is.na(simNA[,3]),]

     ## Not run: Running a logistic regression analysis on the complete data:

     summary(glm(complete[,1]~complete[,3], family="binomial"))

     ## Not run: gives the following result

     Call:
     glm(formula = complete[, 1] ~ complete[, 3], family = "binomial")

     Coefficients:
                   Estimate Std. Error z value Pr(>|z|)    
     (Intercept)    0.05258    0.09879   0.532    0.595    
     complete[, 3]  1.01942    0.12050   8.460   <2e-16 ***
     ## End(Not run)

     ## Not run: 
     The first and second stage sample sizes can be viewed by running
     the "coding" function (see help(coding) for details)
     ## End(Not run)

     coding(x=simNA[,3], y=simNA[,1], z=simNA[,2])
     ## Not run: which gives the following:

      [1] "For calls to ms.nprev, input n1 or prev in the following order!!"
          ylevel z new.z  n1  n2
     [1,]      0 0     0 310 150
     [2,]      0 1     1 166  85
     [3,]      1 0     0 177  86
     [4,]      1 1     1 347 179
     ## End(Not run)

     ## Not run: An analysis of all first- and second-stage data using Mean Score:

     # supply the first stage sample sizes in the correct order
     n1=c(310,166,177,347)
     ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],n1=n1)

     ## Not run: gives the results:
     [1] "please run coding function to see the order in which you"
     [1] "must supply the first-stage sample sizes or prevalences"
     [1] " Type ?coding for details!"
     [1] "For calls to ms.nprev,input n1 or prev in the following order!!"
          ylevel z new.z  n2
     [1,]      0 0     0 150
     [2,]      0 1     1  85
     [3,]      1 0     0  86
     [4,]      1 1     1 179
     [1] "Check sample sizes/prevalences"
     $table
          ylevel zlevel  n1  n2
     [1,]      0      0 310 150
     [2,]      0      1 166  85
     [3,]      1      0 177  86
     [4,]      1      1 347 179

     $parameters
                       est         se          z    pvalue
     (Intercept) 0.0493998 0.07155138  0.6904103 0.4899362
     x           1.0188437 0.10187094 10.0013188 0.0000000
     ## End(Not run)

     ## Not run: If we supply the prevalances instead of first stage sample sizes
     p1=c(310,166,177,347)/1000
     ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],prev=p1)

     ## Not run: we get the output:

           ylevel zlevel  prev  n2
     [1,]      0      0 0.310 150
     [2,]      0      1 0.166  85
     [3,]      1      0 0.177  86
     [4,]      1      1 0.347 179

     $parameters
                        est
     (Intercept) 0.04939797
     x           1.01885599
     ## End(Not run)

     ## Not run: 
     Note that the Mean Score algorithm produces smaller 
     standard errors of estimates than the complete-case
     analysis, due to the additional information in the
     incomplete cases.
     ## End(Not run)

