ms.nprev              package:twostage              R Documentation

_L_o_g_i_s_t_i_c _r_e_g_r_e_s_s_i_o_n _o_f _t_w_o-_s_t_a_g_e _d_a_t_a _u_s_i_n_g _s_e_c_o_n_d _s_t_a_g_e _s_a_m_p_l_e 
_a_n_d _f_i_r_s_t _s_t_a_g_e _s_a_m_p_l_e _s_i_z_e_s _o_r _p_r_o_p_o_r_t_i_o_n_s (_p_r_e_v_a_l_e_n_c_e_s) _a_s _i_n_p_u_t

_D_e_s_c_r_i_p_t_i_o_n:

     Weighted logistic regression using the Mean Score method 

     *BACKGROUND*

     This algorithm will analyse the second stage data from a two-stage
     design, incorporating as appropriate weights the first stage
     sample sizes in each of the strata defined by the first-stage
     variables. If the first-stage sample sizes are unknown, you can
     still get estimates (but not standard errors) using estimated
     relative  frequencies (prevalences)of the strata. To ensure that
     the sample sizes or prevalences are provided in the correct order,
     it is  advisable to first run the 'coding' function.

_U_s_a_g_e:

     ms.nprev(x=x,y=y,z=z,n1="option",prev="option",factor=NULL,print.all=FALSE)

_A_r_g_u_m_e_n_t_s:

     REQUIRED ARGUMENTS 

       x: matrix of predictor variables for regression model

       y: response variable (should be binary 0-1)

       z: matrix of any surrogate or auxiliary variables which must be
          categorical, 


          and one of the following:

      n1: vector of the first stage sample sizes  for each (y,z)
          stratum: must be provided in the correct order (see 'coding'
          function) 
           OR

    prev: vector of the first-stage or population proportions
          (prevalences) for each (y,z) stratum: must be provided in the
          correct order  (see 'coding' function) 


          OPTIONAL ARGUMENTS

print.all: logical value determining all output to be printed.  The
          default is FALSE.

  factor: factor variables; if the columns of the matrix of predictor
          variables have names, supply these names,  otherwise supply
          the column numbers. MS.NPREV will fit  separate coefficients
          for each level of the factor variables.

_D_e_t_a_i_l_s:

     The response, predictor and surrogate variables  have to be
     numeric. If you have multiple columns of  z, say (z1,z2,..zn),
     these will be recoded into a single vector 'new.z'

       z1  z2  z3  new.z
        0   0   0      1
        1   0   0      2
        0   1   0      3
        1   1   0      4
        0   0   1      5
        1   0   1      6
        0   1   1      7
        1   1   1      8

     If some of the value combinations do not exist  in your data, the
     function will adjust accordingly.  For example if the combination
     (0,1,1) is absent, then (1,1,1) will be coded as 7.

_V_a_l_u_e:

     If called with 'prev' will return only:

     *A list called 'table' containing the following:*

  ylevel: the distinct values (or levels) of y

  zlevel: the distinct values (or levels) of z

    prev: the prevalences for each (y,z) stratum

      n2: the sample sizes at the second stage in each stratum  defined
          by (y,z) 

          *and a list called 'parameters' containing:*

     est: the Mean score estimates of the coefficients in the logistic
          regression model 


          If called with 'n1' it will return:

          *a list called 'table' containing:*

  ylevel: the distinct values (or levels) of y

  zlevel: the distinct values (or levels) of z

      n1: the sample size at the first stage in each (y,z) stratum

      n2: the sample sizes at the second stage in each stratum  defined
          by (y,z) 

          *and a list called 'parameters' containing:*

     est: the Mean score estimates of the coefficients in the logistic
          regression model

      se: the standard errors of the Mean Score estimates

       z: Wald statistic for each coefficient

  pvalue: 2-sided p-value (H0: coeff=0) 


          If print.all=T, the following lists will also be returned:

     Wzy: the weight matrix used by the mean score algorithm, for each
          Y,Z stratum: this will be in the same order  as n1 and prev

   varsi: the variance of the score in each Y,Z stratum

    Ihat: the Fisher information matrix

_R_e_f_e_r_e_n_c_e_s:

     Reilly,M and M.S. Pepe. 1995. A mean score method for  missing and
     auxiliary covariate data in  regression models. _Biometrika_
     *82:*299-314 

     Reilly,M. 1996. Optimal sampling strategies for  two-stage
     studies. _Amer. J. Epidemiol._  *143:*92-100

_S_e_e _A_l_s_o:

     'fixed.n','budget','precision' 'coding','cass1','cass2'

_E_x_a_m_p_l_e_s:

     ## Not run: As an illustrative example, we use the CASS pilot data,"cass1"
     from Reilly (1996)

     Use ## End(Not run)
     data(cass1)        #to load the data
     ## Not run: and
     help(cass1)        #for details

     ## Not run: The first-stage sample sizes are:

             Y       Z       n
             0       0       6666
             0       1       1228
             1       0       144
             1       1       58      

     An analysis of the pilot data using Mean Score## End(Not run)

     # supply the first stage sample sizes in the correct order
     n1=c(6666, 1228, 144, 58)
     ms.nprev(y=cass1[,1], x=cass1[,2:3],z=cass1[,3],n1=n1)

     ## Not run: gives the results:

     [1] "please run coding function to see the order in which you"
     [1] "must supply the first-stage sample sizes or prevalences"
     [1] " Type ?coding for details!"
     [1] "For calls requiring n1 or prev as input, use the following order"
          ylevel z new.z n2
     [1,]      0 0     0 25
     [2,]      0 1     1 25
     [3,]      1 0     0 25
     [4,]      1 1     1 25
     [1] "Check sample sizes/prevalences"
     $table
          ylevel zlevel   n1 n2
     [1,]      0      0 6666 25
     [2,]      0      1 1228 25
     [3,]      1      0  144 25
     [4,]      1      1   58 25

     $parameters
                         est         se         z       pvalue
     (Intercept) -5.06286163 1.46495235 -3.455991 0.0005482743
     age          0.02166536 0.02584049  0.838427 0.4017909402
     sex          0.67381300 0.21807878  3.089769 0.0020031236
     ## End(Not run)

     ## Not run: 
     Note that the Mean Score algorithm produces smaller 
     standard errors of estimates than the complete-case
     analysis, due to the additional information in the
     incomplete cases.
     ## End(Not run)

