coding               package:twostage               R Documentation

_c_o_m_b_i_n_e_s _t_w_o _o_r _m_o_r_e _s_u_r_r_o_g_a_t_e/_a_u_x_i_l_i_a_r_y _v_a_r_i_a_b_l_e_s _i_n_t_o _a _v_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     recodes a matrix of categorical variables into a vector which
     takes  a unique value for each combination 

     *BACKGROUND*

     From the matrix Z of first-stage covariates, this function creates
      a vector which takes a unique value for each combination as
     follows:

       z1  z2  z3  new.z
        0   0   0      1
        1   0   0      2
        0   1   0      3
        1   1   0      4
        0   0   1      5
        1   0   1      6
        0   1   1      7
        1   1   1      8

     If some of the combinations do not exist, the function will adjust
     accordingly: for example if the combination (0,1,1) is absent
     above, then (1,1,1) will be coded as 7. 

     The values of this new.z are reported as 'new.z' in the printed
     output  (see 'value' below) 

     This function should be run on second stage data prior to using
     the ms.nprev function, as it illustrates the order in which the
     call to ms.nprev expects the first-stage sample sizes to be
     provided.

_U_s_a_g_e:

     coding(x=x,y=y,z=z,return=FALSE)

_A_r_g_u_m_e_n_t_s:

     REQUIRED ARGUMENTS 

       x: matrix of predictor variables for regression model

       y: response variable (should be binary 0-1)

       z: matrix of any surrogate or auxiliary variables which must be
          categorical 

          OPTIONAL ARGUMENTS

  return: logical value; if it's TRUE(T) the original surrogate or
          auxiliary variables and the re-coded auxilliary  variables
          will be returned.    The default is FALSE.  

_V_a_l_u_e:

     This function does not return any values *except* if 'return'=T. 

     If used with only second stage (i.e. complete) data, it will print
     the  following: 

  ylevel: the distinct values (or levels) of response variable

*z*1 ... *z*i: the distinct values of first stage variables  *z*1 ...
          *z*i

   new.z: recoded first stage variables. Each value represents a unique
          combination of  first stage variable values.

      n2: second stage sample sizes in each ('ylevel','new.z') stratum. 

          If used with combined first and second stage data (i.e. with
          NA for  missing values), in addition to the above items, the
          function will also print the following:

      n1: first-stage sample sizes in each ('ylevel','new.z') stratum.

_R_e_f_e_r_e_n_c_e_s:

     Reilly,M. 1996. Optimal sampling strategies for  two-stage
     studies. _Amer. J. Epidemiol._  *143:*92-100

_S_e_e _A_l_s_o:

     'ms.nprev','fixed.n','budget' 'precision', 'cass1','cass2'

_E_x_a_m_p_l_e_s:

     ## Not run: The CASS2 data set in Reilly (1996) has 2 categorical first-stage 
     variables in columns 2 (sex) and 3 (categorical weight). The predictor 
     variables are  column 2 (sex) and columns 4-9 and the response variable 
     is in column 1 (mort). See help(cass2) for further details. 

     The commands## End(Not run)
     data(cass2)
     coding(x=cass2[,c(2,4:9)],y=cass2[,1], z=cass2[,2:3])

     ## Not run: give the following coding scheme and first-stage and second-stage 
     sample sizes (n1 and n2 respectively)

     [1] "For calls requiring n1 or prev as input, use the following order"
           ylevel sex wtcat new.z n2
      [1,]      0   0     1     1 10
      [2,]      0   1     1     2 10
      [3,]      0   0     2     3 10
      [4,]      0   1     2     4 10
      [5,]      0   0     3     5 10
      [6,]      0   1     3     6 10
      [7,]      1   0     1     1  8
      [8,]      1   1     1     2 10
      [9,]      1   0     2     3 10
     [10,]      1   1     2     4 10
     [11,]      1   0     3     5 10
     [12,]      1   1     3     6 10
     ## End(Not run)

