prepare.cc              package:SimHap              R Documentation

_P_r_e_p_a_r_e _c_a_s_e-_c_o_n_t_r_o_l _d_a_t_a _f_o_r _i_n_f_e_r_r_i_n_g _h_a_p_l_o_t_y_p_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'prepare.cc' prepares case-control data when there may be missing
     values in the `case status' variable. This eliminates problems
     when using 'infer.haplos.cc'.

_U_s_a_g_e:

     prepare.cc(geno, pheno, cc.var)

_A_r_g_u_m_e_n_t_s:

    geno: a genotype data frame where each SNP is represented by two
          columns, one for each allele, in the form of 'haplo.dat'.

   pheno: a data frame containing phenotype data with at least two
          columns - a subject identifier and an indicator of disease
          status.

  cc.var: the column name of the parameter indicating disease status.
          Must be entered with quotations, e.g. ``DISEASE".

_D_e_t_a_i_l_s:

     'prepare.cc' searches for missing values in 'cc.var' and reduces
     'geno' and 'pheno' to include only those individuals with known
     disease status. These 'geno' and 'pheno' objects can then be
     passed into 'infer.haplos.cc'.

_V_a_l_u_e:

    geno: a genotype data frame where each SNP is represented by two
          columns, one for each allele, in the form of 'haplo.dat'.
          Individuals with unknown disease status are removed.

   pheno: a data frame containing phenotype data with at least two
          columns - a subject identifier and an indicator of disease
          status. Individuals with unknown disease status are removed.

_A_u_t_h_o_r(_s):

     Pamela A. McCaskie

_R_e_f_e_r_e_n_c_e_s:

     McCaskie, P.A., Carter, K.W. Hazelton, M., Palmer, L.J. (2007)
     SimHap: A comprehensive modeling framework for epidemiological
     outcomes and a multiple-imputation approach to haplotypic analysis
     of population-based data, [online] www.genepi.org.au/simhap.

_S_e_e _A_l_s_o:

     'infer.haplos.cc'

_E_x_a_m_p_l_e_s:

     data(SNP.dat)

     # convert SNP.dat to format required by infer.haplos.cc
     haplo.dat <- SNP2Haplo(SNP.dat)
     data(pheno.dat)

     # not run: will return an error due to missing data in variable 'DISEASE'
     # myinfer<-infer.haplos.cc(geno=haplo.dat, pheno=pheno.dat, 
     #       cc.var="DISEASE") 

     newdata <- prepare.cc(geno=haplo.dat, pheno=pheno.dat, cc.var="DISEASE")
     newhaplo.dat <- newdata$geno
     newpheno.dat <- newdata$pheno
     myinfer<-infer.haplos.cc(geno=newhaplo.dat, pheno=newpheno.dat, 
             cc.var="DISEASE")

     # prints haplotype frequencies among cases
     myinfer$hap.freq.cases

     # prints haplotype frequencies among controls
     myinfer$hap.freq.controls 

     # generated haplo object where haplotypes with a frequency 
     # below min.freq are grouped as a category called "rare"
     myhaplo<-make.haplo.rare(myinfer,min.freq=0.05) 
     mymodel <- haplo.bin(formula1=DISEASE~AGE+SBP+h.N1AA, 
             formula2=DISEASE~AGE+SBP, pheno=newpheno.dat, haplo=myhaplo, 
             sim=10)

