EM                 package:hapassoc                 R Documentation

_E_M _a_l_g_o_r_i_t_h_m _t_o _f_i_t _m_a_x_i_m_u_m _l_i_k_e_l_i_h_o_o_d _e_s_t_i_m_a_t_e_s _o_f _t_r_a_i_t _a_s_s_o_c_i_a_t_i_o_n_s _w_i_t_h _S_N_P _h_a_p_l_o_t_y_p_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function takes a dataset of haplotypes in which rows for
     individuals of uncertain phase have been augmented by
     "pseudo-individuals" who carry the possible multilocus genotypes
     consistent with the single-locus phenotypes.  The EM algorithm is
     used to find MLE's for trait associations with covariates in
     generalized linear models.

_U_s_a_g_e:

     EM(form,haplos.list,baseline = "missing" ,family = binomial(),
     gamma = FALSE, maxit = 50, tol = 0.001, ...)

_A_r_g_u_m_e_n_t_s:

    form: model equation in usual R format

haplos.list: list of haplotype data from 'PreEM'

baseline: optional, haplotype to be used for baseline coding. Default
          is the most frequent haplotype.

  family: binomial, poisson, gaussian or gamma are supported,
          default=binomial

   gamma: initial estimates of haplotype frequencies, default values
          are calculated in 'PreEM' using standard haplotype-counting 
          (i.e. EM algorithm without adjustment for non-haplotype
          covariates)

   maxit: maximum iterations of the EM loop, default=50

     tol: convergence tolerance in terms of the maximum difference in 
          parameter estimates between interations; default=0.001

     ...: additional arguments to be passed to the glm function such 
          as starting values for parameter estimates in the risk model

_V_a_l_u_e:

      it: number of iterations of the EM algorithm

    beta: estimated regression coefficients

   gamma: estimated haplotype frequencies

    fits: fitted values of the trait

     wts: final weights calculated in last iteration of the EM loop.
          These are estimates of the conditional probabilities of each
          multilocus genotype given the observed  single-locus
          genotypes.

     var: joint variance-covariance matrix of the estimated regression
          coefficients and the estimated haplotype frequencies

dispersionML: maximum likelihood estimate of dispersion parameter  (to
          get the moment estimate, use 'summary.EM')

  family: family of the generalized linear model (e.g. binomial, 
          gaussian, etc.)

response: trait value

converged: TRUE/FALSE indicator of convergence. If the algorithm  fails
          to converge, only the converged indicator is returned.

_S_e_e _A_l_s_o:

     'PreEM','summary.EM','glm','family'.

_E_x_a_m_p_l_e_s:

     data(hypoDat)
     example.preEM<-PreEM(hypoDat, 3)

     names(example.preEM$haploDM)
     # "h000"   "h001"   "h010"   "h011"   "h100"   "pooled"

     # Logistic regression, baseline group: '001/001'

     example.regr <- EM(affected ~ attr + h000+ h010 + h011 + h100 + pooled,
                          example.preEM, family=binomial())

