haplo.em             package:haplo.score             R Documentation

_E_M _C_o_m_p_u_t_a_t_i_o_n _o_f _H_a_p_l_o_t_y_p_e _P_r_o_b_a_b_i_l_i_t_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     For genotypes measured on unrelated subjects, with linkage phase 
     unknown,  compute  maximum  likelihood estimates of haplotype
     probabilities. Because linkage phase is unknown, there  may be 
     more than one pair of haplotypes that are consistent with  the
     oberved marker phenotypes, so posterior probabilities of pairs of
     haplotypes for each subject are also computed.

_U_s_a_g_e:

     haplo.em(geno, locus.label=NA, converge.eps=1e-06, maxiter=500)

_A_r_g_u_m_e_n_t_s:

    geno: Matrix of alleles, such that each locus has a  pair  of
          adjacent  columns  of  alleles,  and  the order of columns
          corresponds to the order of  loci  on  a  chromosome.   If
          there  are  K  loci, then ncol(geno) = 2*K. Rows represent
          alleles for each subject. 

locus.label : Vector of  labels  for  loci,  of  length  K  (see
          definition of geno matrix). 

converge.eps : Convergence criterion, based on absolute  change in log
          likelihood (lnlike). 

 maxiter: Maximum number of iterations of EM. 

_D_e_t_a_i_l_s:

     The input data are arranged  as  a  matrix,  with  N  rows
     representing N subjects, and 2K columns representing pairs of
     alleles for K loci whose phase is  unknown.  The  input data 
     matrix  is  reduced to the distinguishable un-phased multilocus
     marker phenotypes, along with their counts. For each 
     distinguishable  phenotype,  all  possible  pairs of haplotypes
     are enumerated. Maximum likelihood  estimation, implemented    by 
       the   expectation-maximization   (EM) algorithm, proceeds by
     assuming Hardy-Weinberg proportions of underlying genotypes, so
     that the probability of a pair of haplotypes is the product of
     their probabilities (times 2  if  haplotypes differ), and then
     relative probabilities are assigned to the list of possible
     underlying  pairs  of haplotypes for each genotype. The haplotypes
     are "counted" from the enumerated list of  all  possibilities, 
     but  the relative  probabilities  are  used  as  weights. These
     new counts are used to determine  new  haplotype  frequencies,
     which   in   turn   are   used   to  update  the  relative
     probabilities  to  new  values.  This   cyclic   iteration
     continues until the likelihood is maximized (i.e., minimal change
     in the lnlike).

_V_a_l_u_e:

     List with components:

converge: Indicator of convergence of the EM algorithm (1=converge, 0 =
          failed). 

   niter: Number of iterations completed in the EM alogrithm. 

locus.info: A list with  a  component for each locus.  Each component
          is also a list, and  the  items of a locus- specific list are
          the locus name and a vector for the unique alleles for the
          locus. 

locus.label: Vector of  labels  for  loci,  of  length  K  (see
          definition of input values). 

haplotype: Matrix of unique haplotypes. Each row represents a unique 
          haplotype, and the number of columns is the number of loci. 

hap.prob: Vector of mle's of haplotype probabilities.  The ith element
          of hap.prob corresponds to the ith row of  haplotype. 

hap.prob.noLD: Similar to hap.prob, but assuming no linkage
          disequilibrium. 

  lnlike: Value of lnlike at last EM iteration (maximum lnlike if
          converged). 

      lr: Likelihood ratio statistic to test no linkage disequilibrium
          among all loci. 

indx.subj: Vector for index of subjects, after  expanding  to all
          possible  pairs  of  haplotypes  for  each person. If indx=i,
          then i is the ith row of input matrix geno. If the ith
          subject has  n possible  pairs  of haplotypes that correspond
          to their marker phenotype, then i is repeated n times. 

   nreps: Vector for the count of haplotype pairs that map to each
          subject's marker genotypes. 

hap1code: Vector of codes for each subject's first haplotype. The
          values in hap1code are the row numbers of the unique
          haplotypes in the returned matrix haplotype. 

hap2code: Similar to hap1code, but for  each  subject's  second
          haplotype. 

    post: Vector of posterior probabilities of pairs of haplotypes for
          a person, given thier marker phenotypes. 

_S_i_d_e _E_f_f_e_c_t_s:

_R_e_f_e_r_e_n_c_e_s:

     Excoffier, L., and Slatkin, M.,  1995,  Maximum-likelihood
     estimation of molecular haplotype frequencies in a diploid
     population, Mol. Biol. Evol. 12(5):921-927.

     Hawley, M. E., and Kidd, K. K.,  1995,  HAPLO:  a  program using 
     the  EM  algorithm  to  estimate  the frequenciesof multi-site
     haplotypes, J.Heredity. 86:409-411.

     Long, J. C., Williams, R. C., and Urbanek,  M.,  1995,  An E-M 
     algorithm  and  testing  strategy  for multiple-locus haplotypes,
     Am.J.Hum.Genet. 56:799-810.

     Terwilliger, J. D., and Ott, J., 1994, Handbook  of  human gentic
     linkage, Johns Hopkins University Press, Baltimore.

_S_e_e _A_l_s_o:

     haplo.enum, haplo.hash, haplo.score

_E_x_a_m_p_l_e_s:

     ## Don't run: 
     haplo <- haplo.em(geno)
     ## End Don't run

