getXlist             package:MasterBayes             R Documentation

_D_e_s_i_g_n _M_a_t_r_i_c_e_s _f_o_r _t_h_e _M_u_l_t_i_n_o_m_i_a_l _L_o_g-_L_i_n_e_a_r _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Forms design matrices for each offspring, and stores other
     relavent inforamtion.

_U_s_a_g_e:

     getXlist(PdP, GdP=NULL, A=NULL, E1=0.005, E2=0.005, mm.tol=999, ...)

_A_r_g_u_m_e_n_t_s:

     PdP: 'PdataPed' object

     GdP: optional 'GdataPed' object

       A: optional list of allele frequencies.  If not sepcified and
          'GdP' exists, allele frequencies are taken from 'GdP$G' using
          'extractA'

      E1: the probability of a dominant allele being scored as a
          recessive allele for dominant markers

      E2: per-allele genotyping error rate. 'E2'(2-'E2') is the
          per-genotype rate defined in Kalinowski (2006) for codominant
          markers, and 'E2' is the probability of a recessive allele
          being scored as a dominant allele for dominant markers

  mm.tol: maximum number of genotype mismatches tolerated for potential
          parents

     ...: further arguments to be passed

_D_e_t_a_i_l_s:

     This is the main R routine for setting up design matrices for the
     various models that may be defined in the 'formula' argument of
     'PdataPed'.  If a 'GdataPed' object is passed to 'getXlist' design
     matrices of genetic likelihoods are calculated (see 'fillX.G'),
     and the number of mismatches between offspring and parental
     genotypes are stored (see 'mismatches').  'mm.tol' specifies the
     maximum number of mismatches that are tolerated between an
     offspring and a parent.  Parents that exceed this number of
     mismatches are excluded, and the design matrices for non-excluded
     parents are reordered by the number of mismatches. This increases
     the efficency of sampling from the multinomial distribution of
     parents, because high probability parents appear first.

_V_a_l_u_e:

      id: vector of unique identifiers taken from 'PdP'

par_order: index relating the order of parameters in 'PdP$formula' to
          the columns of the design matrices

beta_map: index relating the vector of unique parameters to the columns
          of the design matrices

       X: list of design matrices and other information.

_N_o_t_e:

     Each element of 'X' refers to an offspring ('names(X)') and
     contains vectors for the set of potential parents ('restdam.id'
     and 'restsire.id') of each offspring.  Also included are the set
     of individuals that may have been parents but have been excluded
     for certain reasons ('dam.id' and 'sire.id').  Exclusion may have
     been based on the number of genotype mismatches, or it may have
     been on biological grounds (See the 'keep' argument of 'varPed'). 
     Parental id's are stored as integers which correspond to the
     actual id's stored in 'id'.  Parental id's greater than the length
     of 'id' refer to unsampled parents.

     Six types of design matrix are used ('XDus', 'XDs', 'XSus', 'XSs',
     'XDSus', 'XDSs').  'XD..' are the design matrices for dams, and
     'XS..' are the design matrices for sires.  The rows of each design
     matrix are associated with indivdiuals in dam.id and sire.id,
     respectively.  When interactions between dam and sire variables
     are modelled, or a 'varPed' variable is created using the argument
     'relational="MATE"', the design matrices vary over parental
     combinations.  'XDS..' are the design matrices for parental
     combinations with sire's varying the fastest.  Each of these three
     types of design matrix have two subclasses: 's' and 'us'. 's' are
     design matrices which are fully observed, either beacuse unsampled
     parents do not exist or beacuse unsampled parents have known
     phenotypes (see argument 'USvar' in 'varPed'). 'us' are for design
     matrices where the phenotypes of unsampled parents are unkown. 
     The matrices 'XDus' and 'Xsus' have a row of 'NA''s which
     correspond to the unsampled parent category.  The design matrix
     'XDSus' will typically have many rows of {NA}'s because each
     sampled parent may be paired to an unsampled indiviual.

     When the argument 'gender=NULL' is passed to 'varPed' the
     respective columns in the dam and sire design matrices are
     associated with a single parameter.  Because of this the number of
     parameters to be estimated may be less than the total number of
     columns in the 6 design matrices.  'beta_map' relates a parameter
     vector to the columns of the design matrices.  The columns of the
     design matrices are numbered in the order they are introduced in
     the preceding paragraph (i.e XDus through to XDSs). The parameter
     vector is ordered identically except parameters associated with
     genderless variables are omitted for males. 'par_order' is similar
     to 'beta_map' but relates the order of the parameters specified in
     the 'formula' argument to 'PdataPed' to the respective columns of
     the design matrices.

     If the argument 'relational="OFFSPRING"' is specified in 'varPed',
     or the set of potential parents varies over offspring, the design
     matrices will vary across offspring.  For this reason I create a
     design matrix for each offspring irrespective of whether the
     matrices vary or not.  The design matrices for the genetic
     likelihoods will always vary over offspring.

_A_u_t_h_o_r(_s):

     Jarrod Hadfield j.hadfield@sheffield.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Hadfield J.D. _et al_ (2006) Molecular Ecology 15 3715-31
     Kalinowski S.T. _et al_ (2006) Molecular Ecology _in press_
     Hadfield J. D. _et al_ (2007) _in prep_

_S_e_e _A_l_s_o:

     'varPed', 'MCMCped'

_E_x_a_m_p_l_e_s:

     id<-1:20
     sex<-sample(c("Male", "Female"),20, replace=TRUE)
     offspring<-c(rep(0,18),1,1)
     lat<-rnorm(20)
     long<-rnorm(20)
     mating_type<-gl(2,10, label=c("+", "-"))

     test.data<-data.frame(id, offspring, lat, long, mating_type, sex)

     res1<-expression(varPed("offspring", restrict=0))
     var1<-expression(varPed(c("lat", "long"), gender="Male", 
       relational="OFFSPRING"))
     var2<-expression(varPed(c("mating_type"), gender="Female", 
       relational="MATE"))
     var3<-expression(varPed("mating_type", gender="Male"))

     PdP<-PdataPed(formula=list(res1, var1, var2, var3), data=test.data)

     X.list<-getXlist(PdP)
     X.list$X$"19"$XSs

     # For the first offspring we have the design matrix for sires
     # The first column represents the distance between each male 
     # and each offspring. The second column indicates the male's 
     # mating type. Note that contrasts are set up with the first 
     # male so the indicator variables may be negative.

     matrix(X.list$X$"19"$XDSs, ncol=length(X.list$X$"19"$dam.id), 
        nrow=length(X.list$X$"19"$sire.id))

     # incidence matrix indicating whether Females (columns) and Males (rows)
     # are the same mating type. Again this is a contrast with the first 
     # parental combination (which is +/+) so 0 actually represents parents
     # with the same mating type.

