\name{aldh2}
\alias{aldh2}
\title{ALDH2 markers and Alcoholism}
\description{This data set contains eight ALDH2 markers 
  and Japanese patients and controls.
}
\usage{data(aldh2)}
\format{A data frame:
\describe{
\item{id}{subject id}
\item{y}{a variable taking value 0 for controls and 1 for Schizophrenia}
\item{D12S2070.a1}{D12S2070 allele a1}
\item{D12S2070.a2}{D12S2070 allele a2}
\item{D12S839.a1}{D12S839 allele a1}
\item{D12S839.a2}{D12S839 allele a2}
\item{D12S821.a1}{D12S821 allele a1}
\item{D12S821.a2}{D12S821 allele a2}
\item{D12S1344.a1}{D12S1344 allele a1}
\item{D12S1344.a2}{D12S1344 allele a2}
\item{EXON12.a1}{EXON12 allele a1}
\item{EXON12.a2}{EXON12 allele a2}
\item{EXON1.a1}{EXON1 allele a1}
\item{EXON1.a2}{EXON1 allele a2}
\item{D12S2263.a1}{D12S2263 allele a1}
\item{D12S2263.a2}{D12S2263 allele a2}
\item{D12S1341.a1}{D12S1341 allele a1}
\item{D12S1341.a2}{D12S1341 allele a2}
}
  
There are genotypes for 8 loci, with a prefix name
(e.g., "EXON12") and a suffix for each of two alleles (".a1" and ".a2").

The eight markers loci follows the following map (base pairs)

\tabular{lr}{
D12S2070   \tab (> 450 000),\cr
D12S839    \tab (> 450 000),\cr
D12S821    \tab (\eqn{\sim}{~} 400 000),\cr
D12S1344   \tab (   83 853),\cr
EXON12     \tab (    0),\cr
EXON1      \tab (   37 335),\cr
D12S2263   \tab (   38 927),\cr
D12S1341   \tab (> 450 000)
}
}
\source{
  Prof Ian Craig of Oxford and SGDP Centre, KCL
}
\keyword{datasets}

\eof
\name{apoeapoc}
\alias{apoeapoc}
\title{APOE/APOC1 markers and Schizophrenia}
\description{This data set contains APOE/APOC1 markers 
  and Chinese Schizophrenic patients and controls.
}
\usage{data(apoeapoc)}
\format{A data frame:
\describe{
\item{id}{subject id}
\item{y}{a variable taking value 0 for controls and 2 for Schizophrenia}
\item{sex}{sex}
\item{age}{age}
\item{apoe.a1}{APOE allele a1}
\item{apoe.a2}{APOE allele a2}
\item{apoc.a1}{APOC allele a1}
\item{apoc.a2}{APOC allele a2}
}
  
The last six variables are age, sex and genotypes for APOE and APOC 
with suffixes for each of two alleles (".a1" and ".a2").
}
\source{
  Dr JJ Shi of Western China Medical University
}
\keyword{datasets}

\eof
\name{bt}
\alias{bt}
\title{Bradley-Terry model for contingency table}
\usage{bt(x)}
\description{
This function calculates statistics under Bradley-Terry model. 

Inside the function is a function toETDT which generates data required by ETDT.
}
\arguments{
\item{x}{the data table}
}

\value{
The returned value is a list containing:
\item{y}{A column of 1}
\item{count}{the frequency count/weight}
\item{allele}{the design matrix}
\item{bt.glm}{a glm.fit object}
\item{etdt.dat}{a data table that can be used by ETDT}
}

\section{References}{
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs I. 
the method of paired comparisons. Biometrika 39:324--345

Sham PC, Curtis D (1995) An extended transmission/disequilibrium 
test ({TDT}) for multi-allelic marker loci. Ann. Hum. Genet. 59:323-336
}
\seealso{
\code{\link[gap]{mtdt}}
}

\examples{
\dontrun{
# Copeman JB, Cucca F, Hearne CM, Cornall RJ, Reed PW, Ronningen KS, Undlien DE, Nistico L, Buzzetti R, Tosi R, et al.
# (1995) Linkage disequilibrium mapping of a type 1 diabetes susceptibility gene (IDDM7) to chromosome 2q31-q33. 
# Nat Genet 9: 80-5

x <- matrix(c(0,0, 0, 2, 0,0, 0, 0, 0, 0, 0, 0,
              0,0, 1, 3, 0,0, 0, 2, 3, 0, 0, 0,
              2,3,26,35, 7,0, 2,10,11, 3, 4, 1,
              2,3,22,26, 6,2, 4, 4,10, 2, 2, 0,
              0,1, 7,10, 2,0, 0, 2, 2, 1, 1, 0,
              0,0, 1, 4, 0,1, 0, 1, 0, 0, 0, 0,
              0,2, 5, 4, 1,1, 0, 0, 0, 2, 0, 0,
              0,0, 2, 6, 1,0, 2, 0, 2, 0, 0, 0,
              0,3, 6,19, 6,0, 0, 2, 5, 3, 0, 0,
              0,0, 3, 1, 1,0, 0, 0, 1, 0, 0, 0,
              0,0, 0, 2, 0,0, 0, 0, 0, 0, 0, 0,
              0,0, 1, 0, 0,0, 0, 0, 0, 0, 0, 0),nrow=12)

# Bradley-Terry model, only deviance is available in R
bt.ex<-bt(x)
anova(bt.ex$bt.glm)
summary(bt.ex$bt.glm)
bt.ex$etdt.dat
}
}

\author{Jing hua Zhao}
\keyword{}




\eof
\name{chow.test}
\alias{chow.test}
\title{Chow's test for heterogeneity in two regressions}
\usage{chow.test(y1,x1,y2,x2,x=NULL)}
\description{
Chow's test is for differences between two or more regressions.  Assuming that
errors in regressions 1 and 2 are normally distributed with zero mean and
homoscedastic variance, and they are independent of each other, the test of
regressions from sample sizes \eqn{n_1} and \eqn{n_2} is then carried out using
the following steps.  1.  Run a regression on the combined sample with size
\eqn{n=n_1+n_2} and obtain within group sum of squares called \eqn{S_1}.  The
number of degrees of freedom is \eqn{n_1+n_2-k}, with \eqn{k} being the number
of parameters estimated, including the intercept.  2.  Run two regressions on
the two individual samples with sizes \eqn{n_1} and \eqn{n_2}, and obtain their
within group sums of square \eqn{S_2+S_3}, with \eqn{n_1+n_2-2k} degrees of
freedom.  3.  Conduct an \eqn{F_{(k,n_1+n_2-2k)}} test defined by \deqn{F =
\frac{[S_1-(S_2+S_3)]/k}{[(S_2+S_3)/(n_1+n_2-2k)]}} If the \eqn{F} statistic
exceeds the critical \eqn{F}, we reject the null hypothesis that the two
regressions are equal.

In the case of haplotype trend regression, haplotype frequencies from combined
data are known, so can be directly used.
}
\arguments{
\item{y1}{a vector of dependent variable}
\item{x1}{a matrix of independent variables}
\item{y2}{a vector of dependent variable}
\item{x2}{a matrix of independent variables}
\item{x}{a known matrix of independent variables}
}

\source{
\url{http://aoki2.si.gunma-u.ac.jp/R/}
}

\value{
The returned value is a vector containing (please use subscript to access them):

\item{F}{the F statistic}
\item{df1}{the numerator degree(s) of freedom}
\item{df2}{the denominator degree(s) of freedom}
\item{p}{the p value for the F test}
}

\section{References}{
Chow GC (1960). Tests of equality between sets of coefficients in two linear regression. Econometrica 28:591-605
}
\seealso{
\code{\link[gap]{htr}}
}

\examples{
\dontrun{
dat1 <- matrix(c(
	1.2, 1.9, 0.9,
	1.6, 2.7, 1.3,
	3.5, 3.7, 2.0,
	4.0, 3.1, 1.8,
	5.6, 3.5, 2.2,
	5.7, 7.5, 3.5,
	6.7, 1.2, 1.9,
	7.5, 3.7, 2.7,
	8.5, 0.6, 2.1,
	9.7, 5.1, 3.6), byrow=TRUE, ncol=3)

dat2 <- matrix(c(
	1.4, 1.3, 0.5,
	1.5, 2.3, 1.3,
	3.1, 3.2, 2.5,
	4.4, 3.6, 1.1,
	5.1, 3.1, 2.8,
	5.2, 7.3, 3.3,
	6.5, 1.5, 1.3,
	7.8, 3.2, 2.2,
	8.1, 0.1, 2.8,
	9.5, 5.6, 3.9), byrow=TRUE, ncol=3)

y1<-dat1[,3]
y2<-dat2[,3]
x1<-dat1[,1:2]
x2<-dat2[,1:2]
chow.test.r<-chow.test(y1,x1,y2,x2)
}
}
\author{Shigenobu Aoki, Jing hua Zhao}
\note{adapted from chow.R}
\keyword{}

\eof
\name{fbsize}
\alias{fbsize}
\title{Sample size for family-based linkage and association design}
\usage{fbsize(gamma,p,debug=0,error=0)}
\description{
This function implements Risch and Merikangas (1996) statistics 
evaluating power for family-based linkage and association design.
They are potentially useful in the prospect of genome-wide 
association studies.

The function calls auxiliary functions sn() and strlen; sn() 
contains the necessary thresholds for power calculation while
strlen() evaluates length of a string (generic).
}

\arguments{
  \item{gamma}{genotype relative risk assuming multiplicative model}
  \item{p}{frequency of disease allele}
  \item{debug}{verbose output}
  \item{error}{0=use the correct formula,1=the original paper}
}

\value{
The returned value is a list containing:

  \item{gamma}{input gamma}
  \item{p}{input p}
  \item{n1}{sample size for ASP}
  \item{n2}{sample size for TDT}
  \item{n3}{sample size for ASP-TDT}
  \item{lambdao}{lambda o}
  \item{lambdas}{lambda s}
}

\section{References}{

Risch, N. and K. Merikangas (1996). The future of genetic studies of
complex human diseases. Science 273(September): 1516-1517.

Risch, N. and K. Merikangas (1997). Reply to Scott el al. Science
275(February): 1329-1330.

Scott, W. K., M. A. Pericak-Vance, et al. (1997). Genetic analysis of 
complex diseases. Science 275: 1327.

}

\seealso{
\code{\link[gap]{pbsize}}
}
\examples{
\dontrun{
models <- matrix(c(
    4.0, 0.01,
    4.0, 0.10,
    4.0, 0.50, 
    4.0, 0.80,
    2.0, 0.01,
    2.0, 0.10,
    2.0, 0.50,
    2.0, 0.80,
    1.5, 0.01,    
    1.5, 0.10,
    1.5, 0.50,
    1.5, 0.80), ncol=2, byrow=TRUE)
    
cat("\nThe family-based result: \n")
cat("\ngamma   p     Y     N_asp   P_A    Het    N_tdt  Het N_asp/tdt  L_o  L_s\n\n")
for(i in 1:12) \{
  g <- models[i,1]
  p <- models[i,2]
  fbsize(g,p)
  if(i\%\%4==0) cat("\n")
\}

# APOE-4, Scott WK, Pericak-Vance, MA & Haines JL
# Genetic analysis of complex diseases 1327
g <- 4.5
p <- 0.15
cat("\nAlzheimer's:\n\n")
fbsize(g,p)
}
}
\author{Jing hua Zhao}
\note{extracted from rm.c}
\keyword{}

\eof
\name{fsnps}
\alias{fsnps}
\title{A case-control data involving four SNPs with missing genotype}
\description{
This is a simulated data
}
\usage{data(hla)}
\format{A data frame
\describe{
\item{id}{subject id}
\item{y}{a column of 0/1 indicating case/control}
\item{site1.a1}{SNP 1 allele a1}
\item{site1.a2}{SNP 1 allele a2}
\item{site2.a1}{SNP 2 allele a1}
\item{site2.a2}{SNP 2 allele a2}
\item{site3.a1}{SNP 3 allele a1}
\item{site3.a2}{SNP 3 allele a2}
\item{site4.a1}{SNP 4 allele a1}
\item{site4.a2}{SNP 4 allele a2}
}
  
The last eight variables are genotypes for 4 SNPs, coded in characters
}
\source{
Dr Sebastien Lissarrague of Genset
}
\keyword{datasets}

\eof
\name{genecounting}
\alias{genecounting}
\title{Gene counting for haplotype analysis}
\usage{genecounting(data,weight=NULL,convll=1,handle.miss=0,eps=0.00001,maxit=50,pl=0.001)}
\description{
Gene counting for haplotype analysis with missing data
}
\arguments{
  \item{data}{genotype table}
  \item{weight}{a column of frequencies}
  \item{convll}{set convergence criteria according to log-likelihood, if its value set to 1}
  \item{handle.miss}{to handle missing data, if its value set to 1}
  \item{eps}{the actual convergence criteria, with default value 1e-5}
  \item{maxit}{maximum number of iterations, with default value 50}
  \item{pl}{criteria for trimming haplotypes according to posterior probabilities}
}

\value{
The returned value is a list containing:

\item{h}{haplotype frequency estimates under linkage disequilibrium (LD)}
\item{h0}{haplotype frequency estimates under linkage equilibrium (no LD)}
\item{prob}{genotype probability estimates}
\item{l0}{log-likelihood under linkage equilibrium}
\item{l1}{log-likelihood under linkage disequilibrium}
\item{hapid}{unique haplotype identifier (defunct, see gc.em)}
\item{npusr}{number of parameters according user-given alleles}
\item{npdat}{number of parameters according to observed}
\item{htrtable}{design matrix for haplotype trend regression (defunct, see gc.em)}
\item{iter}{number of iterations used in gene counting}
\item{converge}{a flag indicating convergence status of gene counting}
\item{di0}{haplotype diversity under no LD, defined as \eqn{1-\sum (h_0^2)}{1-sum (h0^2)}}
\item{di1}{haplotype diversity under LD, defined as \eqn{1-\sum (h^2))}{1-sum (h^2)}}
}

\section{References}{

Zhao, J. H., Lissarrague, S., Essioux, L. and P. C. Sham (2002).
Gene-counting for haplotype analysis with missing genotypes.
Bioinformatics 18(12):1694-1695

Zhao, J. H. and P. C. Sham (2003). Generic number systems and haplotype
analysis. Comp Meth Prog Biomed 70: 1-9

}
\seealso{
\code{\link[gap]{gc.em}}, \code{\link[gap]{kbyl}}
}

\examples{
\dontrun{
# Now we use the HLA data for testing
data(hla)
hla.gc<-genecounting(hla[,3:8])
summary(hla.gc)
hla.gc$l0
hla.gc$l1

# Now we use ALDH2 data
data(aldh2)
aldh2.gc<-genecounting(aldh2[,3:6],handle.miss=1)
summary(aldh2.gc)
aldh2.gc$l0
aldh2.gc$l1
}
}
\author{Jing hua Zhao}
\note{adapted from GENECOUNTING}
\keyword{}

\eof
\name{gc.em}
\alias{gc.em}
\title{Gene counting for haplotype analysis}
\usage{
gc.em(data, locus.label=NA, converge.eps=1e-06, maxiter=500, handle.miss=0)
}

\description{
Gene counting for haplotype analysis with missing data, adapted for hap.score
}
\arguments{
  \item{data}{Matrix of alleles, such that each locus has a  pair  of
   adjacent  columns  of  alleles,  and  the order of columns
   corresponds to the order of  loci  on  a  chromosome.   If
   there  are  K  loci, then ncol(data) = 2*K. Rows represent
   alleles for each subject.}
  \item{locus.label }{Vector of  labels  for  loci,  of  length  K  (see definition of data matrix).}
  \item{converge.eps }{Convergence criterion, based on absolute  change in log likelihood (lnlike).}
  \item{maxiter}{Maximum number of iterations of EM.}
  \item{handle.miss}{a flag for handling missing genotype data, 0=no, 1=yes}
}

\value{
List with components:
  \item{converge}{Indicator of convergence of the EM algorithm
  (1=converged, 0 = failed).}
  \item{niter}{Number of iterations completed in the EM alogrithm.}
  \item{locus.info}{A list with  a  component for each locus.  Each
   component is also a list, and  the  items of a locus-
   specific list are the locus name and a vector for the
   unique alleles for the locus.}
  \item{locus.label}{Vector of  labels  for  loci,  of  length  K  (see
    definition of input values).}
  \item{haplotype}{Matrix of unique haplotypes. Each row represents a
   unique  haplotype, and the number of columns is the number of loci.}
  \item{hap.prob}{Vector of mle's of haplotype probabilities.  The ith
   element of hap.prob corresponds to the ith row of haplotype.}
  \item{hap.prob.noLD}{Similar to hap.prob, but assuming no linkage
   disequilibrium.}
  \item{lnlike}{Value of lnlike at last EM iteration (maximum lnlike if converged).}
  \item{lr}{Likelihood ratio statistic to test no linkage disequilibrium among all loci.}
  \item{indx.subj}{Vector for index of subjects, after  expanding  to
   all possible  pairs  of  haplotypes  for  each person. If
   indx=i, then i is the ith row of input matrix data. If the
   ith subject has  n possible  pairs  of haplotypes that
   correspond to their marker phenotype, then i is repeated n times.}
  \item{nreps}{Vector for the count of haplotype pairs that map to
   each subject's marker genotypes.}
  \item{hap1code}{Vector of codes for each subject's first haplotype.
   The values in hap1code are the row numbers of the unique
   haplotypes in the returned matrix haplotype.}
  \item{hap2code}{Similar to hap1code, but for  each  subject's  second haplotype.}
  \item{post}{Vector of posterior probabilities of pairs of
   haplotypes for a person, given thier marker phenotypes.}
  \item{htrtable}{A table which can be used in haplotype trend regression}
}

\section{References}{

Zhao, J. H., Lissarrague, S., Essioux, L. and P. C. Sham (2002).
Gene-counting for haplotype analysis with missing genotypes.
Bioinformatics 18(12):1694-1695

Zhao, J. H. and P. C. Sham (2003). Generic number systems and haplotype
analysis. Comp Meth Prog Biomed 70: 1-9

}
\seealso{
\code{\link[gap]{genecounting}}, \code{\link[hap]{kbyl}}
}

\examples{
\dontrun{
data(hla)
gc.em(hla[,3:8],locus.label=c("DQR","DQA","DQB"))
}
}

\author{Jing hua Zhao}
\note{Adapted from GENECOUNTING}
\keyword{}

\eof
\name{gcontrol}
\alias{gcontrol}
\title{genomic control}
\usage{gcontrol(data,zeta,kappa,tau2,epsilon,ngib,burn,idum)}

\description{
the genomic control statistics of Devlin and Roeder, the list of
parameters after data are optional (with default value)
}

\arguments{
\item{data}{the data matrix}
\item{zeta}{with default value 1000}
\item{kappa}{with default value 4}
\item{tau2}{with default value 1}
\item{epsilon}{with default value 0.01}
\item{ngib}{number of Gibbs steps, with default value 500}
\item{burn}{number of burn-ins with default value 50}
\item{idum}{seed for pseudorandom number sequence}
}

\source{
\url{http://www.stat.cmu.edu}
}

\value{
The returned value is a list containing:

\item{deltot}{the probability}
\item{x2}{the statistic}
\item{A}{the A vector}
}

\section{References}{
Devlin B, Roeder K (1999) Genomic control for association studies. 
Biometrics 55:997-1004
}

\examples{
\dontrun{
test<-c(1,2,3,4,5,6,  1,2,1,23,1,2, 100,1,2,12,1,1, 
        1,2,3,4,5,61, 1,2,11,23,1,2, 10,11,2,12,1,11)
test<-t(matrix(test,nrow=6))
gcontrol(test)
}
}

\author{Bobby Jones, Jing hua Zhao}

\note{Adapted from gcontrol.c Bobby Jones and Kathryn Roeder, 
use -Dexecutable for standalone program, function getnum in the original 
code needs \%*s to skip id string}

\keyword{}


\eof
\name{gif}
\alias{gif}
\title{Kinship coefficient and genetic index of familiality}
\usage{gif(data,gifset)}
\description{
The genetic index of familality is defined as the mean kinship between
all pairs of individuals in a set multiplied by 100,000. Formally, it 
is defined as 
\deqn{100,000 \times \frac{2}{n(n-1)}\sum_{i=1}^{n-1}\sum_{j=i+1}^n k_{ij}}{100,000 x 2/[n(n-1)]\sum_(i=1)^(n-1)\sum_(j=i+1)^n k_(ij)}
where \eqn{n} is the number of individuals in the set and \eqn{k_{ij}} is the
kinship coefficient between individuals \eqn{i} and \eqn{j}.

The scaling is purely for convenience of presentation.
}
\arguments{
  \item{data}{the trio data of a pedigree}
  \item{gifset}{a subgroup of pedigree members}
}

\value{
The returned value is a list containing:

\item{gifval}{the genetic index of familiarity}
}

\section{References}{
Gholamic K, Thomas A (1994) A linear time algorithm for calculation of
multiple pairwise kinship coefficients and genetic index of familiality.
Comp Biomed Res 27:342-350

}
\seealso{
\code{\link[gap]{pfc}}
}

\examples{
\dontrun{
test<-c(
 5,      0,      0,
 1,      0,      0,
 9,      5,      1,
 6,      0,      0,
10,      9,      6,
15,      9,      6,
21,     10,     15,
 3,      0,      0,
18,      3,     15,
23,     21,     18,
 2,      0,      0,
 4,      0,      0,
 7,      0,      0,
 8,      4,      7,
11,      5,      8,
12,      9,      6,
13,      9,      6,
14,      5,      8,
16,     14,      6,
17,     10,      2,
19,      9,     11,
20,     10,     13,
22,     21,     20)
test<-t(matrix(test,nrow=3))
gif(test,gifset=c(20,21,22))

# all individuals
gif(test,gifset=1:23)
}
}
\author{Alun Thomas, Jing hua Zhao}
\note{Adapted from gif.c, testable with -Dexecutable as standalone program, 
which can be use for any pair of indidivuals}
\keyword{}

\eof
\name{hap}
\alias{hap}
\title{Haplotype reconstruction}
\usage{hap(id,data,nloci,loci=rep(2,nloci),names=paste("loci",1:nloci,sep=""),
              mb=0,pr=0,po=0.001,to=0.001,th=1,maxit=100,n=0,
              ss=0,rs=0,rp=0,ro=0,rv=0,sd=0,mm=0,mi=0,mc=50,ds=0.1,de=0,q=0)}
\description{
Haplotype reconstruction using sorting and trimming algorithms
}
\arguments{
\item{id}{a column of subject id}
\item{data}{genotype table}
\item{nloci}{number of loci}
\item{loci}{number of alleles at all loci}
\item{names}{locus names}
\item{mb}{Maximum dynamic storage to be allocated, in Mb}
\item{pr}{Prior (ie population) probability threshold}
\item{po}{Posterior probability threshold}
\item{to}{Log-likelihood convergence tolerance}
\item{th}{Posterior probability threshold for output}
\item{maxit}{Maximum EM iteration}
\item{n}{Force numeric allele coding (1/2) on output (off)}
\item{ss}{Tab-delimited speadsheet file output (off)}
\item{rs}{Random starting points for each EM iteration (off)}
\item{rp}{Restart from random prior probabilities}
\item{ro}{Loci added in random order (off)}
\item{rv}{Loci added in reverse order (off)}
\item{sd}{Set seed for random number generator (use date+time)}
\item{mm}{Repeat final maximization multiple times}
\item{mi}{Create multiple imputed datasets. If set >0}
\item{mc}{ Number of MCMC steps between samples}
\item{ds}{ Starting value of Dirichlet prior parameter}
\item{de}{ Finishing value of Dirichlet prior parameter}
\item{q}{Quiet operation (off)}
}
\details{
The package can hanlde much larger number of multiallelic loci. 
For large sample size with relatively small number of multiallelic
loci, genecounting should be used.

}

\value{
The returned value is a list containing:

\item{l1}{log-likelihood assuming linkage disequilibrium}
\item{converge}{convergence status, 0=failed, 1=succeeded}
\item{niter}{number of iterations}
}

\section{References}{

Clayton DG (2001) SNPHAP. http://www-gene.cimr.cam.ac.uk/clayton/software

Zhao JH and W Qian (2003) Association analysis of unrelated individuals
using polymorphic genetic markers. RSS 2003, Hassalt, Belgium

}
\seealso{
\code{\link[gap]{genecounting}}
}

\examples{
\dontrun{
# 4 SNP example, to generate hap.out and assign.out alone
data(fsnps)
hap(id=fsnps[,1],data=fsnps[,3:10],nloci=4)

# to generate results of imputations
hap(id=fsnps[,1],data=fsnps[,3:10],nloci=4,ss=1,mi=5)
}
}
\note{adapted from hap}
\keyword{}

\eof
\name{hap.em}
\alias{hap.em}
\title{Gene counting for haplotype analysis}
\usage{
hap.em(id, data, locus.label=NA, converge.eps=1e-06, maxiter=500)
}

\description{
Gene counting for haplotype analysis with missing data, adapted for hap.score
}
\arguments{
  \item{id}{a vector of individual IDs}
  \item{data}{Matrix of alleles, such that each locus has a  pair  of
   adjacent  columns  of  alleles,  and  the order of columns
   corresponds to the order of  loci  on  a  chromosome.   If
   there  are  K  loci, then ncol(data) = 2*K. Rows represent
   alleles for each subject.}
  \item{locus.label }{Vector of  labels  for  loci,  of  length  K  (see definition of data matrix).}
  \item{converge.eps }{Convergence criterion, based on absolute  change in log likelihood (lnlike).}
  \item{maxiter}{Maximum number of iterations of EM.}
}

\value{
List with components:
  \item{converge}{Indicator of convergence of the EM algorithm
  (1=converged, 0 = failed).}
  \item{niter}{Number of iterations completed in the EM alogrithm.}
  \item{locus.info}{A list with  a  component for each locus.  Each
   component is also a list, and  the  items of a locus-
   specific list are the locus name and a vector for the
   unique alleles for the locus.}
  \item{locus.label}{Vector of  labels  for  loci,  of  length  K  (see
    definition of input values).}
  \item{haplotype}{Matrix of unique haplotypes. Each row represents a
   unique  haplotype, and the number of columns is the number of loci.}
  \item{hap.prob}{Vector of mle's of haplotype probabilities.  The ith
   element of hap.prob corresponds to the ith row of haplotype.}
  \item{lnlike}{Value of lnlike at last EM iteration (maximum lnlike if converged).}
  \item{indx.subj}{Vector for index of subjects, after  expanding  to
   all possible  pairs  of  haplotypes  for  each person. If
   indx=i, then i is the ith row of input matrix data. If the
   ith subject has  n possible  pairs  of haplotypes that
   correspond to their marker phenotype, then i is repeated n times.}
  \item{nreps}{Vector for the count of haplotype pairs that map to
   each subject's marker genotypes.}
  \item{hap1code}{Vector of codes for each subject's first haplotype.
   The values in hap1code are the row numbers of the unique
   haplotypes in the returned matrix haplotype.}
  \item{hap2code}{Similar to hap1code, but for  each  subject's  second haplotype.}
  \item{post}{Vector of posterior probabilities of pairs of
   haplotypes for a person, given thier marker phenotypes.}
}

\section{References}{

Clayton DG (2001) SNPHAP. http://www-gene.cimr.cam.ac.uk/clayton/software

Zhao JH and W Qian (2003) Association analysis of unrelated individuals
using polymorphic genetic markers. RSS 2003, Hassalt, Belgium

}
\seealso{
\code{\link[gap]{hap}}, \code{\link[hap]{kbyl}}
}

\examples{
\dontrun{
data(hla)
hap.em(id=1:length(hla[,1]),data=hla[,3:8],locus.label=c("DQR","DQA","DQB"))
}
}
\author{Jing hua Zhao}
\note{Adapted from HAP}
\keyword{}

\eof
\name{hap.score}
\alias{hap.score}
\title{Score Statistics for Association of Traits with Haplotypes}
\description{
Compute score statistics to evaluate the association of a trait with haplotypes, when linkage phase is unknown and diploid marker 
phenotypes  are  observed  among  unrelated subjects. For now, only autosomal loci are considered.
}
\usage{
hap.score(y, geno, trait.type="gaussian", offset=NA, x.adj=NA, skip.haplo=0.005, 
          locus.label=NA, miss.val=0, n.sim=0, 
          method="gc", id=NA, handle.miss=0, n.miss.loci=NA, sexid=NA)
}

\arguments{
 \item{y}{Vector of trait values. For  trait.type  =  "binomial",  y  must have values of 1 for event, 0 for no event.}
 \item{geno}{Matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent alleles for each subject.}
 \item{trait.type}{Character string  defining  type  of  trait, with values of "gaussian", "binomial", "poisson", "ordinal".}
 \item{offset}{Vector of offset when trait.type = "poisson"}
 \item{x.adj}{Matrix of non-genetic covariates used to adjust the score statistics. Note that intercept should not be included, as it will be added in this function.}
 \item{skip.haplo}{Skip score statistics for haplotypes with frequencies < skip.haplo}
 \item{locus.label}{Vector of labels for loci, of length K (see definition of geno matrix).}
 \item{miss.val}{Vector of codes for missing values of alleles.}
 \item{n.sim}{Number of simulations for empirical p-values.  If n.sim=0, no empirical p-values are computed.}
 \item{method}{method of haplotype frequency estimation, "gc" or "hap"}
 \item{id}{an added option which contains the individual IDs}
 \item{handle.miss}{flag to handle missing genotype data, 0=no, 1=yes}
 \item{n.miss.loci}{maximum number of loci/sites with missing data to be allowed in the analysis}
 \item{sexid}{flag to indicator sex for data from X chromosome, i=male, 2=female}
}

\value{
List with the following components:
 \item{score.global}{Global statistic to test association of trait with haplotypes that have frequencies >= skip.haplo.}
 \item{df}{Degrees of freedom for score.global.}
 \item{score.global.p}{P-value of score.global based on chi-square distribution, with degrees of freedom equal to df.}
 \item{score.global.p.sim}{P-value of score.global based on simulations (set equal to NA when n.sim=0).}
 \item{score.haplo}{Vector of score statistics for individual haplotypes that have frequencies >= skip.haplo.}
 \item{score.haplo.p}{Vector of p-values for score.haplo, based on a chi-square distribution with 1 df.}
 \item{score.haplo.p.sim}{Vector of p-values for score.haplo, based on  simulations (set equal to NA when n.sim=0).}
 \item{score.max.p.sim}{P-value  of  maximum  score.haplo, based on simulations (set equal to NA when n.sim=0).}
 \item{haplotype}{Matrix of hapoltypes  analyzed.  The ith row of haplotype corresponds to the ith item of score.haplo, score.haplo.p, and score.haplo.p.sim.}
 \item{hap.prob}{Vector of haplotype probabilies, corresponding to the haplotypes in the matrix haplotype.}
 \item{locus.label}{Vector of labels for loci, of length K (same as input argument).}
 \item{n.sim}{Number of simulations.}
 \item{n.val.global}{Number of valid simulated global statistics.}
 \item{n.val.haplo}{Number of valid simulated score statistics (score.haplo) for individual haplotypes.}
}

\details{This is a version which substitutes haplo.em}

\section{References}{
Schaid DJ, Rowland CM, Tines DE, Jacobson RM,  Poland  GA.
Score tests for association of traits with haplotypes when
linkage phase is ambiguous. Submitted to Amer J Hum Genet.
}

\examples{
\dontrun{
data(hla)
y<-hla[,2]
geno<-hla[,3:8]
hap.score(y,geno,locus.label=c("DRB","DQA","DQB"))
unlink("assign.dat")

### note the differences in p values in the following runs
data(aldh2)
# to subset the data since hap doesn't handle one allele missing
deleted<-c(40,239,256)
aldh2[deleted,]
aldh2<-aldh2[-deleted,]
y<-aldh2[,2]
geno<-aldh2[,3:18]
# only one missing locus
hap.score(y,geno,handle.miss=1,n.miss.loci=1,method="hap")
# up to seven missing loci and with 10,000 permutations
hap.score(y,geno,handle.miss=1,n.miss.loci=7,method="hap",n.sim=10000)

# haplo.score takes considerably longer time and does not handle missing data
haplo.score(y,geno,n.sim=10000)
}
}
 
\keyword{}

\eof
\name{hla}
\alias{hla}
\title{HLA markers and Schizophrenia}
\description{This data set contains HLA markers DRB, DQA, DQB
  and 271 Schizophrenia patients and controls.
}
\usage{data(hla)}
\format{A data frame containing 271 rows and 8 columns:
\describe{
\item{id}{subject id}
\item{y}{a variable taking value 0 for controls and 1 for Schizophrenia}
\item{DQR.a1}{DQR allele a1}
\item{DQR.a2}{DQR allele a2}
\item{DQA.a1}{DQA allele a1}
\item{DQA.a2}{DQA allele a2}
\item{DQB.a1}{DQB allele a1}
\item{DQB.a2}{DQB allele a2}
}
  
The last six variables are genotypes for 3 HLA loci, with a prefix
name (e.g., "DQB") and a suffix for each of two alleles (".a1" and
".a2").
}
\source{
Dr Padraig Wright of Pfitzer
}
\keyword{datasets}

\eof
\name{htr}
\alias{htr}
\title{Haplotype trend regression}
\usage{htr(y,x,n.sim=0)}
\description{
Haplotype trend regression (with permutation)
}
\arguments{
  \item{y}{a vector of phenotype}
  \item{x}{a haplotype table}
  \item{n.sim}{the number of permutations}
}

\value{
The returned value is a list containing:

\item{f}{the F statistic for overall association}
\item{p}{the p value for overall association}
\item{fv}{the F statistics for individual haplotypes}
\item{pi}{the p values for individual haplotypes}
}

\section{References}{
Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG (2002) Testing association of statistically inferred haplotypes with 
discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53:79-91
}
\seealso{
\code{\link[gap]{htr}}
}

\examples{
\dontrun{
# 26-10-03
test2<-read.table("test2.dat")
y<-test2[,1]
x<-test2[,-1]
y<-as.matrix(y)
x<-as.matrix(x)
htr.test2<-htr(y,x)
htr.test2
htr.test2<-htr(y,x,n.sim=10)
htr.test2

# 13-11-2003
data(apoeapoc)
apoeapoc.gc<-gc.em(apoeapoc[,5:8])
y<-apoeapoc$y
for(i in 1:length(y)) if(y[i]==2) y[i]<-1
htr(y,apoeapoc.gc$htrtable)
}
}

\author{Dimitri Zaykin, Jing hua Zhao}
\note{adapted from emgi.cpp, a pseudorandom number seed will be added on}
\keyword{}

\eof
\name{hwe}
\alias{hwe}
\title{Hardy-Weinberg equlibrium test}
\synopsis{hwe(data, is.count=FALSE, is.genotype=FALSE, yates.correct=FALSE, miss.val=0)}
\usage{
hwe(data,yates.correct=FALSE, miss.val=0)
hwe(data,is.genotype=FALSE, yates.correct=FALSE, miss.val=0)
hwe(data,is.count=FALSE, yates.correct=FALSE, miss.val=0)
}

\description{
Hardy-Weinberg equilibrium test
}
\arguments{
  \item{data}{a rectangular data containing the genotype, or an array of genotype counts}
  \item{is.genotype}{A flag indicating if the data is an array of genotypes}
  \item{is.count}{A flag indicating if the data is an array of genotypes count}
  \item{yates.correct}{A flag indicating if Yates' correction is used for Pearson \eqn{\chi^2}{chi-squared} statistic}
  \item{miss.val}{A list of missing values}
}

\details{
This function obtains Hardy-Weinberg equilibrium test statistics. It can
handle data coded as allele numbers (default), genotype identifiers (by
setting is.genotype=TRUE) and counts corresponding to individual genotypes
 (by setting is.count=TRUE) ; the latter does not need is.genotype to be
specified but requires that genotype counts for all possible genotypes,
i.e. n(n+1)/2, where n is the number of alleles.
}

\value{
The returned value is a list containing:

\item{x2}{Pearson \eqn{\chi^2}{chi-square}}
\item{p.x2}{p value for \eqn{\chi^2}{chi-square}}
\item{lrt}{Log-likelihood ratio test statistic}}
\item{p.lrt}{p value for lrt}}
\item{df}{Degree(s) of freedom}
\item{rho}{\eqn{\chi^2/N}{chi-square/N} the effect size}
}

\seealso{
\code{\link[gap]{hwe.hardy}} 
}

\examples{
\dontrun{
a <- c(3,2,2)
a.out <- hwe(a,is.genotype=TRUE)
a.out
a.out <- hwe(a,is.count=TRUE)
a.out
}
}
\author{Jing hua Zhao}
\keyword{}


\eof
\name{hwe.hardy}
\alias{hwe.hardy}
\title{Hardy-Weinberg equlibrium test using MCMC}
\usage{hwe.hardy(a,alleles=3,seed=3000,sample=c(1000,1000,5000))}

\description{
Hardy-Weinberg equilibrium test by MCMC
}
\arguments{
  \item{a}{a trangular array containing the genotype}
  \item{alleles}{number of allele at the locus, greater than or equal to 3}
  \item{seed}{pseudo-random number seed}
  \item{sample}{optional, parameters for MCMC containing \# of chunks, 
                size of chunk and burn-in steps}
}

\source{
http://www.stat.washington.edu/thompson/Genepi/pangaea.shtml,
}

\value{
The returned value is a list containing:

\item{p}{Monte Carlo p value}
\item{se}{standard error of Monte Carlo p value}
\item{swp}{percentage of switches (partial, full and altogether)}
}

\section{References}{

Guo, S.-W. and E. A. Thompson (1992) Performing the exact test of
Hardy-Weinberg proportion for multiple alleles. Biometrics. 48:361--372.
}
\seealso{
\code{\link[gap]{hwe}} 
}

\examples{
\dontrun{
# example 2 from hwe.doc:
a<-c(
3,
4, 2,
2, 2, 2,
3, 3, 2, 1,
0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 2, 1, 0, 0, 0)
ex2<-hwe.hardy(a,alleles=8)
}
}
\author{Sun-Wei Guo, Jing hua Zhao}
\note{Adapted from HARDY, testable with -Dexecutable as standalone program}
\keyword{}

\eof
\name{kbyl}
\alias{kbyl}
\title{LD statistics for two multiallelic loci}
\usage{kbyl(n1,n2,h,n,optrho=2)}
\description{
LD statistics for two multiallelic loci}
}
\arguments{
  \item{n1}{number of alleles at marker 1}
  \item{n2}{number of alleles at marker 2}
  \item{h}{a vector of haplotype frequencies}
  \item{n}{number of haplotypes}
  \item{optrho}{type of contingency table association,
0=Pearson, 1=Tschuprow, 2=Cramer (default)}
}

\value{
The returned value is a list containing:

\item{n1}{the number of alleles at marker 1}
\item{n2}{the number of alleles at marker 2}
\item{h}{the haplotype frequency vector}
\item{n}{the number of haplotypes}
\item{VarDp}{variance of D'}
\item{Dijtable}{table of Dij}
\item{Dmaxtable}{table of Dmax}
\item{Dijptable}{table of Dij'}
\item{VarDijtable}{table of variances for Dij}
\item{VarDijptable}{table of variances for Dij'}
\item{x2}{the Chi-squared statistic}
\item{seX2}{the variance of x2}
\item{rho}{the measure of association}
\item{seR}{the standard error of rho}
\item{optrho}{the method for calculating rho}
\item{klinfo}{the Kullback-Leibler information}
}

\section{References}{
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete Multivariate Analysis
-- Theory and Practice, The MIT press

Cramer H (1946) Mathematical Methods of Statistics. Princeton Univ. Press

Zapata C, Carollo C, Rodriquez S (2001) Sampleing variance and distribution
of the D' measure of overall gametic disequilibrium between multiallelic loci.
Ann. Hum. Genet. 65: 395-406

}
\seealso{
\code{\link[gap]{tbyt}}
}

\examples{
\dontrun{
# example of two SNPs
h <- c(0.442356,0.291532,0.245794,0.020319)
n <- 481*2
kbyl(2,2,h,n)
}
}
\author{Jing hua Zhao}
\note{adapted from 2ld.c}
\keyword{}

\eof
\name{kin.morgan}
\alias{kin.morgan}
\title{kinship matrix for simple pedigree}
\usage{kin.morgan(ped)}
\description{
kinship matrix according to Morgan v2.1
}

\arguments{
\item{ped}{pedigree id and family trio (id, father id, morther id}
}

\source{
CRAN \url{http://cran.r-project.org}
}

\value{
The returned value is a list containing:

\item{kin}{the kinship matrix}
}

\section{References}{
Morgan V2.1 http://www.stat.washington.edu/thompson/Genepi/MORGAN/morgan.shtml
}

\seealso{
\code{\link[gap]{gif}}
}
\examples{
\dontrun{
# Werner syndrome pedigree
werner<-c(
 1, 0,  0,  1,
 2, 0,  0,  2,
 3, 0,  0,  2,
 4, 1,  2,  1,
 5, 0,  0,  1,
 6, 1,  2,  2,
 7, 1,  2,  2,
 8, 0,  0,  1,
 9, 4,  3,  2,
10, 5,  6,  1,
11, 5,  6,  2,
12, 8,  7,  1,
13,10,  9,  2,
14,12, 11,  1,
15,14, 13,  1)
werner<-t(matrix(werner,nrow=4))
kin.morgan(werner[,1:3])
}
}
\author{Morgan development team, Jing hua Zhao}
\note{The input data is required to be sorted so that parents preceed their children}
\keyword{}

\eof
\name{makeped}
\alias{makeped}
\title{A function to prepare pedigrees in post-MAKEPED format}
\usage{makeped(pifile="pedfile.pre", pofile="pedfile.ped", auto.select=1,
       with.loop=0, loop.file=NA, auto.proband=1, proband.file=NA)}

\description{
Many computer programs for genetic data analysis requires pedigree data to be in the so-called 
``post-MAKEPED'' format. This function performs this translation and allows for some 
inconsistences to be detected.

The first four columns of the input file contains the following information:

pedigree ID, individual ID, father's ID, mother's ID, sex

Either father's or mother's id is set to 0 for founders, i.e. individuals with no parents. 
Numeric coding for sex is 0=unknown, 1=male, 2=female. These can be followed by satellite 
information such as disease phenotype and marker information.

The output file has extra information extracted from data above.
}

\arguments{
\item{pifile}{input filename}
\item{pofile}{output filename}
\item{auto.select}{no loops in pedigrees and probands are selected automatically? 0=no, 1=yes}
\item{with.loop}{input data with loops? 0=no, 1=yes}
\item{loop.file}{filename containing pedigree id and an individual id for each loop, set if with.loop=1}
\item{auto.proband}{probands are selected automatically? 0=no, 1=yes}
\item{proband.file}{filename containing pedigree id and proband id, set if auto.proband=0 (not implemented)}
}

\details{
Before invoking makeped, input file, loop file and proband file have to be prepared.

By default, auto.select=1, so translation proceeds without considering loops and proband statuses.
If there are loops in the pedigrees, then set auto.select=0, with.loop=1, loop.file="filespec".

There may be several versions of makeped available, but their differences with this port should 
be minor.
}

\source{
\url{http://linkage.rockefeller.edu}
}

\value{
All output will be written in pofile

}

\examples{
\dontrun{
library(gap)
makeped("ped7.pre","ped7.ped",0,1,"ped7.lop")
}
}
\note{adapted from makeped.c by W Li and others}
\keyword{}

\eof
\name{mia}
\alias{mia}
\title{multiple imputation analysis for hap}
\usage{mia(hapfile,assfile,miafile,so,ns,mi,allsnps,sas)}
\description{
This command reads outputs from hap session that uses multiple imputations, i.e. -mi\# option. To
simplify matters it assumes -ss option is specified together with -mi option there.

This is a very naive version of MIANALYZE, but can produce results for PROC MIANALYZE of SAS
}
\arguments{
\item{hapfile}{hap haplotype output file name}
\item{assfile}{hap assignment output file name}
\item{miafile}{mia output file name}
\item{so}{to generate results according to subject order}
\item{ns}{do not sort in subject order}
\item{mi}{number of multiple imputations used in hap}
\item{allsnps}{all loci are SNPs}
\item{sas}{produce SAS data step program}
}
\details{
It simply extracts outputs from hap

}

\value{
The returned value is a list containing:

}

\section{References}{

Zhao JH and W Qian (2003) Association analysis of unrelated individuals
using polymorphic genetic markers. RSS 2003, Hassalt, Belgium

Clayton DG (2001) SNPHAP. http://www-gene.cimr.cam.ac.uk/clayton/software


}
\seealso{
\code{\link[gap]{hap}}
}

\examples{
\dontrun{
# 4 SNP example, to generate hap.out and assign.out alone
data(fsnps)
hap(id=fsnps[,1],gdata=fsnps[,3:10],nloci=4)

# to generate results of imputations
hap(id=fsnps[,1],gdata=fsnps[,3:10],nloci=4,ss=1,mi=5)

# to extract information from the second run above
mia(so=1,ns=1,mi=5)

## commands to check out where the output files are as follows:
## Windows
# system("command.com")
## Unix
# system("csh")
}
}
\note{adapted from hap, in fact cline.c and cline.h are not used}
\keyword{}

\eof
\name{mtdt}
\alias{mtdt}
\title{Transmission/disequilibrium test of a multiallelic marker}
\usage{mtdt(x,n.sim=0)}
\description{
This function calculates transmission-disequilibrium statistics involving
multiallelic marker.

Inside the function are tril and triu used to obtain lower and upper triangular
matrices.
}
\arguments{
\item{x}{the data table}
\item{n.sim}{the number of simulations}
}

\value{
It returned list contains the following components:
\item{SE}{Spielman-Ewens Chi-square from the observed data}
\item{ST}{Stuart or score Statistic from the observed data}
\item{pSE}{the simulated p value}
\item{sSE}{standard error of the simulated p value}
\item{pST}{the simulated p value}
\item{sST}{standard error of the simulated p value}
}

\section{References}{
Sham PC (1997) Transmission/disequilibrium tests for multiallelic loci. 
Am. J. Hum. Genet. 61:774-778

Spielman RS, Ewens WJ (1996) The TDT and other family-based tests for
linkage disequilibrium and association. Am. J. Hum. Genet. 59:983-989

Miller MB (1997) Genomic scanning and the transmission/disequilibrium test: 
analysis of error rates. Genet. Epidemiol. 14:851-856

Zhao JH, Sham PC, Curtis D (1999) A program for the Monte Carlo evaluation 
of significance of the extended transmission/disequilibrium test. 
Am. J. Hum. Genet. 64:1484-1485

}
\seealso{
\code{\link[gap]{bt}}
}

\examples{
\dontrun{
# Copeman et al (1995) Nat Genet 9: 80-5

x <- matrix(c(0,0, 0, 2, 0,0, 0, 0, 0, 0, 0, 0,
              0,0, 1, 3, 0,0, 0, 2, 3, 0, 0, 0,
              2,3,26,35, 7,0, 2,10,11, 3, 4, 1,
              2,3,22,26, 6,2, 4, 4,10, 2, 2, 0,
              0,1, 7,10, 2,0, 0, 2, 2, 1, 1, 0,
              0,0, 1, 4, 0,1, 0, 1, 0, 0, 0, 0,
              0,2, 5, 4, 1,1, 0, 0, 0, 2, 0, 0,
              0,0, 2, 6, 1,0, 2, 0, 2, 0, 0, 0,
              0,3, 6,19, 6,0, 0, 2, 5, 3, 0, 0,
              0,0, 3, 1, 1,0, 0, 0, 1, 0, 0, 0,
              0,0, 0, 2, 0,0, 0, 0, 0, 0, 0, 0,
              0,0, 1, 0, 0,0, 0, 0, 0, 0, 0, 0),nrow=12)

# Stuart test is the score test obtained by the following SAS statements:
# proc logistic;
#  freq count;
#  model y=y1-y&n / noint;
#  output out=out p=p;
#

mtdt(x)
}
}
\author{Mike Miller, Jing hua Zhao}
\keyword{}

\eof
\name{muvar}
\alias{muvar}
\title{Means and variances under 1- and 2- locus (biallelic) QTL model}
\synopsis{
  muvar(n.loci,y1,y12,p1,p2)
}
\usage{
muvar(n.loci=1,y1=c(0,1,1),p1=0.5)
muvar(n.loci=2,y12=c(1,1,1,1,1,0,0,0,0),p1=0.99,p2=0.9)
}
\description{
Function muvar() gives means and variances under 1-locus and 2-locus QTL model (simple); 
in the latter case it gives results from different avenues. This function is included for
experimental purpose and yet to be generalized.

}
\arguments{
  \item{n.loci}{number of loci, 1=single locus, 2=two loci}
  \item{y1}{the genotypic means of aa, Aa and AA}
  \item{p1}{the frequency of the lower allele, or the that for the first locus under a 2-locus model}
  \item{y12}{the genotypic means of aa, Aa and AA at the first locus and bb, Bb and BB at the second locus}
  \item{p2}{the frequency of the lower allele at the second locus}
}

\value{Currently it does not return any value except screen output; the results can be kept via R's sink()
command or via modifying the C/R codes.}

\section{References}{
Sham P (1998). Statistics in Human Genetics. Arnold
}

\examples{
\dontrun{
# the default 1-locus model
muvar(n.loci=1,y1=c(0,1,1),p1=0.5)

# the default 2-locus model
muvar(n.loci=2,y12=c(1,1,1,1,1,0,0,0,0),p1=0.99,p2=0.9)
}
}
\author{Jing hua Zhao}
\note{Adapted from an earlier C program written for the above book}
\keyword{}

\eof
\name{pbsize}
\alias{pbsize}
\title{Power for population-based association design}
\usage{pbsize(gamma=4.5, p=0.15, kp, x2alpha=29.72, zalpha=5.45, z1beta=-0.84)}
\description{
This function implements Long et al. (1997) statistics for population-based association
design
}

\arguments{
  \item{gamma}{genotype relative risk assuming multiplicative model}
  \item{p}{frequency of disease allele}
  \item{kp}{population disease prevalence}
  \item{x2alpha}{normal z-deviate}
  \item{zalpha}{normal z-deviate}
  \item{z1beta}{normal z-deviate}
}

\value{
The returned value is scaler containing the required sample size
}

\section{References}{

Scott, W. K., M. A. Pericak-Vance, et al. (1997). Genetic analysis of complex 
diseases. Science 275: 1327.
		
Long, A. D. and C. H. Langley (1997). Genetic analysis of complex traits.
Science 275: 1328.
	
}

\seealso{
\code{\link[gap]{fbsize}}
}
\examples{
\dontrun{
models <- matrix(c(
    4.0, 0.01,
    4.0, 0.10,
    4.0, 0.50, 
    4.0, 0.80,
    2.0, 0.01,
    2.0, 0.10,
    2.0, 0.50,
    2.0, 0.80,
    1.5, 0.01,    
    1.5, 0.10,
    1.5, 0.50,
    1.5, 0.80), ncol=2, byrow=TRUE)
    
g <- 4.5
p <- 0.15
cat("\nAlzheimer's:\n\n")
 
zalpha <- 5.45   # 5.4513104
z1beta <- -0.84

q <- 1-p
pi <- 0.065      # 0.07 generates 163, equivalent to ASP
k <- pi*(g*p+q)^2
s <- (1-pi*g^2)*p^2+(1-pi*g)*2*p*q+(1-pi)*q^2
# LGL formula
lambda <- pi*(g^2*p+q-(g*p+q)^2)/(1-pi*(g*p+q)^2)
# my own
lambda <- pi*p*q*(g-1)^2/(1-pi*(g*p+q)^2)
# not sure about +/-!
n <- (z1beta+zalpha)^2/lambda

# may be used to correct for population prevalence
cat("\nThe population-based result: Kp=",k, "Kq=",s, "n=",ceiling(n),"\n")

# population-based sample size
strlen <- function(x) length(unlist(strsplit(as.character(x),split="")))
kp <- c(0.01,0.05,0.10)
cat("\nRandom ascertainment with disease prevalence\n")
cat("\n          1\%          5\%         10\%\n\n")
for(i in 1:12) \{
  g <- models[i,1]
  p <- models[i,2]
  q <- 1-p
  for(j in 1:3) \{
    n <- pbsize(g,p,kp[j])
    cat(rep("",12-strlen(ceiling(n))),format(ceiling(n)))
  \}
  cat("\n")
  if(i\%\%4==0) cat("\n")
\} 
cat("This is only an approximation, a more accurate result\n")
cat("can be obtained by Fisher's exact test\n")
}
}
\author{Jing hua Zhao}
\note{extracted from rm.c}
\keyword{}

\eof
\name{pfc}
\alias{pfc}
\title{Probability of familial clustering of disease}
\usage{pfc(famdata,enum)}
\description{
To calculate exact probability of familial clustering of disease
}
\arguments{
\item{famdata}{collective information of sib size, number of affected sibs and their frequencies}
\item{enum}{a switch taking value 1 if all possible tables are to be enumerated}
}

\value{
The returned value is a list containing (tailp,sump,nenum are only available if enum=1):

\item{p}{the probabitly of familial clustering}
\item{stat}{the deviances, chi-squares based on binomial and hypergeometric distributions, 
the degrees of freedom should take into account the number of marginals used}
\item{tailp}{the exact statistical significance}
\item{sump}{sum of the probabilities used for error checking}
\item{nenum}{the total number of tables enumerated}
}

\section{References}{
Yu C and D Zelterman (2001) Exact inference for family disease clusters. Commun Stat -- Theory
Meth 30:2293-2305

Yu C and Zelterman D (2002) Statistical inference for familial disease clusters. Biometrics
58:481-491
}
\seealso{
\code{\link[gap]{kin.morgan}}
}

\examples{
\dontrun{
# IPF among 203 siblings of 100 COPD patients from Liang KY, SL Zeger, B Qaquish (1992)
# Multivariate regression analyses for categorical data (with discussion). J Roy Stat Soc
# B 54:3-40

# the degrees of freedom is 15
famtest<-c(
1, 0, 36,
1, 1, 12,
2, 0, 15,
2, 1,  7,
2, 2,  1,
3, 0,  5,
3, 1,  7,
3, 2,  3,
3, 3,  2,
4, 0,  3,
4, 1,  3,
4, 2,  1,
6, 0,  1,
6, 2,  1,
6, 3,  1,
6, 4,  1,
6, 6,  1)
test<-t(matrix(famtest,nrow=3))
famp<-pfc(test)
}
}
\author{Dani Zelterman, Jing hua Zhao}
\note{Adapted from family.for by Dani Zelterman, 25/7/03}
\keyword{}

\eof
\name{pgc}
\alias{pgc}
\title{Preparing weight for GENECOUNTING}
\usage{pgc(data,handle.miss=1,is.genotype=0,with.id=0)}
\description{
This function is a R port of the GENECOUNTING/PREPARE program which takes
an array of genotyep data and collapses individuals with the same multilocus
genotype
}

\arguments{
  \item{data}{the multilocus genotype data for a set of individuals}
  \item{handle.miss}{a flag to indicate if missing data is kept, 0 = no, 1 = yes}
  \item{is.genotype}{a flag to indicate if the data is already in the form of genotype identifiers}
  \item{with.id}{a flag to indicate if the unique multilocus genotype identifier is generated}
}

\value{
The returned value is a list containing:

\item{gret}{the collapsed genotype data}
\item{wt}{the frequency weight}
\item{obscom}{the observed number of combinations or genotypes}
\item{idsave}{optional, available only if with.id = 1}
}

\section{References}{
Zhao JH, Sham PC (2003). Generic number system and haplotype analysis. Comp Prog Meth Biomed 70:1-9
}

\seealso{
\code{\link[gap]{genecounting}}
}
\examples{
\dontrun{

data(hla)
x <- hla[,3:8]

# do not handle missing data
y<-pgc(x,handle.miss=0,with.id=1)
hla.gc<-genecounting(y$cdata,y$wt,handle.miss=0)

# handle missing but with multilocus genotype identifier
pgc(x,handle.miss=1,with.id=1)

# handle missing data with no identifier
pgc(x,handle.miss=1,with.id=0)
}
}
}
\author{Jing hua Zhao}
\note{Built on pgc.c}
\keyword{}

\eof
\name{s2k}
\alias{s2k}
\title{Statistics for 2 by K table}
\usage{s2k(y1,y2)}
\description{
This function calculates one-to-others and maximum accumulated chi-squared
statistics for a 2 by K contingency table.
}
\arguments{
\item{y1}{a vector containing the first row of a 2 by K contingency table}
\item{y2}{a vector containing the second row of a 2 by K contingency table}
}

\value{
The returned value is a list containing:

\item{x2a}{the one-to-other chisquare}
\item{x2b}{the maximum accumulated chisquare}
\item{col1}{the column index for x2a}
\item{col2}{the column index for x2b}
\item{p}{the corresponding p value}
}

\section{References}{
Hirotsu C, Aoki S, Inada T, Kitao Y (2001) An exact test for the association 
between the disease and alleles at highly polymorphic loci with particular interest 
in the haplotype analysis. Biometrics 57:769-778
}

\examples{
\dontrun{
# an example from Mike Neale
# termed 'ugly' contingency table by Patrick Sullivan
y1 <- c(2,15,16,35,132,30,25,7,12,24,10,10,0)
y2 <- c(0, 6,31,49,120,27,15,8,14,25, 3, 9,3)

result <- s2k(y1,y2)
}
}
\author{Chihiro Hirotsu, Jing hua Zhao}
\note{The lengths of y1 and y2 should be the same}
\keyword{}

\eof
\name{tbyt}
\alias{tbyt}
\title{LD statistics for two SNPs}
\usage{tbyt(h,n)}
\description{
LD statistics for two SNPs
}

\arguments{
  \item{h}{a vector of haplotype frequencies}
  \item{n}{number of haplotypes}
}

\value{
The returned value is a list containing:

\item{h}{the original haplotype frequency vector}
\item{n}{the number of haplotypes}
\item{D}{the linkage disequilibrium parameter}
\item{VarD}{the variance of D}
\item{Dmax}{the maximum of D}
\item{VarDmax}{the variance of Dmax}
\item{Dprime}{the scaled disequilibrium parameter}
\item{VarDprime}{the variance of Dprime}
\item{x2}{the Chi-squared statistic}
}

\section{References}{
Zapata C, Alvarez G, Carollo C (1997) Approximate variance of the standardized
measure of gametic disequilibrium D'. Am. J. Hum. Genet. 61:771-774
}

\seealso{
\code{\link[gap]{kbyl}}
}
\examples{
\dontrun{
h <- c(0.442356,0.291532,0.245794,0.020319)
n <- 481*2
tbyt(h,n)
}
}
\author{Jing hua Zhao}
\note{extracted from 2ld.c}
\keyword{}

\eof
\name{whscore}
\alias{whscore}
\title{Whittemore-Halpern scores for allele-sharing}
\usage{whscore(allele,type)}
\description{
Allele sharing score statistics
}

\arguments{
  \item{allele}{a matrix of alleles of affected pedigree members}
  \item{type}{0 = pairs, 1 = all}
}

\value{
The returned value is the value of score statistic

}

\section{References}{
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996) Parametric and Nonparametric 
linkage analysis: a unified multipoint approach. Am. J. Hum. Genet. 58:1347-1363

Whittemore AS, Halpern J (1994) A class of tests for linkage using affected 
pedigree members. Biometrics 50:118-127

Whittemore AS, Halpern J (1994) Probability of gene identity by descent: 
computation and applications. Biometrics 50:109-117
}

\examples{
\dontrun{
c<-matrix(c(1,1,1,2,2,2),ncol=2)
whscore(c,type=1)
whscore(c,type=2)
}
}
\author{Leonid Kruglyak, Jing hua Zhao}
\note{adapted from GENEHUNTER}
\keyword{}

\eof
