| makeGenotypes {genetics} | R Documentation |
Convert columns in a dataframe to genotypes or haplotypes.
makeGenotypes(data, convert, sep = "/", tol = 0.5, ..., method=as.genotype) makeHaplotypes(data, convert, sep = "/", tol = 0.9, ...)
data |
Dataframe containing columns to be converted |
convert |
Vector or list of pairs specifying which columns contain genotype/haplotype data. See below for details. |
sep |
Genotype separator |
tol |
See below. |
... |
Optional arguments to as.genotype function |
method |
For internal use only. |
The functions makeGenotypes and makeHaplotypes allow the conversion of all of the genetic variables in a dataset to genotypes or haplotypes in a single step.
The parameter convert may be missing, a vector of
column names, indexes or true/false indictators, or a list of column
name or index pairs.
When the argument convert is not provided, the function will
look for columns where at least tol*100% of the records
contain the separator character sep ('/' by default). These
columns will then be assumed to contain both genotype/haplotype alles
and will be converted in-place to genotype variables.
When the argument convert is a vector of column names, indexes
or true/false indictators, the corresponding columns will be assumed
to contain both genotype/haplotype alles and will be converted
in-place to genotype variables.
When the argument convert is a list containing column name or
index pairs, the two elements of each pair will be assumed to contain the
individual alleles of a genotype/haplotype. The first column
specified in each pair will be replaced with the new
genotype/haplotype variable and the column will be renamed to
name1 + sep + name2.
Dataframe containing converted genotype/haplotype variables. All other variables will be unchanged.
Gregory R. Warnes Gregory_R_Warnes@groton.pfizer.com
## Not run:
# common case
data <- read.csv(file="genotype_data.csv")
data <- makeGenotypes(data)
## End(Not run)
# Create a test data set where there are several genotypes in columns
# of the form "A/T".
test1 <- data.frame(Tmt=sample(c("Control","Trt1","Trt2"),20, replace=TRUE),
G1=sample(c("A/T","T/T","T/A",NA),20, replace=TRUE),
N1=rnorm(20),
I1=sample(1:100,20,replace=TRUE),
G2=paste(sample(c("134","138","140","142","146"),20,
replace=TRUE),
sample(c("134","138","140","142","146"),20,
replace=TRUE),
sep=" / "),
G3=sample(c("A /T","T /T","T /A"),20, replace=TRUE),
comment=sample(c("Possible Bad Data/Lab Error",""),20,
rep=TRUE)
)
test1
# now automatically convert genotype columns
geno1 <- makeGenotypes(test1)
geno1
# Create a test data set where there are several haplotypes with alleles
# in adjacent columns.
test2 <- data.frame(Tmt=sample(c("Control","Trt1","Trt2"),20, replace=TRUE),
G1.1=sample(c("A","T",NA),20, replace=TRUE),
G1.2=sample(c("A","T",NA),20, replace=TRUE),
N1=rnorm(20),
I1=sample(1:100,20,replace=TRUE),
G2.1=sample(c("134","138","140","142","146"),20,
replace=TRUE),
G2.2=sample(c("134","138","140","142","146"),20,
replace=TRUE),
G3.1=sample(c("A ","T ","T "),20, replace=TRUE),
G3.2=sample(c("A ","T ","T "),20, replace=TRUE),
comment=sample(c("Possible Bad Data/Lab Error",""),20,
rep=TRUE)
)
test2
# specifly the locations of the columns to be paired for haplotypes
makeHaplotypes(test2, convert=list(c("G1.1","G1.2"),6:7,8:9))