| minForest {gRapHD} | R Documentation |
Returns the forest that minimises the -2*log-likelihood, AIC, or BIC.
minForest(dataset,homog=TRUE,forbEdges=NULL,stat="BIC")
dataset |
matrix or data frame (nrow(dataset) observations and
ncol(dataset) variables). |
homog |
TRUE for homogeneous covariance structure, FALSE
for heterogeneous. This is only meaningful with mixed models.
Default is homogeneous (TRUE). |
forbEdges |
list with edges that should not be considered. Matrix with 2
columns, each row representing one edge, and each column one
of the vertices in the edge. Default is NULL. |
stat |
measure to be minimized: LR (-2*log-likelihood), AIC, or BIC.
Default is BIC. It can also be a user
defined function with format: FUN(newEdge,varType,numCat,
dataset); where the parameters varType and numCat
are as defined in the Value section; newEdge is a vector
with length two; and dataset is a matrix (n by p). |
Returns for the tree or forest that minimizes the -2*log-likelihood, AIC, or
BIC. If the log-likelihood is used, the result is a tree, if AIC or BIC is used,
the result is a tree or forest.The dataset contains variables
(vertices) in the columns, and observations in the rows. The result has vertices
numbered according to the column indexes in dataset.
All discrete variables must be factors. All factor levels must be represented in
the data. Missing values are not allowed.
A list containing:
edges |
matrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Column 1 contains the vertex with lower index. |
p |
number of variables (vertices) in the model. |
stat.minForest |
measure used (LR, AIC, or BIC). |
statSeq |
vector with value of stat.minForest for each edge. |
varType |
vector indicating the type of each variable: 0 if continuous, or 1 if discrete. |
numCat |
vector with number of levels for each variable (0 if continuous). |
homog |
TRUE if the covariance is homogeneous. |
numP |
vector with number of estimated parameters for each edge. |
minForest |
first and last edges found with minForest. |
Gabriel Coelho Goncalves de Abreu (Gabriel.Abreu@agrsci.dk)
Rodrigo Labouriau (Rodrigo.Labouriau@agrsci.dk)
David Edwards (David.Edwards@agrsci.dk)
Chow, C.K. and Liu, C.N. (1968) Approximating discrete probability distributions
with dependence trees. IEEE Transactions on Information Theory,
Vol. IT-14, 3:462-7.
Edwards, D., de Abreu, G.C.G. and Labouriau, R. (2009). High-dimensional Mixed
Graphical Models Using Minimal AIC and BIC forests. BMC Bioinformatics.
(submitted).
set.seed(7,kind="Mersenne-Twister")
dataset <- matrix(rnorm(1000),nrow=100,ncol=10)
m <- minForest(dataset,stat="BIC")
##############################################################################
# Example with continuous variables
data(dsCont)
# m1 <- minForest(dataset,varType=0,homog=TRUE,forbEdges=NULL,stat="LR")
# 1. in this case, there is no use for homog
# 2. no forbidden edges
# 3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsCont,homog=TRUE,forbEdges=NULL,stat="LR")
plotG(model=m1,numIter=1000)
##############################################################################
# Example with discrete variables
data(dsDiscr)
# m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR")
# 1. in this case, there is no use for homog
# 2. no forbidden edges
# 3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsDiscr,homog=TRUE,forbEdges=NULL,stat="LR")
plotG(model=m1,numIter=1000)
##############################################################################
# Example with mixed variables
data(dsMixed)
# m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR")
# 1. it is to be considered homogeneous
# 2. no forbidden edges
# 3. the measure used is the LR (the result is a tree)
m1 <- minForest(dsMixed,homog=TRUE,forbEdges=NULL,stat="LR")
plotG(model=m1,numIter=1000)
##############################################################################
# Example using a user defined function
# The function userFun calculates the same edges weigths as the option
# stat="LR". It means that the final result, using either option, is the
# same.
userFun <- function(newEdge,varType,numCat,dataset)
{
sigma <- var(dataset[,newEdge])
v <- nrow(dataset)*log(prod(diag(sigma))/det(sigma))
return(c(v,1))
}
data(dsCont)
m <- minForest(dsCont,stat="LR")
m1 <- minForest(dsCont,stat=userFun)
identical(m$edges,m1$edges)