| sga {pga} | R Documentation |
The SGA algorithm as described in Technometrics 48, page 493, Table 5.
Before running parallel evolution, it is sometimes useful to get a rough
idea of how many generations to evolve in each universe. To do so, one can
use sga with check.convg=TRUE; see example below. Otherwise,
this function is often NOT used directly.
sga(y, X, m, N, start = NULL, mutation=1/ncol(X), prior = 0.35, check.convg = FALSE, thresh = 0.05) sga(y, X, N=50) sga(y, X, N=200, check.convg=TRUE)
y |
an n-by-1 response vector. |
X |
an n-by-p matrix; each column is a
candidate predictor variable. |
m |
population size in each universe, default = ncol(X) or
ncol(X)+1, depending on whether ncol(X) is even or odd. |
N |
number of generations to evolve in each universe; this needs to be fairly short to prevent each evolutionary path from converging. |
start |
the starting population; mostly useless, default =
NULL. |
mutation |
mutation rate; this can be a vector of length N if
a different mutation rate is needed for each generation
t=1,2,...,N; default = 1/p for all t=1,2,...,N. |
prior |
prior probability which controls the density of 1's in the
initial population, default = 0.35, but if there is some prior
information that the number of relevant variables is large, then it can be
more efficient to use a higher prior, e.g., prior=0.7. |
check.convg |
TRUE if running sga initially
to find out the number of generations needed for a single path to
converge; FALSE otherwise. |
thresh |
a prespecified threshold; if the entropy of the population
falls below thresh, the evolutionary algorithm is deemed to have
converged; see Technometrics 48, page 495, Section 3.2. |
ans |
returned only if check.convg=TRUE; it is the number of
iterations needed to achieve convergence. |
popn |
last-generation population after N generations of
evolution, returned as an m-by-p binary matrix. |
combo.gene |
a p-by-1 vector, whose j-th
element is the frequency that variable j “shows up” in the
last-generation population. |
best |
a p-by-1 binary vector, representing the best
solution after N generations of evolution. If the evolutionary
algorithm has converged in the entropy sense, then combo.gene is
expected to be the same as best after rounding; see example below. |
optval |
used during code development; ignore. |
convg |
same as above. |
perf |
same as above. |
Dandi Qiao and Mu Zhu, University of Waterloo, Canada.
Zhu M, Chipman HA (2006). Darwinian evolution in parallel universes: A parallel genetic algorithm for variable selection. Technometrics, 48(4), 491–502.
## simulate some data
sigma <- 1
N <- 50
d <- 10
truth <- c(2,5,8)
beta <- rep(0,d)
beta[truth] <- c(1,1,1)
X <- matrix(rnorm(N*d), N, d)
y <- X %*% beta + sigma*rnorm(N)
## get a rough idea of how many generations are needed for
## the evolutionary algorithm to converge (in the entropy sense)
check=numeric(5)
for (i in 1:5){
check[i] = sga(y, X, N=200, check.convg=TRUE)$ans
}
round(mean(check))
## if round(mean(check)) above is equal to 20, then one often runs
## just 10 generations in each parallel universe to prevent each path
## from converging ...
## run a long evolutionary path and identify the best solution,
## but this is often not too useful ... however, from the example
## below, you will see that, if evolution has converged in the
## entropy sense, then $best and $combo.gene are not
## going to be very different ...
stuff<-sga(y, X, N=200)
stuff$best
round(stuff$combo.gene)