| asia {bnlearn} | R Documentation |
Small synthetic data set from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia.
data(asia)
The asia data set contains the following variables:
D (dyspnoea), a two-level factor with levels
yes and no.
T (tuberculosis), a two-level factor with levels
yes and no.
L (lung cancer), a two-level factor with levels
yes and no.
B (bronchitis), a two-level factor with levels
yes and no.
A (visit to Asia), a two-level factor with levels
yes and no.
S (smoking), a two-level factor with levels
yes and no.
X (chest X-ray), a two-level factor with levels
yes and no.
E (tuberculosis versus lung cancer/bronchitis), a
two-level factor with levels yes and no.
Standard learning algorithms are not able to recover the true
structure of the network because of the presence of a node (E)
with conditional probabilities equal to both 0 and 1.
S. Lauritzen and D. Spiegelhalter (1988). Local computation with probabilities on graphical structures and their application to expert system. Journal of the Royal Statistics Society - B Series, 50(2), pages 157–192.
## The modelstring() of this data set is:
# [A][S][T|A][L|S][B|S][D|B][E|T:L][X|E]
# these are the R commands used to generate this data set.
## Not run:
a = sample(c("yes", "no"), 5000, prob = c(0.01, 0.99), replace = TRUE)
s = sample(c("yes", "no"), 5000, prob = c(0.50, 0.50), replace = TRUE)
t = a
t[t == "yes"] = sample(c("yes", "no"), length(which(t == "yes")),
prob = c(0.05, 0.95), replace = TRUE)
t[t == "no"] = sample(c("yes", "no"), length(which(t == "no")),
prob = c(0.01, 0.99), replace = TRUE)
l = s
l[l == "yes"] = sample(c("yes", "no"), length(which(l == "yes")),
prob = c(0.10, 0.90), replace = TRUE)
l[l == "no"] = sample(c("yes", "no"), length(which(l == "no")),
prob = c(0.01, 0.99), replace = TRUE)
b = s
b[b == "yes"] = sample(c("yes", "no"), length(which(b == "yes")),
prob = c(0.60, 0.40), replace = TRUE)
b[b == "no"] = sample(c("yes", "no"), length(which(b == "no")),
prob = c(0.30, 0.70), replace = TRUE)
e = apply(cbind(l,t), 1, paste, collapse= ":")
e[e == "yes:yes"] = "yes"
e[e == "yes:no"] = "yes"
e[e == "no:yes"] = "yes"
e[e == "no:no"] = "no"
x = e
x[x == "yes"] = sample(c("yes", "no"), length(which(x == "yes")),
prob = c(0.98, 0.02), replace = TRUE)
x[x == "no"] = sample(c("yes", "no"), length(which(x == "no")),
prob = c(0.05, 0.95), replace = TRUE)
d = apply(cbind(e,b), 1, paste, collapse= ":")
d[d == "yes:yes"] = sample(c("yes", "no"), length(which(d == "yes:yes")),
prob = c(0.90, 0.10), replace = TRUE)
d[d == "yes:no"] = sample(c("yes", "no"), length(which(d == "yes:no")),
prob = c(0.70, 0.30), replace = TRUE)
d[d == "no:yes"] = sample(c("yes", "no"), length(which(d == "no:yes")),
prob = c(0.80, 0.20), replace = TRUE)
d[d == "no:no"] = sample(c("yes", "no"), length(which(d == "no:no")),
prob = c(0.10, 0.90), replace = TRUE)
data.frame(A = a, S = s, T = t, L = l, B = b, E = e, X = x, D = d)
## End(Not run)