| cspade {arulesSequences} | R Documentation |
Mining frequent sequential patterns with the cSPADE algorithm. This algorithm utilizes temporal joins along with efficient lattice search techniques and provides for timing constraints.
cspade(data, parameter = NULL, control = NULL, tmpdir = tempdir())
data |
an object of class
transactions with
temporal information. |
parameter |
an object of class SPparameter
or a named list with corresponding components. |
control |
an object of class SPcontrol
or a named list with corresponding components. |
tmpdir |
a non-empty character vector giving the directory name where temporary files are written. |
Interfaces the command-line tools for preprocessing and mining frequent sequences with the cSPADE algorithm by M. Zaki via a proper chain of system calls.
The temporal information is taken from components sequenceID
(sequence or customer identifier) and eventID (event identifier)
of slot transactionInfo. Both identifiers must be in (blockwise)
ascending order.
The amount of disk space used by temporary files is reported in
verbose mode (see class SPcontrol).
The utility function read_baskets provides for reading
of text files with temporal transaction data.
Returns an object of class sequences.
Temporary files may not be deleted until the end of the R session if the call is interrupted.
sequenceID and eventID are coerced to factor if necessary.
Christian Buchta, Michael Hahsler
M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31–60.
Class
transactions,
sequences,
SPparameter,
SPcontrol,
method
ruleInduction,
function
read_baskets.
## use example data from paper
data(zaki)
## mine frequent sequences
s1 <- cspade(zaki, parameter = list(support = 0.4),
control = list(verbose = TRUE))
summary(s1)
as(s1, "data.frame")
## use timing constraint
s2 <- cspade(zaki, parameter = list(support = 0.4, maxwin = 5))
as(s2, "data.frame")
## replace timestamps
t <- zaki
transactionInfo(t)$eventID <-
unlist(tapply(seq(t), transactionInfo(t)$sequenceID,
function(x) x - min(x) + 1), use.names = FALSE)
as(t, "data.frame")
s0 <- cspade(t, parameter = list(support = 0.4))
s0
identical(as(s1, "data.frame"), as(s0, "data.frame"))
## Not run:
## use generated data
t <- read_baskets(con = system.file("misc", "test.txt", package =
"arulesSequences"),
info = c("sequenceID","eventID","SIZE"))
summary(t)
## use low support
s3 <- cspade(t, parameter = list(support=0.03),
control = list(verbose=TRUE))
summary(s3)
## End(Not run)