| measures {arules} | R Documentation |
Provides the generic functions and the needed S4 methods to calculate some additional interest measures for a set of existing associations.
allConfidence(x, ...)
## S4 method for signature 'itemsets':
allConfidence(x, transactions = NULL, itemSupport = NULL)
crossSupportRatio(x, ...)
## S4 method for signature 'itemsets':
crossSupportRatio(x, transactions = NULL, itemSupport = NULL)
hyperLift(x, ...)
## S4 method for signature 'rules':
hyperLift(x, transactions, d = 0.99)
hyperConfidence(x, ...)
## S4 method for signature 'rules':
hyperConfidence(x, transactions = NULL,
complements = TRUE, significance = FALSE)
x |
the set of associations. |
... |
further arguments. |
transactions |
the transaction data set used to mine the associations. |
itemSupport |
alternatively to transactions, for some measures a item support in the transaction data set is sufficient. |
d |
the quantile used to calculate hyperlift. |
complements |
calculate confidence/significance levels for substitutes instead of complements. |
significance |
report significance levels instead of confidence levels. |
Currently the following interest measures are implemented:
Lift is defined for the rule X -> Y as:
lift(X -> Y) = P(X+Y)/(P(X)*P(Y)) = c_XY / E[C_XY],
where E[C_{XY}] = c_X c_Y / m with m being the number of transactions in the database.
Hyper-lift is defined as:
hyperlift(X -> Y) = c_XY / Q_d[C_XY],
where Q_d[C_XY] is the quantile of the hypergeometric distribution given by d.
A confidence level of, e.g., > 0.95 indicates that there is only a 5% chance that the count for the rule was generated randomly.
A numeric vector containing the values of the interest measure
for each association
in the set of associations x.
Edward R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.
Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, March 2005.
data("Income")
### calculate all-confidence and the cross-support ratio
itemsets <- apriori(Income, parameter = list(target = "freq"))
quality(itemsets) <- cbind(quality(itemsets),
allConfonfidence = allConfidence(itemsets),
crossSupportRatio = crossSupportRatio(itemsets))
summary(itemsets)
### calculate hyperlift for the 0.9 quantile
rules <- apriori(Income)
quality(rules) <- cbind(quality(rules),
hyperLift = hyperLift(rules, Income, d = 0.9))
inspect(SORT(rules, by = "hyperLift")[1:5])
### calculate hyper-confidence and discard all rules with
### a confidence level < 1%
quality(rules) <- cbind(quality(rules),
hyperConfidence = hyperConfidence(rules, Income))
rulesHConf <- rules[quality(rules)$hyperConfidence >= 0.99]
inspect(rulesHConf[1:10])