Basic Concepts
Frequent pattern is a pattern that occurs frequently in a data set.
Basic concept of Frequent Pattern
- (absolute) support, or, support count: frequency of an itemset
- (relative) support: fraction of transactions that contains an itemset (i.e., the probability that a transaction contains an itenset
- An itemset X is frequent if X's support is no less than a Minsup threshold.
Basic concept of Association Rules
Find all the rules X → Y with minimum support and confidence
- Association rules: X → Y (support, confidence)
- support(s): probability that a transaction contains both (or more) of two itemsets (X ∪ Y)
- confidence(c): conditional probability that a transaction having an itemset X also contains another itemset Y
Closed Patterns and Max-Patterns
A long pattern contains a combinatorial number of sub-patterns, e.g., {a1, a2, …, a100} contains 2100 - 1 sub-patterns!
- An itemset X is closed if X is frequent and there exists no super-pattern Y ⊃ X, with the same support as X.
- An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y ⊃ X.
- Example:
- DB = {<a1, a2, …, a100>, <a1, a2, …, a50>}, Minsup = 1
- set of closed itemset: <a1, a2, …, a100> = 1, <a1, a2, …, a50> = 2
- set of max-pattern: <a1, a2, …, a100> = 1
Computational Complexity of Frequent Itemset Mining
The worst case complexity vs. the expected probability
For example, suppose Walmart store has 104 kinds of products.
- The chance to pick up one product: 10-4
- The chance to pick up a particular set of 10 products: about (10-4)10 = 10-40
- What is the chance this particular set of 10 products to be frequent 103 times in 109 transactions?
Solution:
The Downward Closure Property and Scalable Mining Methods
Any subset of a frequent itemset must be frequent.
If {beer, diaper, nuts} is frequent, so is {beer, diaper}.
Scalable mining methods: Three major approaches
- Apriori (Agrawal and Srikant @VLDB'94)
- FPgrowth (Han, Pei and Yin @SIGMOD'00)
- Vertical Data Format (Charm-Zaki and Hsiao @SDM'02)
References
- "Data Mining: Concepts and Techniques" - Jiawei Han and Micheline Kamber