Basic Concepts

Frequent pattern is a pattern that occurs frequently in a data set.


Basic concept of Frequent Pattern
  • (absolute) support, or, support count: frequency of an itemset
  • (relative) support: fraction of transactions that contains an itemset (i.e., the probability that a transaction contains an itenset
  • An itemset X is frequent if X's support is no less than a Minsup threshold.


Basic concept of Association Rules

Find all the rules XY with minimum support and confidence

  • Association rules: XY (support, confidence)



  • support(s): probability that a transaction contains both (or more) of two itemsets (X ∪ Y)
  • confidence(c): conditional probability that a transaction having an itemset X also contains another itemset Y


Closed Patterns and Max-Patterns

A long pattern contains a combinatorial number of sub-patterns, e.g., {a1, a2, …, a100} contains 2100 - 1 sub-patterns!

  • An itemset X is closed if X is frequent and there exists no super-pattern YX, with the same support as X.
  • An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern YX.
  • Example:
     - DB = {<a1, a2, …, a100>, <a1, a2, …, a50>}, Minsup = 1
     - set of closed itemset: <a1, a2, …, a100> = 1, <a1, a2, …, a50> = 2
     - set of max-pattern: <a1, a2, …, a100> = 1


Computational Complexity of Frequent Itemset Mining

The worst case complexity vs. the expected probability

For example, suppose Walmart store has 104 kinds of products.

  • The chance to pick up one product: 10-4
  • The chance to pick up a particular set of 10 products: about (10-4)10 = 10-40
  • What is the chance this particular set of 10 products to be frequent 103 times in 109 transactions?
    Solution: 


The Downward Closure Property and Scalable Mining Methods

Any subset of a frequent itemset must be frequent.

If {beer, diaper, nuts} is frequent, so is {beer, diaper}.

Scalable mining methods: Three major approaches

  • Apriori (Agrawal and Srikant @VLDB'94)
  • FPgrowth (Han, Pei and Yin @SIGMOD'00)
  • Vertical Data Format (Charm-Zaki and Hsiao @SDM'02)



References