Apriori is a classic algorithm for frequent itemset mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
Algorithm
If there is any infrequent itemset, its superset should not be generated/tested.
- Initially, scan DB once to get frequent 1-itemset
- Generate length (k+1) candidate itemsets from length k frequent itemsets - Self Joining
- Test the candidates against DB - Pruning
- Terminate when no frequent or candidate set can be generated
References
- "Data Mining: Concepts and Techniques" - Jiawei Han and Micheline Kamber
- http://en.wikipedia.org/wiki/Apriori_algorithm
- http://sens.tistory.com/279
- http://oopsoopskeke.tistory.com/20