Data mining (the advanced analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary sub-field of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Data Mining Algorithms
The Most Used Data Mining Algorithms
Selected at ICDM '06
#1: C4.5 (61 votes)
#2: K-Means (60 votes)
#3: SVM (58 votes)
#4: Apriori (52 votes)
#5: EM (48 votes)
#6: PageRank (46 votes)
#7: AdaBoost (45 votes)
#7: kNN (45 votes)
#7: Naive Bayes (45 votes)
#10: CART (34 votes)
Other Data Mining Techniques
- Clustering
- tf-idf weight
- MCMC (Markov Chain Monte Carlo)
- MinHash
- LSH (Locality Sensitive Hashing)
- LDA (Latent Dirichlet Allocation)
- PLSI (Probabilistic Latent Semantic Indexing)
- LSI (Latent Semantic Indexing)
- Viterbi
- HMM (Hidden Markov Model)
- Language Model
Recommender Systems
- Recommender Systems
- Recommender Systems Handbook [2011]
- Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions
- Collaborative Filtering, CF
Lecture
References
- "Data Mining: Concepts and Techniques" - Jiawei Han and Micheline Kamber