https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms 列出mahout所实现或正在实现的一些算法
Classification
Logistic Regression (SGD)
Bayesian
Support Vector Machines (SVM) (open: MAHOUT-14, MAHOUT-232 and MAHOUT-334)
Perceptron and Winnow (open: MAHOUT-85)
Neural Network (open, but MAHOUT-228 might help)
Random Forests (integrated - MAHOUT-122, MAHOUT-140, MAHOUT-145)
Restricted Boltzmann Machines (open, MAHOUT-375, GSOC2010)
Online Passive Aggressive (integrated, MAHOUT-702)
Boosting (awaiting patch commit, MAHOUT-716)
Hidden Markov Models (HMM) (MAHOUT-627, MAHOUT-396, MAHOUT-734) - Training is done in Map-Reduce
Clustering
Reference Reading
Canopy Clustering (MAHOUT-3 - integrated)
K-Means Clustering (MAHOUT-5 - integrated)
Fuzzy K-Means (MAHOUT-74 - integrated)
Expectation Maximization (EM) (MAHOUT-28)
Mean Shift Clustering (MAHOUT-15 - integrated)
Hierarchical Clustering (MAHOUT-19)
Dirichlet Process Clustering (MAHOUT-30 - integrated)
Latent Dirichlet Allocation (MAHOUT-123 - integrated)
Spectral Clustering (MAHOUT-363 - integrated)
Minhash Clustering (MAHOUT-344 - integrated)
Top Down Clustering (MAHOUT-843 - integrated)
Pattern Mining
Parallel FP Growth Algorithm (Also known as Frequent Itemset mining)
Regression
Locally Weighted Linear Regression (open)
Dimension reduction
Singular Value Decomposition and other Dimension Reduction Techniques (available since 0.3)
Stochastic Singular Value Decomposition with PCA workflow (PCA and dimensionality reduction workflow is now integrated with SSVD)
Principal Components Analysis (PCA) (open)
Independent Component Analysis (open)
Gaussian Discriminative Analysis (GDA) (open)
Evolutionary Algorithms
NOTE: * Watchmaker support has been removed as of 0.7
see also: MAHOUT-56 (integrated)
You will find here information, examples, use cases, etc. related to Evolutionary Algorithms.
Introductions and Tutorials:
Evolutionary Algorithms Introduction
How to distribute the fitness evaluation using Mahout.GA
Examples:
Traveling Salesman
Class Discovery
Recommenders / Collaborative Filtering
Mahout contains both simple non-distributed recommender implementations and distributed Hadoop-based recommenders.
Non-distributed recommenders ("Taste") (integrated)
Distributed Item-Based Collaborative Filtering (integrated)
Collaborative Filtering using a parallel matrix factorization (integrated)
First-timer FAQ
Vector Similarity
Mahout contains implementations that allow one to compare one or more vectors with another set of vectors. This can be useful if one is, for instance, trying to calculate the pairwise similarity between all documents (or a subset of docs) in a corpus.
RowSimilarityJob – Builds an inverted index and then computes distances between items that have co-occurrences. This is a fully distributed calculation.
VectorDistanceJob – Does a map side join between a set of "seed" vectors and all of the input vectors.
Other
Collocations