Classification
Logistic Regression(SGD)
Bayesian
Support Vector Machines(SVM)
Perceptron and Winnow
Neural Network
Random Forests
Restricted Boltzmann Machines
Online Passive Aggressive
Boosting
Hidden Markov Models
Clustering
Canopy Clustering
K-Means Clustering
Fuzzy K-Means
Expectation Maximization (EM)
Mean Shift Clustering
Hierarchical Clustering
Dirichlet Process Clustering
Latent Dirichlet Allocation
Spectral Clustering
Minhash Clustering
Top Down Clustering
Pattern Mining
Parallel FP Growth Algorithm
Dimension reduction
Singular Value Decomposition and other Dimension Reduction Techniques
Stochastic Singular Value Decomposition with PCA workflow
Principal Components Analysis
Independent Component Analysis
Gaussian Discriminative Analysis
Recommenders / Collaborative Filtering
Non-distributed recommenders ("Taste")
Distributed Item-Based Collaborative Filtering
Collaborative Filtering using a parallel matrix factorization
下载Mahout(http://www.apache.org/dyn/closer.cgi/mahout/ ), 解压;
在MAHOUT_HOME/bin目录下,在mahout中添加:
export JAVA_HOME=XXXX
export HADOOP_HOME=XXXX
export HADOOP_CONF_DIR=XXXX
HADOOP_CONF_DIR如果没设置,会默认为HADOOP_HOME/conf
编译:MAHOUT_HOME目录下,执行mvn clean && mvn compile && mvn -DskipTests install
好了, 提示SUCCESS则OK
(注意:提前装好maven)
how to use
在MAHOUT_HOME/bin目录下,执行./mahout --help 可以看到mahout目前拥有的算法
也可以在MAHOUT_HOME/src/conf/driver.classes.props文件中查看各个算法的入口,如果要添加新的算法,也可以在这个文件中注册。
算法的执行:
例如执行贝叶斯分类 (训练过程)
MAHOUT_HOME/bin目录下 执行./mahout trainnb -h 查看参数
(注意:以前的版本可能是 ./mahout trainclassifier -h ,这里我用的是目前最新版本:mahout-distribution-0.9 )
参考https://cwiki.apache.org/MAHOUT/quickstart.html