最近一直在学习hadoop 一直没有梳理接触到的东西,常见算法分类:

推荐系统(推荐引擎):

  1. 基于用户的协同过滤算法UserCF      近邻算法,容易实现

  2. 基于物品的协同过滤算法ItemCF       速度快,容易实现分布式计算

  3. SlopeOne算法       @Deprecated at mahout 0.8

  4. KNN Linear interpolation item–based推荐算法    最近邻算法   @Deprecated at mahout 0.8

  5. SVD推荐算法   奇异值分解, 需要降维, 大量预处理

  6. Tree Cluster-based 推荐算法   树形聚类 大量预处理  @Deprecated at mahout 0.8


分类算法:


    1. 支持向量机(SVM)

    2. 逻辑回归(LR)

    3. 梯度下降法(SGD)

    4. 神经网络

    5. 随机森林(RF) ,天猫推荐算法大战中经常用到(RF + GBDT) 可并行 mapreduce 

    6. 朴素贝叶斯(Naive Beyes),还有一种补充的贝叶斯算法 cbeyes,效果一般比beyes 要好, 可并行 mapreduce



聚类算法:


    1. canopy clustering

    2. kmeans clustering

    3. 层次聚类法


频繁模式挖掘


mahout(0.9) 最新版支持的常用算法

Latest release version 0.9 has

  • User and Item based recommenders

  • Matrix factorization based recommenders

  • K-Means, Fuzzy K-Means clustering

  • Latent Dirichlet Allocation

  • Singular Value Decomposition

  • Logistic regression classifier

  • (Complementary) Naive Bayes classifier

  • Random forest classifier

  • High performance java collections

  • A vibrant community



另外:注意 mahout 官网公告,mahout 已经不再支持新的算法了,请关注 最新的 spark。

原文:

Mahout News

25 April 2014 - Goodbye MapReduce

The Mahout community decided to move its codebase onto modern data processing systems that offer a richer programming model and more efficient execution than Hadoop MapReduce. Mahout will therefore reject new MapReduce algorithm implementations from now on. We will however keep our widely used MapReduce algorithms in the codebase and maintain them.

We are building our future implementations on top of a DSL for linear algebraic operations which has been developed over the last months. Programs written in this DSL are automatically optimized and executed in parallel on Apache Spark.

Furthermore, there is an experimental contribution undergoing which aims to integrate the h20 platform into Mahout.