Spark Machine Learning 总览

Spark的ML(Machine Learning)库提供了主流数据统计/挖掘算法的实现,威廉将在本文中做一个总览,具体的解析将会在之后的文章中来写

分类与回归算法

算法 Spark算法类 Spark模型类
SVM支持向量机 SVMWithSGD SVMModel
Logistic回归 LogisticRegressionWithLBFGS;LogisticRegressionWithSGD LogisticRegressionModel
线性回归 LinearRegressionWithSGD LinearRegressionModel
实时线性回归 StreamingLinearRegressionWithSGD LinearRegressionModel
岭回归 RidgeRegressionWithSGD RidgeRegressionModel
Lasso回归 LassoWithSGD LassoModel
朴素贝叶斯 NaiveBayes NaiveBayesModel
决策树 DecisionTree DecisionTreeModel
随机森林 RandomForest RandomForestModel
Gradient-Boosted Trees GradientBoostedTrees GradientBoostedTreesModel
Isotonic regression IsotonicRegression IsotonicRegressionModel

协同过滤算法

算法 Spark算法类 Spark模型类
alternating least squares (ALS) ALS MatrixFactorizationModel

聚类算法

算法 Spark算法类 Spark模型类
k-means KMeans KMeansModel
Gaussian mixture GaussianMixture GaussianMixtureModel
power iteration clustering (PIC) PowerIterationClustering PowerIterationClusteringModel
latent Dirichlet allocation (LDA) LDA DistributedLDAModel
streaming k-means StreamingKMeans KMeansModel

降维算法

算法 Spark算法类
singular value decomposition (SVD) RowMatrix.computeSVD
principal component analysis (PCA) RowMatrix.computePrincipalComponents

特征提取与转换

算法 Spark算法类 Spark模型类
TF-IDF HashingTF;IDF
Word2Vec Word2Vec Word2VecModel
Standard Scaler StandardScaler StandardScalerModel
Normalizer Normalizer

频繁项集的挖掘

算法 Spark算法类
FP-growth FPGrowth
association rules AssociationRules
PrefixSpan PrefixSpan

你可能感兴趣的:(spark-ml,spark)