scikit-learn

scikit-learn官网截图

sklearn库的基本介绍
Machine Learning in Python

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on Numpy , Scipy, and matplotlib
  • Open source, commercially usable - BSD license

那么sklearn可以做哪些事情呢?
如官网所示
Classification(分类):Identifying to which category an object belongs to.
SVM, nearest neighbors, random forest, ...

Regression(回归):Predicting a continuous-valued attribute associated with an object.
SVR, ridge regression(L2), Lasso(L1), ...

Clustering(聚类)无监督学习:Automatic grouping of similar objects into sets.
K-Means, spectral clustering, mean-shift, ...

Dimensionality reduction(降维):Reducing the number of random variables to consider.
PCA(主成分分析), feature selection(特征选择), nonegative matrix factorization(矩阵分解).

Model selection(模型选择):Comparing, validating and choosing parameters and models.
grid search(栅格搜索), cross validation(交叉验证), metrics.

Preprocessing(预处理):Feature extraction and normalization.
preprocessing, feature extraction.

当我们的数据量不是很大的时候,sklearn 是一个非常好的选择。当然,sklearn 包也有着自己的局限性。sklearn 一般来讲是基于单机跑起来的(依赖单机的性能),当数据量很大或者需要解决一些复杂问题的时候,sklearn 可能就不是那么得心应手了。

后期会逐渐添加一些 sklearn 的具体使用案例和分析。

你可能感兴趣的:(scikit-learn)