sklearn 官方文档翻译--SVM

                   支持向量机(-)

声明:本人初学,属于简单记录。若有错误,望看官不吝指正。

原文链接支持向量机(SVM)

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

svm是一组支持分类,回归,异常值检测的监督学习方法。

The advantages of support vector machines are:

Effective in high dimensional spaces.
Still effective in cases where number of dimensions is greater than the number of samples.
Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of support vector machines include:

If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).

优点如下:

  • 在高维空间较好
  • 在特征空间远远大于样本空间时,依然较好
  • 决策函数中只使用了训练数据集的子集(称为支持向量)。
  • 多功能:可以为决策功能指定不同的内核函数。提供了公共的内核,但是也可以指定定制的内核。

缺点如下:

  • 如果特征个数远大于样本个数,那么避免选择核函数过度拟合,正则化项至关重要。
  • svms不直接提供概率估计值,需要通过k折交叉验证得到,较为耗资源。(有如下示例)。

The support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by numpy.asarray) and sparse (any scipy.sparse) sample vectors as input.
However, to use an SVM to make predictions for sparse data, it must have been fit on such data.
For optimal performance,use C-ordered numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.


scikit-learn中的svm支持稠密矩阵(numpy.ndarray和numpy.asarray可转换)和稀疏矩阵(任何scipy.sparse)样本向量作为输入。
然而,要使用svm来预测sparse data,必须使用sparse data来训练模型
从性能角度考虑,请使用dtype = float64的c-order numpy.ndarray(dense)或scipy.sparse.csr_matrix(sparse)

Classification

SVC, NuSVC and LinearSVC are classes capable of performing multi-class classification on a dataset.

SVC, NuSVC and LinearSVC 封装了解决多分类问题的类

SVC and NuSVC are similar methods, but accept slightly different sets of parameters and 
have different mathematical formulations (see section Mathematical formulation).
On the other hand, LinearSVC is another implementation of Support Vector Classification for the case of a linear kernel.
Note that LinearSVC does not accept keyword kernel,
as this is assumed to be linear. It also lacks some of the members of SVC and NuSVC, like support_.


SVC/NuSVC是比较类似的,但是在调用时,所接受的参数是略有不同的,不同的部分详见数学公式
另外一方面,LinearSVC是一个实现带有线性核函数的支持向量机分类器。
注意: LinearSVC是不接受keyword kernel,因为这是被假设认为是线性的。
同时,LinearSVC也缺少像SVC/NuSVC中的一些属性成员,比如 support_

As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays:
an array X of size [n_samples, n_features] holding the training samples, 
and an array y of class labels (strings or integers), size [n_samples]:

和其他的分类器一样, SVC/NuSVC/LinearSVC 将两个数组作为输入,分别是:二维向量(矩阵)X[n_samples, n_features]和一维向量(矩阵)Y [n_samples]。

X矩阵大小为: 样本个数 * 单个样本的特征个数
Y矩阵大小为: 样本个数(每个样本都会有个分类/标签)

示例:

	from sklearn import svm
	X = [[0, 0], [1, 1]]
	y = [0, 1]
    clf = svm.SVC()
	clf.fit(X, y)
输出:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
预测:

predictions = clf.predict([[2., 2.]])
print(predictions)
[1]
SVMsdecision function depends on some subset of the training data, called the support vectors.
Some properties of these support vectors can be found in members supportvectors, support_ and n_support:
SVMs  决策函数依赖于训练数据集中部分子集—— 支持向量(support vector),
一些支持向量对应着SVMs中的属性成员变量,比如: support vectors , support_ and n_support:

示例:

# 获取支持向量
clf.support_vectors_
输出:
array([[ 0.,  0.],
       [ 1.,  1.]])

# 获取支持向量的索引值
clf.support_ 
输出:
array([0, 1])

# 获取每个类的支持向量的个数
clf.n_support_
输出:

array([1, 1])


你可能感兴趣的:(机器学习)