包含对三种支持向量机的介绍,包括线性可分支持向量机,线性支持向量机和非线性支持向量机,包含核函数和一种快速学习算法-序列最小最优化算法SMO。
线性可分支持向量机与硬间隔最大化
线性可分支持向量机
假设一个特征空间上线性可分的训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
线性可分支持向量机利用间隔最大化求最优分离超平面,解是唯一的
分离超平面:
ω ∗ ⋅ x + b ∗ = 0 \omega^* \cdot x + b^* = 0 ω∗⋅x+b∗=0
相应的分类决策函数:
f ( x ) = s i g n ( ω ∗ ⋅ x + b ∗ ) f(x) = sign(\omega^* \cdot x + b^*) f(x)=sign(ω∗⋅x+b∗)
函数间隔和几何间隔
- 超平面 ( ω , b ) (\omega,b) (ω,b)关于训练数据集 T T T中样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的函数间隔functional margin:
γ ^ i = y i ( ω ⋅ x i + b ) \hat{\gamma}_i = y_i(\omega \cdot x_i + b) γ^i=yi(ω⋅xi+b)
- 超平面关于训练数据集的函数间隔:所有关于样本点的函数间隔的最小值:
γ ^ = m i n i = 1 , ⋯ , N γ ^ i \hat{\gamma} = \mathop{min}\limits_{i = 1,\cdots,N}\hat{\gamma}_i γ^=i=1,⋯,Nminγ^i
- 对超平面的法向量 ω \omega ω规范化使得 ∣ ∣ ω ∣ ∣ = 1 ||\omega|| = 1 ∣∣ω∣∣=1, ∣ ∣ ω ∣ ∣ ||\omega|| ∣∣ω∣∣是 ω \omega ω的 L 2 L_2 L2范数,这时函数间隔变为几何间隔geometric margin,关于样本点的几何间隔为:
γ i = y i ( ω ∣ ∣ ω ∣ ∣ ⋅ x i + b ∣ ∣ ω ∣ ∣ ) \gamma_i = y_i\left(\frac{\omega}{||\omega||} \cdot x_i + \frac{b}{||\omega||}\right) γi=yi(∣∣ω∣∣ω⋅xi+∣∣ω∣∣b)
- 超平面关于训练集的几何间隔:
γ = m i n i = 1 , ⋯ , N γ i \gamma = \mathop{min}\limits_{i = 1,\cdots,N}\gamma_i γ=i=1,⋯,Nminγi
- 如果 ∣ ∣ ω ∣ ∣ = 1 ||\omega|| = 1 ∣∣ω∣∣=1,那么函数间隔和几何间隔相等
间隔最大化
求几何间隔最大的分离超平面,这个问题表示为下面的约束最优化问题,表示每个样本点的几何间隔最小是 γ \gamma γ:
m a x ω , b γ s . t . y i ( ω ∣ ∣ ω ∣ ∣ ⋅ x i + b ∣ ∣ ω ∣ ∣ ) ≥ γ , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{max}\limits_{\omega,b}\ &\gamma \\ s.t.\ &y_i\left(\frac{\omega}{||\omega||} \cdot x_i + \frac{b}{||\omega||}\right) \geq \gamma,i = 1,2,\cdots,N \end{aligned} ω,bmax s.t. γyi(∣∣ω∣∣ω⋅xi+∣∣ω∣∣b)≥γ,i=1,2,⋯,N
将几何间隔转化为函数间隔:
m a x ω , b γ ^ ∣ ∣ ω ∣ ∣ s . t . y i ( ω ⋅ x i + b ) ≥ γ ^ , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{max}\limits_{\omega,b}\ &\frac{\hat{\gamma}}{||\omega||} \\ s.t.\ &y_i(\omega \cdot x_i + b) \geq \hat{\gamma},i = 1,2,\cdots,N \end{aligned} ω,bmax s.t. ∣∣ω∣∣γ^yi(ω⋅xi+b)≥γ^,i=1,2,⋯,N
而已知 γ ^ \hat{\gamma} γ^的取值不影响最优化问题的解,取其值为1,于是得到线性可分支持向量机学习的最优化问题:
m i n ω , b 1 2 ∣ ∣ ω ∣ ∣ 2 s . t . y i ( ω ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{min}\limits_{\omega,b}\ &\frac{1}{2}||\omega||^2 \\ s.t.\ &y_i(\omega \cdot x_i + b) - 1 \geq 0,i = 1,2,\cdots,N \end{aligned} ω,bmin s.t. 21∣∣ω∣∣2yi(ω⋅xi+b)−1≥0,i=1,2,⋯,N
凸优化问题是指约束最优化问题:
m i n ω f ( ω ) s . t . g i ( ω ) ≤ 0 , i = 1 , 2 , ⋯ , k h i ( ω ) = 0 , i = 1 , 2 , ⋯ , l \begin{aligned} \mathop{min}\limits_\omega\ &f(\omega) \\ s.t.\ &g_i(\omega) \leq 0,i = 1,2,\cdots,k \\ &h_i(\omega) = 0,i = 1,2,\cdots,l \end{aligned} ωmin s.t. f(ω)gi(ω)≤0,i=1,2,⋯,khi(ω)=0,i=1,2,⋯,l
其中目标函数 f ( ω ) f(\omega) f(ω)和约束函数 g i ( ω ) g_i(\omega) gi(ω)都是 R n R^n Rn上连续可微的凸函数,约束函数 h i ( ω ) h_i(\omega) hi(ω)是 R n R^n Rn上的仿射函数( f ( x ) f(x) f(x)称为仿射函数,如果满足 f ( x ) = a ⋅ x + b , a ∈ R n , b ∈ R , x ∈ R n f(x) = a \cdot x + b,a \in R^n,b \in R, x \in R^n f(x)=a⋅x+b,a∈Rn,b∈R,x∈Rn)
当目标函数 f ( ω ) f(\omega) f(ω)是二次函数且约束函数 g i ( ω ) g_i(\omega) gi(ω)是仿射函数时,上述凸优化问题成为凸二次规划问题
线性可分支持向量机学习算法-最大间隔法
输入:线性可分的训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
输出:最大间隔分离超平面和分类决策函数
- 构造并求解约束最优化问题,得到最优解 ω ∗ , b ∗ \omega^*,b^* ω∗,b∗:
m i n ω , b 1 2 ∣ ∣ ω ∣ ∣ 2 s . t . y i ( ω ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{min}\limits_{\omega,b}\ &\frac{1}{2}||\omega||^2 \\ s.t.\ &y_i(\omega \cdot x_i + b) - 1 \geq 0,i = 1,2,\cdots,N \end{aligned} ω,bmin s.t. 21∣∣ω∣∣2yi(ω⋅xi+b)−1≥0,i=1,2,⋯,N
- 得到分离超平面
ω ∗ ⋅ x + b ∗ = 0 \omega^* \cdot x + b^* = 0 ω∗⋅x+b∗=0
分类决策函数:
f ( x ) = s i g n ( ω ∗ ⋅ x + b ∗ ) f(x) = sign(\omega^* \cdot x + b^*) f(x)=sign(ω∗⋅x+b∗)
若训练数据集 T T T线性可分,则可将训练数据集中的样本点完全正确分开的最大间隔分离超平面存在且唯一p117页证明(回忆不起来可看)
支持向量:是指约束条件不等式等号成立的样本点 y i ( ω ⋅ x i + b ) − 1 = 0 y_i(\omega \cdot x_i + b) - 1 = 0 yi(ω⋅xi+b)−1=0
间隔:依赖于分离超平面的法向量,等于 2 ∣ ∣ ω ∣ ∣ \frac{2}{||\omega||} ∣∣ω∣∣2
学习的对偶算法
首先引入拉格朗日乘子 α i ≥ 0 , i = 1 , 2 , ⋯ , N \alpha_i \geq 0,i = 1,2,\cdots,N αi≥0,i=1,2,⋯,N定义拉格朗日函数, α = ( α 1 , α 2 , ⋯ , α N ) T \alpha = (\alpha_1,\alpha_2,\cdots,\alpha_N)^T α=(α1,α2,⋯,αN)T为拉格朗日乘子向量:
L ( ω , b , α ) = 1 2 ∣ ∣ ω ∣ ∣ 2 − ∑ i = 1 N α i y i ( ω ⋅ x i + b ) + ∑ i = 1 N α i L(\omega,b,\alpha) = \frac{1}{2}||\omega||^2 - \sum_{i = 1}^N\alpha_iy_i(\omega \cdot x_i + b) + \sum_{i = 1}^N\alpha_i L(ω,b,α)=21∣∣ω∣∣2−i=1∑Nαiyi(ω⋅xi+b)+i=1∑Nαi
对偶问题是极大极小问题,先求极小再求极大:
m a x α m i n ω , b L ( ω , b , α ) \mathop{max}\limits_\alpha \mathop{min}\limits_{\omega,b}L(\omega,b,\alpha) αmaxω,bminL(ω,b,α)
- 求 m i n ω , b L ( ω , b , α ) \mathop{min}\limits_{\omega,b}L(\omega,b,\alpha) ω,bminL(ω,b,α),对 ω , b \omega,b ω,b求偏导
∇ ω L ( ω , b , α ) = ω − ∑ i = 1 N α i y i x i = 0 ∇ b L ( ω , b , α ) = − ∑ i = 1 N α i y i = 0 \nabla_\omega L(\omega,b,\alpha) = \omega - \sum_{i = 1}^N\alpha_iy_ix_i = 0 \\ \nabla_b L(\omega,b,\alpha) = -\sum_{i = 1}^N\alpha_iy_i = 0 ∇ωL(ω,b,α)=ω−i=1∑Nαiyixi=0∇bL(ω,b,α)=−i=1∑Nαiyi=0
得到:
ω = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 \omega = \sum_{i = 1}^N\alpha_iy_ix_i \\ \sum_{i = 1}^N\alpha_iy_i = 0 ω=i=1∑Nαiyixii=1∑Nαiyi=0
将上式带到拉格朗日函数:
L ( ω , b , α ) = 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i y i ( ( ∑ j = 1 N α j y j x j ) ⋅ x i + b ) + ∑ i = 1 N α i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \begin{aligned} L(\omega,b,\alpha) &= \frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_iy_i\left(\left(\sum_{j = 1}^N\alpha_jy_jx_j\right) \cdot x_i + b\right) + \sum_{i = 1}^N\alpha_i \\ &= -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \end{aligned} L(ω,b,α)=21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαiyi((j=1∑Nαjyjxj)⋅xi+b)+i=1∑Nαi=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
所以:
m i n ω , b L ( ω , b , α ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i \mathop{min}\limits_{\omega,b}L(\omega,b,\alpha) = -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i ω,bminL(ω,b,α)=−21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαi
- 求 m i n ω , b L ( ω , b , α ) \mathop{min}\limits_{\omega,b}L(\omega,b,\alpha) ω,bminL(ω,b,α)对 α \alpha α的极大:
m a x α − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) + ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{max}\limits_{\alpha}\ &-\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ & \alpha_i \geq 0,i = 1,2,\cdots,N \end{aligned} αmax s.t. −21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)+i=1∑Nαii=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
将上式的目标函数由求极大转换为求极小,得到下面和上式等价的对偶最优化问题
m i n α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{min}\limits_{\alpha}\ &\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ & \alpha_i \geq 0,i = 1,2,\cdots,N \end{aligned} αmin s.t. 21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαii=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
因为原始问题满足原始问题和对偶问题最优值相同所需的条件,所以存在 ω ∗ , α ∗ , β ∗ \omega^*,\alpha^*,\beta^* ω∗,α∗,β∗分别是原始问题的解和对偶问题的解,求解原始问题可以转化为求解对偶问题
- 假设对偶问题解为 α ∗ = ( α 1 ∗ , α 2 ∗ , ⋯ , α N ∗ ) T \alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T α∗=(α1∗,α2∗,⋯,αN∗)T,存在下标 j j j,使得 α j ∗ > 0 \alpha_j^* \gt 0 αj∗>0,可以由 α ∗ \alpha^* α∗求得原始问题的解 ω ∗ , b ∗ \omega^*,b^* ω∗,b∗
ω ∗ = ∑ i = 1 N α i ∗ y i x i b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) \omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i \\ b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j) ω∗=i=1∑Nαi∗yixib∗=yj−i=1∑Nαi∗yi(xi⋅xj)
证明如下:
由KKT条件:
∇ ω L ( ω ∗ , b ∗ , α ∗ ) = ω ∗ − ∑ i = 1 N α i ∗ y i x i = 0 \nabla_\omega L(\omega^*,b^*,\alpha^*) = \omega^* - \sum_{i = 1}^N\alpha^*_iy_ix_i = 0 ∇ωL(ω∗,b∗,α∗)=ω∗−i=1∑Nαi∗yixi=0
得到:
ω ∗ = ∑ i = 1 N α i ∗ y i x i \omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i ω∗=i=1∑Nαi∗yixi
至少存在一个 a j ∗ > 0 a^*_j \gt 0 aj∗>0,这是因为假设所有的都为0,那么 ω ∗ = 0 \omega^* = 0 ω∗=0,这不是原始最优化问题的解,所以至少存在一个,对这个 j j j结合KKT条件 α i ∗ ( y i ( ω ∗ ⋅ x i + b ∗ ) − 1 ) = 0 \alpha^*_i(y_i(\omega^* \cdot x_i + b^*) - 1) = 0 αi∗(yi(ω∗⋅xi+b∗)−1)=0,得到 y j ( ω ∗ ⋅ x j + b ∗ ) − 1 = 0 y_j(\omega^* \cdot x_j + b^*) - 1 = 0 yj(ω∗⋅xj+b∗)−1=0,又有 y j 2 = 1 y^2_j = 1 yj2=1,得到:
y j = ω ∗ ⋅ x j + b ∗ b ∗ = y j − ω ∗ ⋅ x j b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) y_j = \omega^* \cdot x_j + b^* \\ b^* = y_j - \omega^* \cdot x_j \\ b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j) yj=ω∗⋅xj+b∗b∗=yj−ω∗⋅xjb∗=yj−i=1∑Nαi∗yi(xi⋅xj)
至此 ω ∗ , b ∗ \omega^*,b^* ω∗,b∗求出
分离超平面为:
∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ = 0 \sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^* = 0 i=1∑Nαi∗yi(x⋅xi)+b∗=0
分类决策函数为:
f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x) = sign\left(\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^*\right) f(x)=sign(i=1∑Nαi∗yi(x⋅xi)+b∗)
线性可分支持向量机学习算法-对偶
输入:线性可分的训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯ , ( x N , y N ) } T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} T={(x1,y1),(x2,y2),⋯,(xN,yN)},其中 x i ∈ X = R n , y i ∈ Y = { + 1 , − 1 } , i = 1 , 2 , ⋯ , N x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N xi∈X=Rn,yi∈Y={+1,−1},i=1,2,⋯,N
输出:最大间隔分离超平面和分类决策函数
- 构造并求解约束最优化问题求得最优解 α ∗ = ( α 1 ∗ , α 2 ∗ , ⋯ , α N ∗ ) T \alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T α∗=(α1∗,α2∗,⋯,αN∗)T
m i n α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , i = 1 , 2 , ⋯ , N \begin{aligned} \mathop{min}\limits_{\alpha}\ &\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ & \alpha_i \geq 0,i = 1,2,\cdots,N \end{aligned} αmin s.t. 21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαii=1∑Nαiyi=0αi≥0,i=1,2,⋯,N
- 计算
ω ∗ = ∑ i = 1 N α i ∗ y i x i \omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i ω∗=i=1∑Nαi∗yixi
选择 α ∗ \alpha^* α∗的一个分量 α j ∗ > 0 \alpha_j^* \gt 0 αj∗>0,计算:
b ∗ = y j − ∑ i = 1 N α i ∗ y i ( x i ⋅ x j ) b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j) b∗=yj−i=1∑Nαi∗yi(xi⋅xj)
- 求得分离超平面和决策函数:
∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ = 0 f ( x ) = s i g n ( ∑ i = 1 N α i ∗ y i ( x ⋅ x i ) + b ∗ ) \sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^* = 0 \\ f(x) = sign\left(\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^*\right) i=1∑Nαi∗yi(x⋅xi)+b∗=0f(x)=sign(i=1∑Nαi∗yi(x⋅xi)+b∗)
训练数据中对应 α i ∗ > 0 \alpha^*_i \gt 0 αi∗>0的样本点 ( x i , y i ) (x_i,y_i) (xi,yi)的 x i x_i xi称为支持向量,根据KKT条件得到: y i ( ω ∗ ⋅ x i + b ∗ ) − 1 = 0 y_i(\omega^* \cdot x_i + b^*) - 1 = 0 yi(ω∗⋅xi+b∗)−1=0,支持向量 x i x_i xi一定在间隔边界上