训练数据集
\begin{align*} \\& T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{align*}
其中,x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N,x_{i}为第i个特征向量(实例),y_{i}为第x_{i}的类标记,当y_{i}=+1时,称x_{i}为正例;当y_{i}= -1时,称x_{i}为负例,\left( x_{i}, y_{i} \right)称为样本点。
线性可分支持向量机(硬间隔支持向量机):给定线性可分训练数据集,通过间隔最大化或等价地求解相应地凸二次规划问题学习得到分离超平面为
\begin{align*} \\& w^{*} \cdot x + b^{*} = 0 \end{align*}
以及相应的分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
称为线型可分支持向量机。
超平面\left( w, b \right)关于样本点\left( x_{i}, y_{i} \right)的函数间隔为
\begin{align*} \\& \hat \gamma_{i} = y_{i} \left( w \cdot x_{i} + b \right) \end{align*}
超平面\left( w, b \right)关于训练集T的函数间隔
\begin{align*} \\& \hat \gamma = \min_{i = 1, 2, \cdots, N} \hat \gamma_{i} \end{align*}
即超平面\left( w, b \right)关于训练集T中所有样本点\left( x_{i}, y_{i} \right)的函数间隔的最小值。
超平面\left( w, b \right)关于样本点\left( x_{i}, y_{i} \right)的几何间隔为
\begin{align*} \\& \gamma_{i} = y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \end{align*}
超平面\left( w, b \right)关于训练集T的几何间隔
\begin{align*} \\& \gamma = \min_{i = 1, 2, \cdots, N} \gamma_{i} \end{align*}
函数间隔和几何间隔的关系
\begin{align*} \\& \gamma_{i} = \dfrac{\hat \gamma_{i}}{\| w \|} \\& \gamma = \dfrac{\hat \gamma}{\| w \|} \end{align*}
最大间隔分离超平面等价为求解
\begin{align*} \\& \max_{w,b} \quad \gamma \\ & s.t. \quad y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \geq \gamma, \quad i=1,2, \cdots, N \end{align*}
等价的
\begin{align*} \\ & \max_{w,b} \quad \dfrac{\hat \gamma}{\| w \|} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq \hat \gamma, \quad i=1,2, \cdots, N \end{align*}
等价的
\begin{align*} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{align*}
线性可分支持向量机学习算法(最大间隔法):
输入:线性可分训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
输出:最大间隔分离超平面和分类决策函数
1. 构建并求解约束最优化问题
\begin{align*} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{align*}
求得最优解w^{*}, b^{*}。
2. 得到分离超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
(硬间隔)支持向量:训练数据集的样本点中与分离超平面距离最近的样本点的实例,即使约束条件等号成立的样本点
\begin{align*} \\ & y_{i} \left( w \cdot x_{i} + b \right) -1 = 0 \end{align*}
对y_{i} = +1的正例点,支持向量在超平面
\begin{align*} \\ & H_{1}:w \cdot x + b = 1 \end{align*}
对y_{i} = -1的正例点,支持向量在超平面
\begin{align*} \\ & H_{1}:w \cdot x + b = -1 \end{align*}
H_{1}和H_{2}称为间隔边界。
H_{1}和H_{2}之间的距离称为间隔,且|H_{1}H_{2}| = \dfrac{1}{\| w \|} + \dfrac{1}{\| w \|} = \dfrac{2}{\| w \|}。
最优化问题的求解:
1. 引入拉格朗日乘子\alpha_{i} \geq 0, i = 1, 2, \cdots, N构建拉格朗日函数
\begin{align*} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \| w \|^{2} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 \right] \\ & = \dfrac{1}{2} \| w \|^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
其中,\alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T}为拉格朗日乘子向量。
2. 求\min_{w,b}L \left( w, b, \alpha \right):
\begin{align*} \\ & \nabla _{w} L \left( w, b, \alpha \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla _{b} L \left( w, b, \alpha \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*}
得
\begin{align*} \\ & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*}
代入拉格朗日函数,得
\begin{align*} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
即
\begin{align*} \\ & \min_{w,b}L \left( w, b, \alpha \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
3.求\max_{\alpha} \min_{w,b}L \left( w, b, \alpha \right):
\begin{align*} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
等价的
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
线性可分支持向量机(硬间隔支持向量机)学习算法:
输入:线性可分训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
输出:最大间隔分离超平面和分类决策函数
1. 构建并求解约束最优化问题
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
求得最优解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) 。
2. 计算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
并选择\alpha^{*}的一个正分量\alpha_{j}^{*} > 0,计算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{align*}
3. 得到分离超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
线性支持向量机(软间隔支持向量机):给定线性不可分训练数据集,通过求解凸二次规划问题
\begin{align*} \\ & \min_{w,b,\xi} \quad \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq 1 - \xi_{i} \\ & \xi_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
学习得到分离超平面为
\begin{align*} \\& w^{*} \cdot x + b^{*} = 0 \end{align*}
以及相应的分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
称为线型支持向量机。
最优化问题的求解:
1. 引入拉格朗日乘子\alpha_{i} \geq 0, \mu_{i} \geq 0, i = 1, 2, \cdots, N构建拉格朗日函数
\begin{align*} \\ & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 - \xi_{i} \right] + \sum_{i=1}^{N} \mu_{i} \left( -\xi_{i} \right) \\ & = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} \left[ y_{i} \left( w \cdot x_{i} + b \right) -1 + \xi_{i} \right] - \sum_{i=1}^{N} \mu_{i} \xi_{i} \end{align*}
其中,\alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T}以及\mu = \left( \mu_{1}, \mu_{2}, \cdots, \mu_{N} \right)^{T}为拉格朗日乘子向量。
2. 求\min_{w,b}L \left( w, b, \xi, \alpha, \mu \right):
\begin{align*} \\ & \nabla_{w} L \left( w, b, \xi, \alpha, \mu \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla_{b} L \left( w, b, \xi, \alpha, \mu \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \nabla_{\xi_{i}} L \left( w, b, \xi, \alpha, \mu \right) = C - \alpha_{i} - \mu_{i} = 0 \end{align*}
得
\begin{align*} \\ & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0\end{align*}
代入拉格朗日函数,得
\begin{align*} \\ & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad + \sum_{i=1}^{N} \alpha_{i} - \sum_{i=1}^{N} \alpha_{i} \xi_{i} - \sum_{i}^{N} \mu_{i} \xi_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} + \sum_{i=1}^{N} \xi_{i} \left( C - \alpha_{i} - \mu_{i} \right) \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
即
\begin{align*} \\ & \min_{w,b,\xi}L \left( w, b, \xi, \alpha, \mu \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
3.求\max_{\alpha} \min_{w,b, \xi}L \left( w, b, \xi, \alpha, \mu \right):
\begin{align*} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0 \\ & \alpha_{i} \geq 0 \\ & \mu_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
等价的
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
线性支持向量机(软间隔支持向量机)学习算法:
输入:训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
输出:最大间隔分离超平面和分类决策函数
1. 选择惩罚参数C \geq 0,构建并求解约束最优化问题
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
求得最优解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) 。
2. 计算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
并选择\alpha^{*}的一个分量0 < \alpha_{j}^{*} < C,计算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{align*}
3. 得到分离超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
(软间隔)支持向量:线性不可分情况下,最优化问题的解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*} \right)^{T}中对应于\alpha_{i}^{*} > 0的样本点\left( x_{i}, y_{i} \right)的实例x_{i}。
实例x_{i}的几何间隔
\begin{align*} \\& \gamma_{i} = \dfrac{y_{i} \left( w \cdot x_{i} + b \right)}{ \| w \|} = \dfrac{| 1 - \xi_{i} |}{\| w \|} \end{align*}
且\dfrac{1}{2} | H_{1}H_{2} | = \dfrac{1}{\| w \|}
则实例x_{i}到间隔边界的距离
\begin{align*} \\& \left| \gamma_{i} - \dfrac{1}{\| w \|} \right| = \left| \dfrac{| 1 - \xi_{i} |}{\| w \|} - \dfrac{1}{\| w \|} \right| \\ & = \dfrac{\xi_{i}}{\| w \|}\end{align*}
\begin{align*} \xi_{i} \geq 0 \Leftrightarrow \left\{ \begin{aligned} \ & \xi_{i}=0, x_{i}在间隔边界上; \\ & 0 < \xi_{i} < 1, x_{i}在间隔边界与分离超平面之间; \\ & \xi_{i}=1, x_{i}在分离超平面上; \\ & \xi_{i}>1, x_{i}在分离超平面误分类一侧; \end{aligned} \right.\end{align*}
线性支持向量机(软间隔)的合页损失函数
\begin{align*} \\& L \left( y \left( w \cdot x + b \right) \right) = \left[ 1 - y \left(w \cdot x + b \right) \right]_{+} \end{align*}
其中,“+”为取正函数
\begin{align*} \left[ z \right]_{+} = \left\{ \begin{aligned} \ & z, z > 0 \\ & 0, z \leq 0 \end{aligned} \right.\end{align*}
核函数
设\mathcal{X}是输入空间(欧氏空间R^{n}的子集或离散集合),\mathcal{H}是特征空间(希尔伯特空间),如果存在一个从\mathcal{X}到\mathcal{H}的映射
\begin{align*} \\& \phi \left( x \right) : \mathcal{X} \to \mathcal{H} \end{align*}
使得对所有x,z \in \mathcal{X},函数K \left(x, z \right)满足条件
\begin{align*} \\ & K \left(x, z \right) = \phi \left( x \right) \cdot \phi \left( z \right) \end{align*}
则称K \left(x, z \right)为核函数,\phi \left( x \right)为映射函数,式中\phi \left( x \right) \cdot \phi \left( z \right)为\phi \left( x \right)和\phi \left( z \right)的内积。
常用核函数:
1. 多项式核函数
\begin{align*} \\& K \left( x, z \right) = \left( x \cdot z + 1 \right)^{p} \end{align*}
2. 高斯核函数
\begin{align*} \\& K \left( x, z \right) = \exp \left( - \dfrac{\| x - z \|^{2}}{2 \sigma^{2}} \right) \end{align*}
非线性支持向量机:从非线性分类训练集,通过核函数与软间隔最大化,学习得到分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left(x, x_{i} \right) + b^{*} \right) \end{align*}
称为非线性支持向量机,K \left( x, z \right)是正定核函数。
非线性支持向量机学习算法:
输入:训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
输出:分类决策函数
1. 选择适当的核函数K \left( x, z \right)和惩罚参数C \geq 0,构建并求解约束最优化问题
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
求得最优解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right) 。
2. 计算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
并选择\alpha^{*}的一个分量0 < \alpha_{j}^{*} < C,计算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) \end{align*}
3. 得到分离超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分类决策函数
\begin{align*} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) + b^{*} \right) \end{align*}
序列最小最优化(sequential minimal optimization,SMO)算法 要解如下凸二次规划的对偶问题:
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
选择\alpha_{1}, \alpha_{2}两个变量,其他变量\alpha_{i} \left( i = 3, 4, \cdots, N \right)是固定的,SMO的最优化问题的子问题
\begin{align*} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{align*}
其中,K_{ij} = K \left( x_{i}, x_{j} \right), i,j = 1,2, \cdots, N, \varsigma是常数,且省略了不含\alpha_{1}, \alpha_{2}的常数项。
设凸二次规划的对偶问题的初始可行解为\alpha_{1}^{old}, \alpha_{2}^{old},最优解为\alpha_{1}^{new}, \alpha_{2}^{new},且在沿着约束方向未经剪辑时\alpha_{2}的最优解为 \alpha_{2}^{new,unc}。
由于\alpha_{2}^{new}需要满足0 \leq \alpha_{i} \leq C,所以最优解\alpha_{2}^{new}的取值范围需满足
\begin{align*} \\ & L \leq \alpha_{2}^{new} \leq H \end{align*}
其中,L与H是\alpha_{2}^{new}所在的对角线段断点的界。
如果y_{1} \neq y_{2},则
\begin{align*} \\ & L = \max \left( 0, \alpha_{2}^{old} - \alpha_{1}^{old} \right), H = \min \left( C, C + \alpha_{2}^{old} - \alpha_{1}^{old} \right) \end{align*}
如果y_{1} = y_{2},则
\begin{align*} \\ & L = \max \left( 0, \alpha_{2}^{old} + \alpha_{1}^{old} - C \right), H = \min \left( C, \alpha_{2}^{old} + \alpha_{1}^{old} \right) \end{align*}
记
\begin{align*} \\ & g \left( x \right) = \sum_{i=1}^{N} \alpha_{i} y_{i} K \left( x_{i}, x \right) + b \end{align*}
令
\begin{align*} \\ & E_{i} = g \left( x_{i} \right) - y_{i} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K \left( x_{j}, x_{i} \right) + b \right) - y_{i}, \quad i=1,2 \\ & v_{i} = \sum_{j=3}^{N} \alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) = g \left( x_{i} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) - b, \quad i=1,2\end{align*}
则
\begin{align*} \\ & W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} v_{1} \alpha_{1}+ y_{2} v_{2} \alpha_{2} \end{align*}
由于\alpha_{1} y_{1} = \varsigma, y_{i}^{2} = 1,可将\alpha_{1}表示为
\begin{align*} \\ & \alpha_{1} = \left( \varsigma - y_{2} \alpha_{2} \right) y_{1}\end{align*}
代入,得
\begin{align*} \\ & W \left( \alpha_{2} \right) = \dfrac{1}{2} K_{11} \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \right]^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + \alpha_{2} \right] + y_{1} v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + y_{2} v_{2} \alpha_{2} \\ & = \dfrac{1}{2} K_{11} \left( \varsigma - y_{2} \alpha_{2} \right)^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} - \alpha_{2} + v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) + y_{2} v_{2} \alpha_{2} \end{align*}
对\alpha_{2}求导
\begin{align*} \\ & \dfrac {\partial W}{\partial \alpha_{2}} = K_{11} \alpha_{2} + K_{22} \alpha_{2} -2 K_{12} \alpha_{2} \\ & \quad\quad\quad - K_{11} \varsigma y_{2} + K_{12} \varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \end{align*}
令其为0,得
\begin{align*} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2} = y_{2} \left( y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + v_{1} - v_{2} \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad = y_{2} \left[ y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + \left( g \left( x_{1} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_1j - b \right) \\ \quad\quad\quad\quad\quad\quad\quad\quad - \left( g \left( x_{2} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_2j - b \right) \right]\end{align*}
将\varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2}代入,得
\begin{align*} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{new,unc} = y_{2} \left( \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \left( x_{1} \right) - g \left( x_{2} \right) \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad = \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} + y_{2} \left( E_{1} - E_{2} \right) \end{align*}
令\eta = K_{11} + K_{22} - 2 K_{12}代入,得
\begin{align*} \\ & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{align*}
经剪辑后
\begin{align*} \alpha_{2}^{new} = \left\{ \begin{aligned} \ & H, \alpha_{2}^{new,unc} > H \\ & \alpha_{2}^{new,unc}, L \leq \alpha_{2}^{new,unc} \leq H \\ & L, \alpha_{2}^{new,unc} < L \end{aligned} \right.\end{align*}
由于\varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2}及\varsigma = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2}
则
\begin{align*} \\ & \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2} \\ & \quad\quad\quad\quad \alpha_{1}^{new} = \alpha_{1}^{old} + y_{1} y_{2} \left( \alpha_{2}^{old} - \alpha_{2}^{new} \right) \end{align*}
由分量0 < \alpha_{1}^{new} < C,则
\begin{align*} \\ & b_1^{new} = y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} - \alpha_{1}^{new} y_{1} K_{11} - \alpha_{2}^{new} y_{2} K_{21} \end{align*}
由
\begin{align*} \\ & E_{1} = g \left( x_{1} \right) - y_{1} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K_{ij} + b \right) - y_{1} \\ & = \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \end{align*}
则
\begin{align*} \\ & y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} = -E_{1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} \end{align*}
代入,得
\begin{align*} \\ & b_1^{new} = -E_{1} + y_{1} K_{11} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{21} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{align*}
同理,得
\begin{align*} \\ & b_2^{new} = -E_{2} + y_{1} K_{12} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{22} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{align*}
如果\alpha_{1}^{new}, \alpha_{2}^{new}满足0 < \alpha_{i}^{new} < C, i = 1, 2,
则
\begin{align*} \\ & b^{new} = b_{1}^{new} = b_{2}^{new}\end{align*}
否则
\begin{align*} \\ & b^{new} = \dfrac{b_{1}^{new} + b_{2}^{new}}{2} \end{align*}
更新E_{i}
\begin{align*} \\ & E_{i}^{new} = \sum_{S} y_{j} \alpha_{j} K_{ \left( x_{i}, x_{j} \right)} + b^{new} - y_{i} \end{align*}
其中,S是所有支持向量x_{j}的集合。
SMO算法:
输入:训练数据集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\},其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N,精度\varepsilon;
输出:近似解\hat \alpha
1. 取初始值\alpha^{0} = 0,令k = 0;
2. 选取优化变量\alpha_{1}^{\left( k \right)},\alpha_{2}^{\left( k \right)},求解
\begin{align*} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{align*}
求得最优解\alpha_{1}^{\left( k+1 \right)},\alpha_{2}^{\left( k+1 \right)},更新\alpha为\alpha^{\left( k+1 \right)};
3. 若在精度\varepsilon范围内满足停机条件
\begin{align*} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C, i = 1, 2, \cdots, N \\ & \end{align*}
\begin{align*} y_{i} \cdot g \left( x_{i} \right) = \left\{ \begin{aligned} \ & \geq 1, \left\{ x_{i} | \alpha_{i} = 0 \right\} \\ & = 1, \left\{ x_{i} | 0 < \alpha_{i} < C \right\} \\ & \leq 1, \left\{ x_{i} | \alpha_{i} = C \right\} \end{aligned} \right.\end{align*}
则转4.;否则令k = k + 1,转2.;
4.取\hat \alpha = \alpha^{\left( k + 1 \right)}。