二分类问题诸如下图:
有两类坐标点(图中用红点和蓝点来表示),我们需要寻找一条最优直线来将它们分离开。这些样本点也可能是高维空间中的点,因此需要一个超平面来将两类样本点分离。
这里我们以二维平面上的点集 ( x i 1 , x i 2 ) (x_{i}^{1},x_{i}^{2}) (xi1,xi2)来表示,每个点拥有一个特征,由此将它们分成两类。这个分类函数我们可以如下表示:
f ( x i 1 , x i 2 ) = { 红 色 ( x i 1 , x i 2 ) 满 足 某 条 件 1 蓝 色 ( x i 1 , x i 2 ) 满 足 某 条 件 2 f(x_{i}^{1},x_{i}^{2})=\left\{\begin{matrix} 红色 & (x_{i}^{1},x_{i}^{2})满足某条件1\\ 蓝色 & (x_{i}^{1},x_{i}^{2})满足某条件2 \end{matrix}\right. f(xi1,xi2)={红色蓝色(xi1,xi2)满足某条件1(xi1,xi2)满足某条件2
在这幅图中可以很清楚地认定红色的点位置靠上,蓝色的点位置靠下。因此我们可以用 " 1 " "1" "1"和 " 0 " "0" "0"来区分这两类样本点,即:
f ( x i 1 , x i 2 ) = { 1 i ∈ D 0 i ∉ D f(x_{i}^{1},x_{i}^{2})=\left\{\begin{matrix} 1 & i\in D\\ 0 & i\notin D \end{matrix}\right. f(xi1,xi2)={10i∈Di∈/D
其中 D D D为红色样本点序号所构成的集合,由此完成了用 " 1 " "1" "1"和 " 0 " "0" "0"来区分这两类样本点,其中 f ( x i 1 , x i 2 ) = 1 f(x_{i}^{1},x_{i}^{2})=1 f(xi1,xi2)=1表示靠近上方位置的点, f ( x i 1 , x i 2 ) = 0 f(x_{i}^{1},x_{i}^{2})=0 f(xi1,xi2)=0表示靠近下方位置的点。
设我们要寻找的直线为 w 0 + w 1 x 1 + w 2 x 2 = 0 w_{0} +w_{1} x^{1} + w_{2} x^{2} =0 w0+w1x1+w2x2=0,我们需要保证这两类样本点都尽可能地远离这条直线。由空间位置我们可以得到
{ w 0 + w 1 x i 1 + w 2 x i 2 > 0 w h e n : f ( x i 1 , x i 2 ) = 1 w 0 + w 1 x i 1 + w 2 x i 2 < 0 w h e n : f ( x i 1 , x i 2 ) = 0 \left\{\begin{matrix} w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2} >0 & when:f(x_{i}^{1},x_{i}^{2})=1\\ w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2} <0 & when:f(x_{i}^{1},x_{i}^{2})=0 \end{matrix}\right. {w0+w1xi1+w2xi2>0w0+w1xi1+w2xi2<0when:f(xi1,xi2)=1when:f(xi1,xi2)=0
这里只需要保证 w 2 > 0 w_{2}>0 w2>0就能保证 w 0 + w 1 x i 1 + w 2 x i 2 > 0 w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2} >0 w0+w1xi1+w2xi2>0表示的是上半平面部分。
我们引入爱因斯坦求和约定:当一个求和式子角标同时包含上下两个指标时,求和符号可以省略不写,例如记:
∑ α u α v α = u α v α \sum_{\alpha } u_{\alpha } v^{\alpha } =u_{\alpha } v^{\alpha } α∑uαvα=uαvα
当求和式子有多个指标时也同样适用,例如:
∑ α , β u α β v γ α β = u α β v γ α β \sum_{\alpha ,\beta } u_{\alpha\beta } v_{\gamma }^{\alpha \beta } = u_{\alpha\beta } v_{\gamma }^{\alpha \beta } α,β∑uαβvγαβ=uαβvγαβ
需要注意的是,求和符号 ∑ \sum ∑内部需要至少两个乘积项,而且求和指标必须分别在两个乘积项的上标和下标。记 x i 0 = 1 x_{i}^{0}=1 xi0=1,于是上式我们可以简写为:
{ w j x i j > 0 w h e n : f ( x i 1 , x i 2 ) = 1 w j x i j < 0 w h e n : f ( x i 1 , x i 2 ) = 0 ( w 2 > 0 , j = 0 , 1 , 2 ) \left\{\begin{matrix} w_{j} x_{i}^{j} >0 & when:f(x_{i}^{1},x_{i}^{2})=1\\ w_{j} x_{i}^{j} <0 & when:f(x_{i}^{1},x_{i}^{2})=0 \end{matrix}\right. (w_{2}>0,j=0,1,2) {wjxij>0wjxij<0when:f(xi1,xi2)=1when:f(xi1,xi2)=0(w2>0,j=0,1,2)
根据之前的分析,我们需要保证这两类样本点都尽可能地远离目标直线 w 0 + w 1 x 1 + w 2 x 2 = 0 w_{0} +w_{1} x^{1} + w_{2} x^{2} =0 w0+w1x1+w2x2=0,对于满足 f ( x i 1 , x i 2 ) = 1 f(x_{i}^{1},x_{i}^{2})=1 f(xi1,xi2)=1的点有 w 0 + w 1 x i 1 + w 2 x i 2 > 0 w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2} >0 w0+w1xi1+w2xi2>0;满足 f ( x i 1 , x i 2 ) = 0 f(x_{i}^{1},x_{i}^{2})=0 f(xi1,xi2)=0的点有 w 0 + w 1 x i 1 + w 2 x i 2 < 0 w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2} <0 w0+w1xi1+w2xi2<0
因此 w 0 + w 1 x i 1 + w 2 x i 2 = w j x i j w_{0} +w_{1} x_{i}^{1} + w_{2} x_{i}^{2}=w_{j} x_{i}^{j} w0+w1xi1+w2xi2=wjxij可以作为一个指标,对于满足 f ( x i 1 , x i 2 ) = 1 f(x_{i}^{1},x_{i}^{2})=1 f(xi1,xi2)=1的点我们希望 w j x i j w_{j} x_{i}^{j} wjxij越大越好;满足 f ( x i 1 , x i 2 ) = 0 f(x_{i}^{1},x_{i}^{2})=0 f(xi1,xi2)=0的点我们希望 w j x i j w_{j} x_{i}^{j} wjxij越小越好。
写到这里,读者应该能想到用阶跃函数来描述:
但是如果我们需要用该直线去预测新的样本点应该属于哪部分的概率,单位阶跃函数就失效了,因为单位阶跃函数 u ( x ) u(x) u(x)值域只有 0 0 0和 1 1 1,没有办法表征样本点的置信程度。例如将第一类样本点映射到右半部分,无论在哪个区域所对应的值均为 1 1 1,输出量没有表征分类直线将样本点区分的决策优化程度。
下面我们提供一个性质良好的函数:逻辑斯蒂(LOGISTIC)函数,记为 g ( x ) = 1 1 + e − x g(x)=\frac{1}{1+e^{-x}} g(x)=1+e−x1
LOGISTIC函数有非常良好的性质:
通过LOGISTIC函数,我们可以将满足 f ( x i 1 , x i 2 ) = 1 f(x_{i}^{1},x_{i}^{2})=1 f(xi1,xi2)=1的点映射到输入的正半轴,将满足 f ( x i 1 , x i 2 ) = 0 f(x_{i}^{1},x_{i}^{2})=0 f(xi1,xi2)=0的点映射到输入的负半轴,中央 x = 0 x=0 x=0的点就是两类样本点的分界,两类样本点都需尽量远离中心。根据之前的讨论,这个距离指标恰恰可以用 w j x i j w_{j} x_{i}^{j} wjxij来描述。
对于两类样本点,可以将 p i p_{i} pi合写为
p i = g ( w j x i j ) f ( x i 1 , x i 2 ) ⋅ ( 1 − g ( w j x i j ) ) 1 − f ( x i 1 , x i 2 ) p_{i}=g(w_{j} x_{i}^{j})^{f(x_{i}^{1},x_{i}^{2})}·(1-g(w_{j} x_{i}^{j}))^{1-f(x_{i}^{1},x_{i}^{2})} pi=g(wjxij)f(xi1,xi2)⋅(1−g(wjxij))1−f(xi1,xi2)
这个函数即表征分类之依据,我们需要让样本点 ( x i 1 , x i 2 ) (x_{i}^{1},x_{i}^{2}) (xi1,xi2)尽量远离直线,即需要 p i p_{i} pi尽可能地大。
由之前的讨论,我们构造分类函数
L ( w 0 , w 1 , w 2 ) = ∏ i p i = ∏ i g ( w j x i j ) f ( x i 1 , x i 2 ) ⋅ ( 1 − g ( w j x i j ) ) 1 − f ( x i 1 , x i 2 ) L(w_{0},w_{1},w_{2})=\prod_{i}p_{i} =\prod_{i}g(w_{j} x_{i}^{j})^{f(x_{i}^{1},x_{i}^{2})}·(1-g(w_{j} x_{i}^{j}))^{1-f(x_{i}^{1},x_{i}^{2})} L(w0,w1,w2)=i∏pi=i∏g(wjxij)f(xi1,xi2)⋅(1−g(wjxij))1−f(xi1,xi2)
取对数得 l ( w 0 , w 1 , w 2 ) = l n L ( w 0 , w 1 , w 2 ) = ∑ i [ f ( x i 1 , x i 2 ) l n ( g ( w j x i j ) ) + ( 1 − f ( x i 1 , x i 2 ) ) l n ( 1 − g ( w j x i j ) ) ] l(w_{0},w_{1},w_{2})=lnL(w_{0},w_{1},w_{2})=\sum_{i}[f(x_{i}^{1},x_{i}^{2})ln(g(w_{j} x_{i}^{j}))+(1-f(x_{i}^{1},x_{i}^{2}))ln(1-g(w_{j} x_{i}^{j}))] l(w0,w1,w2)=lnL(w0,w1,w2)=i∑[f(xi1,xi2)ln(g(wjxij))+(1−f(xi1,xi2))ln(1−g(wjxij))]
于是目标转化为求 l ( w 0 , w 1 , w 2 ) l(w_{0},w_{1},w_{2}) l(w0,w1,w2)的最大值,这里我们采用梯度上升法,
(梯度上升(下降)法可以参考我之前的文章:梯度下降法线性回归模拟)
对 w 0 , w 1 , w 2 w_{0},w_{1},w_{2} w0,w1,w2逐个求偏导得:
∂ l ∂ w k = ∑ i ( f ( x i 1 , x i 2 ) − g ( w j x i j ) ) x i k \frac{\partial l}{\partial w_{k} } =\sum_{i} (f(x_{i}^{1},x_{i}^{2})-g(w_{j} x_{i}^{j}))x_{i}^{k} ∂wk∂l=i∑(f(xi1,xi2)−g(wjxij))xik
若设 n n n为迭代次数,求得迭代公式如下:
{ w k ( n + 1 ) = w k ( n ) + α ∑ i ( f ( x i 1 , x i 2 ) − g ( w j ( n ) x i j ) ) x i k l ( n ) = ∑ i [ f ( x i 1 , x i 2 ) l n ( g ( w j ( n ) x i j ) ) + ( 1 − f ( x i 1 , x i 2 ) ) l n ( 1 − g ( w j ( n ) x i j ) ) ] \left\{\begin{matrix} w_{k}^{(n+1)} =w_{k}^{(n)}+\alpha \sum_{i} (f(x_{i}^{1},x_{i}^{2})-g(w_{j}^{(n)} x_{i}^{j}))x_{i}^{k} \\ l^{(n)} =\sum_{i}[f(x_{i}^{1},x_{i}^{2})ln(g(w_{j}^{(n)} x_{i}^{j}))+(1-f(x_{i}^{1},x_{i}^{2}))ln(1-g(w_{j}^{(n)} x_{i}^{j}))] \end{matrix}\right. {wk(n+1)=wk(n)+α∑i(f(xi1,xi2)−g(wj(n)xij))xikl(n)=∑i[f(xi1,xi2)ln(g(wj(n)xij))+(1−f(xi1,xi2))ln(1−g(wj(n)xij))]
其中 k = 0 , 1 , 2 ; x i 0 = 1 k=0,1,2;x_{i}^{0}=1 k=0,1,2;xi0=1
写成展开形式为
{ w 0 ( n + 1 ) = w 0 ( n ) + α ∑ i ( f ( x i 1 , x i 2 ) − 1 1 + e − ( w 0 ( n ) + w 1 ( n ) x i 1 + w 2 ( n ) x i 2 ) ) w 1 ( n + 1 ) = w 1 ( n ) + α ∑ i x i 1 ( f ( x i 1 , x i 2 ) − 1 1 + e − ( w 0 ( n ) + w 1 ( n ) x i 1 + w 2 ( n ) x i 2 ) ) w 2 ( n + 1 ) = w 2 ( n ) + α ∑ i x i 2 ( f ( x i 1 , x i 2 ) − 1 1 + e − ( w 0 ( n ) + w 1 ( n ) x i 1 + w 2 ( n ) x i 2 ) ) l ( n ) = ∑ i [ f ( x i 1 , x i 2 ) l n ( 1 1 + e − ( w 0 ( n ) + w 1 ( n ) x i 1 + w 2 ( n ) x i 2 ) ) + ( 1 − f ( x i 1 , x i 2 ) ) l n ( 1 − 1 1 + e − ( w 0 ( n ) + w 1 ( n ) x i 1 + w 2 ( n ) x i 2 ) ) ] \left\{\begin{matrix} w_{0}^{(n+1)} =w_{0}^{(n)}+\alpha \sum_{i} (f(x_{i}^{1},x_{i}^{2})-\frac{1}{1+e^{-(w_{0}^{(n)}+ w_{1}^{(n)}x_{i}^{1}+w_{2}^{(n)}x_{i}^{2})}}) \\ w_{1}^{(n+1)} =w_{1}^{(n)}+\alpha \sum_{i} x_{i}^{1}(f(x_{i}^{1},x_{i}^{2})-\frac{1}{1+e^{-(w_{0}^{(n)}+ w_{1}^{(n)}x_{i}^{1}+w_{2}^{(n)}x_{i}^{2})}}) \\ w_{2}^{(n+1)} =w_{2}^{(n)}+\alpha \sum_{i} x_{i}^{2}(f(x_{i}^{1},x_{i}^{2})-\frac{1}{1+e^{-(w_{0}^{(n)}+ w_{1}^{(n)}x_{i}^{1}+w_{2}^{(n)}x_{i}^{2})}}) \\ l^{(n)} =\sum_{i}[f(x_{i}^{1},x_{i}^{2})ln(\frac{1}{1+e^{-(w_{0}^{(n)}+ w_{1}^{(n)}x_{i}^{1}+w_{2}^{(n)}x_{i}^{2})}})+(1-f(x_{i}^{1},x_{i}^{2}))ln(1-\frac{1}{1+e^{-(w_{0}^{(n)}+ w_{1}^{(n)}x_{i}^{1}+w_{2}^{(n)}x_{i}^{2})}})] \end{matrix}\right. ⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧w0(n+1)=w0(n)+α∑i(f(xi1,xi2)−1+e−(w0(n)+w1(n)xi1+w2(n)xi2)1)w1(n+1)=w1(n)+α∑ixi1(f(xi1,xi2)−1+e−(w0(n)+w1(n)xi1+w2(n)xi2)1)w2(n+1)=w2(n)+α∑ixi2(f(xi1,xi2)−1+e−(w0(n)+w1(n)xi1+w2(n)xi2)1)l(n)=∑i[f(xi1,xi2)ln(1+e−(w0(n)+w1(n)xi1+w2(n)xi2)1)+(1−f(xi1,xi2))ln(1−1+e−(w0(n)+w1(n)xi1+w2(n)xi2)1)]
其中 α \alpha α为学习参数,选择合适的 α \alpha α,当经过足够多的迭代次数之后, w i w_{i} wi收敛到 w i ^ \hat{w_{i}} wi^, l ( w 0 , w 1 , w 2 ) l(w_{0},w_{1},w_{2}) l(w0,w1,w2)收敛到 l M l_{M} lM, l M l_{M} lM即为分类函数 l ( w 0 , w 1 , w 2 ) l(w_{0},w_{1},w_{2}) l(w0,w1,w2)的最大值。在这种情况下收敛的 w i ^ \hat{w_{i}} wi^所满足的直线 w j ^ x i j = w 0 ^ + w 1 ^ x 1 + w 2 ^ x 2 = 0 \hat{w_{j}} x_{i}^{j}=\hat{w_{0}}+\hat{w_{1}}x^{1}+\hat{w_{2}}x^{2}=0 wj^xij=w0^+w1^x1+w2^x2=0可以使得两类样本点分别尽可能远离这条直线,因此该直线即为最后分类两类样本点的目标直线。
在初值的选择上 w i ( 0 ) = 0 w_{i}^{(0)}=0 wi(0)=0,这样在梯度上升的作用下能保证 w 2 ( k ) > 0 w_{2}^{(k)}>0 w2(k)>0,由此满足要求。
对于已有样本点 ( x i 1 , x i 2 ) (x_{i}^{1},x_{i}^{2}) (xi1,xi2),我们可以通过比较 w 0 ^ + w 1 ^ x 1 + w 2 ^ x 2 \hat{w_{0}}+\hat{w_{1}}x^{1}+\hat{w_{2}}x^{2} w0^+w1^x1+w2^x2与 0 0 0的关系来实现,也就是计算 p i = g ( w j ^ x i j ) f ( x i 1 , x i 2 ) ⋅ ( 1 − g ( w j ^ x i j ) ) 1 − f ( x i 1 , x i 2 ) p_{i}=g(\hat{w_{j}} x_{i}^{j})^{f(x_{i}^{1},x_{i}^{2})}·(1-g(\hat{w_{j}} x_{i}^{j}))^{1-f(x_{i}^{1},x_{i}^{2})} pi=g(wj^xij)f(xi1,xi2)⋅(1−g(wj^xij))1−f(xi1,xi2)是否大于 0.5 0.5 0.5来判断是否分类成功。
若给定新的样本点 ( x i a 1 , x i a 2 ) (x_{ia}^{1},x_{ia}^{2}) (xia1,xia2), p i p_{i} pi即为表征置信概率的指标。
若把新的样本点 ( x i a 1 , x i a 2 ) (x_{ia}^{1},x_{ia}^{2}) (xia1,xia2)划分到红色点集,即令 f ( x i a 1 , x i a 2 ) = 1 f(x_{ia}^{1},x_{ia}^{2})=1 f(xia1,xia2)=1,由此算出的 p i p_{i} pi即为这种划分模式的可信度。
其中置信函数的LOGISTIC形式为:
g ( x 1 , x 2 ) = 1 1 + e − ( w 0 ^ + w 1 ^ x 1 + w 2 ^ x 2 ) g(x^{1},x^{2})=\frac{1}{1+e^{-(\hat{w_{0}}+\hat{w_{1}}x^{1}+\hat{w_{2}}x^{2})}} g(x1,x2)=1+e−(w0^+w1^x1+w2^x2)1
本例中训练集和测试集的数据如下:
百度网盘分享:Test_Train_Set.xls,提取码:mr6d
训练集样本 ( x i , y i ) (x_{i},y_{i}) (xi,yi),用该样本来计算分类直线:
测试集样本 ( x i ′ , y i ′ ) (x'_{i},y'_{i}) (xi′,yi′),用之前训练出的分类直线来验证用训练集算出的分类直线的正确率:
代码如下:
import xlrd as xd
import numpy as np
from matplotlib import cm
import matplotlib.pyplot as plt
import math
from mpl_toolkits.mplot3d import Axes3D
# matplotlib画图中中文显示会有问题,需要这两行设置默认字体可以显示中文
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# -----------------------------------------------------------------------------------
# # 读取数据
# 打开excel表所在路径
data = xd.open_workbook('Test_Train_Set.xls')
sheet = data.sheet_by_name('Sheet1')
Data_All = []
test_Desired = []
test_Input1 = []
test_Input2 = []
train_Desired = []
train_Input1 = []
train_Input2 = []
# 将表中数据按行逐步添加到列表中,最后转换为list结构
for r in range(sheet.nrows):
data1 = []
for c in range(sheet.ncols):
data1.append(sheet.cell_value(r, c))
Data_All.append(list(data1))
test_Desired.append(data1[0])
test_Input1.append(data1[1])
test_Input2.append(data1[2])
train_Desired.append(data1[3])
train_Input1.append(data1[4])
train_Input2.append(data1[5])
# 数据筛选,把表头去掉
test_Desired = test_Desired[1:]
test_Input1 = test_Input1[1:]
test_Input2 = test_Input2[1:]
train_Desired = train_Desired[1:]
train_Input1 = train_Input1[1:]
train_Input2 = train_Input2[1:]
# 数据转化为矩阵
test_DesiredM = np.c_[test_Desired[1:]]
test_Input1M = np.c_[test_Input1[1:]]
test_Input2M = np.c_[test_Input2[1:]]
train_DesiredM = np.c_[train_Desired[1:]]
train_Input1M = np.c_[train_Input1[1:]]
train_Input2M = np.c_[train_Input2[1:]]
# -----------------------------------------------------------------------------------
# 定义函数
# Logistic函数: lgstc'(x) = lgstc(x)(1 - lgstc(x))
def lgstc (x):
return 1 / (1 + math.exp(-x))
# 目标方程:1 + w1x1 + w2x2 = 0
# w0增量
def Del_w0(Input1, Input2, Desired, w0, w1, w2):
sumDel = 0
for i in range(len(Input1)):
lc_p = lgstc(w0 + w1 * Input1[i] + w2 * Input2[i])
sumDel += Desired[i] - lc_p
return sumDel
# w1增量
def Del_w1(Input1, Input2, Desired, w0, w1, w2):
sumDel = 0
for i in range(len(Input1)):
lc_p = lgstc(w0 + w1 * Input1[i] + w2 * Input2[i])
sumDel += Input1[i] * (Desired[i] - lc_p)
return sumDel
# w2增量
def Del_w2(Input1, Input2, Desired, w0, w1, w2):
sumDel = 0
for i in range(len(Input1)):
lc_p = lgstc(w0 + w1 * Input1[i] + w2 * Input2[i])
sumDel += Input2[i] * (Desired[i] - lc_p)
return sumDel
# Loss函数每次迭代后的损失函数
def Loss(Input1, Input2, Desired, w0, w1, w2):
loss = 0
for i in range(len(Input1)):
lc_p = lgstc(w0 + w1 * Input1[i] + w2 * Input2[i])
loss += Desired[i] * math.log(lc_p) + (1 - Desired[i]) * math.log(1 - lc_p)
return loss
# 梯度上升法求极值
def GDM_LC(Input1, Input2, Desired, w0 = 0, w1 = 0, w2 = 0, Alpha = 0.1, err = 1e-9):
w0_list = []
w1_list = []
w2_list = []
loss_list = []
IterTime = 0
while 1:
w0, w1, w2 =\
w0 + Alpha * Del_w0(Input1, Input2, Desired, w0, w1, w2), \
w1 + Alpha * Del_w1(Input1, Input2, Desired, w0, w1, w2), \
w2 + Alpha * Del_w2(Input1, Input2, Desired, w0, w1, w2)
w0_list.append(w0)
w1_list.append(w1)
w2_list.append(w2)
loss_list.append(Loss(Input1, Input2, Desired, w0, w1, w2))
IterTime = IterTime + 1
if IterTime > 3:
if abs(w1_list[IterTime-1] - w1_list[IterTime - 2]) < err:
if abs(w2_list[IterTime - 1] - w2_list[IterTime - 2]) < err:
break
return w0, w1, w2, w0_list, w1_list, w2_list, loss_list, IterTime
# 正确率验证函数
def bool_rate(Input1, Input2, Desired, w0, w1, w2):
boolnum = len(Input1)
boolT = 0
for i in range(boolnum):
if Desired[i] == 0:
if w0 + w1 * Input1[i] + w2 * Input2[i] < 0:
boolT += 1
elif Desired[i] == 1:
if w0 + w1 * Input1[i] + w2 * Input2[i] >= 0:
boolT += 1
return boolT / boolnum
# -----------------------------------------------------------------------------------
#
Alpha = 0.1 # 学习参数
err = 1e-9 # 误差
w0, w1, w2, w0_list, w1_list, w2_list, loss_list, IterTime =\
GDM_LC(test_Input1, test_Input2, test_Desired, 0, 0, 0, Alpha, err)
print('测试集样本数量:'+str(len(test_Desired)))
print('训练集样本数量:'+str(len(train_Desired)))
print('迭代次数为:'+str(IterTime))
print('学习参数为:'+str(Alpha))
print('误差不超过:'+str(err))
str_line = '线性分类的直线为:'+str(round(w0, 3))+'+'+str(round(w1, 3))+'x+'+str(round(w2, 3))+'y=0'
str_test = '训练集的正确率:'+str(round(100 * bool_rate(test_Input1, test_Input2, test_Desired, w0, w1, w2), 3))+'%'
str_train = '测试集的正确率:'+str(round(100 * bool_rate(train_Input1, train_Input2, train_Desired, w0, w1, w2), 3))+'%'
print(str_line)
print(str_test)
print(str_train)
X = test_Input1M
Y = - w0 / w2 - w1 * X / w2
# -----------------------------------------------------------------------------------
# 绘图
plt.figure('线性分类求解(训练集)')
plt.plot(X, Y, label=str_line)
for i in range(len(test_Desired)):
if test_Desired[i] == 1:
plt.scatter(test_Input1[i], test_Input2[i], c='r', marker='+')
else:
plt.scatter(test_Input1[i], test_Input2[i], c='b', marker='.')
plt.title('训练集样本')
plt.xlabel('训练集坐标x')
plt.ylabel('训练集坐标y')
plt.legend()
plt.figure('线性分类验证(测试集)')
plt.plot(X, Y, label=str_line)
for i in range(len(train_Desired)):
if train_Desired[i] == 1:
plt.scatter(train_Input1[i], train_Input2[i], c='r', marker='+')
else:
plt.scatter(train_Input1[i], train_Input2[i], c='b', marker='.')
plt.title('测试集样本')
plt.xlabel('测试集坐标x')
plt.ylabel('测试集坐标y')
plt.legend()
plt.figure('指标收敛图像')
plt.subplot(1, 2, 1)
plt.title('概率函数收敛曲线')
plt.semilogx(loss_list, label='指标函数收敛到'+str(round(loss_list[len(loss_list) - 1], 3)))
plt.legend()
plt.subplot(1, 2, 2)
plt.title('直线的三个指标收敛曲线')
plt.semilogx(w0_list, label='w0收敛到'+str(round(w0_list[len(w0_list) - 1], 3)))
plt.semilogx(w1_list, label='w1收敛到'+str(round(w1_list[len(w1_list) - 1], 3)))
plt.semilogx(w2_list, label='w2收敛到'+str(round(w2_list[len(w2_list) - 1], 3)))
plt.legend()
plt.show()
# 概率预测曲面图(三维图)
fig = plt.figure('概率预测曲面图')
ax = fig.add_subplot(111, projection='3d')
x_ax = np.arange(-1, 1, 0.005)
y_ax = np.arange(0, 1, 0.005)
x_ax, y_ax = np.meshgrid(x_ax, y_ax)
z_ax = 1 / (1 + np.exp(- (w0 + w1 * x_ax + w2 * y_ax)))
ax.plot_surface(x_ax, y_ax, z_ax, rstride=1, cstride=1, cmap=cm.coolwarm)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_title('概率预测曲面$z=(1+e^{-8.937+2.590x+20.537y})^{-1}$')
x_line = np.linspace(-1, 1, 10)
y_line = - w0 / w2 - w1 * x_line / w2
z_line = 1 / (1 + np.exp(w0 + w1 * x_line + w2 * y_line))
ax.plot(x_line, y_line, z_line, c='k', label='在曲面上的分类直线:'+str(round(w0, 3))+'+'+str(round(w1, 3))+'x+'+str(round(w2, 3))+'y=0')
ax.legend()
plt.show()
输出结果为:
w 0 , w 1 , w 2 , l w_{0},w_{1},w_{2},l w0,w1,w2,l的收敛曲线如下:
于是预测样本点的概率函数图像如下:
z = 1 1 + e 8.937 − 2.590 x − 20.537 y z=\frac{ 1 }{1+e^{8.937-2.590x-20.537y} } z=1+e8.937−2.590x−20.537y1
例如,随意增添样本点 ( 0.6 , 0.6 ) (0.6,0.6) (0.6,0.6),我们希望将该点认定为从点集 D D D中取出,即把该点分到 f ( 0.6 , 0.6 ) = 1 f(0.6,0.6)=1 f(0.6,0.6)=1的情况,可信度为
p = 1 1 + e 8.937 − 2.590 × 0.6 − 20.537 × 0.6 = 99.29 % p=\frac{ 1 }{1+e^{8.937-2.590\times 0.6-20.537\times 0.6} } =99.29\% p=1+e8.937−2.590×0.6−20.537×0.61=99.29%
直线的一般方程为 w 0 + w 1 x + w 2 y = 0 w_{0} +w_{1} x + w_{2} y =0 w0+w1x+w2y=0,有三个未知量,理论上有无数组 ( w 0 , w 1 , w 2 ) (w_{0},w_{1},w_{2}) (w0,w1,w2)可以表示同一条直线。但是如果在本例中我们在求解分类直线时只设置两个未知量 1 + w 1 x + w 2 y = 0 1 +w_{1} x + w_{2} y =0 1+w1x+w2y=0,再利用上述方法求解得不到正确的收敛值。因为在LOGISTIC函数中还应在设置一个参数 a a a, g ( a ; x ) = 1 1 + e − a x g(a;x)=\frac{1}{1+e^{-ax}} g(a;x)=1+e−ax1
用一般方程 w 0 + w 1 x + w 2 y = 0 w_{0} +w_{1} x + w_{2} y =0 w0+w1x+w2y=0去拟合直线只不过把 a a a含盖在了里面。
a a a参数也是表征样本性质的参数,比较以下两个样本
很显然样本1的两类点集的分离情况要好于样本2,因此在寻求最优分类直线的时候,样本1两类点集中间的空间有更多的置信空间,再求得分类的最优直线后再给定样本点,其所在区域分类到某一侧的确定度非常高。相比之下样本2的两类点集有交叉,因此样本2的最优分类直线两侧的区域置信概率有一定的模糊性,在选择分类标准时不容易区分。
因此我们可以得到LOGISTIC函数在样本1和样本2分别求出的 a a a参数 a 1 a_{1} a1和 a 2 a_{2} a2有如下关系:
a 1 > a 2 a_{1}>a_{2} a1>a2