【题目一】现有四个来自于两个类别的二维空间中的样本, 其中第一类的两个样本为 ( 1 , 4 ) T (1,4)^T (1,4)T 和 ( 2 , 3 ) T (2,3)^T (2,3)T, 第二类的两个样本为 ( 4 , 1 ) T (4,1)^T (4,1)T 和 ( 3 , 2 ) T (3,2)^T (3,2)T 。这里, 上标 T T T 表示向量转置。若采用规范化增广样本表示形式, 并假设初始的权向量 a = ( 0 , 1 , 0 ) T \mathbf{a}=(0,1,0)^T a=(0,1,0)T, 其中向量 a \mathbf{a} a 的第三维对应于样本的齐次坐标。同时, 假定梯度更新步长 η k \eta_k ηk 固定为 1 。试利用批处理感知准则函数方法求解线性判别函数 g ( y ) = a T y g(\mathbf{y})=\mathbf{a}^T \mathbf{y} g(y)=aTy 的权向量 a \mathbf{a} a 。(注: “规范化增广样本表示” 是指对齐次坐标表示的样本 进行规范化处理。
【解】第一类的样本规范化后为: x 1 = [ 1 , 4 , 1 ] x_1=[1,4,1] x1=[1,4,1] 和 x 2 = [ 2 , 3 , 1 ] x_2=[2,3,1] x2=[2,3,1]; 第二类的样本规范化后为: x 3 = [ − 4 , − 1 , − 1 ] x_3=[-4,-1,-1] x3=[−4,−1,−1] 和 x 4 = [ − 3 , − 2 , − 1 ] x_4=[-3,-2,-1] x4=[−3,−2,−1], 初始化的权重向量为 [ 0 , 1 , 0 ] [0,1,0] [0,1,0], 用权重向量来判别所有样本的类别, 结果如下: x 1 : [ 0 , 1 , 0 ] [ 1 , 4 , 1 ] T = 4 > 0 x 2 : [ 0 , 1 , 0 ] [ 2 , 3 , 1 ] T = 3 > 0 x 3 : [ 0 , 1 , 0 ] [ − 4 , − 1 , − 1 ] T = − 1 < 0 x 4 : [ 0 , 1 , 0 ] [ − 3 , − 2 , − 1 ] T = − 2 < 0 \begin{gathered} x_1:[0,1,0][1,4,1]^T=4>0 \\ x_2:[0,1,0][2,3,1]^T=3>0 \\ x_3:[0,1,0][-4,-1,-1]^T=-1<0 \\ x_4:[0,1,0][-3,-2,-1]^T=-2<0 \end{gathered} x1:[0,1,0][1,4,1]T=4>0x2:[0,1,0][2,3,1]T=3>0x3:[0,1,0][−4,−1,−1]T=−1<0x4:[0,1,0][−3,−2,−1]T=−2<0希望判别结果全为正, 所以 x 3 x_3 x3 和 x 4 x_4 x4 被错分, 更新步长为 1 时, 权向量更新为: [ 0 , 1 , 0 ] + [ − 4 , − 1 , − 1 ] + [ − 3 , − 2 , − 1 ] = [ − 7 , − 2 , − 2 ] [0,1,0]+[-4,-1,-1]+[-3,-2,-1]=[-7,-2,-2] [0,1,0]+[−4,−1,−1]+[−3,−2,−1]=[−7,−2,−2]计算是否都正确分类
x 1 : [ − 7 , − 2 , − 2 ] [ 1 , 4 , 1 ] T = − 17 < 0 x 2 : [ − 7 , − 2 , − 2 ] [ 2 , 3 , 1 ] T = − 22 < 0 x 3 : [ − 7 , − 2 , − 2 ] [ − 4 , − 1 , − 1 ] T = 32 > 0 x 4 : [ − 7 , − 2 , − 2 ] [ − 3 , − 2 , − 1 ] T = 27 > 0 \begin{gathered} x_1:[-7,-2,-2][1,4,1]^T=-17<0 \\ x_2:[-7,-2,-2][2,3,1]^T=-22<0 \\ x_3:[-7,-2,-2][-4,-1,-1]^T=32>0 \\ x_4:[-7,-2,-2][-3,-2,-1]^T=27>0 \end{gathered} x1:[−7,−2,−2][1,4,1]T=−17<0x2:[−7,−2,−2][2,3,1]T=−22<0x3:[−7,−2,−2][−4,−1,−1]T=32>0x4:[−7,−2,−2][−3,−2,−1]T=27>0希望判别结果全为正, 所以 x 1 x_1 x1 和 x 2 x_2 x2 被错分, 更新步长为 1 时, 权向量更新为: [ − 7 , − 2 , − 2 ] + [ 1 , 4 , 1 ] + [ 2 , 3 , 1 ] = [ − 4 , 5 , 0 ] [-7,-2,-2]+[1,4,1]+[2,3,1]=[-4,5,0] [−7,−2,−2]+[1,4,1]+[2,3,1]=[−4,5,0]计算是否正确分类 x 1 : [ − 4 , 5 , 0 ] [ 1 , 4 , 1 ] T = 16 > 0 x 2 : [ − 4 , 5 , 0 ] [ 2 , 3 , 1 ] T = 7 > 0 x 3 : [ − 4 , 5 , 0 ] [ − 4 , − 1 , − 1 ] T = 11 > 0 x 4 : [ − 4 , 5 , 0 ] [ − 3 , − 2 , − 1 ] T = 2 > 0 \begin{gathered} x_1:[-4,5,0][1,4,1]^T=16>0 \\ x_2:[-4,5,0][2,3,1]^T=7>0\\ x_3:[-4,5,0][-4,-1,-1]^T=11>0 \\ x_4:[-4,5,0][-3,-2,-1]^T=2>0 \end{gathered} x1:[−4,5,0][1,4,1]T=16>0x2:[−4,5,0][2,3,1]T=7>0x3:[−4,5,0][−4,−1,−1]T=11>0x4:[−4,5,0][−3,−2,−1]T=2>0于是得到最终的权向量 [ − 4 , 5 , 0 ] [-4,5,0] [−4,5,0]
【题目二】对于多类分类情形, 考虑 one-vs-all 技巧, 即构建 c c c 个线性判别函数: g i ( x ) = w i T x + w i 0 , i = 1 , 2 , … , c g_i(\mathbf{x})=\mathbf{w}_i^T \mathbf{x}+w_{i 0}, \quad i=1,2, \ldots, c gi(x)=wiTx+wi0,i=1,2,…,c此时的决策规则为: 对 j ≠ i j \neq i j=i, 如果 g i ( x ) > g j ( x ) , x g_i(\mathbf{x})>g_j(\mathbf{x}), \mathbf{x} gi(x)>gj(x),x 则被分为 ω i \omega_i ωi 类。现有三个二维空间 内的模式分类器, 其判别函数为: g 1 ( x ) = − x 1 + x 2 g 2 ( x ) = x 1 + x 2 − 1 g 3 ( x ) = − x 2 \begin{aligned} & g_1(\mathbf{x})=-x_1+x_2 \\ & g_2(\mathbf{x})=x_1+x_2-1 \\ & g_3(\mathbf{x})=-x_2 \end{aligned} g1(x)=−x1+x2g2(x)=x1+x2−1g3(x)=−x2试画出决策面, 指出为何此时不存在分类不确定性区域。
【解】根据决策规则, 属于 ω \omega ω 的区域应该满足 g 1 ( x ) > g 2 ( x ) g_1(x)>g_2(x) g1(x)>g2(x) 且 g 1 ( x ) > g 3 ( x ) g_1(x)>g_3(x) g1(x)>g3(x), 所以 ω 1 \omega_1 ω1 的决策边界为:
g 1 ( x ) − g 2 ( x ) = − 2 x 1 + 1 = 0 g 1 ( x ) − g 3 ( x ) = − x 1 + 2 x 2 = 0 \begin{gathered} g_1(x)-g_2(x)=-2 x_1+1=0 \\ g_1(x)-g_3(x)=-x_1+2 x_2=0 \end{gathered} g1(x)−g2(x)=−2x1+1=0g1(x)−g3(x)=−x1+2x2=0
还有一条分界线 g 2 ( x ) − g 3 ( x ) = x 1 + 2 x 2 − 1 = 0 g_2(x)-g_3(x)=x_1+2x_2-1=0 g2(x)−g3(x)=x1+2x2−1=0由于决策边界交于一点 ( 0.5 , 0.25 ) (0.5,0.25) (0.5,0.25), 因此, 不存在不确定区域
clc;
close all;
clear;
plot(0.5*ones(1,100),linspace(0.25,5,100));
hold on;
x1 = linspace(-5,0.5,100);
plot(x1,1/2*x1);
hold on;
x2 = linspace(0.5,5,100);
plot(x2,(-x2+1)/2);
t = text(-3,3,'{\omega_1}');
t.FontSize = 24;
t1 = text(3,3,'{\omega_2}');
t1.FontSize = 24;
t2 = text(0.5,-2,'{\omega_3}');
t2.FontSize = 24;
clc;
close all;
clear;
trainset1 = [0.1, 1.1, 1; 6.8, 7.1, 1; -3.5, -4.1, 1;
2.0, 2.7, 1; 4.1, 2.8, 1; 3.1, 5.0, 1;-0.8, -1.3, 1;
0.9, 1.2, 1; 5.0, 6.4, 1; 3.9, 4.0, 1;-7.1, -4.2, -1;
1.4, 4.3, -1; -4.5, -0.0, -1;-6.3, -1.6, -1;-4.2, -1.9, -1;-1.4, 3.2, -1;
-2.4, 4.0, -1;-2.5, 6.1, -1;-8.4, -3.7, -1;-4.1, -2.2, -1];
trainset2 = [-7.1, -4.2, -1;
1.4, 4.3, -1; -4.5, -0.0, -1;-6.3, -1.6, -1;-4.2, -1.9, -1;-1.4, 3.2, -1;
-2.4, 4.0, -1;-2.5, 6.1, -1;-8.4, -3.7, -1;-4.1, -2.2, -1 ;-3.0, -2.9, 1;0.5, 8.7, 1;2.9, 2.1, 1;
-0.1, 5.2, 1;-4.0, 2.2, 1;-1.3, 3.7, 1;-3.4, 6.2, 1;-4.1, 3.4, 1;
-5.1, 1.6, 1;1.9, 5.1, 1];
omega_1 = [0,0,0];
omega_2 = [0,0,0];
learning_rate = 0.01;
iteration1 = 0;
while iteration1<=1001
iteration1 = iteration1 + 1;
if sum(omega_1*trainset1'>0) == 20
%print('迭代次数为',num2str(iteration1))
%print('权重为',num2str(omega_1))
iteration1
omega_1
break
else
omega_1 = omega_1 + sum(trainset1(omega_1*trainset1' <= 0,:)*learning_rate);
end
if iteration1 == 1000
print('迭代次数已达最大1000')
omega1
end
end
iteration2 = 0;
while iteration2<=1001
iteration2 = iteration2 + 1;
if sum(omega_2*trainset2'>0) == 20
%print('迭代次数为',num2str(iteration1))
%print('权重为',num2str(omega_1))
iteration2
omega_2
break
else
omega_2 = omega_2 + sum(trainset2(omega_2*trainset2' <= 0,:)*learning_rate);
end
if iteration2 == 1000
print('迭代次数已达最大1000')
omega2
end
end
clc;
close all;
clear;
a = [0,0,0]'; % 初始权重
b = ones(20,1)*0.01; % 初始margin
bmin = ones(20,1)*0.001; % 误差阈值
Y1 = [0.1 1.1 1;
6.8 7.1 1;
-3.5 -4.1 1;
2.0 2.7 1;
4.1 2.8 1;
3.1 5.0 1;
-0.8 -1.3 1;
0.9 1.2 1;
5.0 6.4 1;
3.9 4.0 1;
3.0 2.9 -1;
-0.5 -8.7 -1;
-2.9 -2.1 -1;
0.1 -5.2 -1;
4.0 -2.2 -1;
1.3 -3.7 -1;
3.4 -6.2 -1;
4.1 -3.4 -1;
5.1 -1.6 -1;
-1.9 -5.1 -1];
Y2 = [7.1 4.2 1;
-1.4 -4.3 1;
4.5 0.0 1;
6.3 1.6 1;
4.2 1.9 1;
1.4 -3.2 1;
2.4 -4.0 1;
2.5 -6.1 1;
8.4 3.7 1;
4.1 -2.2 1;
2.0 8.4 -1;
8.9 -0.2 -1;
4.2 7.7 -1;
8.5 3.2 -1;
6.7 4.0 -1;
0.5 9.2 -1;
5.3 6.7 -1;
8.7 6.4 -1;
7.1 9.7 -1;
8.0 6.3 -1];
kmax = 100000; % 最大迭代次数
learning_rate = 0.01;
iterations = 0; % 迭代次数
e = [1 1 1]'; % error
%======================%
while 1
e = Y1*a-b;
e_plus = 1/2*(e + abs(e));
b = b + 2*learning_rate*e_plus;
a = (Y1'*Y1)\Y1'*b;
iterations =iterations+1;
if abs(e) <= bmin
a
b
iterations
break
end
if iterations == kmax
disp('No solution found!')
sprintf('迭代已达最大次数%d',kmax)
disp('========================')
break
end
end
%======================%
while 1
e = Y2*a-b;
e_plus = 1/2*(e + abs(e));
b = b + 2*learning_rate*e_plus;
a = (Y2'*Y2)\Y2'*b;
iterations =iterations+1;
if abs(e) <= bmin
a
b
iterations
break
end
if iterations == kmax
disp('No solution found!')
sprintf('迭代已达最大次数%d',kmax)
break
end
end
输出结果:
No solution found!
ans =
'迭代已达最大次数100000'
========================
a =
0.0063
0.0050
0.0398
b =
0.1056
0.0105
0.0682
0.0875
0.0758
0.0326
0.0349
0.0250
0.1113
0.0546
0.0149
0.0156
0.0252
0.0298
0.0224
0.0100
0.0272
0.0471
0.0535
0.0422
iterations =
122298
【分析】
由于1类和3类是线性不可分的,所以算法肯定是不收敛的。
画出上面这个分布图的代码:
clc;
close all;
clear;
omega1_x = [0.1,6.8,-3.5,2.0,4.1,3.1,-0.8,0.9,5.0,3.9];
omega1_y = [1.1,7.1,-4.1,2.7,2.8,5.0,-1.3,1.2,6.4,4.0];
omega2_x = [7.1,-1.4,4.5,6.3,4.2,1.4,2.4,2.5,8.4,4.1];
omega2_y = [4.2,-4.3,0.0,1.6,1.9,-3.2,-4.0,-6.1,3.7,-2.2];
omega3_x = [-3.0,0.5,2.9,-0.1,-4.0,-1.3,-3.4,-4.1,-5.1,1.9];
omega3_y = [-2.9,8.7,2.1,5.2,2.2,3.7,6.2,3.4,1.6,5.1];
omega4_x = [-2.0,-8.9,-4.2,-8.5,-6.7,-0.5,-5.3,-8.7,-7.1,-8.0];
omega4_y = [-8.4,0.2,-7.7,-3.2,-4.0,-9.2,-6.7,-6.4,-9.7,-6.3];
figure();
scatter(omega1_x,omega1_y,'filled');
hold on
scatter(omega2_x,omega2_y,'filled');
hold on
scatter(omega3_x,omega3_y,'filled');
hold on
scatter(omega4_x,omega4_y,'filled');
hold off
legend('\omega_1','\omega_2','\omega_3','\omega_4');
clc;
close all;
clear;
Y = [1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;1 0 0 0;
0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;0 1 0 0;
0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;0 0 1 0;
0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;0 0 0 1;]';
X_hat = [0.1 1.1 1;6.8 7.1 1;-3.5 -4.1 1;2.0 2.7 1;4.1 2.8 1;3.1 5.0 1;-0.8 -1.3 1;0.9 1.2 1;
7.1 4.2 1;-1.4 -4.3 1;4.5 0.0 1;6.3 1.6 1;4.2 1.9 1;1.4 -3.2 1;2.4 -4.0 1;2.5 -6.1 1;-3.0 -2.9 1;
0.5 8.7 1;2.9 2.1 1;-0.1 5.2 1;-4.0 2.2 1;-1.3 3.7 1;-3.4 6.2 1;-4.1 3.4 1;-2.0 -8.4 1;-8.9 0.2 1;
-4.2 -7.7 1;-8.5 -3.2 1;-6.7 -4.0 1;-0.5 -9.2 1;-5.3 -6.7 1;-8.7 -6.4 1]';
W_hat = (X_hat*X_hat')\X_hat*Y';
X_test = [5.0 6.4 1;3.9 4.0 1;8.4 3.7 1;4.1 -2.2 1;
-5.1 1.6 1;1.9 5.1 1;-7.1 -9.7 1; -8.0 -6.3 1]';
[a,b] = max(W_hat'*X_test);
b
正确率100%