《统计学习方法》第7章习题答案

7.2

min ⁡ 1 2 ∥ w ∥ 2 s . t . y i ( w ⋅ x i + b ) ≥ 1 , i = 1 , 2 , … , 5 \min \frac{1}{2} {\parallel w \parallel}^2 \\ s.t. y_i(w \cdot x_i + b) \ge 1,i=1,2,\ldots,5 min21w2s.t.yi(wxi+b)1,i=1,2,,5

( x 1 , y 1 ) , … , ( x 5 , y 5 ) (x_1,y_1),\ldots,(x_5,y_5) (x1,y1),,(x5,y5)代入,得

min ⁡ 1 2 ( w 1 2 + w 2 2 ) s . t . w 1 + 2 w 2 + b ≥ 1 ①    2 w 1 + 3 w 2 + b ≥ 1 ②    3 w 1 + 3 w 2 + b ≥ 1 ③ − 2 w 1 − w 2 − b ≥ 1 ④ − 3 w 1 − 2 w 2 − b ≥ 1 ⑤ \min \frac{1}{2} (w_1^2 + w_2^2) \\ s.t. w_1 + 2 w_2 + b\ge 1 \quad① \\ \quad\;2 w_1 + 3 w_2 +b\ge 1 \quad ② \\ \quad\; 3w_1 + 3 w_2 + b\ge 1 \quad ③\\ \quad-2w_1 - w_2 - b\ge 1 \quad ④ \\ \quad-3w_1 - 2w_2 - b\ge 1 \quad ⑤ min21(w12+w22)s.t.w1+2w2+b12w1+3w2+b13w1+3w2+b12w1w2b13w12w2b1

① + ④ , 得    − w 1 + w 2 ≥ 2 ① + ⑤ , 得    − 2 w 2 ≥ 2 ② + ④ , 得    2 w 2 ≥ 2 ② + ⑤ , 得    − w 1 + w 2 ≥ 2 ③ + ④ , 得    w 1 + 2 w 2 ≥ 2 ③ + ⑤ , 得    w 2 ≥ 2 ① + ④, 得\; -w_1 + w_2 \ge 2 \\ ① + ⑤,得\; -2 w_2 \ge 2\\②+④, 得\; 2w_2 \ge 2 \\ ② +⑤, 得\; -w_1 + w_2 \ge 2 \\ ③ + ④, 得 \; w_1 + 2w_2 \ge 2 \\③ + ⑤, 得\; w_2 \ge 2 +,w1+w22+,2w22+,2w22+,w1+w22+,w1+2w22+,w22

由上述可知

− w 1 + w 2 ≥ 2 w 1 ≤ − 1 w 2 ≥ 2 -w_1 + w_2 \ge 2 \\ w_1 \le -1\\w_2 \ge 2 w1+w22w11w22

为使 w 1 2 + w 1 2 w_1^2 + w_1^2 w12+w12最小,令 w 1 = − 1 , w 2 = 2 w_1 = -1, w_2 = 2 w1=1,w2=2

代入 ① − ⑤ ① - ⑤ 得, b = − 2 b=-2 b=2

y 1 ( w ⋅ x 1 + b ) = y 3 ( w ⋅ x 3 + b ) = y 5 ( w ⋅ x 5 + b ) = 1 y_1(w \cdot x_1 + b) = y_3(w \cdot x_3 + b) = y_5(w \cdot x_5 + b) = 1 y1(wx1+b)=y3(wx3+b)=y5(wx5+b)=1

支持向量为 x 1 = ( 1 , 2 ) T , x 3 = ( 3 , 3 ) T , x 5 = ( 3 , 2 ) T x_1=(1,2)^T, x_3=(3,3)^T,x_5=(3,2)^T x1=(1,2)T,x3=(3,3)T,x5=(3,2)T

最大间隔分离超平面为 − x 1 + 2 x 2 − 2 = 0 -x_1 +2x_2-2=0 x1+2x22=0

分类决策函数为 f ( x ) = s i g n ( − x 1 + 2 x 2 − 2 ) f(x)=sign(-x_1+2x_2-2) f(x)=sign(x1+2x22)

7.3

min ⁡ w , b , ξ 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i 2 s . t . y i ( w ⋅ x i + b ) ≥ 1 − ξ i , i = 1 , 2 , … , N ξ i ≥ 0 , i = 1 , 2 , … , N \min_{w,b,\xi} \frac{1}{2} {\parallel w \parallel}^2 + C \sum_{i=1}^N {\xi_i}^2 \\s.t. \quad y_i(w \cdot x_i + b) \ge 1 - \xi_i, i =1,2,\ldots,N \\ \xi_i \ge0, i=1,2,\ldots,N w,b,ξmin21w2+Ci=1Nξi2s.t.yi(wxi+b)1ξi,i=1,2,,Nξi0,i=1,2,,N

对应的拉格朗日函数是 L ( w , b , ξ , α , γ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i 2 + ∑ i = 1 N α i ( 1 − ξ i − y i ( w ⋅ x i + b ) ) − ∑ i = 1 N γ i ξ i L(w,b,\xi,\alpha,\gamma) = \frac{1}{2} {\parallel w \parallel}^2 + C \sum_{i=1}^N {\xi_i}^2 +\sum_{i=1}^N \alpha_i ( 1 - \xi_i -y_i(w \cdot x_i + b) ) -\sum_{i=1}^N \gamma_i \xi_i L(w,b,ξ,α,γ)=21w2+Ci=1Nξi2+i=1Nαi(1ξiyi(wxi+b))i=1Nγiξi

使用KKT条件得到

∂ L ∂ w = w − ∑ i = 1 N α i y i x i = 0 ∂ L ∂ b = − ∑ i = 1 N α i y i = 0 ∂ L ∂ ξ i = 2 C ξ i − α i − γ i = 0 \frac{\partial L}{\partial w} = w - \sum_{i=1}^N \alpha_i y_i x_i = 0 \\ \frac{\partial L}{\partial b} = -\sum_{i=1}^N \alpha_i y_i = 0 \\ \frac{\partial L}{\partial \xi_i} = 2C \xi_i - \alpha_i - \gamma_i=0 wL=wi=1Nαiyixi=0bL=i=1Nαiyi=0ξiL=2Cξiαiγi=0

因此

w = ∑ i = 1 N α i y i x i ∑ i = 1 N α i y i = 0 2 C ξ i = α i + γ i w = \sum_{i=1}^N \alpha_i y_i x_i \\ \sum_{i=1}^N \alpha_i y_i = 0 \\ 2C \xi_i = \alpha_i + \gamma_i w=i=1Nαiyixii=1Nαiyi=02Cξi=αi+γi

代入拉格朗日函数可得

min ⁡ w , b , ξ L ( w , b , ξ , α , γ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 N ξ i 2 + ∑ i = 1 N α i − ∑ i = 1 N ( α i + γ i ) ξ i − ∑ i = 1 N α i y i w ⋅ x i − ∑ i = 1 N α i y i b = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i − 1 2 ∑ i = 1 N ( α i + γ i ) ξ i = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i − 1 2 ∑ i = 1 N ( α i + γ i ) α i + γ i 2 C = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i − 1 4 C ∑ i = 1 N ( α i + γ i ) 2 \min_{w,b,\xi} L(w,b,\xi,\alpha,\gamma) = \frac{1}{2} {\parallel w \parallel}^2 + C \sum_{i=1}^N {\xi_i}^2 +\sum_{i=1}^N \alpha_i -\sum_{i=1}^N (\alpha_i+\gamma_i) \xi_i -\sum_{i=1}^N \alpha_i y_i w \cdot x_i - \sum_{i=1}^N \alpha_i y_i b \\ =- \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j x_i x_j + \sum_{i=1}^N \alpha_i - \frac{1}{2}\sum_{i=1}^N (\alpha_i+\gamma_i) \xi_i \\= - \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j x_i x_j + \sum_{i=1}^N \alpha_i - \frac{1}{2}\sum_{i=1}^N (\alpha_i+\gamma_i)\frac{\alpha_i+\gamma_i}{2C}\\= - \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j x_i x_j + \sum_{i=1}^N \alpha_i - \frac{1}{4C}\sum_{i=1}^N (\alpha_i+\gamma_i)^2 w,b,ξminL(w,b,ξ,α,γ)=21w2+Ci=1Nξi2+i=1Nαii=1N(αi+γi)ξii=1Nαiyiwxii=1Nαiyib=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi21i=1N(αi+γi)ξi=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi21i=1N(αi+γi)2Cαi+γi=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi4C1i=1N(αi+γi)2

对偶问题为
max ⁡ α W ( α ) = − 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j x i x j + ∑ i = 1 N α i − 1 4 C ∑ i = 1 N ( α i + γ i ) 2 s . t . ∑ i = 1 N α i y i = 0 α i ≥ 0 , γ i ≥ 0 , i = 1 , 2 , … , N \max_{\alpha} W(\alpha) =- \frac{1}{2} \sum_{i=1}^N \sum_{j=1}^N \alpha_i \alpha_j y_i y_j x_i x_j + \sum_{i=1}^N \alpha_i - \frac{1}{4C}\sum_{i=1}^N (\alpha_i+\gamma_i)^2\\s.t. \quad \sum_{i=1}^N \alpha_i y_i=0 \\ \alpha_i \ge0, \gamma_i \ge0,i=1,2,\ldots,N αmaxW(α)=21i=1Nj=1Nαiαjyiyjxixj+i=1Nαi4C1i=1N(αi+γi)2s.t.i=1Nαiyi=0αi0,γi0,i=1,2,,N

7.4

p p p进行数学归纳。

p = 1 p=1 p=1时, K ( x , z ) = x ⋅ z K(x,z) = x \cdot z K(x,z)=xz, 则 ϕ ( x ) = x \phi(x) = x ϕ(x)=x

假设 p = k p=k p=k时, K ( x , z ) = ( x ⋅ z ) k = ϕ k ( x ) ⋅ ϕ k ( z ) K(x,z) = (x \cdot z )^k=\phi_k(x) \cdot \phi_k(z) K(x,z)=(xz)k=ϕk(x)ϕk(z)

p = k p=k p=k时, K ( x , z ) = ( x ⋅ z ) k + 1 = ( x ⋅ z ) k ( x ⋅ z ) = ϕ k ( x ) ⋅ ϕ k ( z ) ( x ⋅ z ) K(x,z) = (x \cdot z )^{k+1} = (x \cdot z )^{k} (x \cdot z) = \phi_k(x) \cdot \phi_k(z) (x \cdot z) K(x,z)=(xz)k+1=(xz)k(xz)=ϕk(x)ϕk(z)(xz)

不妨设 ϕ k ( x ) = ( f 1 ( x ) , f 2 ( x ) , … , f m ( x ) ) T , x = ( x 1 , x 2 , … , x n ) T \phi_k(x) =( f_1(x),f_2(x),\ldots,f_m(x))^T, x = (x_1,x_2,\ldots,x_n)^T ϕk(x)=(f1(x),f2(x),,fm(x))T,x=(x1,x2,,xn)T

K ( x , z ) = ( f 1 ( x ) f 1 ( z ) + f 2 ( x ) f 2 ( z ) + … + f m ( x ) f m ( z ) ) ( x 1 z 1 + x 2 z 2 + … + x n z n ) = f 1 ( x ) f 1 ( z ) ( x 1 z 1 + x 2 z 2 + … + x n z n ) + f 2 ( x ) f 2 ( z ) ( x 1 z 1 + x 2 z 2 + … + x n z n ) + … + f m ( x ) f m ( z ) ( x 1 z 1 + x 2 z 2 + … + x n z n ) = ( f 1 ( x ) x 1 ) ( f 1 ( z ) z 1 ) + ( f 1 ( x ) x 2 ) ( f 1 ( z ) z 2 ) + … + ( f 1 ( x ) x n ) ( f 1 ( z ) z n ) + ( f 2 ( x ) x 1 ) ( f 2 ( z ) z 1 ) + … + ( f 2 ( x ) x n ) ( f 2 ( z ) z n ) + ( f m ( x ) x 1 ) ( f m ( z ) z 1 ) + … + ( f m ( x ) x n ) ( f m ( z ) z n ) : = ϕ k + 1 ( x ) ⋅ ϕ k + 1 ( z ) K(x,z) =(f_1(x)f_1(z) + f_2(x)f_2(z) + \ldots + f_m(x)f_m(z))(x_1 z_1+x_2 z_2+ \ldots +x_n z_n) \\ =f_1(x)f_1(z) (x_1 z_1+x_2 z_2+ \ldots +x_n z_n) +f_2(x)f_2(z)(x_1 z_1+x_2 z_2+ \ldots +x_n z_n)+ \ldots + f_m(x)f_m(z)(x_1 z_1+x_2 z_2+ \ldots +x_n z_n) \\ =(f_1(x)x_1)(f_1(z)z_1) +(f_1(x)x_2)(f_1(z)z_2) + \ldots +(f_1(x)x_n)(f_1(z)z_n) + (f_2(x)x_1)(f_2(z)z_1) + \ldots \\ +(f_2(x)x_n)(f_2(z)z_n) + (f_m(x)x_1)(f_m(z)z_1) + \ldots +(f_m(x)x_n)(f_m(z)z_n) \\ :=\phi_{k+1}(x) \cdot \phi_{k+1}(z) K(x,z)=(f1(x)f1(z)+f2(x)f2(z)++fm(x)fm(z))(x1z1+x2z2++xnzn)=f1(x)f1(z)(x1z1+x2z2++xnzn)+f2(x)f2(z)(x1z1+x2z2++xnzn)++fm(x)fm(z)(x1z1+x2z2++xnzn)=(f1(x)x1)(f1(z)z1)+(f1(x)x2)(f1(z)z2)++(f1(x)xn)(f1(z)zn)+(f2(x)x1)(f2(z)z1)++(f2(x)xn)(f2(z)zn)+(fm(x)x1)(fm(z)z1)++(fm(x)xn)(fm(z)zn):=ϕk+1(x)ϕk+1(z)

其中 ϕ k + 1 ( x ) = ( f 1 ( x ) x 1 , f 1 ( x ) x 2 , … , f 1 ( x ) x n , f 2 ( x ) x 1 , … , f 2 ( x ) x n , … , f m ( x ) x 1 , … , f m ( x ) x n ) T \phi_{k+1}(x)=(f_1(x)x_1, f_1(x)x_2, \ldots, f_1(x)x_n, f_2(x)x_1, \ldots, f_2(x)x_n, \ldots,f_m(x)x_1, \ldots, f_m(x)x_n )^T ϕk+1(x)=(f1(x)x1,f1(x)x2,,f1(x)xn,f2(x)x1,,f2(x)xn,,fm(x)x1,,fm(x)xn)T

因此 K ( x , z ) = ( x ⋅ z ) p K(x,z) = (x \cdot z )^p K(x,z)=(xz)p是正定核。

你可能感兴趣的:(统计学习方法)