模式识别作业

模式识别作业

姓名:蔡少斐
学号:2019E8013261007
单位:计算技术研究所

1.说明判别分类器(如logistic回归)与上述特定类别的高斯朴素贝叶斯分类器之间的关系正是logistic回归所采用的形式。

经过第2问更加普遍的推导过程:

对应参数为:

二次项:
v = [ σ 11 2 − σ 10 2 2 σ 11 2 σ 10 2 , . . . , σ D 1 2 − σ D 0 2 2 σ D 1 2 σ D 0 2 ] v=[\frac{\sigma_{11}^2-\sigma_{10}^2}{2\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D1}^2-\sigma_{D0}^2}{2\sigma_{D1}^2\sigma_{D0}^2}] v=[2σ112σ102σ112σ102,...,2σD12σD02σD12σD02]

一次项:
w = [ σ 10 2 μ 11 − σ 11 2 μ 10 σ 11 2 σ 10 2 , . . . , σ D 0 2 μ D 1 − σ D 1 2 μ D 0 σ D 1 2 σ D 0 2 ] w=[\frac{\sigma_{10}^2\mu_{11}-\sigma_{11}^2\mu_{10}}{\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D0}^2\mu_{D1}-\sigma_{D1}^2\mu_{D0}}{\sigma_{D1}^2\sigma_{D0}^2}] w=[σ112σ102σ102μ11σ112μ10,...,σD12σD02σD02μD1σD12μD0]

常数项:
b = l n ( π 1 − π ) + ∑ l n σ i 0 σ i 1 + ∑ σ 11 2 μ 10 2 − σ 10 2 μ 11 2 2 σ 11 2 σ 10 2 b=ln(\frac{\pi}{1-\pi})+\sum ln\frac{\sigma_{i0}}{\sigma{i_1}}+\sum \frac{\sigma_{11}^2\mu_{10}^2-\sigma_{10}^2\mu_{11}^2}{2\sigma_{11}^2\sigma_{10}^2} b=ln(1ππ)+lnσi1σi0+2σ112σ102σ112μ102σ102μ112

其中

f ( x ) = 1 1 + e x p ( ∑ v i x i 2 + w i x i + b ) f(x)=\frac{1}{1+exp(\sum v_ix_i^2+w_ix_i+b)} f(x)=1+exp(vixi2+wixi+b)1

由于 σ i 0 = σ i 1 = σ i \sigma_{i0} = \sigma_{i1}=\sigma_{i} σi0=σi1=σi

发现 v = 0. v=0. v=0.

二次项消失,一次项和常数项如下:

一次项:
w = [ μ 11 − μ 10 σ 1 2 , . . . , μ D 1 − μ D 0 σ D 2 ] w=[\frac{\mu_{11}-\mu_{10}}{\sigma_{1}^2},...,\frac{\mu_{D1}-\mu_{D0}}{\sigma_{D}^2}] w=[σ12μ11μ10,...,σD2μD1μD0]

常数项:
b = l n ( π 1 − π ) + ∑ μ 10 2 − μ 11 2 2 σ 1 2 b=ln(\frac{\pi}{1-\pi})+\sum \frac{\mu_{10}^2-\mu_{11}^2}{2\sigma_{1}^2} b=ln(1ππ)+2σ12μ102μ112

f ( x ) = 1 1 + e x p ( w i x i + b ) f(x)=\frac{1}{1+exp(w_ix_i+b)} f(x)=1+exp(wixi+b)1


2.当换成更普遍的高斯函数,是否仍有Logistic Regression形式?

生成式高斯朴素贝叶斯分类器如下:

P ( y = 1 ∣ X ) = P ( X ∣ y = 1 ) P ( y = 1 ) P ( X ) = P ( X ∣ y = 1 ) P ( y = 1 ) P ( X ∣ y = 1 ) P ( y = 1 ) + P ( X ∣ y = 0 ) P ( y = 0 ) P(y=1|X) = \frac{P(X|y=1)P(y=1)}{P(X)}=\frac{P(X|y=1)P(y=1)}{P(X|y=1)P(y=1)+P(X|y=0)P(y=0)} P(y=1X)=P(X)P(Xy=1)P(y=1)=P(Xy=1)P(y=1)+P(Xy=0)P(y=0)P(Xy=1)P(y=1)

= 1 1 + P ( X ∣ y = 1 ) P ( y = 1 ) P ( X ∣ y = 0 ) P ( y = 0 ) = 1 1 + e x p ( l n ( P ( X ∣ y = 1 ) P ( y = 1 ) P ( X ∣ y = 0 ) P ( y = 0 ) ) ) =\frac{1}{1+\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)}}=\frac{1}{1+exp(ln(\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)}))} =1+P(Xy=0)P(y=0)P(Xy=1)P(y=1)1=1+exp(ln(P(Xy=0)P(y=0)P(Xy=1)P(y=1)))1

其中

l n ( P ( X ∣ y = 1 ) P ( y = 1 ) P ( X ∣ y = 0 ) P ( y = 0 ) ) = l n ( π 1 − π ) + l n P ( X ∣ y = 1 ) P ( X ∣ y = 0 ) ln(\frac{P(X|y=1)P(y=1)}{P(X|y=0)P(y=0)})=ln(\frac{\pi}{1-\pi})+ln\frac{P(X|y=1)}{P(X|y=0)} ln(P(Xy=0)P(y=0)P(Xy=1)P(y=1))=ln(1ππ)+lnP(Xy=0)P(Xy=1)

= l n ( π 1 − π ) + l n ( ( 2 π σ i 1 ) − 1 e x p ( − ( X − μ 1 ) 2 / 2 σ 1 2 ) ( 2 π σ i 0 ) − 1 e x p ( − ( X − μ 0 ) 2 / 2 σ 0 2 ) ) =ln(\frac{\pi}{1-\pi}) + ln(\frac{(\sqrt{2\pi}\sigma_{i1})^{-1}exp(-(X-\mu_1)^2/2\sigma_1^2)}{(\sqrt{2\pi}\sigma_{i0})^{-1}exp(-(X-\mu_0)^2/2\sigma_0^2)}) =ln(1ππ)+ln((2π σi0)1exp((Xμ0)2/2σ02)(2π σi1)1exp((Xμ1)2/2σ12))

= l n ( π 1 − π ) + ∑ l n ( ( 2 π σ i 1 ) − 1 e x p ( − ( X − μ i 1 ) 2 / 2 σ i 1 2 ) ( 2 π σ i 0 ) − 1 e x p ( − ( X − μ i 0 ) 2 / 2 σ i 0 2 ) ) =ln(\frac{\pi}{1-\pi}) + \sum ln(\frac{(\sqrt{2\pi}\sigma_{i1})^{-1}exp(-(X-\mu_{i1})^2/2\sigma_{i1}^2)}{(\sqrt{2\pi}\sigma_{i0})^{-1}exp(-(X-\mu_{i0})^2/2\sigma_{i0}^2)}) =ln(1ππ)+ln((2π σi0)1exp((Xμi0)2/2σi02)(2π σi1)1exp((Xμi1)2/2σi12))

= l n ( π 1 − π ) + ∑ ( l n σ i 0 σ i 1 + x i 2 σ i 1 2 − σ i 0 2 2 σ i 1 2 σ i 0 2 + x i σ i 0 2 μ i 1 − σ i 1 2 μ i 0 σ i 1 2 σ i 0 2 + σ i 1 2 μ i 0 2 − σ i 0 2 μ i 1 2 2 σ i 1 2 σ i 0 2 ) =ln(\frac{\pi}{1-\pi}) +\sum{(ln\frac{\sigma_{i0}}{\sigma{i_1}}+x_i^2\frac{\sigma_{i1}^2-\sigma_{i0}^2}{2\sigma_{i1}^2\sigma_{i0}^2}+x_i\frac{\sigma_{i0}^2\mu_{i1}-\sigma_{i1}^2\mu_{i0}}{\sigma_{i1}^2\sigma_{i0}^2}+\frac{\sigma_{i1}^2\mu_{i0}^2-\sigma_{i0}^2\mu_{i1}^2}{2\sigma_{i1}^2\sigma_{i0}^2})} =ln(1ππ)+(lnσi1σi0+xi22σi12σi02σi12σi02+xiσi12σi02σi02μi1σi12μi0+2σi12σi02σi12μi02σi02μi12)

所以

σ i 1 2 = σ i 0 2 \sigma_{i1}^2=\sigma_{i0}^2 σi12=σi02时, x i 2 x_i^2 xi2项不复存在,其对应形式刚好为logistic regression。

对应参数为:

二次项:
v = [ σ 11 2 − σ 10 2 2 σ 11 2 σ 10 2 , . . . , σ D 1 2 − σ D 0 2 2 σ D 1 2 σ D 0 2 ] v=[\frac{\sigma_{11}^2-\sigma_{10}^2}{2\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D1}^2-\sigma_{D0}^2}{2\sigma_{D1}^2\sigma_{D0}^2}] v=[2σ112σ102σ112σ102,...,2σD12σD02σD12σD02]

一次项:
w = [ σ 10 2 μ 11 − σ 11 2 μ 10 σ 11 2 σ 10 2 , . . . , σ D 0 2 μ D 1 − σ D 1 2 μ D 0 σ D 1 2 σ D 0 2 ] w=[\frac{\sigma_{10}^2\mu_{11}-\sigma_{11}^2\mu_{10}}{\sigma_{11}^2\sigma_{10}^2},...,\frac{\sigma_{D0}^2\mu_{D1}-\sigma_{D1}^2\mu_{D0}}{\sigma_{D1}^2\sigma_{D0}^2}] w=[σ112σ102σ102μ11σ112μ10,...,σD12σD02σD02μD1σD12μD0]

常数项:
b = l n ( π 1 − π ) + ∑ l n σ i 0 σ i 1 + ∑ σ 11 2 μ 10 2 − σ 10 2 μ 11 2 2 σ 11 2 σ 10 2 b=ln(\frac{\pi}{1-\pi})+\sum ln\frac{\sigma_{i0}}{\sigma{i_1}}+\sum \frac{\sigma_{11}^2\mu_{10}^2-\sigma_{10}^2\mu_{11}^2}{2\sigma_{11}^2\sigma_{10}^2} b=ln(1ππ)+lnσi1σi0+2σ112σ102σ112μ102σ102μ112

其中

f ( x ) = 1 1 + e x p ( ∑ v i x i 2 + w i x i + b ) f(x)=\frac{1}{1+exp(\sum v_ix_i^2+w_ix_i+b)} f(x)=1+exp(vixi2+wixi+b)1


3.非朴素高斯贝叶斯分类器是否仍具有Logistic Regress的性质?

P ( y ∣ X ) = P ( x 1 , x 2 ∣ y = 1 ) P ( y = 1 ) P ( X ) = P ( x 1 , x 2 ∣ y = 1 ) P ( y = 1 ) P ( x 1 , x 2 ∣ y = 1 ) P ( y = 1 ) + P ( x 1 , x 2 ∣ y = 0 ) P ( y = 0 ) P(y|X)=\frac{P(x1,x2|y=1)P(y=1)}{P(X)}=\frac{P(x1,x2|y=1)P(y=1)}{P(x1,x2|y=1)P(y=1)+P(x1,x2|y=0)P(y=0)} P(yX)=P(X)P(x1,x2y=1)P(y=1)=P(x1,x2y=1)P(y=1)+P(x1,x2y=0)P(y=0)P(x1,x2y=1)P(y=1)

= 1 1 + P ( x 1 , x 2 ∣ y = 1 ) P ( y = 1 ) P ( x 1 , x 2 ∣ y = 0 ) P ( y = 0 ) = 1 1 + e x p ( e ) =\frac{1}{1+\frac{P(x1,x2|y=1)P(y=1)}{P(x1,x2|y=0)P(y=0)}} = \frac{1}{1+exp(e)} =1+P(x1,x2y=0)P(y=0)P(x1,x2y=1)P(y=1)1=1+exp(e)1

其中

e = l n π 1 − π + l n ( P ( x 1 , x 2 ∣ y = 1 ) P ( x 1 , x 2 ∣ y = 0 ) ) e =ln{\frac{\pi}{1-\pi}} + ln(\frac{P(x1,x2|y=1)}{P(x1,x2|y=0)}) e=ln1ππ+ln(P(x1,x2y=0)P(x1,x2y=1))

由于

P ( x 1 , x 2 ∣ y = k ) = 1 2 π σ 1 σ 2 1 − p 2 e x p ( 略 去 ) P(x1,x2|y=k)=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-p^2}}exp(略去) P(x1,x2y=k)=2πσ1σ21p2 1exp()

将其带入式子 e e e中,得到:

e ∗ 2 ( 1 − p 2 ) σ 1 2 σ 2 2 = l n π 1 − π + x 1 2 ( σ 2 2 − σ 2 2 ) + x 2 2 ( σ 1 2 − σ 1 2 ) + x 1 ( − 2 μ 10 σ 2 2 + 2 p σ 1 σ 2 μ 20 + 2 μ 11 σ 2 2 − 2 p σ 1 σ 2 μ 21 ) + x 2 ( − 2 μ 20 σ 1 2 + 2 p σ 1 σ 2 μ 10 + 2 μ 21 σ 1 2 − 2 p σ 1 σ 2 μ 11 ) + x 1 x 2 ( − 2 p σ 1 σ 2 + 2 p σ 1 σ 2 ) − 2 p σ 1 σ 2 μ 10 μ 20 + 2 p σ 1 σ 2 μ 11 μ 21 e*2(1-p^2)\sigma_1^2\sigma_2^2 = ln{\frac{\pi}{1-\pi}} + x_1^2(\sigma_2^2-\sigma_2^2)+x_2^2(\sigma_1^2-\sigma_1^2)+x_1(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})+x_2(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})+x_1x_2(-2p\sigma_1\sigma_2+2p\sigma_1\sigma_2)-2p\sigma_1\sigma_2\mu_{10}\mu_{20}+2p\sigma_1\sigma_2\mu_{11}\mu_{21} e2(1p2)σ12σ22=ln1ππ+x12(σ22σ22)+x22(σ12σ12)+x1(2μ10σ22+2pσ1σ2μ20+2μ11σ222pσ1σ2μ21)+x2(2μ20σ12+2pσ1σ2μ10+2μ21σ122pσ1σ2μ11)+x1x2(2pσ1σ2+2pσ1σ2)2pσ1σ2μ10μ20+2pσ1σ2μ11μ21

= x 1 ( − 2 μ 10 σ 2 2 + 2 p σ 1 σ 2 μ 20 + 2 μ 11 σ 2 2 − 2 p σ 1 σ 2 μ 21 ) + x 2 ( − 2 μ 20 σ 1 2 + 2 p σ 1 σ 2 μ 10 + 2 μ 21 σ 1 2 − 2 p σ 1 σ 2 μ 11 ) + 2 p σ 1 σ 2 ( μ 11 μ 21 − μ 10 μ 20 ) =x_1(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})+x_2(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})+2p\sigma_1\sigma_2(\mu_{11}\mu_{21}-\mu_{10}\mu_{20}) =x1(2μ10σ22+2pσ1σ2μ20+2μ11σ222pσ1σ2μ21)+x2(2μ20σ12+2pσ1σ2μ10+2μ21σ122pσ1σ2μ11)+2pσ1σ2(μ11μ21μ10μ20)

b = 2 p σ 1 σ 2 ( μ 11 μ 21 − μ 10 μ 20 ) / ( 2 ( 1 − p 2 ) σ 1 2 σ 2 2 ) b=2p\sigma_1\sigma_2(\mu_{11}\mu_{21}-\mu_{10}\mu_{20})/(2(1-p^2)\sigma_1^2\sigma_2^2) b=2pσ1σ2(μ11μ21μ10μ20)/(2(1p2)σ12σ22)

w 1 = ( − 2 μ 10 σ 2 2 + 2 p σ 1 σ 2 μ 20 + 2 μ 11 σ 2 2 − 2 p σ 1 σ 2 μ 21 ) / ( 2 ( 1 − p 2 ) σ 1 2 σ 2 2 ) w_1 =(-2\mu_{10}\sigma_2^2+2p\sigma_1\sigma_2\mu_{20}+2\mu_{11}\sigma_2^2-2p\sigma_1\sigma_2\mu_{21})/(2(1-p^2)\sigma_1^2\sigma_2^2) w1=(2μ10σ22+2pσ1σ2μ20+2μ11σ222pσ1σ2μ21)/(2(1p2)σ12σ22)

w 2 = ( − 2 μ 20 σ 1 2 + 2 p σ 1 σ 2 μ 10 + 2 μ 21 σ 1 2 − 2 p σ 1 σ 2 μ 11 ) / ( 2 ( 1 − p 2 ) σ 1 2 σ 2 2 ) w_2 =(-2\mu_{20}\sigma_1^2+2p\sigma_1\sigma_2\mu_{10}+2\mu_{21}\sigma_1^2-2p\sigma_1\sigma_2\mu_{11})/(2(1-p^2)\sigma_1^2\sigma_2^2) w2=(2μ20σ12+2pσ1σ2μ10+2μ21σ122pσ1σ2μ11)/(2(1p2)σ12σ22)

则原式可以写成:

P ( y ∣ X ) = 1 1 + e x p ( b + w 1 x 1 + w 2 x 2 ) P(y|X)=\frac{1}{1+exp(b+w_1x_1+w_2x_2)} P(yX)=1+exp(b+w1x1+w2x2)1

因此,仍然满足logistic regression形式。


你可能感兴趣的:(深度学习自学笔记,深度学习)