对二类问题,在样本 x x x上错误的概率为:
p ( e ∣ x ) = { P ( w 2 ∣ x ) x ∈ w 1 P ( w 1 ∣ x ) x ∈ w 2 p(e \mid x) = \left\{\begin{aligned}P(w_2 \mid x) \quad x \in w_1 \\P(w_1 \mid x) \quad x \in w_2\end{aligned}\right. p(e∣x)={P(w2∣x)x∈w1P(w1∣x)x∈w2
平均错误率:
P ( e ) = ∫ P ( e ∣ x ) p ( x ) d x P(e)= \int P(e \mid x)p(x)dx P(e)=∫P(e∣x)p(x)dx
贝叶斯公式:
P ( w i ∣ x ) = p ( x , w i ) p ( x ) = p ( x ∣ w i ) ⋅ P ( w i ) ∑ j = 1 2 p ( x ∣ w j ) P ( w j ) , i = 1 , 2 P\left(w_{i} \mid x\right)=\frac{p\left(x, w_{i}\right)}{p(x)}=\frac{p\left(x \mid w_{i}\right) \cdot P\left(w_{i}\right)}{\sum\limits_{j=1}^2p(x \mid w_j)P(w_j)}, i=1,2 P(wi∣x)=p(x)p(x,wi)=j=1∑2p(x∣wj)P(wj)p(x∣wi)⋅P(wi),i=1,2
二类最小错误率贝叶斯决策:
i f P ( w 1 ∣ x ) ≷ P ( w 2 ∣ x ) , t h e n x ∈ { w 1 w 2 if \ P(w_1 \mid x) \gtrless P(w_2 \mid x),\ then \ x \in \left\{\begin{aligned}w_1\\w_2\end{aligned}\right. if P(w1∣x)≷P(w2∣x), then x∈{w1w2
i f p ( x ∣ w 1 ) P ( w 1 ) ≷ p ( x ∣ w 2 ) P ( w 2 ) , t h e n x ∈ { w 1 w 2 if \ p(x \mid w_1)P(w_1)\gtrless p(x \mid w_2)P(w_2),\ then \ x \in \left\{\begin{aligned}w_1\\w_2\end{aligned}\right. if p(x∣w1)P(w1)≷p(x∣w2)P(w2), then x∈{w1w2
i f l ( x ) = p ( x ∣ w 1 ) p ( x ∣ w 2 ) ≷ λ = P ( w 2 ) P ( w 1 ) , t h e n x ∈ { w 1 w 2 if \ l(x)=\frac{p\left(x \mid w_{1}\right)}{p\left(x \mid w_{2}\right)} \gtrless \lambda=\frac{P\left(w_{2}\right)}{P\left(w_{1}\right)}, \ then \ x \in \left\{\begin{aligned}w_1\\w_2\end{aligned}\right. if l(x)=p(x∣w2)p(x∣w1)≷λ=P(w1)P(w2), then x∈{w1w2
i f h ( x ) = ln p ( x ∣ w 2 ) p ( x ∣ w 1 ) ≶ ln P ( w 1 ) P ( w 2 ) , t h e n x ∈ { w 1 w 2 if \ h(x)=\ln \frac{p\left(x \mid w_{2}\right)}{p\left(x \mid w_{1}\right)}\lessgtr \ln \frac{P(w_1)}{P(w_2)} , \ then \ x \in \left\{\begin{aligned}w_1\\w_2\end{aligned}\right. if h(x)=lnp(x∣w1)p(x∣w2)≶lnP(w2)P(w1), then x∈{w1w2
其中 h ( x ) = − ln ( l ( x ) ) h(x)=-\ln(l(x)) h(x)=−ln(l(x))
错误率的进一步定义:
P ( e ) = P ( w 2 ) ∫ R 1 p ( x ∣ w 2 ) d x + P ( w 1 ) ∫ R 2 p ( x ∣ w 1 ) d x = P ( w 2 ) P 2 ( e ) + P ( w 2 ) P 1 ( e ) P(e)=P(w_2)\int_{R_1}p(x \mid w_2)dx+P(w_1)\int_{R_2}p(x \mid w_1)dx=P(w_2)P_2(e)+P(w_2)P_1(e) P(e)=P(w2)∫R1p(x∣w2)dx+P(w1)∫R2p(x∣w1)dx=P(w2)P2(e)+P(w2)P1(e)
多类情况下的最小贝叶斯决策规则:
i f p ( x ∣ w i ) ⋅ P ( w i ) = max j = 1 , 2 , ⋯ , c p ( x ∣ w j ) ⋅ P ( w j ) , t h e n x ∈ w i if \ p\left(x \mid w_{i}\right) \cdot P\left(w_{i}\right)=\max _{j=1,2, \cdots, c} p\left(x \mid w_{j}\right) \cdot P\left(w_{j}\right), \ then \ x \in w_i if p(x∣wi)⋅P(wi)=j=1,2,⋯,cmaxp(x∣wj)⋅P(wj), then x∈wi
多类别决策错误率计算:
KaTeX parse error: Undefined control sequence: \substack at position 28: …i=1}^{c} \sum_{\̲s̲u̲b̲s̲t̲a̲c̲k̲{j=1 \\ j \neq …
P ( e ) = 1 − P ( c ) = 1 − ∑ j = 1 c P ( w j ) ⋅ ∫ R j p ( x ∣ w j ) ⋅ d x P(e)=1-P(c)=1-\sum_{j=1}^{c} P\left(w_{j}\right) \cdot \int_{\mathscr{R_{j}}} p\left(x \mid w_{j}\right) \cdot dx P(e)=1−P(c)=1−j=1∑cP(wj)⋅∫Rjp(x∣wj)⋅dx
多类别决策平均错误率推导
d维,c类,k种决策
△△△△最小风险贝叶斯决策△△△△
决策规则对特征空间所有可能样本x采取决策所造成的期望损失为:
R ( α ) = ∫ R ( α ∣ x ) p ( x ) d x R(\alpha)=\int R(\alpha\mid x) p(x) d x R(α)=∫R(α∣x)p(x)dx
贝叶斯公式计算后验概率:
P ( w j ∣ x ) = p ( x ∣ w j ) P ( w j ) ∑ i = 1 c p ( x ∣ w i ) P ( w i ) , j = 1 , 2 , ⋯ , c P(w_j\mid x)=\frac{p(x \mid w_j)P(w_j)}{\sum\limits_{i=1}^{c}p(x\mid w_i)P(w_i)}, \ j=1,2,\cdots, c P(wj∣x)=i=1∑cp(x∣wi)P(wi)p(x∣wj)P(wj), j=1,2,⋯,c
对于某个样本x,对它采取决策alpha_i,i=1,2,……,k的期望损失(条件风险)为:
R ( α i ∣ x ) = E ( λ ( α i , w j ) ∣ x ) = ∑ j = 1 c λ ( α i , w j ) ⋅ P ( w j ∣ x ) , i = 1 , 2 , ⋯ k R\left(\alpha_{i} \mid x \right)=E\left(\lambda\left(\alpha_{i}, w_{j}\right) \mid x\right)=\sum_{j=1}^{c} \lambda\left(\alpha_{i}, w_{j}\right) \cdot P\left(w_{j} \mid x\right), i=1,2, \cdots k R(αi∣x)=E(λ(αi,wj)∣x)=j=1∑cλ(αi,wj)⋅P(wj∣x),i=1,2,⋯k
多类的最小风险贝叶斯决策:
i f R ( α i ∣ x ) = min j = 1 , 2 , ⋯ k R ( α j ∣ x ) , t h e n α = α i if \ R\left(\alpha_{i} \mid x\right)=\min_{j=1,2, \cdots k} R\left(\alpha_{j} \mid x\right), \ then \ \alpha=\alpha_{i} if R(αi∣x)=j=1,2,⋯kminR(αj∣x), then α=αi
二类二决策情况下,最小风险贝叶斯决策:
i f λ 11 P ( ω 1 ∣ x ) + λ 12 P ( ω 2 ∣ x ) ≶ λ 21 P ( ω 1 ∣ x ) + λ 22 P ( ω 2 ∣ x ) , then x ∈ { ω 1 ω 2 if \ \lambda_{11} P\left(\omega_{1} \mid x\right)+\lambda_{12} P\left(\omega_{2} \mid x\right) \lessgtr \lambda_{21} P\left(\omega_{1} \mid x\right)+\lambda_{22} P\left(\omega_{2} \mid x\right) \text {, then \ } x \in\left\{\begin{array}{l}\omega_{1} \\\omega_{2}\end{array}\right. if λ11P(ω1∣x)+λ12P(ω2∣x)≶λ21P(ω1∣x)+λ22P(ω2∣x), then x∈{ω1ω2
i f ( λ 11 − λ 21 ) P ( ω 1 ∣ x ) ≶ ( λ 22 − λ 12 ) P ( ω 2 ∣ x ) , t h e n x ∈ { ω 1 ω 2 i f P ( ω 1 ∣ x ) P ( ω 2 ∣ x ) = p ( x ∣ ω 1 ) P ( ω 1 ) p ( x ∣ ω 2 ) P ( ω 2 ) ≷ λ 12 − λ 22 λ 21 − λ 11 , t h e n x ∈ { ω 1 ω 2 i f l ( x ) = p ( x ∣ ω 1 ) p ( x ∣ ω 2 ) ≷ P ( ω 2 ) P ( ω 1 ) ⋅ λ 12 − λ 22 λ 21 − λ 11 , t h e n x ∈ { ω 1 ω 2 \begin{aligned}&if \ \left(\lambda_{11}-\lambda_{21}\right) P\left(\omega_{1} \mid x\right) \lessgtr\left(\lambda_{22}-\lambda_{12}\right) P\left(\omega_{2} \mid x\right) ,\ then \ x \in\left\{\begin{array}{l}\omega_{1} \\\omega_{2}\end{array}\right. \\&if \ \frac{P\left(\omega_{1} \mid x\right)}{P\left(\omega_{2} \mid x\right)}=\frac{p\left(x \mid \omega_{1}\right) P\left(\omega_{1}\right)}{p\left(x \mid \omega_{2}\right) P\left(\omega_{2}\right)} \gtrless \frac{\lambda_{12}-\lambda_{22}}{\lambda_{21}-\lambda_{11}} ,\ then \ x \in\left\{\begin{array}{l}\omega_{1} \\\omega_{2}\end{array}\right. \\&if \ l(x)=\frac{p\left(x \mid \omega_{1}\right)}{p\left(x \mid \omega_{2}\right)} \gtrless \frac{P\left(\omega_{2}\right)}{P\left(\omega_{1}\right)} \cdot \frac{\lambda_{12}-\lambda_{22}}{\lambda_{21}-\lambda_{11}}, \ then \ x \in\left\{\begin{array}{l}\omega_{1} \\\omega_{2}\end{array}\right.\end{aligned} if (λ11−λ21)P(ω1∣x)≶(λ22−λ12)P(ω2∣x), then x∈{ω1ω2if P(ω2∣x)P(ω1∣x)=p(x∣ω2)P(ω2)p(x∣ω1)P(ω1)≷λ21−λ11λ12−λ22, then x∈{ω1ω2if l(x)=p(x∣ω2)p(x∣ω1)≷P(ω1)P(ω2)⋅λ21−λ11λ12−λ22, then x∈{ω1ω2
状态与决策的可能关系:
状态 | ||
---|---|---|
决策 | 阳性 | 阴性 |
阳性 | 真阳性(TP) | 假阳性(FP) |
阴性 | 假阴性(FN) | 真阴性(TN) |
灵敏度(sensitivity): S n = T P T P + F N Sn=\frac{TP}{TP+FN} Sn=TP+FNTP
特异度(specificity): S p = T N T N + F P Sp=\frac{TN}{TN+FP} Sp=TN+FPTN
正确率(accuracy): A C C = T P + T N T P + T N + F P + F N ACC=\frac{TP+TN}{TP+TN+FP+FN} ACC=TP+TN+FP+FNTP+TN
召回率(recall): R e c = T P T P + F N Rec=\frac{TP}{TP+FN} Rec=TP+FNTP
精确率(precision): P r e = T P T P + F P Pre=\frac{TP}{TP+FP} Pre=TP+FPTP
F度量(F-measure): F = 2 R e c P r e R e c + P r e F=\frac{2RecPre}{Rec+Pre} F=Rec+Pre2RecPre
一类错误率(假阳性率): α = 1 − S p = F P T N + F P \alpha=1-Sp=\frac{FP}{TN+FP} α=1−Sp=TN+FPFP
二类错误率(假阴性率): β = 1 − S n = F N T P + F N \beta=1-Sn=\frac{FN}{TP+FN} β=1−Sn=TP+FNFN
Neyman-Pearson决策规则:
i f l ( x ) = p ( x ∣ w 1 ) p ( x ∣ w 2 ) ≷ λ , t h e n x ∈ { w 1 w 2 if \ l(x)=\frac{p\left(x \mid w_{1}\right)}{p(x\mid w_2)} \gtrless \lambda, \ then \ x \in \left\{\begin{aligned}w_1\\w_2\end{aligned}\right. if l(x)=p(x∣w2)p(x∣w1)≷λ, then x∈{w1w2
对于高斯分布或者部分简单分布 λ \lambda λ可以采用解析法求解,即 λ \lambda λ是使决策区域满足下式的一个阈值(固定 w 2 w_2 w2分为 w 1 w_1 w1的错误率):
∫ R 1 p ( x ∣ w 2 ) d x = ϵ 0 \int_{R_1}p(x \mid w_2)dx=\epsilon_0 ∫R1p(x∣w2)dx=ϵ0
多数情况下 λ \lambda λ用数值方法求解:
P 2 ( e ) = 1 − ∫ 0 λ P ( l ∣ ω 2 ) d l = ε 0 , P_{2}(e)=1-\int_{0}^{\lambda} P\left(l \mid \omega_{2}\right) d l=\varepsilon_{0} \text {, } P2(e)=1−∫0λP(l∣ω2)dl=ε0,
其中 ε 0 \varepsilon_{0} ε0固定,P2(e)单调,试探即可
多元正态分布公式:
p ( x ) = 1 ( 2 π ) d 2 ∣ Σ ∣ 1 2 e x p { − 1 2 ( x − μ ) ⊤ Σ − 1 ( x − μ ) } p(x)=\frac{1}{(2\pi)^{\frac{d}{2}}|\Sigma|^{\frac{1}{2}}} exp\left\{-\frac{1}{2}(x-\mu)^{\top} \Sigma^{-1}(x-\mu)\right\} p(x)=(2π)2d∣Σ∣211exp{−21(x−μ)⊤Σ−1(x−μ)}
对于多元正态概率 p ( x ∣ w i ) ∼ N ( μ i , Σ i ) , i = 1 , 2 , ⋯ , c p(x \mid w_i)\sim N(\mu_i, \Sigma_i), \ i=1,2,\cdots, c p(x∣wi)∼N(μi,Σi), i=1,2,⋯,c,可以得出其条件概率:
p ( x ∣ w i ) = 1 ( 2 π ) d 2 ∣ Σ i ∣ 1 2 e x p { − 1 2 ( x − μ i ) ⊤ Σ i − 1 ( x − μ i ) } p(x \mid w_i)=\frac{1}{(2\pi)^{\frac{d}{2}}|\Sigma_i|^{\frac{1}{2}}} exp\left\{-\frac{1}{2}(x-\mu_i)^{\top} \Sigma^{-1}_i(x-\mu_i)\right\} p(x∣wi)=(2π)2d∣Σi∣211exp{−21(x−μi)⊤Σi−1(x−μi)}
其中对于离散样本有:
μ i = E ( x ) = 1 N i ∑ x j ∈ H i x j , i = 1 , 2 , ⋯ , c \mu_i = E(x)=\frac{1}{N_i}\sum\limits_{x_j\in H_i}x_j, \quad i=1,2, \cdots, c μi=E(x)=Ni1xj∈Hi∑xj,i=1,2,⋯,c
Σ i = E [ ( x − μ i ) ( x − μ i ) T ] = 1 N i ∑ x j ∈ H i ( x j − μ i ) ( x j − μ i ) T , i = 1 , 2 , ⋯ , c \Sigma_i = E[(x-\mu_i)(x-\mu_i)^T]=\frac{1}{N_i}\sum\limits_{x_j\in H_i}(x_j-\mu_i)(x_j-\mu_i)^T, \quad i=1,2, \cdots, c Σi=E[(x−μi)(x−μi)T]=Ni1xj∈Hi∑(xj−μi)(xj−μi)T,i=1,2,⋯,c
多维正态分布的等密度点轨迹为一超椭球面,等密度点满足下式:
( x − μ ) T Σ − 1 ( x − μ ) = 常 数 (x-\mu)^T\Sigma^{-1}(x-\mu)=常数 (x−μ)TΣ−1(x−μ)=常数
进一步定义马氏距离的平方:
γ 2 = ( x − μ ) T Σ − 1 ( x − μ ) \gamma^2=(x-\mu)^T\Sigma^{-1}(x-\mu) γ2=(x−μ)TΣ−1(x−μ)
对应马氏距离 γ 2 \gamma^2 γ2的超椭球体积为:
V = V d ∣ Σ ∣ 1 / 2 γ d V=V_d|\Sigma|^{1/2}\gamma^d V=Vd∣Σ∣1/2γd
其中 V d V_d Vd为 d d d维超球体体积:
V d = { π d / 2 ( d 2 ) ! , d 为 偶 数 2 d π ( d − 1 ) / 2 ( d − 1 2 ) ! d ! , d 为 奇 数 V_d=\left\{ \begin{aligned} &\frac{\pi^{d/2}}{(\frac{d}{2})!},d为偶数 \\ &\frac{2^d\pi^{(d-1)/2}(\frac{d-1}{2})!}{d!},d为奇数 \end{aligned} \right. Vd=⎩⎪⎪⎪⎨⎪⎪⎪⎧(2d)!πd/2,d为偶数d!2dπ(d−1)/2(2d−1)!,d为奇数
多元正态随机向量的线性变换仍为多元正态分布的随机向量:
即若 x ∼ N ( μ , Σ ) x \sim N(\mu,\Sigma) x∼N(μ,Σ), y = A x y=Ax y=Ax,有:
p ( y ) ∼ N ( A μ , A Σ A T ) p(y) \sim N(A\mu,A\Sigma A^T) p(y)∼N(Aμ,AΣAT)
其判别函数为:
g i ( x ) = ln p ( x ∣ w i ) P ( w i ) = − 1 2 ( x − μ i ) T Σ i − 1 ( x − μ i ) − d 2 ln 2 π − 1 2 ln ∣ Σ i ∣ + ln P ( ω i ) g_{i}(x)=\ln p(x \mid w_i)P(w_i)=-\frac{1}{2}\left(x-\mu_{i}\right)^{\mathrm{T}} \Sigma_{i}^{-1}\left(x-\mu_{i}\right)-\frac{d}{2} \ln 2 \pi-\frac{1}{2} \ln \left|\Sigma_{i}\right|+\ln P\left(\omega_{i}\right) gi(x)=lnp(x∣wi)P(wi)=−21(x−μi)TΣi−1(x−μi)−2dln2π−21ln∣Σi∣+lnP(ωi)
决策规则为:
i f g i ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_i(x)=\max_{i=1,2,\cdots, c}g_i(x), \ then \ x \in w_k if gi(x)=i=1,2,⋯,cmaxgi(x), then x∈wk
其决策面为:
g i ( x ) = g j ( x ) g_i(x)=g_j(x) gi(x)=gj(x)
3×2种特殊情况下,多元正态分布的判别函数、决策规则(均是哪个类别的判别函数大,就分为哪一类),决策面(均是让两个判别函数相等):
i): Σ i = σ 2 I \Sigma_i=\sigma^2I Σi=σ2I
a): P ( w i ) = P ( w j ) P(w_i)=P(w_j) P(wi)=P(wj)
判函:
g i ( x ) = − ∥ x − μ i ∥ 2 g_{i}(x)=-\left\|x-\mu_{i}\right\|^{2} gi(x)=−∥x−μi∥2
决规:
i f g i ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_i(x)=\max_{i=1,2,\cdots, c}g_i(x), \ then \ x \in w_k if gi(x)=i=1,2,⋯,cmaxgi(x), then x∈wk
( i . e . min i = 1 , 2 , ⋯ , c ∣ ∣ x − μ i ∣ ∣ 2 , m i n u m d i s t a n c e c l a s s i f e r ) (i.e. \ \min_{i=1,2,\cdots, c}||x-\mu_i||^2, \ minum \ distance \ classifer) (i.e. i=1,2,⋯,cmin∣∣x−μi∣∣2, minum distance classifer)
决策面:
( μ i − μ j ) ⊤ ( x − 1 2 ( μ i + μ j ) ) = 0 \left(\mu_{i}-\mu_{j}\right)^{\top}\left(x-\frac{1}{2}\left(\mu_{i}+\mu_{j}\right)\right)=0 (μi−μj)⊤(x−21(μi+μj))=0
( i . e . w T ( x − x 0 ) = 0 ) (i.e. \ w^T(x-x_0)=0) (i.e. wT(x−x0)=0)
b): P ( w i ) ≠ P ( w j ) P(w_i)\neq P(w_j) P(wi)=P(wj)
判函:
g i ( x ) = ( μ i σ 2 ) ⊤ x − 1 2 σ 2 μ i ⊤ μ i + ln P ( w i ) = w ⊤ x + w i 0 \begin{aligned} g_{i}(x) &=\left(\frac{\mu_{i}}{\sigma^2}\right)^{\top} x-\frac{1}{2 \sigma^2} \mu_{i}^{\top} \mu_{i}+\ln P\left(w_{i}\right) \\ &=w^{\top} x+w_{i0} \end{aligned} gi(x)=(σ2μi)⊤x−2σ21μi⊤μi+lnP(wi)=w⊤x+wi0
决规:
i f g k ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_k(x)=\max_{i=1,2,\cdots, c}g_i(x), \ then \ x \in w_k if gk(x)=i=1,2,⋯,cmaxgi(x), then x∈wk
决面:
( μ i − μ j ) T ( x − ( 1 2 ( μ i + μ j ) − σ 2 ∥ μ i − μ j ∥ ln P ( ω i ) P ( w j ) ( μ i − μ j ) ) ) = 0 \left(\mu_{i}-\mu_{j}\right)^{T}\left(x-\left(\frac{1}{2}\left(\mu_{i}+\mu_{j}\right)-\frac{\sigma^{2}}{\left\|\mu_{i}-\mu_{j}\right\|} \ln \frac{P\left(\omega_{i}\right)}{P\left(w_{j}\right)}\left(\mu_{i}-\mu_{j}\right)\right)\right)=0 (μi−μj)T(x−(21(μi+μj)−∥μi−μj∥σ2lnP(wj)P(ωi)(μi−μj)))=0
( i . e . W T ( x − x 0 ) = 0 ) (i.e. \ W^T(x-x_0)=0) (i.e. WT(x−x0)=0)
ii): Σ i = Σ \Sigma_i=\Sigma Σi=Σ
a):P(w_i)=P(w_j)
判函:
g i ( x ) = γ 2 = ( x − μ i ) ⊤ Σ − 1 ( x − μ i ) = ( Σ − 1 μ i ) ⊤ x − 1 2 μ i ⊤ Σ − 1 μ j g_{i}(x)=\gamma^{2}=\left(x-\mu_{i}\right)^{\top} \Sigma^{-1}\left(x-\mu_{i}\right)=\left(\Sigma^{-1} \mu_{i}\right)^{\top} x-\frac{1}{2} \mu_{i}^{\top} \Sigma^{-1} \mu_{j} gi(x)=γ2=(x−μi)⊤Σ−1(x−μi)=(Σ−1μi)⊤x−21μi⊤Σ−1μj
决规:
i f g k ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_{k}(x)=\max _{i=1,2,\cdots,c} g_{i}(x), then \ x \in w_k if gk(x)=i=1,2,⋯,cmaxgi(x),then x∈wk
决面:
( Σ − 1 ( μ i − μ 0 ) ) T ( x − 1 2 ( μ i + μ j ) ) = 0 \left(\Sigma^{-1}\left(\mu_{i}-\mu_{0}\right)\right)^{T}\left(x-\frac{1}{2}\left(\mu_{i}+\mu_{j}\right)\right)=0 (Σ−1(μi−μ0))T(x−21(μi+μj))=0
( i . e . w T ( x − x 0 ) = 0 ) (i.e. \ w^T(x-x_0)=0) (i.e. wT(x−x0)=0)
b):P(w_i)≠P(w_j)
判函:
g i ( x ) = ( Σ − 1 μ i ) ⊤ x − 1 2 μ i ⊤ Σ − 1 μ j + ln P ( w i ) = w ⊤ x + w i 0 \begin{aligned}g_{i}(x) &=\left(\Sigma^{-1} \mu_{i}\right)^{\top} x-\frac{1}{2} \mu_{i}^{\top} \Sigma^{-1} \mu_{j}+\ln P\left(w_{i}\right)\\&=w^{\top} x+w_{i 0}\end{aligned} gi(x)=(Σ−1μi)⊤x−21μi⊤Σ−1μj+lnP(wi)=w⊤x+wi0
决规:
i f g k ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_{k}(x)=\max _{i=1,2,\cdots,c} g_{i}(x), then \ x \in w_k if gk(x)=i=1,2,⋯,cmaxgi(x),then x∈wk
决面:
[ Σ − 1 ( μ i − μ j ) ] ⊤ ( x − ( 1 2 ( μ i + μ j ) − ln P ( w i ) P ( w j ) ( μ i − μ j ) ⊤ Σ − 1 ( μ i − μ j ) ( μ i − μ j ) ) ) = 0 \left[\Sigma^{-1}\left(\mu_{i}-\mu_{j}\right)\right]^{\top}\left(x-\left(\frac{1}{2}\left(\mu_{i}+\mu_{j}\right)-\frac{\ln \frac{P(w_i)}{P(w_j)}}{\left(\mu_{i}-\mu_{j}\right)^{\top} \Sigma^{-1}\left(\mu_{i}-\mu_{j}\right)}\left(\mu_{i}-\mu_{j}\right)\right)\right)\\=0 [Σ−1(μi−μj)]⊤⎝⎛x−⎝⎛21(μi+μj)−(μi−μj)⊤Σ−1(μi−μj)lnP(wj)P(wi)(μi−μj)⎠⎞⎠⎞=0
( i . e . w T ( x − x 0 ) = 0 ) (i.e. \ w^T(x-x_0)=0) (i.e. wT(x−x0)=0)
iii)各类的协方差阵不相等
判函:
g i ( x ) = x ⊤ ( − 1 2 Σ i − 1 ) x + ( Σ i − 1 μ i ) ⊤ x − 1 2 μ i ⊤ Σ i − 1 μ i − 1 2 ln ∣ Σ i ∣ + ln P ( w i ) \begin{aligned}g_{i}(x)=x^{\top}\left(-\frac{1}{2} \Sigma_{i}^{-1}\right) x+\left(\Sigma_{i}^{-1} \mu_{i}\right)^{\top} x &-\frac{1}{2} \mu_{i}^{\top} \Sigma_{i}^{-1} \mu_{i}-\frac{1}{2} \ln \left|\Sigma_{i}\right|+\ln P\left(w_{i}\right)\end{aligned} gi(x)=x⊤(−21Σi−1)x+(Σi−1μi)⊤x−21μi⊤Σi−1μi−21ln∣Σi∣+lnP(wi)
( i . e . q i ( x ) = x T W i x + w i i x + w i 0 ) (i.e. \ q_{i}(x)=x^{T} W_{i} x+w_{i}^{i} x+w_{i0}) (i.e. qi(x)=xTWix+wiix+wi0)
决规:
i f g k ( x ) = max i = 1 , 2 , ⋯ , c g i ( x ) , t h e n x ∈ w k if \ g_{k}(x)=\max _{i=1,2,\cdots,c} g_{i}(x), then \ x \in w_k if gk(x)=i=1,2,⋯,cmaxgi(x),then x∈wk
决规:
x ⊤ ( W i − W j ) x + ( w i − w j ) T x + w i 0 − w j 0 = 0 x^{\top}\left(W_{i}-W_{j}\right) x+\left(w_{i}-w_{j}\right)^{T} x+w_{i 0}-w_{j 0}=0 x⊤(Wi−Wj)x+(wi−wj)Tx+wi0−wj0=0
正态分布且各类协方差矩阵相等情况下的错误率推导
正态分布且各类协方差矩阵相等情况下的错误率计算:
之前构造过最小错误率贝叶斯决策规则的负对数似然比形式:
h ( x ) = − ln l ( x ) = ln p ( x ∣ w 2 ) p ( x ∣ w 1 ) h(x)=-\ln l(x)=\ln \frac{p\left(x \mid w_{2}\right)}{ p\left(x \mid w_{1}\right)} h(x)=−lnl(x)=lnp(x∣w1)p(x∣w2)
经推导,h(x)服从一维正态分布,并可求概密 p ( x ∣ w 1 ) p(x|w_1) p(x∣w1), p ( x ∣ w 2 ) p(x|w_2) p(x∣w2)
令 η = 1 2 [ ( μ 1 − μ 2 ) ⊤ Σ − 1 ( μ 1 − μ 2 ) ] \eta=\frac{1}{2}\left[\left(\mu_{1}-\mu_{2}\right)^{\top} \Sigma^{-1}\left(\mu_{1}-\mu_{2}\right)\right] η=21[(μ1−μ2)⊤Σ−1(μ1−μ2)]
则对于 p ( h ∣ w 1 ) , η 1 = − η , σ 1 2 = 2 η p\left(h \mid w_{1}\right), \quad \eta_{1}=-\eta, \quad \quad \sigma_{1}^{2}=2\eta p(h∣w1),η1=−η,σ12=2η
对于 p ( h ∣ w 2 ) , η 1 = η , σ 1 2 = 2 η p\left(h \mid w_{2}\right), \quad \eta_{1}=\eta, \quad \quad \sigma_{1}^{2}=2\eta p(h∣w2),η1=η,σ12=2η
则:
P 1 ( e ) = ∫ t + ∞ p ( h ∣ w 1 ) d h = ∫ t + η σ + ∞ 1 2 π e − ζ 2 2 d ζ P_{1}(e)=\int_{t}^{+\infty} \quad p\left(h \mid w_{1}\right) d h=\int_{\frac{t+\eta}{\sigma}}^{+\infty} \frac{1}{\sqrt{2 \pi}} e^{-\frac{\zeta^2}{2}} d\zeta P1(e)=∫t+∞p(h∣w1)dh=∫σt+η+∞2π1e−2ζ2dζ
P 2 ( e ) = ∫ − ∞ t p ( h ∣ w 2 ) d h = ∫ − ∞ t − μ σ 1 2 π e − ζ 2 2 d ζ P_{2}(e)=\int_{-\infty}^{t} \quad p\left(h \mid w_{2}\right) d h=\int_{-\infty}^{\frac{t-\mu}{\sigma}} \frac{1}{\sqrt{2 \pi}} e^{-\frac{\zeta^{2}}{2}} d\zeta P2(e)=∫−∞tp(h∣w2)dh=∫−∞σt−μ2π1e−2ζ2dζ
其中, t = ln P ( w 1 ) p ( w 2 ) , σ = 2 η t=\ln \frac{P\left(w_{1}\right)}{p\left(w_{2}\right)}, \quad \sigma=\sqrt{2\eta} t=lnp(w2)P(w1),σ=2η
则最终:
P ( e ) = P ( w 1 ) ⋅ P 1 ( e ) + P ( w 2 ) ⋅ P 2 ( e ) P(e)=P\left(w_{1}\right) \cdot P_{1}(e)+P\left(w_{2}\right) \cdot P_{2}(e) P(e)=P(w1)⋅P1(e)+P(w2)⋅P2(e)
高维独立随机变量时错误率的推导
高维独立随机变量时的错误率估计:
( h ( x ) ∣ ω i ) ∼ N ( η i , σ i 2 ) (h(x) \mid \omega_{i}) \sim N\left(\eta_{i}, \sigma_{i}^{2}\right) (h(x)∣ωi)∼N(ηi,σi2)
其中:
η i = ∑ i = 1 d η i l \eta_i=\sum_{i=1}^{d}\eta_{il} ηi=i=1∑dηil
σ i 2 = ∑ i = 1 d σ i l 2 \sigma_{i}^{2}=\sum_{i=1}^{d} \sigma_{i l}^{2} σi2=i=1∑dσil2
△△△△△离散变量的概率模型估计问题△△△△
一阶马尔科夫链:
P ( x i ∣ x i − 1 , x i − 2 , ⋯ , x 1 ) = P ( x i ∣ x i − 1 ) P(x_i\mid x_{i-1},x_{i-2},\cdots,x_1)=P(x_i\mid x_{i-1}) P(xi∣xi−1,xi−2,⋯,x1)=P(xi∣xi−1)
转移概率:
a s t = P ( x i = t ∣ x i − 1 = s ) a_{st}=P(x_i=t\mid x_{i-1}=s) ast=P(xi=t∣xi−1=s)
观察到指定序列的概率为:
P ( x ) = P ( x 0 , x 1 , ⋯ , x L ) = ∏ i = 2 L a x i − 1 x i P(x)=P(x_0,x_1,\cdots,x_L)=\prod\limits_{i=2}^La_{x_{i-1}x_i} P(x)=P(x0,x1,⋯,xL)=i=2∏Laxi−1xi
一阶马尔科夫链的对数似然比判别:
S ( x ) = log P ( x ∣ + ) P ( x ∣ − ) = ∑ i = 1 L log a x i − 1 x i + a x i − 1 − x i = ∑ i = 1 L β x i − 1 x i S(x)=\log\frac{P(x \mid +)}{P(x \mid -)}=\sum_{i=1}^{L} \log \frac{a_{x_{i-1} x_{i}}^{+}}{a_{x_{i-1}}^{-} x_{i}}={\sum_{i=1}^{L}} \beta_{x_{i-1} x_{i}} S(x)=logP(x∣−)P(x∣+)=i=1∑Llogaxi−1−xiaxi−1xi+=i=1∑Lβxi−1xi
状态转移矩阵的估计:
a s t + = c s t + ∑ t ′ c c s t ′ + a s t − = c s t − ∑ t ′ c c s t ′ − a_{st}^+=\frac{c^+_{st}}{\sum_{t'}c^+_{cst'}}\\a_{st}^-=\frac{c^-_{st}}{\sum_{t'}c^-_{cst'}} ast+=∑t′ccst′+cst+ast−=∑t′ccst′−cst−