贝叶斯&python应用

贝叶斯

贝叶斯判定准则

为最小化总体风险,只需要在每个样本上选择那个能使条件风险 R ( c ∣ x ) R(c|x) R(cx)最小的类别标记,即:
h ∗ ( x ) = arg min ⁡ c ∈ Y R ( c ∣ x ) (式1) h^*{(x)}=\argmin\limits_{c\in{\mathcal Y}}R(c|x)\tag{式1} h(x)=cYargminR(cx)(1)
此时, h ∗ ( x ) h^*(x) h(x)称为贝叶斯最优分类器。
已知,条件风险 R ( c ∣ x ) R(c|x) R(cx)的计算公式为:
R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) (式2) R(c_i|x)=\sum_{j=1}^{N}\lambda_{ij}P(c_j|x)\tag{式2} R(cix)=j=1NλijP(cjx)(2)
如若目标是最小化分类错误率,则误判损失 λ i j \lambda_{ij} λij对应为0/1损失,即:
λ i j = { 0 , i f i = j 1 , o t h e r w i s e (式3) \begin{aligned} \lambda_{ij}= \begin{cases} 0,\qquad &{if\quad i=j}\\ 1,&otherwise \end{cases} \end{aligned}\tag{式3} λij={0,1,ifi=jotherwise(3)
那么条件风险 R ( c ∣ x ) R(c|x) R(cx)的计算公式进一步展开为:
R ( c i ∣ x ) = 1 ⋅ P ( c 1 ∣ x ) + ⋯ + 1 ⋅ P ( c i − 1 ∣ x ) + 0 ⋅ P ( c i ∣ x ) + 1 ⋅ P ( c i + 1 ∣ x ) + ⋯ + 1 ⋅ P ( c N ∣ x ) = P ( c 1 ∣ x ) + ⋯ + P ( c i − 1 ∣ x ) + P ( c i + 1 ∣ x ) + ⋯ + P ( c N ∣ x ) (式4) \begin{aligned} R(c_i|x)&=1\cdot P(c_1|x)+\cdots +1\cdot P(c_{i-1}|x)+0\cdot P(c_i|x)\\ &+1\cdot P(c_{i+1}|x)+\cdots+1\cdot P(c_N|x)\\ &=P(c_1|x)+\cdots+P(c_{i-1}|x)+P(c_{i+1}|x)+\cdots +P(c_N|x)\tag{式4} \end{aligned} R(cix)=1P(c1x)++1P(ci1x)+0P(cix)+1P(ci+1x)++1P(cNx)=P(c1x)++P(ci1x)+P(ci+1x)++P(cNx)(4)
由于 ∑ j = 1 N P ( c j ∣ x ) = 1 \sum_{j=1}^{N}P(c_j|x)=1 j=1NP(cjx)=1,所以有:
R ( c i ∣ x ) = 1 − P ( c i ∣ x ) (式5) R(c_i|x)=1-P(c_i|x)\tag{式5} R(cix)=1P(cix)(5)
于是呢,最小化错误率的贝叶斯最优分类器就是:
h ∗ ( x ) = arg min ⁡ c ∈ Y R ( c ∣ x ) = arg min ⁡ c ∈ Y ( 1 − P ( c ∣ x ) ) = arg max ⁡ c ∈ Y P ( c ∣ x ) (式6) h^*(x)=\argmin\limits_{c\in{\mathcal{Y}}}R(c|x)=\argmin\limits_{c\in{\mathcal{Y}}}(1-P(c|x))=\argmax\limits_{c\in{\mathcal{Y}}}P(c|x)\tag{式6} h(x)=cYargminR(cx)=cYargmin(1P(cx))=cYargmaxP(cx)(6)

多元正态度分布参数的极大似然估计

已知对数似然函数为:
L L ( θ c ) = ∑ x ∈ D c l o g P ( x ∣ θ c ) (式7) LL(\theta_c)=\sum_{x\in{D_c}}logP(x|\theta_c)\tag{式7} LL(θc)=xDclogP(xθc)(7)
为了便于计算,令 l o g log log的底数为 e e e,则对数似然函数为:
L L ( θ c ) = ∑ x ∈ D c l n P ( x ∣ θ c ) (式8) LL(\theta_c)=\sum_{x\in{D_c}}lnP(x|\theta_c)\tag{式8} LL(θc)=xDclnP(xθc)(8)
由于 P ( x ∣ θ c ) = P ( x ∣ c ) ∼ N ( μ c , σ c 2 ) P(x|\theta_c)=P(x|c)\sim\mathcal{N}(\mu_c,\sigma_c^2) P(xθc)=P(xc)N(μc,σc2)那么:
P ( x ∣ θ c ) = 1 ( 2 π ) d ∣ Σ c ∣ e x p ( − 1 2 ( x − μ c ) T Σ c − 1 ( x − μ c ) ) (式9) P(x|\theta_c)=\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))\tag{式9} P(xθc)=(2π)dΣc 1exp(21(xμc)TΣc1(xμc))(9)
其中, d d d表示 x x x的维数, Σ c = σ c 2 \Sigma_c=\sigma_c^2 Σc=σc2为对称正定协方差矩阵, ∣ Σ c ∣ |\Sigma_c| Σc表示行列式,将上式代入对数似然函数可得:
L L ( θ c ) = ∑ x ∈ D c l n [ 1 ( 2 π ) d ∣ Σ c ∣ e x p ( − 1 2 ( x − μ c ) T Σ c − 1 ( x − μ c ) ) ] (式10) LL(\theta_c)=\sum_{x\in{D_c}}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))]\tag{式10} LL(θc)=xDcln[(2π)dΣc 1exp(21(xμc)TΣc1(xμc))](10)
∣ D c = N ∣ |D_c=N| Dc=N,则对数似然函数化为:
L L ( θ c ) = ∑ x = 1 N l n [ 1 ( 2 π ) d ∣ Σ c ∣ e x p ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] = ∑ i = 1 N l n [ 1 ( 2 π ) d ⋅ 1 ∣ Σ c ∣ e x p ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] = ∑ i = 1 N { l n 1 ( 2 π ) d + l n 1 ∣ Σ c ∣ + l n [ e x p ( − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ) ] } = ∑ i = 1 N { − d 2 l n ( 2 π ) − 1 2 l n ∣ Σ c ∣ − 1 2 ( x i − μ c ) T Σ c − 1 ( x i − μ c ) } = − N d 2 l n ( 2 π ) − N 2 l n ∣ Σ c ∣ − 1 2 Σ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) (式11) \begin{aligned} LL(\theta_c)&=\sum_{x=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d}}\cdot \cfrac{1}{\sqrt{|\Sigma_c|}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}\{ln\cfrac{1}{\sqrt{(2\pi)^d}}+ln\cfrac{1}{\sqrt{|\Sigma_c|}}+ln[exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\}\\ &=\sum_{i=1}^{N}\{-\cfrac{d}{2}ln(2\pi)-\cfrac{1}{2}ln|\Sigma_c|-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)\}\\ &=-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\Sigma_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c) \end{aligned}\tag{式11} LL(θc)=x=1Nln[(2π)dΣc 1exp(21(xiμc)TΣc1(xiμc))]=i=1Nln[(2π)d 1Σc 1exp(21(xiμc)TΣc1(xiμc))]=i=1N{ln(2π)d 1+lnΣc 1+ln[exp(21(xiμc)TΣc1(xiμc))]}=i=1N{2dln(2π)21lnΣc21(xiμc)TΣc1(xiμc)}=2Ndln(2π)2NlnΣc21Σi=1N(xiμc)TΣc1(xiμc)(11)
由于参数 θ c \theta_c θc的极大似然估计 θ ^ c \hat{\theta}_c θ^c为;
θ ^ c = arg min ⁡ θ c L L ( θ c ) (式12) \hat{\theta}_c=\argmin\limits_{\theta_c}LL(\theta_c)\tag{式12} θ^c=θcargminLL(θc)(12)
所以下面只需求出使得对数似然函数 L L ( θ c ) LL(\theta_c) LL(θc)取到最大值的 μ ^ c \hat{\mu}_c μ^c ∑ ^ c \hat{\sum}_c ^c,就求出了 θ ^ c \hat{\theta}_c θ^c
L L ( θ c ) LL(\theta_c) LL(θc)关于 μ c \mu_c μc求偏导:
∂ L L ( θ c ) ∂ μ c = ∂ ∂ μ c [ − N d 2 l n ( 2 π ) − N 2 l n ∣ Σ c ∣ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = ∂ ∂ μ c [ − 1 2 ∑ i = 1 N ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i − μ c ) T Σ c − 1 ( x i − μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i T − μ c T ) Σ c − 1 ( x i − μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ ( x i T − μ c T ) ( Σ c − 1 x i − Σ c − 1 μ c ) ] = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ x i T Σ c − 1 x i − x i T Σ c T μ c − μ c T Σ c − 1 x i + μ c T Σ c − 1 μ c ] (式13) \begin{aligned} \cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}&=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)(\Sigma_c^{-1}x_i-\Sigma_c^{-1}\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-x_i^T\Sigma_c^T\mu_c-\mu_c^T\Sigma_c^{-1}x_i+\mu_c^T\Sigma_c^{-1}\mu_c] \end{aligned}\tag{式13} μcLL(θc)=μc[2Ndln(2π)2NlnΣc21i=1N(xiμc)TΣc1(xiμc)]=μc[21i=1N(xiμc)TΣc1(xiμc)]]=21i=1Nμc[(xiμc)TΣc1(xiμc)]=21i=1Nμc[(xiTμcT)Σc1(xiμc)]=21i=1Nμc[(xiTμcT)(Σc1xiΣc1μc)]=21i=1Nμc[xiTΣc1xixiTΣcTμcμcTΣc1xi+μcTΣc1μc](13)
由于 x i T Σ c − 1 μ c x_i^T\Sigma_c^{-1}\mu_c xiTΣc1μc的计算结果为标量,所以有:
x i T Σ c − 1 μ c = ( x i T Σ c − 1 μ c ) T = μ c T ( Σ c − 1 ) T x i = μ c T ( Σ c T ) − 1 x i = μ c T Σ c − 1 x i (式14) x_i^T\Sigma_c^{-1}\mu_c=(x_i^T\Sigma_c^{-1}\mu_c)^T=\mu_c^T(\Sigma_c^{-1})^Tx_i=\mu_c^T(\Sigma_c^T)^{-1}x_i=\mu_c^T\Sigma_c^{-1}x_i\tag{式14} xiTΣc1μc=(xiTΣc1μc)T=μcT(Σc1)Txi=μcT(ΣcT)1xi=μcTΣc1xi(14)
所以(式13)可以进一步化为:
∂ L L ( θ c ) ∂ μ c = − 1 2 ∑ i = 1 N ∂ ∂ μ c [ x i T Σ c − 1 x i − 2 x i T Σ c − 1 μ c + μ c T Σ c − 1 μ c ] (式15) \cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}= -\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-2x_i^T\Sigma_c^{-1}\mu_c+\mu_c^T\Sigma_c^{-1}\mu_c]\tag{式15} μcLL(θc)=21i=1Nμc[xiTΣc1xi2xiTΣc1μc+μcTΣc1μc](15)
由矩阵微分公式:
∂ a T x ∂ x = a , ∂ x T β x ∂ x = ( β + β T ) x (式16) \cfrac{\partial a^T x}{\partial x}=a,\quad \cfrac{\partial x^T \beta x}{\partial x}=(\beta+\beta^T)x\tag{式16} xaTx=a,xxTβx=(β+βT)x(16)
可以得到;
∂ L L ( θ c ) ∂ μ c = − 1 2 ∑ i = 1 N [ 0 − ( 2 x i T Σ c − 1 ) T + ( Σ c − 1 + Σ c − 1 ) T μ c ] = − 1 2 ∑ i = 1 N [ − ( 2 ( Σ c − 1 ) T x i ) + ( Σ c − 1 + Σ c − 1 ) T μ c ] = − 1 2 ∑ i = 1 N [ − ( 2 Σ c − 1 x i ) + 2 Σ c − 1 μ c ] = ∑ i = 1 N Σ c − 1 x i − N Σ c − 1 μ c (式17) \begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&= -\cfrac{1}{2}\sum_{i=1}^{N}[0-(2x_i^T\Sigma_c^{-1})^T+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2(\Sigma_c^{-1})^T x_i)+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2\Sigma_c^{-1}x_i)+2\Sigma_c^{-1}\mu_c]\\ &=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c \end{aligned}\tag{式17} μcLL(θc)=21i=1N[0(2xiTΣc1)T+(Σc1+Σc1)Tμc]=21i=1N[(2(Σc1)Txi)+(Σc1+Σc1)Tμc]=21i=1N[(2Σc1xi)+2Σc1μc]=i=1NΣc1xiNΣc1μc(17)
令偏导数为0,得到:
∂ L L ( θ c ) ∂ μ c = ∑ i = 1 N Σ c − 1 x i − N Σ c − 1 μ c = 0 ⟹ ∑ i = 1 N Σ c − 1 x i = N Σ c − 1 μ c ⟹ Σ c − 1 ∑ i = 1 N x i = N Σ c − 1 μ c ⟹ N μ c = ∑ i = 1 N x i ⟹ μ c = 1 N ∑ i = 1 N x i (式18) \begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c=0\\ &\Longrightarrow\sum_{i=1}^{N}\Sigma_c^{-1}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow\Sigma_c^{-1}\sum_{i=1}^{N}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow N\mu_c = \sum_{i=1}^{N}x_i\\ &\Longrightarrow \mu_c = \cfrac{1}{N}\sum_{i=1}^{N}x_i \end{aligned}\tag{式18} μcLL(θc)=i=1NΣc1xiNΣc1μc=0i=1NΣc1xi=NΣc1μcΣc1i=1Nxi=NΣc1μcNμc=i=1Nxiμc=N1i=1Nxi(18)
同样的,对 L L ( θ c ) LL(\theta_c) LL(θc)关于 Σ c \Sigma_c Σc求偏导得到:
Σ c = 1 N ∑ i = 1 N ( x i − μ c ) ( x i − μ c ) T (式19) \Sigma_c = \cfrac{1}{N}\sum_{i=1}^{N}(x_i-\mu_c)(x_i-\mu_c)^T\tag{式19} Σc=N1i=1N(xiμc)(xiμc)T(19)
最小化分类错误率的贝叶斯最优分类器为:
h ∗ ( x ) = arg max ⁡ c ∈ Y P ( c ∣ x ) (式20) h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c|x)\tag{式20} h(x)=cYargmaxP(cx)(20)
又由贝叶斯定理可以知道:
P ( c ∣ x ) = P ( x , c ) P ( x ) = P ( c ) P ( x ∣ c ) P ( x ) (式21) P(c|x)=\cfrac{P(x,c)}{P(x)}=\cfrac{P(c)P(x|c)}{P(x)}\tag{式21} P(cx)=P(x)P(x,c)=P(x)P(c)P(xc)(21)
所以:
h ∗ ( x ) = arg max ⁡ c ∈ Y P ( c ) P ( x ∣ c ) P ( x ) = arg max ⁡ c ∈ Y P ( c ) P ( x ∣ c ) (式22) h^*(x)=\argmax\limits_{c\in{\mathcal{Y}}}\cfrac{P(c)P(x|c)}{P(x)}=\argmax\limits_{c\in\mathcal{Y}}P(c)P(x|c)\tag{式22} h(x)=cYargmaxP(x)P(c)P(xc)=cYargmaxP(c)P(xc)(22)
又由属性条件独立性假设:
P ( x ∣ c ) = P ( x 1 , x 2 , ⋯   , x d ∣ c ) = ∏ i = 1 d P ( x i ∣ c ) (式23) P(x|c) = P(x_1,x_2,\cdots,x_d|c) = \prod_{i=1}^{d}P(x_i|c)\tag{式23} P(xc)=P(x1,x2,,xdc)=i=1dP(xic)(23)
所以:
h ∗ ( x ) = arg max ⁡ c ∈ Y P ( c ) ∏ i = 1 d P ( x i ∣ c ) (式24) h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c)\prod_{i=1}^{d}P(x_i|c)\tag{式24} h(x)=cYargmaxP(c)i=1dP(xic)(24)
这个就是朴素贝叶斯分类器的表达式。
对于 P ( c ) P(c) P(c),表示的是样本空间中各类样本所占的比例,根据大数定律,当训练集包含充足的度量同分布样本的时候, P ( c ) P(c) P(c)可以通过各类样本的频率来进行估计,即:
P ( c ) = ∣ D c ∣ ∣ D ∣ (式25) P(c)=\cfrac{|D_c|}{|D|}\tag{式25} P(c)=DDc(25)
其中, D D D表示训练集, ∣ D ∣ |D| D表示样本数, D c D_c Dc表示训练集中第 c c c类样本的数量组成的集合, ∣ D c ∣ |D_c| Dc表示集合 D c D_c Dc的样本个数。

贝叶斯分类器python应用

# 导入乳腺肿瘤数据
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
# 打印处数据的keys
print(cancer.keys())
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
# 打印数据集中的标注好的肿瘤分类
print("肿瘤的分类:",cancer['target_names'])
print("肿瘤的特征:",cancer['feature_names'])
肿瘤的分类: ['malignant' 'benign']
肿瘤的特征: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

可见,肿瘤的分类分为:恶性(Malignant),良性(benign),特征值有很多。

# 将数据集的数值和分类目标赋值给X,y
X, y = cancer.data, cancer.target
# 导入数据拆分工具
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=38)
# 查看数据形态
print("训练集形态:", X_train.shape)
print("测试集形态:", X_test.shape)
训练集形态: (426, 30)
测试集形态: (143, 30)
# 导入高斯朴素贝叶斯
from sklearn.naive_bayes import GaussianNB
# 进行拟合数据
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# 打印模型得分
print("模型得分:{:.3f}".format(gnb.score(X_test, y_test)))
模型得分:0.944

你可能感兴趣的:(python,机器学习,数据挖掘,朴素贝叶斯算法,肿瘤)