事件A发生的概率记为 P ( A ) P(A) P(A),事件B发生的概率记为 P ( B ) P(B) P(B),
在事件A发生后发生B事件的概率记为 P ( B ∣ A ) P(B|A) P(B∣A),在事件B发生后发生事件A的概率为 P ( A ∣ B ) P(A|B) P(A∣B),
事件A和B同时发生的概率为联合概率,记为为P(A,B)(即P(A和B))那么 P ( A , B ) = P ( A ) P ( B ∣ A ) = P ( B ) P ( A ∣ B ) P(A, B) = P(A)P(B|A) = P(B)P(A|B) P(A,B)=P(A)P(B∣A)=P(B)P(A∣B)
那么我们很容易得到
P ( A ∣ B ) = P ( A ) P ( B ∣ A ) P ( B ) P(A|B) = \frac{P(A)P(B|A)}{P(B)} P(A∣B)=P(B)P(A)P(B∣A)
这就是贝叶斯公式,朴素贝叶斯算法也是基于此公式
输入:有训练数据及 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , … , ( x N , y N ) } T=\{(x_1, y_1), (x_2, y_2), …, (x_N, y_N)\} T={ (x1,y1),(x2,y2),…,(xN,yN)}, 其中 x i = ( x i ( 1 ) , x i ( 2 ) , … , x i ( n ) ) T x_i = (x_i^{(1)}, x_i^{(2)}, …, x_i^{(n)})^T xi=(xi(1),xi(2),…,xi(n))T, x i ( j ) x_i^{(j)} xi(j)是第 i i i个样本的第 j j j个特征, x i ( j ) ∈ { a j 1 , a j 2 , … , a j S j } x_i^{(j)} ∈\{a_{j1}, a_{j2}, …, a_{jS_j}\} xi(j)∈{ aj1,aj2,…,ajSj}, a j l a_{jl} ajl是第 j j j个特征可能取得第 l l l个值, y y y的类标记集合为 { ( c 1 ) , ( c 2 ) , … , ( c K ) } \{(c_1), (c_2), …, (c_K)\} { (c1),(c2),…,(cK)};
实例: x x x;
输出:实例 x x x的分类。
1)计算先验概率及条件概率
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , … , K P(Y=c_k) = \frac{\sum_{i=1}^NI(y_i = c_k)}{N}, \ \ \ \ \ \ \ k=1, 2,…, K P(Y=ck)=N∑i=1NI(yi=ck), k=1,2,…,K
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i=1}^N I(x_i^{(j)}=a_{jl}, y_i=c_k)}{\sum_{i=1}^NI(y_i = c_k)} P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck)
其中 j = 1 , 2 , … , n ; l = 1 , 2 , … , S j ; k = 1 , 2 , … , K j=1, 2,…,n; \ \ l=1, 2,…,S_j; \ \ k=1, 2,…, K j=1,2,…,n; l=1,2,…,Sj; k=1,2,…,K
2)对于给定的实例 x = ( x ( 1 ) , x ( 2 ) , … , x ( n ) ) T x = (x^{(1)}, x^{(2)}, …, x^{(n)})^T x=(x(1),x(2),…,x(n))T,计算
P ( Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) , k = 1 , 2 , … , K P(Y=c_k) = \prod_{j=1}^nP(X^{(j)} = x^{(j)}|Y=c_k),\ \ k=1, 2, …, K P(Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck), k=1,2,…,K
3)确定实例x的类
y = a r g m a x c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y = argmax_{c_k}P(Y=c_k)\prod_{j=1}^nP(X^{(j)} = x^{(j)}|Y=c_k) y=argmaxckP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)
用极大似然估计可能会出现估计概率值为0的情况,解决方法是采用贝叶斯估计。条件概率的贝叶斯估计是
P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ P_{\lambda}(X^{(j)} = a_{jl}|Y = c_k) = \frac{\sum_{i=1}^N I(x_i^{(j)}=a_{jl}, y_i=c_k) +\lambda}{\sum_{i=1}^NI(y_i = c_k)+S_j\lambda} Pλ(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)+Sjλ∑i=1NI(xi(j)=ajl,yi=ck)+λ
其中 λ > = 0 \lambda >= 0 λ>=0,通常 λ = 1 \lambda = 1 λ=1,这是成为拉普拉斯平滑,对于任何 l = 1 , 2 , … , S j , k = 1 , 2 , … , K l=1, 2, …, S_j, k=1,2, …, K l=1,2,…,Sj,k=1,2,…,K,有
P λ ( X ( j ) = a j l ∣ Y = c k ) > 0 P_{\lambda}(X^{(j)} = a_{jl}|Y = c_k) > 0 Pλ(X(j)=ajl∣Y=ck)>0
∑ l = 1 S j P λ ( X ( j ) = a j l ∣ Y = c k ) = 1 \sum_{l=1}^{S_j}P_{\lambda}(X^{(j)} = a_{jl}|Y = c_k) = 1 l=1∑SjPλ(X(j)=ajl∣Y=ck)=1
先验概率的贝叶斯估计是
P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P_{\lambda}(Y=c_k) = \frac{\sum_{i=1}^NI(y_i = c_k) + \lambda}{N+K\lambda} Pλ(Y=ck)=N+Kλ∑i=1NI(yi=ck)+λ
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
# 构造训练数据集
x_cls1 = np.concatenate((np.random.randn(300).reshape(-1, 1), np.random.randn(300).reshape(-1, 1)), axis=1)
y_cls1 = np.zeros((300))
x_cls2 = np.concatenate(((np.random.randn(300)+4).reshape(-1, 1), np.random.randn(300).reshape(-1, 1)), axis=1)
y_cls2 = np.ones((300))
x = np.round(np.concatenate((x_cls1, x_cls2), axis=0), 1)
y = np.concatenate((y_cls1, y_cls2), axis=0)
用散点图查看数据
sns.scatterplot(x[:, 0], x[:, 1], hue=y)
# 把训练集分割成训练集和测试集
x_train, x_test, y_train, y_test = \
train_test_split(x, y, test_size=0.3)
# 训练模型
clf = GaussianNB()
clf.fit(x_train, y_train)
print("test score: %.2f" % clf.score(x_test, y_test))
test score: 0.98
测试集散点图和预测散点图分别为
李航 《统计学习方法 第二版》