《统计学习方法》极简笔记P4:朴素贝叶斯公式推导

朴素贝叶斯基本方法

通过训练数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ( x N , y N ) . . . , ( x 1 , y 1 ) } T=\{(x_1,y_1),(x_2,y_2),(x_N,y_N)...,(x_1,y_1)\} T={(x1,y1),(x2,y2),(xN,yN)...,(x1,y1)}学习联合概率分布P(X,Y),即学习先验概率分布 P ( Y = c k ) , P(Y=c_k), P(Y=ck),
条件概率分布 P ( X = x ∣ Y = c k ) P(X=x|Y=c_k) P(X=xY=ck)
k = 1 , 2 , . . . , K k=1,2,...,K k=1,2,...,K
假设条件独立
P ( X = x ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X=x|Y=c_k)=\prod_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k) P(X=xY=ck)=j=1nP(X(j)=x(j)Y=ck)
然后根据学习到的模型计算后验概率分布,根据贝叶斯定理
P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum_{k}P(X=x|Y=c_k)P(Y=c_k)} P(Y=ckX=x)=kP(X=xY=ck)P(Y=ck)P(X=xY=ck)P(Y=ck)
条件概率带入,得
P ( Y = c k ∣ X = x ) = P ( Y = c k ) ∏ j P ( X i ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X i ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k|X=x)=\frac{P(Y=c_k)\prod_{j}P(X_i^{(j)}=x^{(j)}|Y=c_k)}{\sum_{k}P(Y=c_k)\prod_{j}P(X_i^{(j)}=x^{(j)}|Y=c_k)} P(Y=ckX=x)=kP(Y=ck)jP(Xi(j)=x(j)Y=ck)P(Y=ck)jP(Xi(j)=x(j)Y=ck)
于是,朴素贝叶斯分类器可表示为
y = a r g m a x P ( Y = c k ) ∏ j P ( X i ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X i ( j ) = x ( j ) ∣ Y = c k ) y=argmax\frac{P(Y=c_k)\prod_{j}P(X_i^{(j)}=x^{(j)}|Y=c_k)}{\sum_{k}P(Y=c_k)\prod_{j}P(X_i^{(j)}=x^{(j)}|Y=c_k)} y=argmaxkP(Y=ck)jP(Xi(j)=x(j)Y=ck)P(Y=ck)jP(Xi(j)=x(j)Y=ck)
又,分母对所有 c k c_k ck都相同,so
y = a r g m a x P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) y=argmaxP(Y=c_k)\prod_{j}P(X^{(j)}=x^{(j)}|Y=c_k) y=argmaxP(Y=ck)jP(X(j)=x(j)Y=ck)
假设采用0-1损失函数,期望风险函数为
R e x p ( f ) = E [ L ( Y , f ( X ) ) ] R_{exp}(f)=E[L(Y,f(X))] Rexp(f)=E[L(Y,f(X))]
同样的,条件期望
R e x p ( f ) = E X ∑ k = 1 K [ L ( c k , f ( X ) ) ] P ( c k ∣ X ) R_{exp}(f)=E_X\sum_{k=1}^{K}[L(c_k,f(X))]P(c_k|X) Rexp(f)=EXk=1K[L(ck,f(X))]P(ckX)
期望风险最小,只需对X=x逐个极小化
f ( x ) = a r g m i n ∑ k = 1 K [ L ( c k , y ) ] P ( c k ∣ X ) = a r g m i n ∑ k = 1 K P ( y ≠ c k ∣ X = x ) = a r g m i n ∑ k = 1 K ( 1 − P ( y = c k ∣ X = x ) ) = a r g m a x P ( y = c k ∣ X = x ) f(x)=argmin\sum_{k=1}^{K}[L(c_k,y)]P(c_k|X)\\=argmin\sum_{k=1}^{K}P(y\neq{c_k}|X=x)\\=argmin\sum_{k=1}^{K}(1-P(y={c_k}|X=x))\\=argmaxP(y=c_k|X=x) f(x)=argmink=1K[L(ck,y)]P(ckX)=argmink=1KP(y̸=ckX=x)=argmink=1K(1P(y=ckX=x))=argmaxP(y=ckX=x)
这即为朴素贝叶斯采用的原理

朴素贝叶斯算法

输入:
训练数据 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)}
x i = ( x i ( 1 ) , x i ( 2 ) , . . . , x i ( l ) ) x_i=(x_i^{(1)},x_i^{(2)},...,x_i^{(l)}) xi=(xi(1),xi(2),...,xi(l)), x i ( l ) x_i^{(l)} xi(l)为第i个样本的第j个特征, a j l a_{jl} ajl是第j个特征可能取得第 l l l个值,j=1,2,…,n, l = 1 , 2 , . . . , S j l=1,2,...,S_j l=1,2,...,Sj, y i ∈ { c 1 , c 2 , . . . , c K } y_i\in\{c_1,c_2,...,c_K\} yi{c1,c2,...,cK}
输出:实例 x x x的分类
(1)计算先验概率及条件概率,此处取极大似然估计
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k)=\frac{\sum^{N}_{i=1}I(y_i=c_k)}{N} P(Y=ck)=Ni=1NI(yi=ck)

P ( X ( j ) ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) ∣ y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}|Y=c_k)=\frac{\sum^{N}_{i=1}I(x_i^{(j)}|y_i=c_k)}{\sum_{i=1}^{N}I(y_i=c_k)} P(X(j)Y=ck)=i=1NI(yi=ck)i=1NI(xi(j)yi=ck)
(2)对于给定的实例, x = ( x ( 1 ) , x ( 2 ) , . . . , x ( n ) ) T x=(x^{(1)},x^{(2)},...,x^{(n)})^T x=(x(1),x(2),...,x(n))T,计算
P ( Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k)=\prod_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_k) PY=ck=j=1nP(X(j)=x(j)Y=ck)
(3)确定实例 x x x的类
y = a r g m a x P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=arg maxP(Y=c_k)\prod_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k) y=argmaxP(Y=ck)j=1nP(X(j)=x(j)Y=ck)

贝叶斯估计

极大似然估计存在的问题是会出现概率为0的情况,解决之道是贝叶斯估计

P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ P(Y=c_k)=\frac{\sum^{N}_{i=1}I(y_i=c_k)+\lambda}{N+K\lambda} P(Y=ck)=N+Kλi=1NI(yi=ck)+λ

P ( X ( j ) ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) ∣ y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ P(X^{(j)}|Y=c_k)=\frac{\sum^{N}_{i=1}I(x_i^{(j)}|y_i=c_k)+\lambda}{\sum_{i=1}^{N}I(y_i=c_k)+S_j\lambda} P(X(j)Y=ck)=i=1NI(yi=ck)+Sjλi=1NI(xi(j)yi=ck)+λ

你可能感兴趣的:(《统计学习方法》极简笔记P4:朴素贝叶斯公式推导)