朴素贝叶斯法(python实现)

朴素贝叶斯法

朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。

一、基本方法

设输入空间 χ ∈ R n \chi \in R^n χRn是n维向量集合,输出空间 y = { c 1 , c 2 , ⋯   , c K } y=\{c_1,c_2,\cdots,c_K\} y={c1,c2,,cK}为类标记集合。X是定义在输入空间 χ \chi χ上的随机向量,Y是定义在输出空间 y y y上的随机变量。 P ( X , Y ) P(X,Y) P(X,Y)是X和Y的联合概率分布。训练数据集
T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} T={(x1,y1),(x2,y2),,(xN,yN)}
P ( X , Y ) P(X,Y) P(X,Y)独立同分布产生。
先验概率分布
P ( Y = c k ) , k = 1 , 2 , ⋯   , K P(Y=c_k),k=1,2,\cdots,K P(Y=ck),k=1,2,,K
条件概率分布
P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , X ( 2 ) = x ( 2 ) , ⋯   , X ( n ) = x ( n ) ∣ Y = c k ) , k = 1 , 2 , ⋯   , K P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},\cdots,X^{(n)}=x^{(n)}|Y=c_k),k=1,2,\cdots,K P(X=xY=ck)=P(X(1)=x(1),X(2)=x(2),,X(n)=x(n)Y=ck),k=1,2,,K

朴素贝叶斯法对条件概率分布做了条件独立性的假设。条件独立性假设是
P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , X ( 2 ) = x ( 2 ) , ⋯   , X ( n ) = x ( n ) ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},X^{(2)}=x^{(2)},\cdots,X^{(n)}=x^{(n)}|Y=c_k)\\ \qquad\qquad\qquad\qquad =\prod \limits_{j=1}^nP(X^{(j)}=x^{(j)}|Y=c_k) P(X=xY=ck)=P(X(1)=x(1),X(2)=x(2),,X(n)=x(n)Y=ck)=j=1nP(X(j)=x(j)Y=ck)
条件独立假设可解释为:用于分类的特征在类确定的条件下都是条件独立的。
后验概率分布
P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum \limits_{k}P(X=x|Y=c_k)P(Y=c_k)} P(Y=ckX=x)=kP(X=xY=ck)P(Y=ck)P(X=xY=ck)P(Y=ck)
根据条件独立假设有
P ( Y = c k ∣ X = x ) = P ( X = x ∣ Y = c k ) P ( Y = c k ) ∑ k P ( X = x ∣ Y = c k ) P ( Y = c k ) = P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) , k = 1 , 2 , ⋯   , , K P(Y=c_k|X=x)=\frac{P(X=x|Y=c_k)P(Y=c_k)}{\sum \limits_{k}P(X=x|Y=c_k)P(Y=c_k)}\\ \qquad\qquad\qquad\qquad =\frac{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum \limits_{k}P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)},k=1,2,\cdots,,K P(Y=ckX=x)=kP(X=xY=ck)P(Y=ck)P(X=xY=ck)P(Y=ck)=kP(Y=ck)j=1nP(X(j)=x(j)Y=ck)P(Y=ck)j=1nP(X(j)=x(j)Y=ck),k=1,2,,,K
朴素贝叶斯分类器可表示为
y = f ( x ) = arg max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=f(x)=\argmax \limits_{c_k}\frac{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)}{\sum \limits_{k}P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)} y=f(x)=ckargmaxkP(Y=ck)j=1nP(X(j)=x(j)Y=ck)P(Y=ck)j=1nP(X(j)=x(j)Y=ck)
进一步(分母不变)
y = f ( x ) = arg max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=f(x)=\argmax \limits_{c_k}{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)} y=f(x)=ckargmaxP(Y=ck)j=1nP(X(j)=x(j)Y=ck)

二、参数估计——极大似然法

在朴素贝叶斯法中,学习意味着估计 P ( Y = c k ) P(Y=c_k) P(Y=ck) P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X^{(j)}=x^{(j)}|Y=c_k) P(X(j)=x(j)Y=ck)
应用极大似然估计法估计相应的概率,
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , ⋯   , , K P(Y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)}{N},k=1,2,\cdots,,K P(Y=ck)=Ni=1NI(yi=ck),k=1,2,,,K
设第j个特征 x ( j ) x^{(j)} x(j)可能取值的集合为 { a j 1 , a j 2 , ⋯   , a j s j } \{a_{j1},a_{j2},\cdots,a_{jsj}\} {aj1,aj2,,ajsj},条件概率 P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X^{(j)}=x^{(j)}|Y=c_k) P(X(j)=x(j)Y=ck)的极大似然估计为
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum \limits_{i=1}^{N} I(x^{(j)}_{i}=a_{jl},y_i=c_k)}{\sum \limits_{i=1}^{N}I(y_i=c_k)} P(X(j)=ajlY=ck)=i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)

三、朴素贝叶斯算法

输入:训练数据 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x N , y N ) } , T=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}, T={(x1,y1),(x2,y2),,(xN,yN)},其中 x i = { x i ( 1 ) , x i ( 2 ) , ⋯   , x i ( n ) } T , x i ( j ) x_i=\{x_i^{(1)},x_i^{(2)},\cdots,x_i^{(n)}\}^T,x_i^{(j)} xi={xi(1),xi(2),,xi(n)}T,xi(j)是第 i i i个样本的第 j j j个特征, x i ( j ) ∈ { a j 1 , a j 2 , ⋯   , a j S j } , a j l x_i^{(j)}\in \{a_{j1},a_{j2},\cdots,a_{jS_j}\},a_{jl} xi(j){aj1,aj2,,ajSj},ajl是第 j j j特征可能取的第 l l l个值, j = 1 , 2 , ⋯   , n , l = 1 , 2 , ⋯   , S j , y i ∈ { c 1 , c 2 , ⋯   , c K } j=1,2,\cdots,n,l=1,2,\cdots,S_j,y_i\in \{c_1,c_2,\cdots,c_K\} j=1,2,,n,l=1,2,,Sj,yi{c1,c2,,cK};实例 x x x
输出:实例 x x x的分类
(1)计算先验概率和条件概率
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , ⋯   , , K P(Y=c_k)=\frac{\sum \limits_{i=1}^{N}I(y_i=c_k)}{N},k=1,2,\cdots,,K P(Y=ck)=Ni=1NI(yi=ck),k=1,2,,,K
P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) , j = 1 , 2 , ⋯   , n ; l = 1 , 2 , ⋯   , S j ; k = 1 , 2 , ⋯   , K P(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum \limits_{i=1}^{N} I(x^{(j)}_{i}=a_{jl},y_i=c_k)}{\sum \limits_{i=1}^{N}I(y_i=c_k)},j=1,2,\cdots,n;l=1,2,\cdots,S_j;k=1,2,\cdots,K P(X(j)=ajlY=ck)=i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)j=1,2,,nl=1,2,,Sjk=1,2,,K
(2)对于给定的实例 x = { x ( 1 ) , x ( 2 ) , ⋯   , x ( n ) } T , x=\{x^{(1)},x^{(2)},\cdots,x^{(n)}\}^T, x={x(1),x(2),,x(n)}T,计算
P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) , k = 1 , 2 , ⋯   , K P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k),k=1,2,\cdots,K P(Y=ck)j=1nP(X(j)=x(j)Y=ck),k=1,2,,K
(3)确定实例 x x x的类
y = arg max ⁡ c k P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y=\argmax \limits_{c_k}{P(Y=c_k)\prod \limits_{j=1}^{n}P(X^{(j)}=x^{(j)}|Y=c_k)} y=ckargmaxP(Y=ck)j=1nP(X(j)=x(j)Y=ck)

四、参数估计——贝叶斯估计

\qquad 采用极大似然估计可能会出现所要估计的概率值为0的情况,进而影响后验概率的计算结果,使分类产生偏差。解决这一问题的方法是采用贝叶斯估计。
P λ ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j λ \qquad P_{\lambda}(X^{(j)}=a_{jl}|Y=c_k)=\frac{\sum\limits_{i=1}^{N}I(x_i^{(j)}=a_{jl},y_i=c_k)+\lambda}{\sum\limits_{i=1}^{N}I(y_i=c_k)+S_j\lambda} Pλ(X(j)=ajlY=ck)=i=1NI(yi=ck)+Sjλi=1NI(xi(j)=ajl,yi=ck)+λ
其中 λ ≥ 0 \lambda\geq0 λ0
λ = 0 \lambda=0 λ=0时,是极大似然估计。
λ = 1 \lambda=1 λ=1时,称为拉普拉斯平滑。
先验概率的贝叶斯估计为
P λ ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) + λ N + K λ \qquad P_{\lambda}(Y=c_k)=\frac{\sum\limits_{i=1}^{N}I(y_i=c_k)+\lambda}{N+K\lambda} Pλ(Y=ck)=N+Kλi=1NI(yi=ck)+λ
对于上例,采用拉普拉斯平滑估计概率。

五、代码实现——举例

1.数据集

##朴素贝叶斯

x1=[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3]
x2=['S','M','M','S','S','S','M','M','L','L','L','M','M','L','L']
y=[-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1]
data={'X1':x1,'X2':x2,'Y':y}
df=pd.DataFrame(data)

朴素贝叶斯法(python实现)_第1张图片

A1={1,2,3}
A2={'S','M','L'}
C={1,-1}

2.先验概率

def priorPro(y):
    '''先验概率p(y)'''
    C=y.unique()
    pro_y={}
    for c_k in C:
        pro=sum(y==c_k)/len(y)
        pro_y[c_k]=pro
    return pro_y

3.条件概率

def conditionalPro(x,y):
    '''条件概率p(X=x|Y=y)'''
    a=list(x.unique())
    c=list(y.unique())
    inter=pd.concat([x,y],axis=1)
    conditionalpro={}
    for c_k in c:
        subpro={}
        for a_j in a:
            num=len(inter[inter.iloc[:,1]==c_k])
            num1=len(inter[(inter.iloc[:,0]==a_j)&(inter.iloc[:,1]==c_k)])
            pro=num1/num
            subpro[a_j]=pro
        conditionalpro[c_k]=subpro
    return pd.DataFrame(conditionalpro)

整理结果

priorpro=priorPro(df['Y'])

a1=conditionalPro(df['X1'],df['Y'])
a2=conditionalPro(df['X2'],df['Y'])
a1['变量']=1
a2['变量']=2

conPro=pd.concat([a1,a2])
conpro=conPro.reset_index()
conpro.rename(columns={'index':'X_value'},inplace=True)

朴素贝叶斯法(python实现)_第2张图片

4.预测

def pred(x):
    '''预测'''
    postpros={}
    for c_k in list(C):
        postpro=priorpro[c_k]
        for i in range(len(x)):
            postpro*=conpro.loc[(conpro['X_value']==x[i])&(conpro['变量']==i+1),c_k].values[0]
        postpros[c_k]=postpro
        
    for key, val in postpros.items():
        if val==max(postpros.values()):
            max_key=key
        
    return max_key,postpros
x_sample=[2,'S']
pred(x_sample)

在这里插入图片描述

5.贝叶斯估计参数

引入 λ \lambda λ

##拉普拉斯平滑
def priorPro_lap(y,lam=1):
    '''先验概率p(y)'''
    C=y.unique()
    pro_y={}
    for c_k in C:
        pro=(sum(y==c_k)+lam)/(len(y)+len(C)*lam)
        pro_y[c_k]=pro
    return pro_y

def conditionalPro_lap(x,y,lam=1):
    '''条件概率p(X=x|Y=y)'''
    a=list(x.unique())
    c=list(y.unique())
    inter=pd.concat([x,y],axis=1)
    conditionalpro={}
    for c_k in c:
        subpro={}
        for a_j in a:
            num=len(inter[inter.iloc[:,1]==c_k])
            num1=len(inter[(inter.iloc[:,0]==a_j)&(inter.iloc[:,1]==c_k)])
            pro=(num1+lam)/(num+len(a)*lam)
            subpro[a_j]=pro
        conditionalpro[c_k]=subpro
    return pd.DataFrame(conditionalpro)

priorpro_lap=priorPro_lap(df['Y'],lam=1)
a1=conditionalPro_lap(df['X1'],df['Y'])
a2=conditionalPro_lap(df['X2'],df['Y'])
a1['变量']=1
a2['变量']=2

conPro=pd.concat([a1,a2])
conpro=conPro.reset_index()
conpro.rename(columns={'index':'value'},inplace=True)

def pred(x):
    '''预测'''
    postpros={}
    for c_k in list(C):
        postpro=priorpro_lap[c_k]
        for i in range(len(x)):
            postpro*=conpro.loc[(conpro['value']==x[i])&(conpro['变量']==i+1),c_k].values[0]
        postpros[c_k]=postpro
        
    for key, val in postpros.items():
        if val==max(postpros.values()):
            max_key=key
        
    return max_key,postpros

朴素贝叶斯法(python实现)_第3张图片

六、总结

概率 P ( Y = c ∣ X = x ) P(Y=c|X=x) P(Y=cX=x)最大值所对应的 c c c就是我们所预测的 X X X Y Y Y值。根据贝叶斯公式有    P ( Y = c ∣ X = x ) = P ( X = x , Y = c ) P ( X = x ) ⇒ 贝 叶 斯 定 理    = P ( X = x ∣ Y = c ) P ( Y = c ) ∑ c P ( X = x ∣ Y = c ) P ( Y = c ) ⇒ 特 征 条 件 独 立 = ∏ j P ( X ( j ) = x ( j ) ∣ Y = c ) P ( Y = c ) ∑ c ∏ j P ( X ( j ) = x ( j ) ∣ Y = c ) P ( Y = c ) \;P(Y=c|X=x)=\frac{P(X=x,Y=c)}{P(X=x)}\\ \xRightarrow{贝叶斯定理}\;=\frac{P(X=x|Y=c)P(Y=c)}{\sum\limits_{c}P(X=x|Y=c)P(Y=c)}\\ \xRightarrow{特征条件独立}=\frac{\prod\limits_{j}P(X^{(j)}=x^{(j)}|Y=c)P(Y=c)}{\sum\limits_{c}\prod\limits_{j}P(X^{(j)}=x^{(j)}|Y=c)P(Y=c)} P(Y=cX=x)=P(X=x)P(X=x,Y=c) =cP(X=xY=c)P(Y=c)P(X=xY=c)P(Y=c) =cjP(X(j)=x(j)Y=c)P(Y=c)jP(X(j)=x(j)Y=c)P(Y=c)

你可能感兴趣的:(机器学习,机器学习,python,朴素贝叶斯算法)