统计学习方法第四章课后习题(转载+重新排版+自己解读)

4.1 用极大似然估计法推导朴素贝叶斯法中的先验概率估计公式(4.8)和条件概率估计公式(4.9)

首先是(4.8)
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P({Y=c_k})=\frac {\sum_{i=1}^NI(y_i=c_k)} {N} P(Y=ck)=Ni=1NI(yi=ck)
###################下面开始证明###############################
下面的 a j l a_{jl} ajl表示的是第j个特征可能取的第 l l l个值
x i ( j ) x_i^{(j)} xij指的是第j个样本的第j个特征
P ( x ( j ) = a j l ∣ Y = c k ) P(x^{(j)}=a_{jl}|Y=c_k) P(x(j)=ajlY=ck)
= ∑ 1 = 1 N ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( Y = c k ) = \frac {\sum_{1=1}^N(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(Y=c_k)} =i=1NI(Y=ck)1=1N(xi(j)=ajl,yi=ck)
p = P ( Y = c k ) p=P(Y=c_k) p=P(Y=ck)
相当于从样本中独立同分布地随机抽取N个样本,每个样本的结果为 y i y_i yi

似然概率
P ( y 1 , y 2 , . . . , y n ) P(y_1,y_2,...,y_n) P(y1,y2,...,yn)
= p ∑ i = 1 N ⋅ I ( y i = c k ) ⋅ ( 1 − p ) ∑ i = 1 N I ( y i ≠ c k ) =p^{ \sum_{i=1}^N·I(y_i=c_k) }· (1-p)^{\sum_{i=1}^NI(y_i≠c_k)} =pi=1NI(yi=ck)(1p)i=1NI(yi̸=ck)
然后求解最大似然概率:
d P ( y 1 , y 2 , . . . , y n ) d p \frac {dP(y_1,y_2,...,y_n)} {dp} dpdP(y1,y2,...,yn)
= ∑ i = 1 N I ( y i = c k ) p ∑ i = 1 N I ( y i = c k ) − 1 ⋅ ( 1 − p ) ∑ i = 1 N I ( y i ≠ c k ) = \sum_{i=1}^NI(y_i=c_k)p^{\sum_{i=1}^NI(y_i=c_k)-1}·(1-p)^{\sum_{i=1}^NI(y_i≠c_k)} =i=1NI(yi=ck)pi=1NI(yi=ck)1(1p)i=1NI(yi̸=ck)
− ∑ i = 1 N I ( y i ≠ c k ) ( 1 − p ) ∑ i = 1 N I ( y i ≠ c k ) − 1 ⋅ p ∑ i = 1 N I ( y i = c k ) -\sum_{i=1}^NI(y_i≠c_k)(1-p)^{\sum_{i=1}^NI(y_i≠c_k)-1}·p^{\sum_{i=1}^NI(y_i=c_k) } i=1NI(yi̸=ck)(1p)i=1NI(yi̸=ck)1pi=1NI(yi=ck)

= p [ ∑ i = 1 N I ( y i = c k ) ] − 1 ⋅ ( 1 − p ) [ ∑ i = 1 N I ( y i ≠ c k ) ] − 1 = p^{[\sum_{i=1}^NI(y_i=c_k)]-1}·(1-p)^{[\sum_{i=1}^NI(y_i≠c_k)]-1} =p[i=1NI(yi=ck)]1(1p)[i=1NI(yi̸=ck)]1
⋅ [ ( 1 − p ) ∑ i = 1 N I ( y i = c k ) − p ∑ i = 1 N I ( y i ≠ c k ) ] = 0 ·[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0 [(1p)i=1NI(yi=ck)pi=1NI(yi̸=ck)]=0

∴ [ ( 1 − p ) ∑ i = 1 N I ( y i = c k ) − p ∑ i = 1 N I ( y i ≠ c k ) ] = 0 ∴[(1-p)\sum_{i=1}^NI(y_i=c_k)-p\sum_{i=1}^NI(y_i≠c_k)]=0 [(1p)i=1NI(yi=ck)pi=1NI(yi̸=ck)]=0
又 ∵ Σ i N I ( y i = c k ) = p ( ∑ i = 1 N I ( y i = c k ) + ∑ i = 1 N I ( y i ≠ c k ) ) = p N 又∵Σ_i^NI(y_i=c_k)=p(\sum_{i=1}^NI(y_i=c_k)+\sum_{i=1}^NI(y_i≠c_k))=pN ΣiNI(yi=ck)=p(i=1NI(yi=ck)+i=1NI(yi̸=ck))=pN

∴ p = P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N ① ∴p=P(Y=c_k)=\frac {\sum_{i=1}^NI(y_i=c_k)} {N}① p=P(Y=ck)=Ni=1NI(yi=ck)
(4.8)证明结束
###############################################
接下来证明(4.9)
P ( X ( j ) = a j l ∣ Y = c k ) P(X^{(j)}=a_{jl}|Y=c_k) P(X(j)=ajlY=ck)
= ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) =\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}{\sum_{i=1}^NI(y_i=c_k)} =i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)
j ∈ [ 1 , n ] j∈[1,n] j[1,n]
l ∈ [ 1 , S j ] l∈[1,S_j] l[1,Sj]
k ∈ [ 1 , K ] k∈[1,K] k[1,K]
##############下面开始证明#########
P ( Y = c k , x ( j ) = a i j ) P(Y=c_k,x^{(j)}=a_{ij}) P(Y=ck,x(j)=aij)
= ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) N ② = \frac { \sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl}) } {N}② =Ni=1NI(yi=ck,xi(j)=ajl)
而需要证明的式子的左边是:
P ( x ( j ) = a j l ∣ Y = c k ) P({x^{(j)}=a_{jl}|Y=c_k}) P(x(j)=ajlY=ck)
= P ( Y = c k , x ( j ) = a j l ) P ( Y = c k ) ③ =\frac {P(Y=c_k,x^{(j)}=a_{jl})} {P(Y=c_k)}③ =P(Y=ck)P(Y=ck,x(j)=ajl)
接下来,
把①代入③的分母,
把②代入③的分子。
得到:
P ( x ( j ) = a j l ∣ Y = c k ) = [ ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) N ] [ ∑ j = 1 N I ( y i = c k ) N ] P({x^{(j)}=a_{jl}|Y=c_k})= \frac {[\frac{\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})}{N}]}{[\frac{\sum_{j=1}^NI(y_i=c_k)}{N}]} P(x(j)=ajlY=ck)=[Nj=1NI(yi=ck)][Ni=1NI(yi=ck,xi(j)=ajl)]

= ∑ i = 1 N I ( y i = c k , x i ( j ) = a j l ) ∑ i = 1 N I ( y i = c k ) =\frac {\sum_{i=1}^NI(y_i=c_k,x_i^{(j)}=a_{jl})} {\sum_{i=1}^NI(y_i=c_k)} =i=1NI(yi=ck)i=1NI(yi=ck,xi(j)=ajl)

(4.9)证明结束

#######下面开始证明(4.11)########################
假设先验概率为均匀概率,那么有:
p = 1 k = > p K − 1 = 0 ( 1 ) p= \frac {1}{k} =>pK-1=0(1) p=k1=>pK1=0(1)
另外根据①,也就是式(4.8)有以下关系:
p N − ∑ i = 1 N I ( y i = c k ) = 0 ( 2 ) pN-\sum_{i=1}^NI(y_i=c_k)=0(2) pNi=1NI(yi=ck)=02
注意:严格来讲,上面(1)(2)中p,并不是同一个p
(1)中的p指的是样本分布绝对均匀的情况
(2)中的p是根据实际样本分布得到的数值
( 1 ) ⋅ λ + ( 2 ) = 0 (1)·λ+(2)=0 (1)λ+(2)=0

λ ( p K − 1 ) + p N − ∑ i = 1 N I ( y i = c k ) = 0 λ(pK-1)+pN-\sum_{i=1}^NI(y_i=c_k)=0 λ(pK1)+pNi=1NI(yi=ck)=0
P ( Y = c k ) = λ + ∑ i = 1 N I ( y i = c k ) λ K + N P(Y=c_k) =\frac {λ+\sum_{i=1}^NI(y_i=c_k)} {λK+N} P(Y=ck)=λK+Nλ+i=1NI(yi=ck)
(4.11)证明完毕
###############################

#######下面开始证明(4.10)########
根据(4.9)已知极大似然估计为:
p = P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) p=P(X^{(j)}=a_{jl}|Y=c_k) =\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)} {\sum_{i=1}^NI(y_i=c_k)} p=P(X(j)=ajlY=ck)=i=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)
= > => => p ∑ i = 1 N I ( y i = c k ) − ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) = 0 p{\sum_{i=1}^NI(y_i=c_k)}-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0 pi=1NI(yi=ck)i=1NI(xi(j)=ajl,yi=ck)=0(3)

可以看到(4.10)与(4.9)十分相似,
但是(4.10)比(4.9)的分子和分母多了平滑项。
我们引入一个平滑条件:
Y = c k Y=c_k Y=ck时,因为一个属性会有 S j S_j Sj种取值,我们假设任意属性的各个取值的对应的样本数量是一致的。
那么就有以下关系:
p = P ( X ( j ) = a j l ∣ Y = c k ) = 1 S j p=P(X^{(j)}=a_{jl}|Y=c_k)=\frac{1}{S_j} p=P(X(j)=ajlY=ck)=Sj1
= > => => p ⋅ S j − 1 = 0 ( 4 ) p·S_j-1=0(4) pSj1=0(4)
= > => => ( 3 ) + λ ( 4 ) = 0 (3)+λ(4)=0 (3)+λ(4)=0
= > => =>
( 3 ) + λ ( 4 ) = p [ ∑ i = 1 N I ( y i = c k ) + S j ⋅ λ ] − λ − ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) = 0 (3)+λ(4)=p[{\sum_{i=1}^NI(y_i=c_k)}+S_j·λ]-λ-{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)}=0 (3)+λ(4)=p[i=1NI(yi=ck)+Sjλ]λi=1NI(xi(j)=ajl,yi=ck)=0
= > => =>

p = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ ∑ i = 1 N I ( y i = c k ) + S j ⋅ λ p=\frac {\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=c_k)+λ} {\sum_{i=1}^NI(y_i=c_k)+S_j·λ} p=i=1NI(yi=ck)+Sjλi=1NI(xi(j)=ajl,yi=ck)+λ
(4.10)证明完毕

你可能感兴趣的:(机器学习算法)