已知最大熵模型为 P w ( y ∣ x ) = 1 Z w ( x ) e x p ( ∑ i = 1 n w i f i ( x , y ) ) P_{w}(y|x)=\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big) Pw(y∣x)=Zw(x)1exp(i=1∑nwifi(x,y))其中, Z w ( x ) = ∑ y e x p ( ∑ i = 1 n w i f i ( x , y ) ) Z_{w}(x)=\sum_{y}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big) Zw(x)=y∑exp(i=1∑nwifi(x,y))对数似然函数为 L ( w ) = ∑ x , y P ~ ( x , y ) ∑ i = 1 n w i f i ( x , y ) − ∑ x P ~ ( x ) log Z w ( x ) L(w)=\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^nw_if_i(x,y)-\sum_{x}\tilde{P}(x)\log{Z_{w}(x)} L(w)=x,y∑P~(x,y)i=1∑nwifi(x,y)−x∑P~(x)logZw(x)
推导过程:
对于给定的经验分布 P ~ ( x , y ) \tilde{P}(x,y) P~(x,y),模型参数从 w w w到 w + δ w+\delta w+δ,对数似然函数的改变量是 L ( w + δ ) − L ( w ) = ∑ x , y P ~ ( x , y ) log P w + δ ( y ∣ x ) − ∑ x , y P ~ ( x , y ) log P w ( y ∣ x ) L(w+\delta)-L(w)=\sum_{x,y}\tilde{P}(x,y)\log{P_{w+\delta}(y|x)}-\sum_{x,y}\tilde{P}(x,y)\log{P_w(y|x)} L(w+δ)−L(w)=x,y∑P~(x,y)logPw+δ(y∣x)−x,y∑P~(x,y)logPw(y∣x) = ∑ x , y P ~ ( x , y ) log ( 1 Z w + δ ( x ) e x p ( ∑ i = 1 n ( w i + δ i ) f i ( x , y ) ) ) − ∑ x , y P ~ ( x , y ) log ( 1 Z w ( x ) e x p ( ∑ i = 1 n w i f i ( x , y ) ) ) =\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w+\delta}(x)}exp\Big(\sum_{i=1}^n({w_{i}+\delta_{i}})f_{i}(x,y)\Big)\bigg)-\sum_{x,y}\tilde{P}(x,y)\log{\bigg(\frac{1}{Z_{w}(x)}exp\Big(\sum_{i=1}^nw_{i}f_{i}(x,y)\Big)\bigg)}} =x,y∑P~(x,y)log(Zw+δ(x)1exp(i=1∑n(wi+δi)fi(x,y)))−x,y∑P~(x,y)log(Zw(x)1exp(i=1∑nwifi(x,y))) = ∑ x , y P ~ ( x , y ) ( log 1 Z w + δ ( x ) + ∑ i = 1 n ( ( w i + δ i ) f i ( x , y ) ) ) − ∑ x , y P ~ ( x , y ) ( log 1 Z w ( x ) + ∑ i = 1 n ( w i f i ( x , y ) ) ) =\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w+\delta}(x)}}+\sum_{i=1}^n((w_{i}+\delta_{i})f_{i}(x,y))\Big)-\sum_{x,y}\tilde{P}(x,y)\Big(\log{\frac{1}{Z_{w}(x)}}+\sum_{i=1}^n(w_{i}f_{i}(x,y))\Big) =x,y∑P~(x,y)(logZw+δ(x)1+i=1∑n((wi+δi)fi(x,y)))−x,y∑P~(x,y)(logZw(x)1+i=1∑n(wifi(x,y))) = ∑ x , y P ~ ( x , y ) ∑ i = 1 n δ i f i ( x , y ) − ∑ x P ~ ( x ) log Z w + δ ( x ) Z w ( x ) =\sum_{x,y}\tilde{P}(x,y)\sum_{i=1}^n\delta_{i}f_{i}(x,y)-\sum_{x}\tilde{P}(x)\log{\frac{Z_{w+\delta}(x)}{Z_{w}(x)}} =x,y∑P~(x,y)i=1∑nδifi(x,y)−x∑P~(x)logZw(x)Zw+δ(x)
参考:
《统计学习方法》,李航,p89