My Note of Maximum Entropy

Note for Maximum Entropy

Notations

  • P P P: model distr.
  • P ~ \tilde{P} P~: empirical/sample distr. (Dirac distr.)
  • { x i } \{x_i\} {xi}: sample
  • f j f_j fj: features
  • E P f E_Pf EPf: expectation under the distr. P P P

Maximum Entropy

Def. Max Entropy(ME)
max ⁡ P ∈ P H ( X ) E P ( f ) = E P ~ ( f )              ( ⋆ ) \max_{P\in \mathcal{P}} H(X)\\ E_P(f)=E_{\tilde{P}}(f) ~~~~~~~~~~~~(\star) PPmaxH(X)EP(f)=EP~(f)            ()
where f ( x ) f(x) f(x) are features.

Fact. P w ( x ) ∼ e ∑ j w j f j ( x ) P_w(x)\sim e^{\sum_jw_jf_j(x)} Pw(x)ejwjfj(x) is the solution to inf ⁡ P L ( P , w ) \inf_P L(P, w) infPL(P,w), where L L L is the Laplacian of ( ⋆ \star ), w w w is the Lagrange multiplier.

Laplacian function: the likelihood of the sample,
Ψ ( w ) : = L ( P w , w ) = − ln ⁡ Z ( w ) + E P ~ ( f ) w = ∑ i ln ⁡ P w ( x i ) \Psi(w):=L(P_w,w)\\ =-\ln Z(w)+E_{\tilde{P}}(f)w\\ =\sum_i \ln P_w(x_i) Ψ(w):=L(Pw,w)=lnZ(w)+EP~(f)w=ilnPw(xi)

Dual problem: Max. likelihood estimation(MLE)
max ⁡ w Ψ ( w ) \max_w \Psi(w) wmaxΨ(w)
where Ψ ( w ) : = ∑ i ln ⁡ p ( x i ) \Psi(w):= \sum_{i} \ln p(x_i) Ψ(w):=ilnp(xi).

Fact. the dual of ME( ⋆ \star ) is MLE( ⋆ ⋆ \star\star ).

ME for Machine learning/conditional likelihood

Assume that P ( Y ∣ X ) P(Y|X) P(YX) is a determinative model.

Def. Max Entropy for P ( Y ∣ X ) P(Y|X) P(YX)
max ⁡ P ∈ P H ( Y ∣ X ) E P ( f ) = E P ~ ( f ) \max_{P\in \mathcal{P}} H(Y|X)\\ E_P(f)=E_{\tilde{P}}(f) PPmaxH(YX)EP(f)=EP~(f)
where f ( x , y ) f(x,y) f(x,y) are features, and P ( y ∣ x ) = P ~ ( x ) P ( y ∣ x ) P(y|x)=\tilde{P}(x)P(y|x) P(yx)=P~(x)P(yx)

Fact. P w ( y ∣ x ) ∼ e ∑ j w j f j ( x , y ) P_w(y|x)\sim e^{\sum_jw_jf_j(x,y)} Pw(yx)ejwjfj(x,y) is the solution to max L ( P , w ) L(P, w) L(P,w).

Laplacian function: Ψ ( w ) : = L ( P w , w ) = ∑ i ln ⁡ P w ( y i ∣ x i ) \Psi(w):=L(P_w,w)=\sum_i \ln P_w(y_i|x_i) Ψ(w):=L(Pw,w)=ilnPw(yixi), the conditional likelihood.

Dual problem (conditional MLE):
max ⁡ w Ψ ( w ) \max_w \Psi(w) wmaxΨ(w)
where Ψ ( w ) : = ∑ i ln ⁡ p ( y i ∣ x i ) \Psi(w):= \sum_{i} \ln p(y_i|x_i) Ψ(w):=ilnp(yixi).


Exercise
plz consider ME for the generative model P ( X , Y ) P(X,Y) P(X,Y)

你可能感兴趣的:(机器学习,统计学习,概率统计,机器学习,统计学,最大熵,Laplacian,对偶,最大似然估计)