Notations
Def. Max Entropy(ME)
max P ∈ P H ( X ) E P ( f ) = E P ~ ( f ) ( ⋆ ) \max_{P\in \mathcal{P}} H(X)\\ E_P(f)=E_{\tilde{P}}(f) ~~~~~~~~~~~~(\star) P∈PmaxH(X)EP(f)=EP~(f) (⋆)
where f ( x ) f(x) f(x) are features.
Fact. P w ( x ) ∼ e ∑ j w j f j ( x ) P_w(x)\sim e^{\sum_jw_jf_j(x)} Pw(x)∼e∑jwjfj(x) is the solution to inf P L ( P , w ) \inf_P L(P, w) infPL(P,w), where L L L is the Laplacian of ( ⋆ \star ⋆), w w w is the Lagrange multiplier.
Laplacian function: the likelihood of the sample,
Ψ ( w ) : = L ( P w , w ) = − ln Z ( w ) + E P ~ ( f ) w = ∑ i ln P w ( x i ) \Psi(w):=L(P_w,w)\\ =-\ln Z(w)+E_{\tilde{P}}(f)w\\ =\sum_i \ln P_w(x_i) Ψ(w):=L(Pw,w)=−lnZ(w)+EP~(f)w=i∑lnPw(xi)
Dual problem: Max. likelihood estimation(MLE)
max w Ψ ( w ) \max_w \Psi(w) wmaxΨ(w)
where Ψ ( w ) : = ∑ i ln p ( x i ) \Psi(w):= \sum_{i} \ln p(x_i) Ψ(w):=∑ilnp(xi).
Fact. the dual of ME( ⋆ \star ⋆) is MLE( ⋆ ⋆ \star\star ⋆⋆).
Assume that P ( Y ∣ X ) P(Y|X) P(Y∣X) is a determinative model.
Def. Max Entropy for P ( Y ∣ X ) P(Y|X) P(Y∣X)
max P ∈ P H ( Y ∣ X ) E P ( f ) = E P ~ ( f ) \max_{P\in \mathcal{P}} H(Y|X)\\ E_P(f)=E_{\tilde{P}}(f) P∈PmaxH(Y∣X)EP(f)=EP~(f)
where f ( x , y ) f(x,y) f(x,y) are features, and P ( y ∣ x ) = P ~ ( x ) P ( y ∣ x ) P(y|x)=\tilde{P}(x)P(y|x) P(y∣x)=P~(x)P(y∣x)
Fact. P w ( y ∣ x ) ∼ e ∑ j w j f j ( x , y ) P_w(y|x)\sim e^{\sum_jw_jf_j(x,y)} Pw(y∣x)∼e∑jwjfj(x,y) is the solution to max L ( P , w ) L(P, w) L(P,w).
Laplacian function: Ψ ( w ) : = L ( P w , w ) = ∑ i ln P w ( y i ∣ x i ) \Psi(w):=L(P_w,w)=\sum_i \ln P_w(y_i|x_i) Ψ(w):=L(Pw,w)=∑ilnPw(yi∣xi), the conditional likelihood.
Dual problem (conditional MLE):
max w Ψ ( w ) \max_w \Psi(w) wmaxΨ(w)
where Ψ ( w ) : = ∑ i ln p ( y i ∣ x i ) \Psi(w):= \sum_{i} \ln p(y_i|x_i) Ψ(w):=∑ilnp(yi∣xi).
Exercise
plz consider ME for the generative model P ( X , Y ) P(X,Y) P(X,Y)