KL Divergence

Entropy of distribution P is , which reflects the amount of uncertainty in P. Uniform distributions always have the largest entropy.
If we do not have prior knowledge about P and guess it to be Q, then we actually add extra uncertainty and have cross entropy as . In another aspect, cross entropy itself is a good alternative to MSE loss with sigmoid function as demonstrated.
The discrepancy between and is relative entropy, also known as KL divergence, formulated as .
KL divergence is non-negative proved by using Jensen’s Inequality. Besides, KL divergence is asymmetric (). However, we can define a symmetric variant as . More properties can be referred here.

你可能感兴趣的:(KL Divergence)