http://stats.stackexchange.com/questions/60680/kl-divergence-between-two-multivariate-gaussians
I give a detailed derivation process for the KL Divergence between two multivariate normal distributions.
Given p(x)∼N(μ1,Σ1),q(x)∼N(μ2,Σ2) , we need to solve the KL KL=∫[log(p(x))−log(q(x))] p(x) dx
Firstly, we have the multivariate normal distribution pdf,
p(x)=(2π)−k2|Σ1|−12exp{−12(x−μ1)TΣ−11(x−μ1)}
logp(x)=−k2log2π−12log|Σ1|−12(x−μ1)TΣ−11(x−μ1)
logq(x)=−k2log2π−12log|Σ2|−12(x−μ2)TΣ−12(x−μ2)
From the reference link above, we know a general derivation,
KL=∫[12log|Σ2||Σ1|−12(x−μ1)TΣ−11(x−μ1)+12(x−μ2)TΣ−12(x−μ2)]×p(x)dx=12log|Σ2||Σ1|−12tr {E[(x−μ1)(x−μ1)T] Σ−11}+12E[(x−μ2)TΣ−12(x−μ2)]=12log|Σ2||Σ1|−12tr {Id}+12(μ1−μ2)′Σ−12(μ1−μ2)+12tr{Σ−12Σ1}=12[log|Σ2||Σ1|−d+tr(Σ−12Σ1)+(μ2−μ1)TΣ−12(μ2−μ1)].
Here, I give the detailed process for ∫12(x−μ1)TΣ−11(x−μ1)p(x)dx and ∫12(x−μ2)TΣ−12(x−μ2)p(x)dx .
Trace and Expectation Tricks
Given x is a scalar value, E(x)=E(tr(x)) since the trace of a scalar value is the scalar itself. Specially, for the expectation of quadratic form, E(xTAx)=E(tr(xTAx))=E(tr(AxxT))=tr(E(AxxT)) based on the properties below. After adding the trace symbol, we can exchange the position within the quadratic form.
tr(AB)=tr(BA)
tr(ABC)=tr(BCA)
tr(ABC)=tr(CAB)
tr(ABC)≠tr(ACB)
E(tr(x))=(tr(E(x))
, Expectation symbol can be exchanged by trace.
Part 1
∫12(x−μ1)TΣ−11(x−μ1)p(x)dx=Ep(12(x−μ1)TΣ−11(x−μ1))the expectation here is related to p(x) rather than q(x)=Ep(tr(12(x−μ1)TΣ−11(x−μ1)))=Ep(tr(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep(12(x−μ1)(x−μ1)TΣ−11))=tr(Ep[(x−μ1)(x−μ1)T]12Σ−11)the definition of covariance matrix=tr(Σ112Σ−11)=tr(Id)=d
Part2
∫12(x−μ2)TΣ−12(x−μ2)×p(x)dx=∫12[(x−μ1)+(μ1−μ2)]TΣ−12[(x−μ1)+(μ1−μ2)]×p(x)dx=∫12{(x−μ1)TΣ−12(x−μ1)+2(x−μ1)TΣ−12(μ1−μ2)+(μ1−μ2)TΣ−12(μ1−μ2)}×p(x)dx=tr(Σ−12Σ1)+0+ (μ1−μ2)TΣ−12(μ1−μ2)