假设 A ∈ S + + n \mathbf{A} \in {\mathbb{S}_{++}^n} A∈S++n, B ∈ S + + n \mathbf{B} \in {\mathbb{S}_{++}^n} B∈S++n,其特征值分别为:
λ 1 ( A ) ≥ λ 2 ( A ) ≥ ⋯ ≥ λ n ( A ) > 0 , λ 1 ( B ) ≥ λ 2 ( B ) ≥ ⋯ ≥ λ n ( B ) > 0 \lambda_1(\mathbf{A}) \geq \lambda_2(\mathbf{A}) \geq \cdots \geq \lambda_n(\mathbf{A}) >0,\lambda_1(\mathbf{B}) \geq \lambda_2(\mathbf{B}) \geq \cdots \geq \lambda_n(\mathbf{B})>0 λ1(A)≥λ2(A)≥⋯≥λn(A)>0,λ1(B)≥λ2(B)≥⋯≥λn(B)>0
有如不等式成立:
∑ i = 1 n λ i ( A ) λ n − i + 1 ( B ) ≤ t r ( A B ) = ∑ i = 1 n λ i ( A B ) ≤ ∑ i = 1 n λ i ( A ) λ i ( B ) \sum_{i=1}^n \lambda_i(\mathbf{A})\lambda_{n-i+1}(\mathbf{B})\leq\mathrm{tr}(\mathbf{AB})=\sum_{i=1}^n \lambda_i(\mathbf{A B})\leq \sum_{i=1}^n \lambda_i(\mathbf{A})\lambda_{i}(\mathbf{B}) i=1∑nλi(A)λn−i+1(B)≤tr(AB)=i=1∑nλi(AB)≤i=1∑nλi(A)λi(B)
该不等式称为:冯诺依曼不等式。
本文主要证明冯诺依曼迹不等式的左边部分,右边不等式较为容易理解。
在证明之前需要引入一个重要的柯西中值定理(Cauchy Interlacing Theorem):
λ i + 1 ( C ) ≤ λ i ( C ~ ) ≤ λ i ( C ) \lambda_{i+1}(\mathbf{C}) \leq \lambda_i(\widetilde{\mathbf{C}}) \leq \lambda_i(\mathbf{C}) λi+1(C)≤λi(C )≤λi(C)
其中 C ~ ∈ S + + n − 1 \widetilde{\mathbf{C}} \in {\mathbb{S}_{++}^{n-1}} C ∈S++n−1是 C ∈ S + + n \mathbf{C} \in {\mathbb{S}_{++}^n} C∈S++n n − 1 n-1 n−1维度的主子矩阵(Principal submatrix)。
证:
存在一个正交矩阵 P ∈ R ( n ) × ( n − 1 ) \mathbf{P} \in \mathbb{R}^{(n) \times(n-1)} P∈R(n)×(n−1)使得 P T C P = C ~ \mathbf{P}^{\mathrm{T}} \mathbf{C P}=\widetilde{\mathbf{C}} PTCP=C ,对于 i ≤ n − 1 i\leq n-1 i≤n−1基于Courant-Fischer theorem,有:
λ i ( C ~ ) = max S i ⊆ R n − 1 min x ∈ S i , ∥ x ∥ 2 = 1 x T C ~ x = max S i ⊆ R n − 1 min x ∈ S i , ∥ x ∥ 2 = 1 ( P x ) T C ( P x ) ≤ max P i ∈ R n min y ∈ P i , ∥ y ∥ 2 = 1 y T C y = λ i ( C ) \lambda_i(\widetilde{\mathbf{C}})=\max _{\mathcal{S}_i \subseteq \mathbb{R}^{n-1}} \min _{\mathbf{x} \in \mathcal{S}_i,\|\mathbf{x}\|_2=1} \mathbf{x}^{\mathrm{T}} \widetilde{\mathbf{C}} \mathbf{x}=\max _{\mathcal{S}_i \subseteq \mathbb{R}^{n-1}} \min _{\mathbf{x} \in \mathcal{S}_i,\|\mathbf{x}\|_2=1}(\mathbf{P x})^{\mathrm{T}} \mathbf{C}(\mathbf{P x})\leq \max _{\mathcal{P}_i \in \mathbb{R}^n} \min _{\mathbf{y} \in \mathcal{P}_i,\|\boldsymbol{y}\|_2=1} \mathbf{y}^{\mathrm{T}} \mathbf{C y}=\lambda_i(\mathbf{C}) λi(C )=Si⊆Rn−1maxx∈Si,∥x∥2=1minxTC x=Si⊆Rn−1maxx∈Si,∥x∥2=1min(Px)TC(Px)≤Pi∈Rnmaxy∈Pi,∥y∥2=1minyTCy=λi(C)
同时有:
λ i ( C ~ ) = min S n − i ∈ R n − 1 max x ∈ S n − i , ∥ x ∥ 2 = 1 x T C ~ x = min S n − i ⊆ R n − 1 max x ∈ S n − i , ∥ x ∥ 2 = 1 ( P x ) T C ( P x ) ≥ min P n − i ∈ R n max y ∈ P n − i , ∥ y ∥ 2 = 1 y T C y = λ i + 1 ( C ) \lambda_i(\widetilde{\mathbf{C}})=\min _{\mathcal{S}_{n-i} \in \mathbb{R}^{n-1}} \max _{\mathbf{x} \in \mathcal{S}_{n-i},\|\mathbf{x}\|_2=1} \mathbf{x}^{\mathrm{T}} \widetilde{\mathbf{C}} \mathbf{x}=\min _{\mathcal{S}_{n-i} \subseteq \mathbb{R}^{n-1}} \max _{\mathbf{x} \in \mathcal{S}_{n-i},\|\mathbf{x}\|_2=1}(\mathbf{P x})^{\mathrm{T}} \mathbf{C}(\mathbf{P x})\geq\min _{\mathcal{P}_{n-i} \in \mathbb{R}^n} \max _{\mathbf{y} \in \mathcal{P}_{n-i},\|y\|_2=1} \mathbf{y}^{\mathrm{T}} \mathbf{C y}=\lambda_{i+1}(\mathbf{C}) λi(C )=Sn−i∈Rn−1minx∈Sn−i,∥x∥2=1maxxTC x=Sn−i⊆Rn−1minx∈Sn−i,∥x∥2=1max(Px)TC(Px)≥Pn−i∈Rnminy∈Pn−i,∥y∥2=1maxyTCy=λi+1(C)
综上:
λ i + 1 ( C ) ≤ λ i ( C ~ ) ≤ λ i ( C ) \lambda_{i+1}(\mathbf{C}) \leq \lambda_i(\widetilde{\mathbf{C}}) \leq \lambda_i(\mathbf{C}) λi+1(C)≤λi(C )≤λi(C)
证毕。
基于上面的柯西中值定理,然后利用数学归纳法证明:
∑ i = 1 n − 1 ( λ i ( Λ A ) − λ n ( Λ A ) ) C i , i ≥ ∑ i = 1 n − 1 ( λ i ( Λ A ) − λ n ( Λ A ) ) λ n − i ( C ~ ) \sum_{i=1}^{n-1}( \lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}}))\mathbf{C}_{i, i}\geq\sum_{i=1}^{n-1}(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})) \lambda_{n-i}(\widetilde{\mathbf{C}}) i=1∑n−1(λi(ΛA)−λn(ΛA))Ci,i≥i=1∑n−1(λi(ΛA)−λn(ΛA))λn−i(C )
证明:
定义对角矩阵 Λ A ′ ∈ S + + k + 1 \boldsymbol{\Lambda}^{'}_{\mathbf{A}} \in {\mathbb{S}_{++}^{k+1}} ΛA′∈S++k+1和 C ′ ∈ S + + k + 1 \mathbf{C}^{'} \in {\mathbb{S}_{++}^{k+1}} C′∈S++k+1,其中对角阵 Λ A ∈ S + + k \boldsymbol{\Lambda}_{\mathbf{A}}\in \mathbb{S}_{++}^{k} ΛA∈S++k和 C ∈ S + + k \mathbf{C}\in \mathbb{S}_{++}^{k} C∈S++k分别是 Λ A ′ \boldsymbol{\Lambda}^{'}_{\mathbf{A}} ΛA′和 C ′ \mathbf{C}^{'} C′的 k k k维主子矩阵,因此对于任意的 1 ≤ i ≤ k 1 \leq i \leq k 1≤i≤k,有 λ i ( Λ A ′ ) = λ i ( Λ A ) \lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})=\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}}) λi(ΛA′)=λi(ΛA)和 C i , i ′ = C i , i \mathbf{C}_{i, i}^{\prime}=\mathbf{C}_{i, i} Ci,i′=Ci,i成立。
当 n = 2 n=2 n=2时,显然成立。
假定当 n = k n=k n=k时成立,有:
∑ i = 1 k − 1 ( λ i ( Λ A ) − λ k ( Λ A ) ) C i , i ≥ ∑ i = 1 k − 1 ( λ i ( Λ A ) − λ k ( Λ A ) ) λ k − i ( C ~ ) \sum_{i=1}^{k-1}\left(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}})\right) \mathbf{C}_{i, i} \geq \sum_{i=1}^{k-1}\left(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}})\right) \lambda_{k-i}(\widetilde{\mathbf{C}}) i=1∑k−1(λi(ΛA)−λk(ΛA))Ci,i≥i=1∑k−1(λi(ΛA)−λk(ΛA))λk−i(C )
则当 n = k + 1 n=k+1 n=k+1时有:
∑ i = 1 k ( λ i ( Λ A ′ ) − λ k + 1 ( Λ A ′ ) ) C i , i ′ = ∑ i = 1 k ( λ i ( Λ A ′ ) − λ k ( Λ A ′ ) + λ k ( Λ A ′ ) − λ k + 1 ( A ′ ) ) C i , i ′ = ∑ i = 1 k − 1 ( λ i ( Λ A ′ ) − λ k ( Λ A ′ ) ) C i , i ′ + ( λ k ( Λ A ′ ) − λ k + 1 ( Λ A ′ ) ) ∑ i = 1 k C i , i ′ = ∑ i = 1 k − 1 ( λ i ( Λ A ) − λ k ( Λ A ) ) C i , i + ( λ k ( Λ A ) − λ k + 1 ( Λ A ′ ) ) ∑ i = 1 k C i , i ′ ≥ ∑ i = 1 k − 1 ( λ i ( Λ A ) − λ k ( Λ A ) ) λ k − i ( C ~ ) + ( λ k ( Λ A ) − λ k + 1 ( Λ A ′ ) ) ∑ i = 1 k C i , i ′ ≥ ∑ i = 1 k − 1 ( λ i ( Λ A ′ ) − λ k ( Λ A ′ ) ) λ k − i + 1 ( C ) + ( λ k ( Λ A ′ ) − λ k + 1 ( Λ A ′ ) ) ∑ i = 1 k λ k − i + 1 ( C ) = ∑ i = 1 k ( λ i ( Λ A ′ ) − λ k + 1 ( Λ A ′ ) ) λ k − i + 1 ( C ) \begin{aligned} & \sum_{i=1}^k(\lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \mathbf{C}_{i, i}^{\prime} \\ =& \sum_{i=1}^k(\lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )+\lambda_{k}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )-\lambda_{k+1}(\mathbf{A}^{\prime})) \mathbf{C}_{i, i}^{\prime} \\ =& \sum_{i=1}^{k-1}(\lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \mathbf{C}_{i, i}^{\prime}+(\lambda_k(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \sum_{i=1}^k \mathbf{C}_{i, i}^{\prime}\\=& \sum_{i=1}^{k-1}(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_{k}(\boldsymbol{\Lambda}_{\mathbf{A}} )) \mathbf{C}_{i, i}+(\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \sum_{i=1}^k \mathbf{C}_{i, i}^{\prime}\\\geq&\sum_{i=1}^{k-1}(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}})) \lambda_{k-i}(\widetilde{\mathbf{C}})+(\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \sum_{i=1}^k \mathbf{C}_{i, i}^{\prime}\\\geq&\sum_{i=1}^{k-1}(\lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \lambda_{k-i+1}({\mathbf{C}})+(\lambda_k(\boldsymbol{\Lambda}_{\mathbf{A}}^{\prime})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \sum_{i=1}^k \lambda_{k-i+1}(\mathbf{C})\\=&\sum_{i=1}^{k}(\lambda_i(\boldsymbol{\Lambda}^{'}_{\mathbf{A}})-\lambda_{k+1}(\boldsymbol{\Lambda}^{'}_{\mathbf{A}} )) \lambda_{k-i+1}({\mathbf{C}}) \end{aligned} ===≥≥=i=1∑k(λi(ΛA′)−λk+1(ΛA′))Ci,i′i=1∑k(λi(ΛA′)−λk(ΛA′)+λk(ΛA′)−λk+1(A′))Ci,i′i=1∑k−1(λi(ΛA′)−λk(ΛA′))Ci,i′+(λk(ΛA′)−λk+1(ΛA′))i=1∑kCi,i′i=1∑k−1(λi(ΛA)−λk(ΛA))Ci,i+(λk(ΛA)−λk+1(ΛA′))i=1∑kCi,i′i=1∑k−1(λi(ΛA)−λk(ΛA))λk−i(C )+(λk(ΛA)−λk+1(ΛA′))i=1∑kCi,i′i=1∑k−1(λi(ΛA′)−λk(ΛA′))λk−i+1(C)+(λk(ΛA′)−λk+1(ΛA′))i=1∑kλk−i+1(C)i=1∑k(λi(ΛA′)−λk+1(ΛA′))λk−i+1(C)
证毕。
对 A \mathbf{A} A和 A \mathbf{A} A做特征值分解分别有: A = U Λ A U T \mathbf{A}=\mathbf{U} \boldsymbol{\Lambda}_{\mathbf{A}} \mathbf{U}^{\mathrm{T}} A=UΛAUT和 B = V Λ B V T \mathbf{B}=\mathbf{V} \boldsymbol{\Lambda}_{\mathbf{B}} \mathbf{V}^{\mathrm{T}} B=VΛBVT,则:
tr ( A B ) = tr ( U Λ A U T V Λ B V T ) = tr ( Λ A U T V ⏟ Q Λ B V T U ⏟ Q T ) = tr ( Λ A Q Λ B Q T ⏟ C ) = tr ( Λ A C ) = ∑ i = 1 n λ i ( Λ A ) C i , i = ∑ i = 1 n λ i ( A ) C i , i \operatorname{tr}(\mathbf{A B})=\operatorname{tr}\left(\mathbf{U} \boldsymbol{\Lambda}_{\mathbf{A}} \mathbf{U}^{\mathrm{T}} \mathbf{V} \boldsymbol{\Lambda}_{\mathbf{B}} \mathbf{V}^{\mathrm{T}}\right)=\operatorname{tr}(\boldsymbol{\Lambda}_{\mathbf{A}} \underbrace{\mathbf{U}^{\mathrm{T}} \mathbf{V}}_{\mathbf{Q}} \boldsymbol{\Lambda}_{\mathbf{B}} \underbrace{\mathbf{V}^{\mathrm{T}} \mathbf{U}}_{\mathbf{Q}^{\mathrm{T}}})=\operatorname{tr}(\underbrace{\boldsymbol{\Lambda}_{\mathrm{A}} \mathbf{Q} \boldsymbol{\Lambda}_{\mathbf{B}} \mathbf{Q}^{\mathrm{T}}}_{\mathrm{C}})=\operatorname{tr}\left(\boldsymbol{\Lambda}_{\mathbf{A}} \mathbf{C}\right)=\sum_{i=1}^n \lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}}) \mathbf{C}_{i, i}=\sum_{i=1}^n \lambda_i(\mathbf{A}) \mathbf{C}_{i, i} tr(AB)=tr(UΛAUTVΛBVT)=tr(ΛAQ UTVΛBQT VTU)=tr(C ΛAQΛBQT)=tr(ΛAC)=i=1∑nλi(ΛA)Ci,i=i=1∑nλi(A)Ci,i
不难发现, Λ A \boldsymbol{\Lambda}_{\mathbf{A}} ΛA和 A \mathbf{A} A有相同的特征值, C \mathbf{C} C和 B \mathbf{B} B有相同的特征值。
∑ i = 1 n λ i ( Λ A ) C i , i = ∑ i = 1 n − 1 λ i ( Λ A ) C i , i + λ n ( Λ A ) ( ∑ i = 1 n C i , i − ∑ i = 1 n − 1 C i , i ) = ∑ i = 1 n − 1 ( λ i ( Λ A ) − λ n ( Λ A ) ) C i , i + λ n ( Λ A ) ∑ i = 1 n C i , i ≥ ∑ i = 1 n − 1 ( λ i ( Λ A ) − λ n ( Λ A ) ) λ n − i ( C ~ ) + λ n ( Λ A ) ∑ i = 1 n C i , i ≥ ∑ i = 1 n − 1 ( λ i ( Λ A ) − λ n ( Λ A ) ) λ n − i + 1 ( C ) + λ n ( Λ A ) ∑ i = 1 n λ n − i + 1 ( C ) = ∑ i = 1 n λ i ( Λ A ) λ n − i + 1 ( C ) = ∑ i = 1 n λ i ( A ) λ n − i + 1 ( B ) \begin{aligned} \sum_{i=1}^n \lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}}) \mathbf{C}_{i, i}&=\sum_{i=1}^{n-1}\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}}) \mathbf{C}_{i, i}+\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}}) (\sum_{i=1}^n\mathbf{C}_{i, i}-\sum_{i=1}^{n-1}\mathbf{C}_{i, i})\\&=\sum_{i=1}^{n-1}( \lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}}))\mathbf{C}_{i, i}+\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})\sum_{i=1}^n\mathbf{C}_{i, i}\\&\geq \sum_{i=1}^{n-1}(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})) \lambda_{n-i}(\widetilde{\mathbf{C}})+\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})\sum_{i=1}^n\mathbf{C}_{i, i}\\&\geq\sum_{i=1}^{n-1}(\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}})-\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})) \lambda_{n-i+1}({\mathbf{C}})+\lambda_n(\boldsymbol{\Lambda}_{\mathbf{A}})\sum_{i=1}^n\lambda_{n-i+1}(\mathbf{C})\\&=\sum_{i=1}^{n}\lambda_i(\boldsymbol{\Lambda}_{\mathbf{A}}) \lambda_{n-i+1}({\mathbf{C}})=\sum_{i=1}^n \lambda_i(\mathbf{A})\lambda_{n-i+1}(\mathbf{B}) \end{aligned} i=1∑nλi(ΛA)Ci,i=i=1∑n−1λi(ΛA)Ci,i+λn(ΛA)(i=1∑nCi,i−i=1∑n−1Ci,i)=i=1∑n−1(λi(ΛA)−λn(ΛA))Ci,i+λn(ΛA)i=1∑nCi,i≥i=1∑n−1(λi(ΛA)−λn(ΛA))λn−i(C )+λn(ΛA)i=1∑nCi,i≥i=1∑n−1(λi(ΛA)−λn(ΛA))λn−i+1(C)+λn(ΛA)i=1∑nλn−i+1(C)=i=1∑nλi(ΛA)λn−i+1(C)=i=1∑nλi(A)λn−i+1(B)
即: tr ( A B ) ≥ ∑ i = 1 n λ i ( A ) λ n − i + 1 ( B ) \operatorname{tr}(\mathbf{A B})\geq\sum_{i=1}^n \lambda_i(\mathbf{A})\lambda_{n-i+1}(\mathbf{B}) tr(AB)≥∑i=1nλi(A)λn−i+1(B)。