given coefficient matrix C,
min A 1 2 ∥ ∣ C ∣ − η A ∥ F 2 s . t . A ∈ Ω n , d i a g ( C ) = 0 ( 1 ) \min_\textbf A \frac{1}{2}\||\textbf C|-\eta \textbf A\|_F^2\quad s.t. \ {\bf A\in\Omega}_n, \mathrm {diag}(\textbf C)=0 \quad \quad (1) Amin21∥∣C∣−ηA∥F2s.t. A∈Ωn,diag(C)=0(1)
where Ω n {\bf \Omega}_n Ωn is doubly stochastic space. We have:
1 2 ∥ ∣ C ∣ − η A ∥ F 2 = 1 2 ∥ C ∥ F 2 + η 2 2 ∥ A ∥ F 2 + ⟨ − ∣ C ∣ , η A ⟩ ( 2 ) \frac{1}{2}\||\textbf C|-\eta \textbf A\|_F^2= \frac{1}{2}\|\textbf C\|_F^2+ \frac{\eta^2}{2}\|\textbf A\|_F^2 +\langle {\bf -|C|, \eta A} \rangle \quad (2) 21∥∣C∣−ηA∥F2=21∥C∥F2+2η2∥A∥F2+⟨−∣C∣,ηA⟩(2)
Since C \bf C C is fixed in A-DSSC, we are acturally optimizing:
min A ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 s . t . A ∈ Ω n ( 3 ) \min_\textbf A \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 \quad s.t. \ {\bf A\in\Omega}_n \quad \quad \quad \quad \quad (3) Amin⟨−∣C∣,A⟩+2η∥A∥F2s.t. A∈Ωn(3)
Introducing Lagrange multipliers α , β ∈ R n \alpha, \beta \in \mathbb R^n α,β∈Rn and A ≥ 0 A\geq 0 A≥0 for satisfying the doubly stochastic constraint, then we have a minmax problem:
min A ≥ 0 max α , β ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 + ⟨ α , A 1 − 1 ⟩ + ⟨ β , A ⊤ 1 − 1 ⟩ ( 4 ) \min_{\textbf A\geq 0} \max_{\alpha, \beta} \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 + \langle \alpha,{\bf A1-1} \rangle + \langle \beta,{\bf A^\top 1-1} \rangle \quad \quad(4) A≥0minα,βmax⟨−∣C∣,A⟩+2η∥A∥F2+⟨α,A1−1⟩+⟨β,A⊤1−1⟩(4)
内积是拉格朗日法实现矩阵约束的标准表示形式,优化 α \alpha α用于满足行和为1约束,优化 β \beta β用于满足列和为1约束
Note that
⟨ α , A 1 − 1 ⟩ + ⟨ β , A ⊤ 1 − 1 ⟩ = ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ − 1 ⊤ ( α + β ) \langle \alpha,{\bf A1-1} \rangle + \langle \beta,{\bf A^\top 1-1} \rangle=\langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle-\textbf1^\top(\alpha+\beta) ⟨α,A1−1⟩+⟨β,A⊤1−1⟩=⟨α1⊤+1β⊤,A⟩−1⊤(α+β)
P.S. ⟨ α , A 1 ⟩ = t r ( α ⊤ ⋅ A 1 ) = t r ( A 1 ⋅ α ⊤ ) = t r ( A ⋅ 1 α ⊤ ) = t r ( 1 α ⊤ ⋅ A ) = ⟨ α 1 ⊤ , A ⟩ \langle \alpha,{\bf A1}\rangle=tr(\alpha^\top \cdot {\bf A1})=tr({\bf A1} \cdot \alpha^\top)=tr({\bf A \cdot 1}\alpha^\top)=tr({\bf 1}\alpha^\top \cdot \bf A)=\langle \alpha \textbf 1^\top ,\textbf A \rangle ⟨α,A1⟩=tr(α⊤⋅A1)=tr(A1⋅α⊤)=tr(A⋅1α⊤)=tr(1α⊤⋅A)=⟨α1⊤,A⟩
Therefore, strong duality holds by Slater’s condition, so this is equivalent to:
max α , β − 1 ⊤ ( α + β ) + min A ≥ 0 ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 + ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ ( 5 ) \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta) + \min_{\textbf A\geq 0} \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 + \langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle \quad (5) α,βmax−1⊤(α+β)+A≥0min⟨−∣C∣,A⟩+2η∥A∥F2+⟨α1⊤+1β⊤,A⟩(5)
Let K = ∣ C ∣ − α 1 ⊤ − 1 β ⊤ \bf K=|C|-\alpha 1^\top-1\beta^\top K=∣C∣−α1⊤−1β⊤, we have:
⟨ − ∣ C ∣ , A ⟩ + ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ = ⟨ − ∣ C ∣ + α 1 ⊤ + 1 β ⊤ , A ⟩ = ⟨ − K , A ⟩ \bf \langle -|C|,A\rangle+\langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle=\bf \langle -|C|+\alpha \textbf 1^\top + \textbf 1\beta^\top,A \rangle=\langle -K,A\rangle ⟨−∣C∣,A⟩+⟨α1⊤+1β⊤,A⟩=⟨−∣C∣+α1⊤+1β⊤,A⟩=⟨−K,A⟩
Therefore, the inner min \min min term becomes:
η ⋅ min A ≥ n ⟨ − K η , A ⟩ + η 2 ∥ A ∥ F 2 ( 6 ) \eta\cdot{\bf \min_{A\geq n}\langle -\frac{K}{\eta},A\rangle} + \frac{\eta}{2}\|\textbf A\|_F^2 \quad \quad \quad \quad \quad (6) η⋅A≥nmin⟨−ηK,A⟩+2η∥A∥F2(6)
we can complement Eqn.(6) to a F-norm form:
( 6 ) = − 1 2 η ∥ K ∥ F 2 + η min A ≥ n 1 2 ∥ K η − A ∥ F 2 ( 7 ) (6)=-\frac{1}{2\eta}\|\textbf K\|_F^2+\eta\min_{A\geq n}\frac{1}{2}\|{\bf \frac{K}{\eta}-A}\|_F^2 \quad (7) (6)=−2η1∥K∥F2+ηA≥nmin21∥ηK−A∥F2(7)
Apparently,the optimal A \bf A A satisfies A = K / η \bf A=K/\eta A=K/η,but requires A ≥ 0 \bf A\geq 0 A≥0,therefore, A \bf A A is given as:
A = 1 η [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + {\bf A}=\frac{1}{\eta}{\bf[|C|-\alpha 1^\top-1\beta^\top]_+} A=η1[∣C∣−α1⊤−1β⊤]+
Therefore, we have
( 7 ) = − 1 2 η ∥ K ∥ F 2 + 1 2 η ∥ K − ∥ F 2 = − 1 2 η ∥ K + ∥ F 2 ( 8 ) (7)=-\frac{1}{2\eta}\|\textbf K\|_F^2+\frac{1}{2\eta}\|\textbf K_-\|_F^2=-\frac{1}{2\eta}\|\textbf K_+\|_F^2 \quad (8) (7)=−2η1∥K∥F2+2η1∥K−∥F2=−2η1∥K+∥F2(8)
Finally, the version of the dual becomes (See Eqn.5-8):
max α , β − 1 ⊤ ( α + β ) − 1 2 η ∥ K + ∥ F 2 \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta)-\frac{1}{2\eta}\|K_+\|_F^2 α,βmax−1⊤(α+β)−2η1∥K+∥F2
i.e.,
max α , β − 1 ⊤ ( α + β ) − 1 2 η ∥ [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + ∥ F 2 \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta)-\frac{1}{2\eta}\|{\bf[|C|-\alpha 1^\top-1\beta^\top]_+}\|_F^2 α,βmax−1⊤(α+β)−2η1∥[∣C∣−α1⊤−1β⊤]+∥F2
i.e.,
min α , β 1 ⊤ ( α + β ) + 1 2 η ∥ [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + ∥ F 2 \min_{\alpha, \beta} \textbf1^\top(\alpha+\beta) + \frac{1}{2\eta}\|{\bf[|C|-\alpha 1^\top-1\beta^\top]_+}\|_F^2 α,βmin1⊤(α+β)+2η1∥[∣C∣−α1⊤−1β⊤]+∥F2