Derivation of Doubly Stochastic clustering

1. Original objective

given coefficient matrix C,
min ⁡ A 1 2 ∥ ∣ C ∣ − η A ∥ F 2 s . t .   A ∈ Ω n , d i a g ( C ) = 0 ( 1 ) \min_\textbf A \frac{1}{2}\||\textbf C|-\eta \textbf A\|_F^2\quad s.t. \ {\bf A\in\Omega}_n, \mathrm {diag}(\textbf C)=0 \quad \quad (1) Amin21∥∣CηAF2s.t. AΩn,diag(C)=0(1)
where Ω n {\bf \Omega}_n Ωn is doubly stochastic space. We have:
1 2 ∥ ∣ C ∣ − η A ∥ F 2 = 1 2 ∥ C ∥ F 2 + η 2 2 ∥ A ∥ F 2 + ⟨ − ∣ C ∣ , η A ⟩ ( 2 ) \frac{1}{2}\||\textbf C|-\eta \textbf A\|_F^2= \frac{1}{2}\|\textbf C\|_F^2+ \frac{\eta^2}{2}\|\textbf A\|_F^2 +\langle {\bf -|C|, \eta A} \rangle \quad (2) 21∥∣CηAF2=21CF2+2η2AF2+∣C∣,ηA(2)

Since C \bf C C is fixed in A-DSSC, we are acturally optimizing:
min ⁡ A ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 s . t .   A ∈ Ω n ( 3 ) \min_\textbf A \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 \quad s.t. \ {\bf A\in\Omega}_n \quad \quad \quad \quad \quad (3) Amin∣C∣,A+2ηAF2s.t. AΩn(3)

2. Dual objective

Introducing Lagrange multipliers α , β ∈ R n \alpha, \beta \in \mathbb R^n α,βRn and A ≥ 0 A\geq 0 A0 for satisfying the doubly stochastic constraint, then we have a minmax problem:
min ⁡ A ≥ 0 max ⁡ α , β ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 + ⟨ α , A 1 − 1 ⟩ + ⟨ β , A ⊤ 1 − 1 ⟩ ( 4 ) \min_{\textbf A\geq 0} \max_{\alpha, \beta} \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 + \langle \alpha,{\bf A1-1} \rangle + \langle \beta,{\bf A^\top 1-1} \rangle \quad \quad(4) A0minα,βmax∣C∣,A+2ηAF2+α,A11+β,A11(4)

内积是拉格朗日法实现矩阵约束的标准表示形式,优化 α \alpha α用于满足行和为1约束,优化 β \beta β用于满足列和为1约束

Note that
⟨ α , A 1 − 1 ⟩ + ⟨ β , A ⊤ 1 − 1 ⟩ = ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ − 1 ⊤ ( α + β ) \langle \alpha,{\bf A1-1} \rangle + \langle \beta,{\bf A^\top 1-1} \rangle=\langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle-\textbf1^\top(\alpha+\beta) α,A11+β,A11=α1+1β,A1(α+β)

P.S. ⟨ α , A 1 ⟩ = t r ( α ⊤ ⋅ A 1 ) = t r ( A 1 ⋅ α ⊤ ) = t r ( A ⋅ 1 α ⊤ ) = t r ( 1 α ⊤ ⋅ A ) = ⟨ α 1 ⊤ , A ⟩ \langle \alpha,{\bf A1}\rangle=tr(\alpha^\top \cdot {\bf A1})=tr({\bf A1} \cdot \alpha^\top)=tr({\bf A \cdot 1}\alpha^\top)=tr({\bf 1}\alpha^\top \cdot \bf A)=\langle \alpha \textbf 1^\top ,\textbf A \rangle α,A1=tr(αA1)=tr(A1α)=tr(A1α)=tr(1αA)=α1,A

Therefore, strong duality holds by Slater’s condition, so this is equivalent to:
max ⁡ α , β − 1 ⊤ ( α + β ) + min ⁡ A ≥ 0 ⟨ − ∣ C ∣ , A ⟩ + η 2 ∥ A ∥ F 2 + ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ ( 5 ) \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta) + \min_{\textbf A\geq 0} \langle {\bf -|C|, A} \rangle + \frac{\eta}{2}\|\textbf A\|_F^2 + \langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle \quad (5) α,βmax1(α+β)+A0min∣C∣,A+2ηAF2+α1+1β,A(5)

3. Search Optimal A and Dual Solution α , β \alpha,\beta α,β

Let K = ∣ C ∣ − α 1 ⊤ − 1 β ⊤ \bf K=|C|-\alpha 1^\top-1\beta^\top K=∣C∣α11β, we have:

⟨ − ∣ C ∣ , A ⟩ + ⟨ α 1 ⊤ + 1 β ⊤ , A ⟩ = ⟨ − ∣ C ∣ + α 1 ⊤ + 1 β ⊤ , A ⟩ = ⟨ − K , A ⟩ \bf \langle -|C|,A\rangle+\langle \alpha \textbf 1^\top + \textbf 1\beta^\top,\textbf A \rangle=\bf \langle -|C|+\alpha \textbf 1^\top + \textbf 1\beta^\top,A \rangle=\langle -K,A\rangle ∣C∣,A+α1+1β,A=∣C∣+α1+1β,A=K,A

Therefore, the inner min ⁡ \min min term becomes:

η ⋅ min ⁡ A ≥ n ⟨ − K η , A ⟩ + η 2 ∥ A ∥ F 2 ( 6 ) \eta\cdot{\bf \min_{A\geq n}\langle -\frac{K}{\eta},A\rangle} + \frac{\eta}{2}\|\textbf A\|_F^2 \quad \quad \quad \quad \quad (6) ηAnminηK,A+2ηAF2(6)

we can complement Eqn.(6) to a F-norm form:
( 6 ) = − 1 2 η ∥ K ∥ F 2 + η min ⁡ A ≥ n 1 2 ∥ K η − A ∥ F 2 ( 7 ) (6)=-\frac{1}{2\eta}\|\textbf K\|_F^2+\eta\min_{A\geq n}\frac{1}{2}\|{\bf \frac{K}{\eta}-A}\|_F^2 \quad (7) (6)=2η1KF2+ηAnmin21ηKAF2(7)

Apparently,the optimal A \bf A A satisfies A = K / η \bf A=K/\eta A=K/η,but requires A ≥ 0 \bf A\geq 0 A0,therefore, A \bf A A is given as:
A = 1 η [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + {\bf A}=\frac{1}{\eta}{\bf[|C|-\alpha 1^\top-1\beta^\top]_+} A=η1[∣C∣α11β]+

Therefore, we have
( 7 ) = − 1 2 η ∥ K ∥ F 2 + 1 2 η ∥ K − ∥ F 2 = − 1 2 η ∥ K + ∥ F 2 ( 8 ) (7)=-\frac{1}{2\eta}\|\textbf K\|_F^2+\frac{1}{2\eta}\|\textbf K_-\|_F^2=-\frac{1}{2\eta}\|\textbf K_+\|_F^2 \quad (8) (7)=2η1KF2+2η1KF2=2η1K+F2(8)

Finally, the version of the dual becomes (See Eqn.5-8):
max ⁡ α , β − 1 ⊤ ( α + β ) − 1 2 η ∥ K + ∥ F 2 \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta)-\frac{1}{2\eta}\|K_+\|_F^2 α,βmax1(α+β)2η1K+F2
i.e.,
max ⁡ α , β − 1 ⊤ ( α + β ) − 1 2 η ∥ [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + ∥ F 2 \max_{\alpha, \beta} -\textbf1^\top(\alpha+\beta)-\frac{1}{2\eta}\|{\bf[|C|-\alpha 1^\top-1\beta^\top]_+}\|_F^2 α,βmax1(α+β)2η1[∣C∣α11β]+F2
i.e.,
min ⁡ α , β 1 ⊤ ( α + β ) + 1 2 η ∥ [ ∣ C ∣ − α 1 ⊤ − 1 β ⊤ ] + ∥ F 2 \min_{\alpha, \beta} \textbf1^\top(\alpha+\beta) + \frac{1}{2\eta}\|{\bf[|C|-\alpha 1^\top-1\beta^\top]_+}\|_F^2 α,βmin1(α+β)+2η1[∣C∣α11β]+F2

你可能感兴趣的:(机器学习,算法)