张量是多维数组的泛概念。一维数组我们通常称之为向量,二维数组我们通常称之为矩阵,但其实这些都是张量的一种。以此类推,我们也会有三维张量、四维张量以及五维张量。那么零维张量是什么呢?其实零维张量就是一个数。
< χ , y > = ∑ i 1 = 1 I 1 ∑ i 2 = 1 I 2 . . . ∑ i N = 1 I N x i 1 i 2 . . . i N y i 1 i 2 . . . i N <\chi,y>=\sum_{i_1=1}^{I_1} \sum_{i_2=1}^{I_2} ... \sum_{i_N=1}^{I_N} x_{i_1 i_2 ... i_N} y_{i_1 i_2 ... i_N} <χ,y>=i1=1∑I1i2=1∑I2...iN=1∑INxi1i2...iNyi1i2...iN
o b s e r v a t i o n [ I ] = i m a g e [ I c l e a n ] + n o i s e [ η ] observation[I] = image[I_{clean}]+noise[\eta] observation[I]=image[Iclean]+noise[η]
奇异值分解最早是Beltrami与1873年对实正方矩阵提出来的。Beltrami从双线性函数:
f ( x , y ) = x T A y , A ∈ R n × n f(x,y)=x^TAy,A\in \R^{n \times n} f(x,y)=xTAy,A∈Rn×n
出发,通过引入线性变换 x = U ξ x=U \xi x=Uξ, y = V η y=V \eta y=Vη,将双线性函数变为 f ( x , y ) = ξ T S η f(x,y)=\xi^TS\eta f(x,y)=ξTSη,其中:
S = U T A V S=U^T A V S=UTAV
矩阵的奇异值分解
令 A ∈ R m × n A \in \R^{m \times n} A∈Rm×n,则存在正交矩阵 U ∈ R m × m U \in \R^{m \times m} U∈Rm×m和 V ∈ R n × n V \in \R^{n \times n} V∈Rn×n使得:
A = U Σ V T A=U \Sigma V^T A=UΣVT
式中 Σ = [ Σ 1 o o o ] \Sigma=\left[ \begin{array}{cc} \Sigma_1&o\\ o&o \end{array} \right] Σ=[Σ1ooo],且 Σ 1 = d i a g ( σ 1 , σ 2 , ⋅ ⋅ ⋅ , σ r ) \Sigma_1=diag(\sigma_1,\sigma_2,\cdot \cdot \cdot,\sigma_r) Σ1=diag(σ1,σ2,⋅⋅⋅,σr),其对角元素按照顺序
σ 1 ⩾ σ 2 ⩾ ⋅ ⋅ ⋅ σ r > 0 , r = r a n k ( A ) \sigma_1 \geqslant \sigma_2 \geqslant \cdot \cdot \cdot \sigma_r >0,~~~r=rank(A) σ1⩾σ2⩾⋅⋅⋅σr>0, r=rank(A)
排序。
酉矩阵 设 A ∈ C n × n A \in C^{n \times n} A∈Cn×n,若A满足 A H A = I A^HA=I AHA=I,则称A为酉矩阵。
奇异值分解的理论证明
设 A ∈ C r m × n ( r > 0 ) A \in C_r^{m \times n}(r>0) A∈Crm×n(r>0),则存在m阶酉矩阵U和n阶酉矩阵V使得:
U H A V = ( Σ o o o ) U^HAV=\begin{pmatrix}\Sigma&o\\o&o\end{pmatrix} UHAV=(Σooo)
式中: Σ = d i a g ( σ 1 , σ 2 , ⋅ ⋅ ⋅ , σ r ) , σ i \Sigma=diag(\sigma_1,\sigma_2,\cdot \cdot \cdot,\sigma_r),\sigma_i Σ=diag(σ1,σ2,⋅⋅⋅,σr),σi为A的非零奇异值。而:
A = U ( Σ o o o ) V H A=U\begin{pmatrix}\Sigma&o\\o&o\end{pmatrix}V^H A=U(Σooo)VH
称为A的奇异值分解。
证明:由于 A H A A^HA AHA为Hermite阵,则存在n阶酉矩阵V使得:
V H A H A V = d i a g ( λ 1 , λ 2 , ⋅ ⋅ ⋅ , λ n ) = ( Σ 2 o o o ) V^HA^HAV=diag(\lambda_1,\lambda_2,\cdot \cdot \cdot,\lambda_n)=\begin{pmatrix}\Sigma^2&o\\o&o\end{pmatrix} VHAHAV=diag(λ1,λ2,⋅⋅⋅,λn)=(Σ2ooo)
将V分块为:
V = ( V 1 , V 2 ) ( V 1 ∈ C n × r , V 2 ∈ C n × ( n − r ) ) V=(V_1,V_2)~~~~(V_1 \in C^{n \times r},V_2 \in C^{n \times (n-r)}) V=(V1,V2) (V1∈Cn×r,V2∈Cn×(n−r))
得:
V 1 H A H A V 1 = Σ 2 , V 2 H A H A V 2 = 0 V_1^HA^HAV_1=\Sigma^2,V_2^HA^HAV_2=0 V1HAHAV1=Σ2,V2HAHAV2=0
于是:
Σ − 1 V 1 H A H A V 1 Σ − 1 = I r , ( A V 2 ) H A V 2 = 0 \Sigma^{-1} V_1^HA^HAV_1 \Sigma^{-1}=I_r,(AV_2)^HAV_2=0 Σ−1V1HAHAV1Σ−1=Ir,(AV2)HAV2=0
从而 A V 2 = 0 AV_2=0 AV2=0。又记 U 1 = A V 1 Σ − 1 U_1=AV_1 \Sigma^{-1} U1=AV1Σ−1,则 U 1 H U 1 = I U_1^HU_1=I U1HU1=I,即 U 1 U_1 U1的r个列是两两正交的单位向量。取 U 2 ∈ C m × ( m − r ) U_2 \in C^{m \times (m-r)} U2∈Cm×(m−r)使 U = ( U 1 , U 2 ) U=(U_1,U_2) U=(U1,U2)为m阶酉矩阵,即 U 2 H U 1 = 0 , U 2 H U 2 = I m − r U_2^HU_1=0,U_2^HU_2=I_{m-r} U2HU1=0,U2HU2=Im−r。则有:
U H A V = ( U 1 H U 2 H ) A ( V 1 , V 2 ) = ( U 1 H A V 1 U 1 H A V 2 U 2 H A V 1 U 2 H A V 2 ) = ( U 1 H ( U 1 Σ ) 0 U 2 H ( U 1 Σ ) 0 ) = ( Σ 0 0 0 ) U^HAV=\begin{pmatrix}U_1^H\\ \\U_2^H\end{pmatrix}A\begin{pmatrix}V_1,V_2\end{pmatrix}= \begin{pmatrix}U_1^HAV_1&U_1^HAV_2\\ \\U_2^HAV_1&U_2^HAV_2\end{pmatrix}= \begin{pmatrix}U_1^H(U_1 \Sigma)&0\\ \\U_2^H(U_1 \Sigma)&0\end{pmatrix}= \begin{pmatrix}\Sigma&0\\ \\0&0\end{pmatrix} UHAV=⎝⎛U1HU2H⎠⎞A(V1,V2)=⎝⎛U1HAV1U2HAV1U1HAV2U2HAV2⎠⎞=⎝⎛U1H(U1Σ)U2H(U1Σ)00⎠⎞=⎝⎛Σ000⎠⎞
奇异值分解的应用计算
求矩阵 A = ( 1 0 1 1 1 0 ) A=\begin{pmatrix}1&0&1\\1&1&0\end{pmatrix} A=(110110)的奇异值分解。
解:因为:
A T A = ( 2 1 1 1 1 0 1 0 1 ) A^TA=\begin{pmatrix}2&1&1\\1&1&0\\1&0&1\end{pmatrix} ATA=⎝⎛211110101⎠⎞
所以 A T A A^TA ATA的特征值为 λ 1 = 3 , λ 2 = 1 , λ 3 = 0 , \lambda_1=3,\lambda_2=1,\lambda_3=0, λ1=3,λ2=1,λ3=0,对应的特征向量为:
p 1 = ( 2 1 1 ) , p 2 = ( 0 − 1 1 ) , p 3 = ( − 1 1 1 ) p_1=\begin{pmatrix}2\\1\\1\end{pmatrix}, p_2=\begin{pmatrix}0\\-1\\1\end{pmatrix}, p_3=\begin{pmatrix}-1\\1\\1\end{pmatrix} p1=⎝⎛211⎠⎞,p2=⎝⎛0−11⎠⎞,p3=⎝⎛−111⎠⎞
标准化得:
V = ( 2 6 0 − 1 3 1 6 − 1 2 1 3 1 6 1 2 1 3 ) V=\begin{pmatrix}\frac{2}{\sqrt{6}}&0&-\frac{1}{\sqrt{3}}\\ \\ \frac{1}{\sqrt{6}}&-\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{3}}\\ \\ \frac{1}{\sqrt{6}}&\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{3}}\end{pmatrix} V=⎝⎜⎜⎜⎜⎜⎛6261610−2121−313131⎠⎟⎟⎟⎟⎟⎞
使得:
V H A H A V = ( 3 1 0 ) = ( Σ 2 0 ) V^HA^HAV=\begin{pmatrix}3&&\\&1&\\&&0\end{pmatrix}=\begin{pmatrix}\Sigma^2 &\\&0\end{pmatrix} VHAHAV=⎝⎛310⎠⎞=(Σ20)
计算:
U 1 = A V 1 Σ − 1 = ( 1 0 1 1 1 0 ) ( 2 6 0 1 6 − 1 2 1 6 1 2 ) ( 1 3 0 0 1 ) = ( 1 2 1 2 1 2 − 1 2 ) U_1=AV_1 \Sigma^{-1}=\begin{pmatrix}1&0&1\\ \\1&1&0\end{pmatrix} \begin{pmatrix}\frac{2}{\sqrt{6}}&0\\ \\ \frac{1}{\sqrt{6}}&-\frac{1}{\sqrt{2}}\\ \\ \frac{1}{\sqrt{6}}&\frac{1}{\sqrt{2}}\end{pmatrix} \begin{pmatrix}\frac{1}{\sqrt{3}}&0\\ \\0&1\end{pmatrix}= \begin{pmatrix}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \\ \frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}}\end{pmatrix} U1=AV1Σ−1=⎝⎛110110⎠⎞⎝⎜⎜⎜⎜⎜⎛6261610−2121⎠⎟⎟⎟⎟⎟⎞⎝⎛31001⎠⎞=⎝⎛212121−21⎠⎞
则 U = U 1 U=U_1 U=U1是酉矩阵。故 A A A的奇异值分解为:
A = U ( Σ 0 ) V H = ( 1 2 1 2 1 2 − 1 2 ) ( 3 0 0 0 1 0 ) ( 2 6 1 6 1 6 0 − 1 2 1 2 − 1 3 1 3 1 3 ) A=U(\Sigma~~~~0)V^H= \begin{pmatrix}\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \\ \frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}}\end{pmatrix} \begin{pmatrix}\sqrt{3}&0&0\\ \\0&1&0\end{pmatrix} \begin{pmatrix} \frac{2}{\sqrt{6}}&\frac{1}{\sqrt{6}}&\frac{1}{\sqrt{6}}\\ \\ 0&-\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}\\ \\ -\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}}&\frac{1}{\sqrt{3}} \end{pmatrix} A=U(Σ 0)VH=⎝⎛212121−21⎠⎞⎝⎛300100⎠⎞⎝⎜⎜⎜⎜⎜⎛620−3161−2131612131⎠⎟⎟⎟⎟⎟⎞
Tucker分解,又称高阶奇异值分解(higher-order SVD)。
Tucker分解与Tucker算子密切相关,而Tucker算子是张量与矩阵的多模态乘法的一种有效表示。
定义 令 g ∈ K J 1 × J 2 × ⋅ ⋅ ⋅ × J N g \in \Bbb K^{J_1 \times J_2 \times \cdot \cdot \cdot \times J_N} g∈KJ1×J2×⋅⋅⋅×JN,矩阵 U ( n ) ∈ K I n × J n U^{(n)} \in \Bbb K^{I_n \times J_n} U(n)∈KIn×Jn,其中 n ∈ { 1 , ⋅ ⋅ ⋅ N } n \in \{ 1, \cdot \cdot \cdot N \} n∈{1,⋅⋅⋅N},则Tucker算子定义为:
⟦ g ; U ( 1 ) , U ( 2 ) , ⋅ ⋅ ⋅ , U ( N ) ⟧ = g × 1 U ( 1 ) × 2 U ( 2 ) ⋅ ⋅ ⋅ × N U ( N ) \llbracket g;U^{(1)},U^{(2)}, \cdot \cdot \cdot ,U^{(N)} \rrbracket =g \times_1 U^{(1)} \times_2 U^{(2)} \cdot \cdot \cdot \times_N U^{(N)} [[g;U(1),U(2),⋅⋅⋅,U(N)]]=g×1U(1)×2U(2)⋅⋅⋅×NU(N)
其结果是一个 N N N阶 I 1 × I 2 × ⋅ ⋅ ⋅ × I N I_1 \times I_2 \times \cdot \cdot \cdot \times I_N I1×I2×⋅⋅⋅×IN张量。
N阶奇异值分解 每一个 I 1 × I 2 × ⋅ ⋅ ⋅ × I N I_1 \times I_2 \times \cdot \cdot \cdot \times I_N I1×I2×⋅⋅⋅×IN实张量 χ \chi χ均可以分解为n-模式积:
χ = g × 1 U ( 1 ) × 2 U ( 2 ) ⋅ ⋅ ⋅ × N U ( N ) = ⟦ g ; U ( 1 ) , U ( 2 ) , ⋅ ⋅ ⋅ , U ( N ) ⟧ \chi =g \times_1 U^{(1)} \times_2 U^{(2)} \cdot \cdot \cdot \times_N U^{(N)}=\llbracket g;U^{(1)},U^{(2)}, \cdot \cdot \cdot ,U^{(N)} \rrbracket χ=g×1U(1)×2U(2)⋅⋅⋅×NU(N)=[[g;U(1),U(2),⋅⋅⋅,U(N)]]
或
x i 1 i 2 ⋅ ⋅ ⋅ i N = ∑ j 1 = 1 J 1 ∑ j 2 = 1 J 2 ⋅ ⋅ ⋅ ∑ j N = 1 J N g i 1 i 2 ⋅ ⋅ ⋅ i N u i 1 j 1 ( 1 ) u i 2 j 2 ( 2 ) ⋅ ⋅ ⋅ u i N j N ( N ) x_{i_1 i_2 \cdot \cdot \cdot i_N}=\sum_{j_1=1}^{J_1} \sum_{j_2=1}^{J_2} \cdot \cdot \cdot \sum_{j_N=1}^{J_N} g_{i_1 i_2 \cdot \cdot \cdot i_N} u_{i_1 j_1}^{(1)} u_{i_2 j_2}^{(2)} \cdot \cdot \cdot u_{i_N j_N}^{(N)} xi1i2⋅⋅⋅iN=j1=1∑J1j2=1∑J2⋅⋅⋅jN=1∑JNgi1i2⋅⋅⋅iNui1j1(1)ui2j2(2)⋅⋅⋅uiNjN(N)
其中
(1) U ( n ) = [ u 1 ( n ) , ⋅ ⋅ ⋅ , u J n ( n ) ] U^{(n)}=[u_1^{(n)}, \cdot \cdot \cdot ,u_{J_n}^{(n)}] U(n)=[u1(n),⋅⋅⋅,uJn(n)]是一个 I n × J n I_n \times J_n In×Jn半正交矩阵,即 U ( n ) T U ( n ) = I J n U^{(n)T} U^{(n)}=I_{J_n} U(n)TU(n)=IJn,且 J n ⩽ I n J_n \leqslant I_n Jn⩽In。
(知识补充:实矩阵 Q m × n Q_{m \times n} Qm×n,它只满足 Q Q T = I m QQ^T=I_m QQT=Im或者 Q T Q = I m Q^TQ=I_m QTQ=Im,Q被称为半正交矩阵)
(2)核心张量 g g g是一个 J 1 × J 2 × ⋅ ⋅ ⋅ × J N J_1 \times J_2 \times \cdot \cdot \cdot \times J_N J1×J2×⋅⋅⋅×JN张量,其子张量 g j n = α g_{j_n= \alpha} gjn=α是固定指标 j n = α j_n= \alpha jn=α不变所得到的张量 χ \chi χ。子张量具有以下两个性质:
全正交性 α ≠ β \alpha \neq \beta α=β的两个子核心张量 g j n = α g_{j_n= \alpha} gjn=α和 g j n = β g_{j_n= \beta} gjn=β正交
⟨ g j n = α g j n = β ⟩ = 0 , ∀ α ≠ β , n = 1 , ⋅ ⋅ ⋅ , N \langle g_{j_n= \alpha} g_{j_n= \beta}\rangle =0,\forall \alpha \neq \beta,n=1,\cdot \cdot \cdot,N ⟨gjn=αgjn=β⟩=0,∀α=β,n=1,⋅⋅⋅,N
排序
∥ g i n = 1 ∥ F ≥ ∥ g i n = 2 ∥ F ≥ ⋅ ⋅ ⋅ ≥ ∥ g i n = N ∥ F \parallel g_{i_n=1} \parallel_F \geq \parallel g_{i_n=2} \parallel_F \geq \cdot \cdot \cdot \geq \parallel g_{i_n=N} \parallel_F ∥gin=1∥F≥∥gin=2∥F≥⋅⋅⋅≥∥gin=N∥F
N阶张量的Tucker分解可以写成一个统一的数学模型:
χ = f ( U ( 1 ) , U ( 2 ) , ⋅ ⋅ ⋅ , U ( N ) ) + ε \chi=f(U^{(1)},U^{(2)},\cdot \cdot \cdot,U^{(N)})+ \varepsilon χ=f(U(1),U(2),⋅⋅⋅,U(N))+ε
式中 U ( n ) , n = 1 , ⋅ ⋅ ⋅ , N U^{(n)},n=1,\cdot \cdot \cdot,N U(n),n=1,⋅⋅⋅,N为分解的因子或分量矩阵, ε \varepsilon ε为N阶噪声或误差张量。因此,因子矩阵可以通过下列优化问题求得:
( U ^ ( 1 ) , ⋅ ⋅ ⋅ , U ^ ( N ) ) = arg min U ( 1 ) , ⋅ ⋅ ⋅ , U ( N ) ∥ χ − f ( U ( 1 ) , U ( 2 ) , ⋅ ⋅ ⋅ , U ( N ) ) ∥ 2 2 (\hat{U}^{(1)},\cdot \cdot \cdot,\hat{U}^{(N)}) = \argmin_{U^{(1)},\cdot \cdot \cdot,U^{(N)}} \parallel \chi-f(U^{(1)},U^{(2)},\cdot \cdot \cdot,U^{(N)}) \parallel_2^2 (U^(1),⋅⋅⋅,U^(N))=U(1),⋅⋅⋅,U(N)argmin∥χ−f(U(1),U(2),⋅⋅⋅,U(N))∥22
这是一个N个变元耦合在一起的优化问题。求解这类耦合优化问题的有效方法是交替最小二乘(ALS)算法。
在第k+1次迭代中,利用在k+1次迭代中已更新的因子矩阵 U k + 1 ( 1 ) , ⋅ ⋅ ⋅ , U k + 1 ( i − 1 ) U_{k+1}^{(1)},\cdot \cdot \cdot,U_{k+1}^{(i-1)} Uk+1(1),⋅⋅⋅,Uk+1(i−1)和在k此更新过的因子矩阵 U k + 1 ( i + 1 ) , ⋅ ⋅ ⋅ , U k + 1 ( N ) U_{k+1}^{(i+1)},\cdot \cdot \cdot,U_{k+1}^{(N)} Uk+1(i+1),⋅⋅⋅,Uk+1(N),求因子矩阵 U ( 1 ) U^{(1)} U(1)的最小二乘解:
U ^ k + 1 ( i ) = arg min U ( i ) ∥ χ − f ( U k + 1 ( 1 ) , ⋅ ⋅ ⋅ , U k + 1 ( i − 1 ) , U ( i ) , U k + 1 ( i + 1 ) , ⋅ ⋅ ⋅ , U k + 1 ( N ) ) ∥ 2 2 \hat{U}_{k+1}^{(i)}=\argmin_{U^{(i)}} \parallel \chi-f(U_{k+1}^{(1)},\cdot \cdot \cdot,U_{k+1}^{(i-1)},U^{(i)},U_{k+1}^{(i+1)},\cdot \cdot \cdot,U_{k+1}^{(N)}) \parallel_2^2 U^k+1(i)=U(i)argmin∥χ−f(Uk+1(1),⋅⋅⋅,Uk+1(i−1),U(i),Uk+1(i+1),⋅⋅⋅,Uk+1(N))∥22
其中 i = 1 , ⋅ ⋅ ⋅ , N i=1,\cdot \cdot \cdot,N i=1,⋅⋅⋅,N。对 k = 1 , 2 , ⋅ ⋅ ⋅ k=1,2,\cdot \cdot \cdot k=1,2,⋅⋅⋅,交替使用最小二乘法,直至所有因子矩阵收敛。
下面以张量的矩阵化的水平展开为对象,讨论Tucker3分解的优化问题的求解
min A , B , C , G ( P × Q R ) ∥ X ( I × J K ) − A G ( P × Q R ) ( C ⊗ B ) T ∥ 2 2 \min_{A,B,C,G^{(P \times QR)}} \parallel X^{(I \times JK)}-AG^{(P \times QR)} (C \otimes B)^T \parallel_2^2 A,B,C,G(P×QR)min∥X(I×JK)−AG(P×QR)(C⊗B)T∥22
根据交替最小二乘的原理,假定模式-2矩阵B、模式-3矩阵C和核心张量g的水平展开均固定,则上述优化问题就解耦为仅含模式-1矩阵A的优化问题:
min A ∥ X ( I × J K ) − A G ( P × Q R ) ( C ⊗ B ) T ∥ 2 2 \min_{A} \parallel X^{(I \times JK)}-AG^{(P \times QR)} (C \otimes B)^T \parallel_2^2 Amin∥X(I×JK)−AG(P×QR)(C⊗B)T∥22
相当于求解矩阵 X ( I × J K ) = A G ( P × Q R ) ( C ⊗ B ) T X^{(I \times JK)}=AG^{(P \times QR)}(C \otimes B)^T X(I×JK)=AG(P×QR)(C⊗B)T的最小二乘解。在矩阵方程的两边右乘矩阵 ( C ⊗ B ) (C \otimes B) (C⊗B),得:
X ( I × J K ) ( C ⊗ B ) = A G ( P × Q R ) ( C ⊗ B ) T ( C ⊗ B ) X^{(I \times JK)}(C \otimes B)=AG^{(P \times QR)} (C \otimes B)^T (C \otimes B) X(I×JK)(C⊗B)=AG(P×QR)(C⊗B)T(C⊗B)
若对上式左边得矩阵进行奇异值分解 X ( I × J K ) ( C ⊗ B ) = U 1 S 1 V 1 T X^{(I \times JK)}(C \otimes B)=U_1S_1V_1^T X(I×JK)(C⊗B)=U1S1V1T,则可取前P个左奇异向量作为矩阵A得估计结果 A ^ = U 1 ( : , 1 : P ) \hat{A}=U_1(:,1:P) A^=U1(:,1:P)。这一运算可以简洁表示为 [ A , S , T ] = S V D [ X ( I × J K ) ( C ⊗ B ) , P ] [A,S,T]=SVD[X^{(I \times JK)}(C \otimes B),P] [A,S,T]=SVD[X(I×JK)(C⊗B),P]。
交替最小二乘方法最早由 Paatero与 Tapper用于非负矩阵分解。由于这种方法约束矩阵是非负的,所以现在习惯称为交替非负最小二乘算法。
非负矩阵分解 X I × J = A I × K S K × J X_{I \times J}=A_{I \times K}S_{K \times J} XI×J=AI×KSK×J的优化问题:
min A , S 1 2 ∥ X − A S ∥ F 2 s u b j e c t t o A , S ≥ 0 \min_{A,S} \frac{1}{2} \parallel X-AS \parallel_F^2 ~~subject~~to~~ A,S \geq0 A,Smin21∥X−AS∥F2 subject to A,S≥0
可以分解为两个交替非负最小二乘子问题:
A N L S 1 min S ≥ 0 f 1 ( S ) = 1 2 ∥ A S − X ∥ F 2 ( A 固 定 ) A N L S 2 min A ≥ 0 f 1 ( A T ) = 1 2 ∥ S T A T − X T ∥ F 2 ( S 固 定 ) ANLS1 ~~~~ \min_{S \geq 0}f_1(S)=\frac{1}{2} \parallel AS-X \parallel_F^2 ~~(A固定) \\ ANLS2 ~~~~ \min_{A \geq 0}f_1(A^T)=\frac{1}{2} \parallel S^TA^T-X^T \parallel_F^2 ~~(S固定) ANLS1 S≥0minf1(S)=21∥AS−X∥F2 (A固定)ANLS2 A≥0minf1(AT)=21∥STAT−XT∥F2 (S固定)
这两个交替非负最小二乘子问题相当于使用最小二乘方法交替求解矩阵方程 A S = X AS=X AS=X和 S T A T = X T S^TA^T=X^T STAT=XT,其最小二乘解分别为:
S = P + ( ( A T A ) † A T X ) A T = P + ( ( S S T ) † S X T ) S=P_+((A^TA)^{\dagger} A^TX) \\ A^T=P_+((SS^T)^{\dagger} SX^T) S=P+((ATA)†ATX)AT=P+((SST)†SXT)
当A或S在迭代过程中奇异时,算法将无法收敛。
C N M F min A , S 1 2 ( ∥ X − A S ∥ F 2 + α ∥ A ∥ F 2 + β ∥ S ∥ F 2 ) s u b j e c t t o A , S ≥ 0 CNMF ~~ \min_{A,S} \frac{1}{2}(\parallel X-AS \parallel_F^2+\alpha \parallel A \parallel_F^2+\beta \parallel S \parallel_F^2) ~~subject~~to~~A,S \geq0 CNMF A,Smin21(∥X−AS∥F2+α∥A∥F2+β∥S∥F2) subject to A,S≥0
式中, α ≥ 0 \alpha \geq 0 α≥0和 β ≥ 0 \beta \geq 0 β≥0是两个正则化参数,分别起到压制 ∥ A ∥ F 2 \parallel A \parallel_F^2 ∥A∥F2和 ∥ S ∥ F 2 \parallel S \parallel_F^2 ∥S∥F2的作用。
正则化非负矩阵分解问题可以分解为两个交替正则化非负最小二乘(ARNLS)问题:
A R N L S 1 min S ∈ R + J × K J 1 ( S ) = 1 2 ∥ A S − X ∥ F 2 + 1 2 β ∥ S ∥ F 2 ( A 固 定 ) A R N L S 2 min A ∈ R + I × J J 2 ( A T ) = 1 2 ∥ S T A T − X T ∥ F 2 + 1 2 α ∥ A ∥ F 2 ( S 固 定 ) ARNLS1 ~~ \min_{S \in \R_{+}^{J \times K}} J_1(S)=\frac{1}{2} \parallel AS-X \parallel_F^2+\frac{1}{2} \beta\parallel S \parallel_F^2 ~~(A固定) \\ ARNLS2 ~~ \min_{A \in \R_{+}^{I \times J}} J_2(A^T)=\frac{1}{2} \parallel S^TA^T-X ^T\parallel_F^2+\frac{1}{2}\alpha \parallel A \parallel_F^2 ~~(S固定) ARNLS1 S∈R+J×KminJ1(S)=21∥AS−X∥F2+21β∥S∥F2 (A固定)ARNLS2 A∈R+I×JminJ2(AT)=21∥STAT−XT∥F2+21α∥A∥F2 (S固定)
由矩阵微分
d J 1 ( S ) = 1 2 d ( t r [ ( A S − X ) T ( A S − X ) ] + β t r ( S T S ) ) = t r ( ( S T A T A − X T A + β S T ) d S ) d J 2 ( A T ) = 1 2 d ( t r [ ( A S − X ) ( A S − X ) T ] + α t r ( A T A ) ) = t r ( ( A S S T − X S T + α A ) d A T ) dJ_1(S)=\frac{1}{2}d(tr[(AS-X)^T(AS-X)]+\beta tr(S^TS)) \\=tr((S^TA^TA-X^TA+\beta S^T)dS) \\ dJ_2(A^T)=\frac{1}{2}d(tr[(AS-X)(AS-X)^T]+\alpha tr(A^TA)) \\=tr((ASS^T-XS^T+\alpha A)dA^T) dJ1(S)=21d(tr[(AS−X)T(AS−X)]+βtr(STS))=tr((STATA−XTA+βST)dS)dJ2(AT)=21d(tr[(AS−X)(AS−X)T]+αtr(ATA))=tr((ASST−XST+αA)dAT)
由此得梯度矩阵:
∂ J 1 ( S ) ∂ S = − A T X + A T A S + β S ∂ J 2 ( A T ) ∂ A = − S X T + S S T A T + α A T \frac{\partial J_1(S)}{\partial S}=-A^TX+A^TAS+\beta S \\ \frac{\partial J_2(A^T)}{\partial A}=-SX^T+SS^TA^T+\alpha A^T ∂S∂J1(S)=−ATX+ATAS+βS∂A∂J2(AT)=−SXT+SSTAT+αAT
由 ∂ J 1 ( S ) ∂ S = 0 \frac{\partial J_1(S)}{\partial S}=0 ∂S∂J1(S)=0和 ∂ J 2 ( A T ) ∂ A = 0 \frac{\partial J_2(A^T)}{\partial A}=0 ∂A∂J2(AT)=0分别得到两个正则化最小二乘子问题得解为:
( A T A + β I J ) S = A T X 或 S = ( A T A + β I J ) − 1 A T X ( S S T + α I J ) A T = S X T 或 A T = ( S S T + α I J ) − 1 S X T (A^TA+\beta I_J)S=A^TX ~~ 或 ~~S=(A^TA+\beta I_J)^{-1}A^TX \\ (SS^T+\alpha I_J)A^T=SX^T ~~或~~ A^T=(SS^T+\alpha I_J)^{-1}SX^T (ATA+βIJ)S=ATX 或 S=(ATA+βIJ)−1ATX(SST+αIJ)AT=SXT 或 AT=(SST+αIJ)−1SXT
高斯白噪声
https://blog.csdn.net/qq_26309711/article/details/103157812
c ^ n = c n c ^ c ^ + σ 2 \hat{c}_n = c_n \frac{\hat{c}}{\hat{c}+\sigma^2} c^n=cnc^+σ2c^
c ^ n \hat{c}_n c^n:是过滤核
c ^ \hat{c} c^:是第一个通道的图像核心
c n c_n cn:是在第一个通道的输出上使用块匹配来生成噪声图像的核心
k H a r d kHard kHard:patch size
c h n l s chnls chnls:the number of channels
n S x r nSx_r nSxr:the number of similar patches to the patch at (i,j)