文章地址: https://arxiv.org/pdf/1404.0736.pdf
代码地址: https://cs.nyu.edu/~denton/compress_conv.zip
Contribution.
Monochronmatic Convolution Approximation.
Let W ∈ R C × X × Y × F ( 96 , 7 , 7 , 3 ) W\in \mathbb{R}^{C\times X \times Y \times F} (96,7,7,3) W∈RC×X×Y×F(96,7,7,3)
For every output feature f f f, consider the matrix W f ∈ R C × ( X Y ) W_f \in \mathbb{R}^{C\times (XY)} Wf∈RC×(XY)
Find the SVD, W f = U f S f V f T W_f = U_fS_fV_f^{T} Wf=UfSfVfT, where U f ∈ R C × C ( 3 , 3 ) , S f ∈ R C × X Y ( 3 , 7 × 7 = 49 ) , V f ∈ R X Y × X Y ( 49 , 49 ) U_f \in \mathbb{R}^{C\times C }(3,3), S_f \in \mathbb{R}^{C\times XY}(3,7\times 7 =49), V_f \in \mathbb{R}^{XY\times XY}(49,49) Uf∈RC×C(3,3),Sf∈RC×XY(3,7×7=49),Vf∈RXY×XY(49,49).
We can take the rank 1 approximation of W f W_f Wf, W ~ f = U ~ f S ~ f V ~ f T \tilde{W}_f = \tilde{U}_f\tilde{S}_f\tilde{V}_f^{T} W~f=U~fS~fV~fT, where U ~ f ∈ R C × 1 , S ~ f ∈ R , V ~ f ∈ R 1 × X Y \tilde{U}_f\in \mathbb{R}^{C\times 1}, \tilde{S}_f\in \mathbb{R}, \tilde{V}_f\in \mathbb{R}^{1\times XY} U~f∈RC×1,S~f∈R,V~f∈R1×XY.
Further clustering the F F F left singular vectors, U ~ f \tilde{U}_f U~f into C ′ C' C′ clusters, C ′ < F C'<F C′<F. (Kmeans)
W ~ f = U c f S ~ f V ~ f T \tilde{W}_f =U_{c_f}\tilde{S}_f\tilde{V}_f^T W~f=UcfS~fV~fT, where U c f ∈ R C × 1 U_{c_f}\in\mathbb{R}^{C\times 1} Ucf∈RC×1 is the cluster center for cluster c f c_f cf.
Biclustering Approximation.
Let W ∈ R C × X × Y × F W\in \mathbb{R}^{C\times X \times Y \times F} W∈RC×X×Y×F, W C ∈ R C × ( X Y F ) W_C\in\mathbb{R}^{C\times (XYF)} WC∈RC×(XYF), W F ∈ R ( C X Y ) × F W_F\in\mathbb{R}^{(CXY)\times F} WF∈R(CXY)×F.
Clustering rows of W C W_C WC into G G G clusters.
Clustering columns of W F W_F WF into H H H clusters.
Then we get H × G H\times G H×G sub-tensors W ( C i , : , : , F j ) , W S ∈ R C G × ( X Y ) × F H W(C_i, :, :,F_j),W_S\in\mathbb{R}^{\frac{C}{G}\times(XY)\times{\frac{F}{H}}} W(Ci,:,:,Fj),WS∈RGC×(XY)×HF
Each sub-tensor contains similar elements, and thus is easier to fit with a low-rank approximation.