(Note: The following content is mainly excerpted from the reference [1] listed at the bottom.)
Suppose that we have obtained a partial decomposition of a matrix A A A by some means:
∥ A − C B ∥ ≤ ϵ , (1) \|A-CB\|\leq \epsilon, \tag{1} ∥A−CB∥≤ϵ,(1)
where B B B and C C C have rank k k k.
It is possible to compute the full QR decomposition for the full SVD of an m × n m \times n m×n matrix to double-precision accuracy with O ( m n min { m , n } ) O(mn\min\{m,n\}) O(mnmin{m,n}) flops. (That is, full decomposition is an accurate decomposition. ) The full decompostion has the following formula,
A = C B (2) A = CB \tag{2} A=CB(2)
Suppose that an m × n m \times n m×n matrix has numerical rank k k k, where k k k is substantially smaller than m m m and n n n. In this case, it is possible to produce a structured low-rank decomposition that approximates the matrix well. (That is, partial decomposition is an approximate decomposition.) The partial decomposition has the following formula,
A = C B + E (3) A=CB+E \tag{3} A=CB+E(3)
where ∥ E ∥ ≤ ϵ \|E\|\leq \epsilon ∥E∥≤ϵ.
The final factorization identifies a collection of k k k columns from a rank- k k k matrix A A A that span the range of A A A. To be precise, we can compute an index set J = [ j 1 , ⋯ , j k ] J=[j_{1},\cdots, j_{k}] J=[j1,⋯,jk] such that
A = A ( : , j ) X , (4) A = A(:,j)X, \tag{4} A=A(:,j)X,(4)
where X X X is a k × n k\times n k×n matrix that satisfies X ( : , J ) = I k X(:,J)=I_{k} X(:,J)=Ik. Furthermore, no entry of X X X has magnitude larger than two. In other words, this decomposition express each column of A A A using a linear combination of k k k fixed columns with bounded coefficients. Stable and efficient algorithms for computing the ID appear in the papers [2,3].
It is also possible to compute a two-sided ID,
A = W A ( J ′ , J ) X , (5) A=WA(J',J)X, \tag{5} A=WA(J′,J)X,(5)
where J ′ J' J′ is an index set identifying k k k of the rows of A A A, and W W W is an m × k m\times k m×k matrix that satisfies W ( J ′ , : ) = I k W(J',:)=I_{k} W(J′,:)=Ik and whose entries are all bounded by two.
Given a partial decompostion of matrix A A A in (1), the following steps construct a partial QR factorization:
The result is an orthonormal matrix Q Q Q and a weekly upper-triangular matrix R R R such that ∥ A − Q R ∥ ≤ ϵ \|A-QR\|\leq \epsilon ∥A−QR∥≤ϵ.
Given a partial decompostion of matrix A A A in (1), the following steps construct a partial SVD factorization:
The result is a diagonal matrix Σ \Sigma Σ and orthonormal matrices U U U and V V V such that ∥ A − U Σ V ∗ ∥ ≤ ϵ . \|A-U\Sigma V^{*}\|\leq \epsilon. ∥A−UΣV∗∥≤ϵ.
Given a partial decompostion of matrix A A A in (1), the following one-step process can get the ID factorization:
Then A ≈ A ( : , J ) X A\approx A(:,J)X A≈A(:,J)X, but the approximation error may deteriorate from the initial estimate. That is, the upper bound of ∥ A − A ( : , J ) X ∥ \|A-A(:,J)X\| ∥A−A(:,J)X∥ is larger than ϵ \epsilon ϵ. In other words, ∥ A − A ( : , J ) X ∥ ≤ ( 1 + α ) ϵ \|A-A(:,J)X\|\leq (1+\alpha)\epsilon ∥A−A(:,J)X∥≤(1+α)ϵ.
Given a matrix Q Q Q such that the follwoing inequlity holds,
∥ A − Q Q ∗ A ∥ ≤ ϵ (6) \|A-QQ^{*}A\|\leq \epsilon \tag{6} ∥A−QQ∗A∥≤ϵ(6)
we can obtain a rank- k k k factorization
A ≈ X B , (7) A\approx XB, \tag{7} A≈XB,(7)
where B B B is a k × n k\times n k×n matrix consisting of k k k rows extracted from A A A.
The approximation (7) can be produced without computing any matrix-matrix products, which makes this approach to postprocessing very fast.
The drawback comes because the error ∥ A − X B ∥ \|A-XB\| ∥A−XB∥ is usually larger than the initial error |A-QQ^{*}A|, especially when the dimensions of A A A are large.
To obtain the factorization (7), we simply construct the ID of the matrix Q Q Q:
Q = X Q ( J , : ) . (8) Q=XQ(J,:). \tag{8} Q=XQ(J,:).(8)
The index set J J J marks k k k rows of Q Q Q that span the row space of Q Q Q, and X X X is an m × k m\times k m×k matrix whose entries are bounded in magnitude by two and contains the k × k k\times k k×k identity as a submatrix: X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik. Combining (8) and (6), we reach
A ≈ Q Q ∗ A = X Q ( J , : ) Q ∗ A , (9) A\approx QQ^{*}A=XQ(J,:)Q^{*}A, \tag{9} A≈QQ∗A=XQ(J,:)Q∗A,(9)
Since X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik, equation (9) implies that A ( J , : ) ≈ Q ( J , : ) Q ∗ A A(J,:)\approx Q(J,:)Q^{*}A A(J,:)≈Q(J,:)Q∗A. Therefore, (7) follows when we put B = A ( J , : ) B=A(J,:) B=A(J,:).
Provided with the factorization (7), we can obtain any standard factorization (QR, SVD et.al). The following algorithm illustrates an SVD calculation. This procedure requires O ( k 2 ( m + n ) ) O(k^{2}(m+n)) O(k2(m+n)) flops.
The following lemma guarantees the accuracy of the computed factors.
Lemma 5.1 [1]
Let A A A be an m × n m\times n m×n matrix and let Q Q Q be an m × k m\times k m×k matrix that satisfy (6). Suppose that U U U, Σ \Sigma Σ, and V V V are the matrices constructed by the above algorithm. Then
∥ A − U Σ V ∗ ∥ ≤ [ 1 + 1 + 4 k ( n − k ) ] ε . (10) \left\|\boldsymbol{A}-\boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}\right\| \leq[1+\sqrt{1+4 k(n-k)}] \varepsilon. \tag{10} ∥A−UΣV∗∥≤[1+1+4k(n−k)]ε.(10)
Proof: The factors U , Σ , V U, \Sigma, V U,Σ,V constructed by the algorithm satisfy
U Σ V ∗ = U Σ V ~ ∗ W ∗ = Z W ∗ = X R ∗ W ∗ = X A ( J , : ) \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}=\boldsymbol{U} \boldsymbol{\Sigma} \tilde{\boldsymbol{V}}^{*} \boldsymbol{W}^{*}=\boldsymbol{Z} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{R}^{*} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{A}_{(J,:)} UΣV∗=UΣV~∗W∗=ZW∗=XR∗W∗=XA(J,:)
define the approximation
A ^ = Q Q ∗ A . (11) \widehat{A}=Q Q^{*} A. \tag{11} A =QQ∗A.(11)
Since A ^ = X Q ( J , : ) Q ∗ A \widehat{A}=XQ(J,:)Q^{*}A A =XQ(J,:)Q∗A and since X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik, it must be that A ^ ( J , : ) = Q ( J , : ) Q ∗ A \widehat{A}(J,:)=Q(J,:)Q^{*}A A (J,:)=Q(J,:)Q∗A. Consequently,
A ^ = X A ^ ( J , : ) . \widehat{A}=X\widehat{A}(J,:). A =XA (J,:).
We have the chain of relations
∥ A − U Σ V ∗ ∥ = ∥ A − X A ( J , : ) ∥ = ∥ ( A − X A ^ ( J , : ) ) + ( X A ^ ( J , : ) − X A ( J , : ) ) ∥ ≤ ∥ A − A ^ ∥ + ∥ X A ^ ( J , : ) − X A ( J , : ) ∥ ≤ ∥ A − A ^ ∥ + ∥ X ∥ ∥ A ( J , : ) − A ^ ( J , : ) ∥ . (12) \begin{array}{ll} \|A-U\Sigma V^{*}\| & = \|A-XA(J,:)\| \\ &=\|(A-X\widehat{A}(J,:))+(X\widehat{A}(J,:)-XA(J,:))\|\\ &\leq \|A-\widehat{A}\|+\|X\widehat{A}(J,:)-XA(J,:)\| \\ &\leq \|A-\widehat{A}\|+\|X\|\|A(J,:)-\widehat{A}(J,:)\|. \end{array} \tag{12} ∥A−UΣV∗∥=∥A−XA(J,:)∥=∥(A−XA (J,:))+(XA (J,:)−XA(J,:))∥≤∥A−A ∥+∥XA (J,:)−XA(J,:)∥≤∥A−A ∥+∥X∥∥A(J,:)−A (J,:)∥.(12)
Inequality (6) ensures that ∥ A − A ^ ∥ ≤ ϵ \|A-\widehat{A}\|\leq \epsilon ∥A−A ∥≤ϵ. Since A ( J , : ) − A ^ ( J , : ) A(J,:)-\widehat{A}(J,:) A(J,:)−A (J,:) is a submatrix of A − A ^ A-\widehat{A} A−A , we must also have ∥ A ( J , : ) − A ^ ( J , : ) ∥ ≤ ϵ \|A(J,:)-\widehat{A}(J,:)\|\leq \epsilon ∥A(J,:)−A (J,:)∥≤ϵ. Thus, (12) reduces to
∥ A − U Σ V ∗ ∥ ≤ ( 1 + ∥ X ∥ ) ϵ . (13) \|A-U\Sigma V^{*}\|\leq (1+\|X\|)\epsilon. \tag{13} ∥A−UΣV∗∥≤(1+∥X∥)ϵ.(13)
The bound (10) follows from (13) after we observe that X X X contains a k × k k\times k k×k identity matrix and that the entries of the remaining ( n − k ) × k (n-k)\times k (n−k)×k submatrix are bounded in magnitude by two.
Note: Recall that the orthonormal matrix Q Q Q is contruct by orthonormalizing the columns of the sample matrix Y Y Y( Y = A Ω Y=A\Omega Y=AΩ). With finite-precision arithmetric, it is preferable to adapt the above algorithm to strat directly from the matrix Y Y Y. To be precise, we modify step 1 to compute X X X and J J J so that Y = X Y ( J , : ) Y=XY(J,:) Y=XY(J,:).
References:
[1] N.Halko, P.G.Martinsson, J.A.Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2(53):217-288, 2011.
[2] H. Cheng, Z. Gimbutas, P. G. Martinsson, and V. Rokhlin, On the compression of low rank matrices, SIAM J. Sci. Comput., 26 (2005), pp. 1389–1404.
[3] M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869.