Converting one matrix factorization to another

Converting one matrix factorization to another

  • 1 Preliminaries (Some definitions)
    • 1.1 Full Decomposition and Partial Decomposition
    • 1.2 The Interpolative Decomposition (ID)
  • 2 From given factorization to QR
  • 3 From given factorization to SVD
  • 4 From given factorization to ID
  • 5 Postprocessing via Row Extraction
  • 6 SVD via Row Extraction

(Note: The following content is mainly excerpted from the reference [1] listed at the bottom.)


Suppose that we have obtained a partial decomposition of a matrix A A A by some means:
∥ A − C B ∥ ≤ ϵ , (1) \|A-CB\|\leq \epsilon, \tag{1} ACBϵ,(1)
where B B B and C C C have rank k k k.


1 Preliminaries (Some definitions)

1.1 Full Decomposition and Partial Decomposition

It is possible to compute the full QR decomposition for the full SVD of an m × n m \times n m×n matrix to double-precision accuracy with O ( m n min ⁡ { m , n } ) O(mn\min\{m,n\}) O(mnmin{m,n}) flops. (That is, full decomposition is an accurate decomposition. ) The full decompostion has the following formula,
A = C B (2) A = CB \tag{2} A=CB(2)

Suppose that an m × n m \times n m×n matrix has numerical rank k k k, where k k k is substantially smaller than m m m and n n n. In this case, it is possible to produce a structured low-rank decomposition that approximates the matrix well. (That is, partial decomposition is an approximate decomposition.) The partial decomposition has the following formula,
A = C B + E (3) A=CB+E \tag{3} A=CB+E(3)
where ∥ E ∥ ≤ ϵ \|E\|\leq \epsilon Eϵ.

1.2 The Interpolative Decomposition (ID)

The final factorization identifies a collection of k k k columns from a rank- k k k matrix A A A that span the range of A A A. To be precise, we can compute an index set J = [ j 1 , ⋯   , j k ] J=[j_{1},\cdots, j_{k}] J=[j1,,jk] such that
A = A ( : , j ) X , (4) A = A(:,j)X, \tag{4} A=A(:,j)X,(4)
where X X X is a k × n k\times n k×n matrix that satisfies X ( : , J ) = I k X(:,J)=I_{k} X(:,J)=Ik. Furthermore, no entry of X X X has magnitude larger than two. In other words, this decomposition express each column of A A A using a linear combination of k k k fixed columns with bounded coefficients. Stable and efficient algorithms for computing the ID appear in the papers [2,3].

It is also possible to compute a two-sided ID,
A = W A ( J ′ , J ) X , (5) A=WA(J',J)X, \tag{5} A=WA(J,J)X,(5)
where J ′ J' J is an index set identifying k k k of the rows of A A A, and W W W is an m × k m\times k m×k matrix that satisfies W ( J ′ , : ) = I k W(J',:)=I_{k} W(J,:)=Ik and whose entries are all bounded by two.


2 From given factorization to QR

Given a partial decompostion of matrix A A A in (1), the following steps construct a partial QR factorization:

  • Compute a QR factorization of C C C so that C = Q 1 R 1 C=Q_{1}R_{1} C=Q1R1;
  • From the product D = R 1 B D=R_{1}B D=R1B, and compute a QR factorization: D = Q 2 R D=Q_{2}R D=Q2R;
  • From the product Q = Q 1 Q 2 Q=Q_{1}Q_{2} Q=Q1Q2.

The result is an orthonormal matrix Q Q Q and a weekly upper-triangular matrix R R R such that ∥ A − Q R ∥ ≤ ϵ \|A-QR\|\leq \epsilon AQRϵ.


3 From given factorization to SVD

Given a partial decompostion of matrix A A A in (1), the following steps construct a partial SVD factorization:

  • Compute a QR factorization of C C C so that C = Q 1 R 1 C=Q_{1}R_{1} C=Q1R1;
  • Form the product D = R 1 B D=R_{1}B D=R1B, and compute an SVD: D = U 2 Σ V ∗ D=U_{2}\Sigma V^{*} D=U2ΣV;
  • Form the product U = Q 1 U 2 U=Q_{1}U_{2} U=Q1U2.

The result is a diagonal matrix Σ \Sigma Σ and orthonormal matrices U U U and V V V such that ∥ A − U Σ V ∗ ∥ ≤ ϵ . \|A-U\Sigma V^{*}\|\leq \epsilon. AUΣVϵ.


4 From given factorization to ID

Given a partial decompostion of matrix A A A in (1), the following one-step process can get the ID factorization:

  • Compute J J J and X X X such that B = B ( : , J ) X B=B(:,J)X B=B(:,J)X.

Then A ≈ A ( : , J ) X A\approx A(:,J)X AA(:,J)X, but the approximation error may deteriorate from the initial estimate. That is, the upper bound of ∥ A − A ( : , J ) X ∥ \|A-A(:,J)X\| AA(:,J)X is larger than ϵ \epsilon ϵ. In other words, ∥ A − A ( : , J ) X ∥ ≤ ( 1 + α ) ϵ \|A-A(:,J)X\|\leq (1+\alpha)\epsilon AA(:,J)X(1+α)ϵ.


5 Postprocessing via Row Extraction

Given a matrix Q Q Q such that the follwoing inequlity holds,
∥ A − Q Q ∗ A ∥ ≤ ϵ (6) \|A-QQ^{*}A\|\leq \epsilon \tag{6} AQQAϵ(6)
we can obtain a rank- k k k factorization
A ≈ X B , (7) A\approx XB, \tag{7} AXB,(7)
where B B B is a k × n k\times n k×n matrix consisting of k k k rows extracted from A A A.

The approximation (7) can be produced without computing any matrix-matrix products, which makes this approach to postprocessing very fast.

The drawback comes because the error ∥ A − X B ∥ \|A-XB\| AXB is usually larger than the initial error |A-QQ^{*}A|, especially when the dimensions of A A A are large.

To obtain the factorization (7), we simply construct the ID of the matrix Q Q Q:
Q = X Q ( J , : ) . (8) Q=XQ(J,:). \tag{8} Q=XQ(J,:).(8)
The index set J J J marks k k k rows of Q Q Q that span the row space of Q Q Q, and X X X is an m × k m\times k m×k matrix whose entries are bounded in magnitude by two and contains the k × k k\times k k×k identity as a submatrix: X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik. Combining (8) and (6), we reach
A ≈ Q Q ∗ A = X Q ( J , : ) Q ∗ A , (9) A\approx QQ^{*}A=XQ(J,:)Q^{*}A, \tag{9} AQQA=XQ(J,:)QA,(9)
Since X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik, equation (9) implies that A ( J , : ) ≈ Q ( J , : ) Q ∗ A A(J,:)\approx Q(J,:)Q^{*}A A(J,:)Q(J,:)QA. Therefore, (7) follows when we put B = A ( J , : ) B=A(J,:) B=A(J,:).


6 SVD via Row Extraction

Provided with the factorization (7), we can obtain any standard factorization (QR, SVD et.al). The following algorithm illustrates an SVD calculation. This procedure requires O ( k 2 ( m + n ) ) O(k^{2}(m+n)) O(k2(m+n)) flops.

Converting one matrix factorization to another_第1张图片
The following lemma guarantees the accuracy of the computed factors.

Lemma 5.1 [1]
Let A A A be an m × n m\times n m×n matrix and let Q Q Q be an m × k m\times k m×k matrix that satisfy (6). Suppose that U U U, Σ \Sigma Σ, and V V V are the matrices constructed by the above algorithm. Then
∥ A − U Σ V ∗ ∥ ≤ [ 1 + 1 + 4 k ( n − k ) ] ε . (10) \left\|\boldsymbol{A}-\boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}\right\| \leq[1+\sqrt{1+4 k(n-k)}] \varepsilon. \tag{10} AUΣV[1+1+4k(nk) ]ε.(10)

Proof: The factors U , Σ , V U, \Sigma, V U,Σ,V constructed by the algorithm satisfy
U Σ V ∗ = U Σ V ~ ∗ W ∗ = Z W ∗ = X R ∗ W ∗ = X A ( J , : ) \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{*}=\boldsymbol{U} \boldsymbol{\Sigma} \tilde{\boldsymbol{V}}^{*} \boldsymbol{W}^{*}=\boldsymbol{Z} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{R}^{*} \boldsymbol{W}^{*}=\boldsymbol{X} \boldsymbol{A}_{(J,:)} UΣV=UΣV~W=ZW=XRW=XA(J,:)
define the approximation
A ^ = Q Q ∗ A . (11) \widehat{A}=Q Q^{*} A. \tag{11} A =QQA.(11)
Since A ^ = X Q ( J , : ) Q ∗ A \widehat{A}=XQ(J,:)Q^{*}A A =XQ(J,:)QA and since X ( J , : ) = I k X(J,:)=I_{k} X(J,:)=Ik, it must be that A ^ ( J , : ) = Q ( J , : ) Q ∗ A \widehat{A}(J,:)=Q(J,:)Q^{*}A A (J,:)=Q(J,:)QA. Consequently,
A ^ = X A ^ ( J , : ) . \widehat{A}=X\widehat{A}(J,:). A =XA (J,:).
We have the chain of relations
∥ A − U Σ V ∗ ∥ = ∥ A − X A ( J , : ) ∥ = ∥ ( A − X A ^ ( J , : ) ) + ( X A ^ ( J , : ) − X A ( J , : ) ) ∥ ≤ ∥ A − A ^ ∥ + ∥ X A ^ ( J , : ) − X A ( J , : ) ∥ ≤ ∥ A − A ^ ∥ + ∥ X ∥ ∥ A ( J , : ) − A ^ ( J , : ) ∥ . (12) \begin{array}{ll} \|A-U\Sigma V^{*}\| & = \|A-XA(J,:)\| \\ &=\|(A-X\widehat{A}(J,:))+(X\widehat{A}(J,:)-XA(J,:))\|\\ &\leq \|A-\widehat{A}\|+\|X\widehat{A}(J,:)-XA(J,:)\| \\ &\leq \|A-\widehat{A}\|+\|X\|\|A(J,:)-\widehat{A}(J,:)\|. \end{array} \tag{12} AUΣV=AXA(J,:)=(AXA (J,:))+(XA (J,:)XA(J,:))AA +XA (J,:)XA(J,:)AA +X∥∥A(J,:)A (J,:)∥.(12)
Inequality (6) ensures that ∥ A − A ^ ∥ ≤ ϵ \|A-\widehat{A}\|\leq \epsilon AA ϵ. Since A ( J , : ) − A ^ ( J , : ) A(J,:)-\widehat{A}(J,:) A(J,:)A (J,:) is a submatrix of A − A ^ A-\widehat{A} AA , we must also have ∥ A ( J , : ) − A ^ ( J , : ) ∥ ≤ ϵ \|A(J,:)-\widehat{A}(J,:)\|\leq \epsilon A(J,:)A (J,:)ϵ. Thus, (12) reduces to
∥ A − U Σ V ∗ ∥ ≤ ( 1 + ∥ X ∥ ) ϵ . (13) \|A-U\Sigma V^{*}\|\leq (1+\|X\|)\epsilon. \tag{13} AUΣV(1+X)ϵ.(13)
The bound (10) follows from (13) after we observe that X X X contains a k × k k\times k k×k identity matrix and that the entries of the remaining ( n − k ) × k (n-k)\times k (nk)×k submatrix are bounded in magnitude by two.

Note: Recall that the orthonormal matrix Q Q Q is contruct by orthonormalizing the columns of the sample matrix Y Y Y( Y = A Ω Y=A\Omega Y=AΩ). With finite-precision arithmetric, it is preferable to adapt the above algorithm to strat directly from the matrix Y Y Y. To be precise, we modify step 1 to compute X X X and J J J so that Y = X Y ( J , : ) Y=XY(J,:) Y=XY(J,:).


References:
[1] N.Halko, P.G.Martinsson, J.A.Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2(53):217-288, 2011.
[2] H. Cheng, Z. Gimbutas, P. G. Martinsson, and V. Rokhlin, On the compression of low rank matrices, SIAM J. Sci. Comput., 26 (2005), pp. 1389–1404.
[3] M. Gu and S. C. Eisenstat, Efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869.

你可能感兴趣的:(数学学海,算法)