SVD may be the most important matrix decomposition of all, for both theoretical and computational purposes.
Theorem 4.1.1 (SVD Theorem) Let A∈Rn×m A ∈ R n × m be a nonzero matrix with rank r r . Then A can be expressed as a product
A=UΣVT,(4.1.2) A = U Σ V T , ( 4.1.2 )
where U∈Rn×n U ∈ R n × n and V∈Rm×m V ∈ R m × m are orthogonal, and Σ∈Rn×m Σ ∈ R n × m is a nonsquare “diagonal” matrix. σ1≥σ2≥⋯≥σr>0 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0
The SVD has a simple geometric interpretation.
Theorem 4.1.3 (Geometric SVD Theorem) Let A∈Rn×m A ∈ R n × m be a nonzero matrix with rank r r . Then Rm R m has an orthonormal basis v1,⋯,vm,Rn v 1 , ⋯ , v m , R n has an orthonormal basis u1,⋯,un, u 1 , ⋯ , u n , and there exist σ1≥σ2≥⋯≥σr>0 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0 such that:
Avi=σiui A v i = σ i u i
ATui=σivi A T u i = σ i v i
Theorem 4.1.12 Let A∈Rn×m A ∈ R n × m be a nonzero matrix with rank r r . Let σ1,⋯,σr σ 1 , ⋯ , σ r be the singular values of A A , with associated right and left sinular vectors v1,⋯,vr v 1 , ⋯ , v r and u1,⋯,ur, u 1 , ⋯ , u r , respectively. Then
A=∑j=1rσjujvTj A = ∑ j = 1 r σ j u j v j T
Geometrically ||A||2 | | A | | 2 represents the maximum magnification that can be undergone by any vector x∈Rm x ∈ R m when acted on by A A .
Theorem 4.2.1 Let A∈Rn×m A ∈ R n × m have singular values σ1≥σ2≥⋯≥0. σ 1 ≥ σ 2 ≥ ⋯ ≥ 0. Then ||A||2=σ1. | | A | | 2 = σ 1 .
Since A A and AT A T have the same singular values, we have the following corollary.
Corollary 4.2.2 ||A||2=||AT||2 | | A | | 2 = | | A T | | 2
不知道有什么用,但很好玩的结论:Frobenius matrix norm 等于奇异值的平方和再开根。
||A||F=(∑i=1n∑j=1m|aij|2)1/2=(σ21+σ22+⋯+σ2r)1/2 | | A | | F = ( ∑ i = 1 n ∑ j = 1 m | a i j | 2 ) 1 / 2 = ( σ 1 2 + σ 2 2 + ⋯ + σ r 2 ) 1 / 2方阵的2-条件数等于最大奇异值和最小奇异值之比:
Theorem 4.2.4 Let A∈Rn×n A ∈ R n × n be a nonsigular matrix with singular values σ1≥⋯≥σn>0. σ 1 ≥ ⋯ ≥ σ n > 0. Then
k2(A)=σ1σn k 2 ( A ) = σ 1 σ nAnother expresssion for the condition number that was given in Chapter 2 is
k2(A)=maxmag(A)minmag(A) k 2 ( A ) = m a x m a g ( A ) m i n m a g ( A )Theorem 4.2.9 Let A∈Rn×m A ∈ R n × m with n≥m. n ≥ m . Then ||ATA||2=||A||22 | | A T A | | 2 = | | A | | 2 2 and k2(ATA)=k2(A)2. k 2 ( A T A ) = k 2 ( A ) 2 .
证明很简单,用A的SVD分解形式代入即可。伪逆:
(ATA)−1AT ( A T A ) − 1 A T 称为A的伪逆(pseudoinverse)。
4.2.2 Numerical Rank Determination
数据中可能出现roundoff error和uncertainty.因此有些本该为0的奇异值,可能为很小的正数。这是就应该考虑采用numerical rank.
如果只考虑roundoff error,那么阈值可以设为 ϵ=10u||A||, ϵ = 10 u | | A | | , where u u is the unit roundoff error.
Matlab的rank命令求的就是numerical rank,而且可以用户设定threshold.
Every rank-deficient matrix has full-rank matrices arbitrarily close to it.
证明:rank-deficient matrix A A 的SVD分解 Σ Σ 对角线元素有若干为0,只要把为0的奇异值替换成非常小的正数 ϵ ϵ 得到的新矩阵 Aϵ A ϵ 与 A A 的距离||A−Aepsilon||2 | | A − A e p s i l o n | | 2 就等于 ϵ. ϵ .
Theorem 4.2.15
Let Rn×m R n × m with rank(A)=r>0. r a n k ( A ) = r > 0. Let A=UΣVT A = U Σ V T be the SVD of A, with singular values σ1≥σ2≥⋯≥σr>0. σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0.
For k=1,⋯,r−1, k = 1 , ⋯ , r − 1 , define Ak=UΣkVT, A k = U Σ k V T , where Σk∈Rn×m Σ k ∈ R n × m is the diagonal matrix diagσ1,⋯,σk,0⋯,0. d i a g σ 1 , ⋯ , σ k , 0 ⋯ , 0 . Then rank(Ak)=k, r a n k ( A k ) = k , and
σk+1=||A−Ak||2=min||A−B||2,|rank(B)≤k. σ k + 1 = | | A − A k | | 2 = m i n | | A − B | | 2 , | r a n k ( B ) ≤ k .
that is, of all matrices of rank k or less, Ak A k is closest to A.一句话,秩不一样的两个矩阵,是两个世界的人(它们之间有个鸿沟,且秩相差越大,鸿沟越大)
Corollary 4.2.16 Suppose A∈Rn×m A ∈ R n × m has full rank. Thus rank(A)=r, r a n k ( A ) = r , where r=min(n,m). r = m i n ( n , m ) . Let σ1≥⋯≥σr σ 1 ≥ ⋯ ≥ σ r be the sigular values of A. Let B∈Rn×m B ∈ R n × m satisfy ||A−B||2≤σr. | | A − B | | 2 ≤ σ r . Then B also has full rank.
显然,矩阵AB之间的鸿沟太小,不足以成为两个世界的矩阵
结论:
- 如果A满秩,则足够靠近A的矩阵皆为满秩(距离小于 σr σ r )
- 如果A降秩,则存在满秩矩阵arbitrarily close to it(可以任意靠近,但就是不能等于).
- In topological language, the set of matrices of full rank is an open, dense subset of Rn×m. R n × m . Thus, in a certain sense, almost all matrices have full rank.
- 如果一个矩阵降秩,任何很小的波动基本都会将其转换成满秩矩阵。因此,在存在浮点误差的情况下,it is impossible to calculate the (exact, theoretical) rank of a matrix or even detect it is rank deficent.
4.2.3 Orthogonal Decompostions
The QR decompostion with colum pivoting gives AE=QR A E = Q R or equivalently A=QRET, A = Q R E T , where E is a permutation matrix, a special type of orthogonal matrix.
The SVD gives A=UΣVT. A = U Σ V T . Both are examples of orthogonal decompositions A=YTZT A = Y T Z T , where Y and Z are orthogonal, and T has a simple form.
The QR decomposition is much cheaper to compute than SVD. However, the SVD always reveals the numerical rank of the matrix, whereas QR decomposition may sometimes fail to do so.
4.2.4 Distance to Nearest Singular Matrix
Corollary 4.2.22 Let As A s be the singular matrix that is closest to A, in the sense that ||A−As||2 | | A − A s | | 2 is as small as possible. Then ||A−As||=σn | | A − A s | | = σ n , and
||A−As||2||A||2=1k2(A) | | A − A s | | 2 | | A | | 2 = 1 k 2 ( A )由上面的结论很容易得出这里的结论,不过是把 n×m n × m 的矩阵,变成了 n×n n × n 的方阵。注意:只有方阵才能叫singular,非方阵只能叫rank-deficient.
4.3 THE SVD AND THE LEAST SQUARES PROBLEM
当系数矩阵A不满秩时,最小二乘问题没有唯一解。但可以增加条件 ||x||2 | | x | | 2 is minimized. 此时具有唯一解。
用SVD解决最小二乘问题的推导如下:
||b−Ax||2=||UT(b−Ax)||2=||UTb−Σ(VTx)||2. | | b − A x | | 2 = | | U T ( b − A x ) | | 2 = | | U T b − Σ ( V T x ) | | 2 .Letting c=UTb c = U T b and y=VTx, y = V T x , we have
||b−Ax||22=||c−Σy||22=∑i=1r|ci−σiyi|2+∑i=r+1n|ci|2 | | b − A x | | 2 2 = | | c − Σ y | | 2 2 = ∑ i = 1 r | c i − σ i y i | 2 + ∑ i = r + 1 n | c i | 2SVD is an expensive way to solve the least squares problem. Its principal advantage is that it gives a completely reliable means of determining the numerical rank for rank-deficient least squares problems.
4.3.2 The Pseudoinverse
Every A∈Rn×m A ∈ R n × m has a pseudoinverse. The minimum-norm solution to a least squares problem can be expressed in terms of the pesudoinverse A† A † as x=A†b, x = A † b , where
A†=VΣ†UT A † = V Σ † U T其中 Σ† Σ † 的对角线元素是 Σ Σ 对角线元素的倒数。
Pseudoinverse的另一种形式: (ATA)−1AT ( A T A ) − 1 A T
However, there is seldom any reason to compute the pseudoinverse; it is mainly a theoretical tool.
4.4 SENSITIVITY OF THE LEAST SQUARES PROBLEM
In this section we discuss the sensitivity of the solution of the least squares problem under perturbations of A A and b b .
根据3.5节,求解最小二乘问题可以分成两步:
- First we find a y∈R(A) y ∈ R ( A ) whose distance from b is minmimal:
||b−y||2=mins∈R(A)||b−s||2. | | b − y | | 2 = m i n s ∈ R ( A ) | | b − s | | 2 .
- Then the least squares solution x∈Rm x ∈ R m is found by solving the quation Ax=y A x = y exactly.即,如果将 R(A) R ( A ) 看成一个超平面,那么先将b在超平面上做投影。最小二乘的余项的模就等于b到超平面的距离。
看书上Page 281.
4.4.1 The Effect of Perturbation of b
根据上面的两步求解法可知,
- 如果b几近垂直于 R(A) R ( A ) ,那么 b b 很小的扰动,都会造成投影向量相对的改变||δy||y||2 | | δ y | | y | | 2 很大,从而造成最终的误差很大。
- 如果线性系统 Ax=y A x = y is ill conditioned,最终结果误差也会很大。经过推导可得:
||δx||2||x||2≤k2Acosθ||δb||2||b||2 | | δ x | | 2 | | x | | 2 ≤ k 2 A c o s θ | | δ b | | 2 | | b | | 24.4.2 The Effect of Perturbation of A
Unfortunately perturbations of A have a severe effect than perturbations of b.
直接放最小二乘关于A和b扰动的总误差:
||δx||2||x||2≤2k2(A)cosθϵb+2(k2(A)2tanθ+k2(A))ϵA | | δ x | | 2 | | x | | 2 ≤ 2 k 2 ( A ) c o s θ ϵ b + 2 ( k 2 ( A ) 2 t a n θ + k 2 ( A ) ) ϵ A分析:
- 第一项是b的扰动带来的误差
- 第二项是有A的扰动带来的误差,注意,其与 k2(A)2 k 2 ( A ) 2 正相关。
The presence of k2(A)2 k 2 ( A ) 2 means that even if A is only mildly ill conditioned, a small perturbation in A can cause a large change in x. So we should keep the condition number under control.
一种做法就是machine learning里的feature scaling.
4.4.3 Keeping the condition number under control
例如在用多项式逼近一堆数据时,基函数的选择会影响系数矩阵的条件数。
4.4.4 Accuracy of techniques for solving the least squares problem
- QR分解modified Gram-Schmidt method是backward stable的。
- normal equation不精确,因为 k2(ATA)=k2(A2) k 2 ( A T A ) = k 2 ( A 2 )
再次强调:QR method is superior to the normal equations method when the condition number is bad.
1
1
1
1