>pca <- read.csv("D:/pca.csv")
>pca
x1 x2 x3 x4
1 40 2.0 5 20
2 10 1.5 5 30
3 120 3.0 13 50
4 250 4.5 18 0
5 120 3.5 9 50
6 10 1.5 12 50
7 40 1.0 19 40
8 270 4.0 13 60
9 280 3.5 11 60
10 170 3.0 9 60
11 180 3.5 14 40
12 130 2.0 30 50
13 220 1.5 17 20
14 160 1.5 35 60
15 220 2.5 14 30
16 140 2.0 20 20
17 220 2.0 14 10
18 40 1.0 10 0
19 20 1.0 12 60
20 120 2.0 20 0
> P=scale(pca)#将原始数据标准化后,建立矩阵P
> P
[,1] [,2] [,3] [,4]
[1,] -1.10251269 -0.3081296 -1.3477550 -0.7084466
[2,] -1.44001658 -0.7821750 -1.3477550 -0.2513843
[3,] -0.20250233 0.6399614 -0.2695510 0.6627404
[4,] 1.26001451 2.0620978 0.4043265 -1.6225713
[5,] -0.20250233 1.1140068 -0.8086530 0.6627404
[6,] -1.44001658 -0.7821750 -0.4043265 0.6627404
[7,] -1.10251269 -1.2562205 0.5391020 0.2056781
[8,] 1.48501710 1.5880523 -0.2695510 1.1198028
[9,] 1.59751839 1.1140068 -0.5391020 1.1198028
[10,] 0.36000414 0.6399614 -0.8086530 1.1198028
[11,] 0.47250544 1.1140068 -0.1347755 0.2056781
[12,] -0.09000104 -0.3081296 2.0216325 0.6627404
[13,] 0.92251062 -0.7821750 0.2695510 -0.7084466
[14,] 0.24750285 -0.7821750 2.6955100 1.1198028
[15,] 0.92251062 0.1659159 -0.1347755 -0.2513843
[16,] 0.02250026 -0.3081296 0.6738775 -0.7084466
[17,] 0.92251062 -0.3081296 -0.1347755 -1.1655090
[18,] -1.10251269 -1.2562205 -0.6738775 -1.6225713
[19,] -1.32751528 -1.2562205 -0.4043265 1.1198028
[20,] -0.20250233 -0.3081296 0.6738775 -1.6225713
> eigen(cov(P)) #求矩阵P的协方差矩阵的特征值和特征向量,向量矩阵($vectors)中的第一例(0.69996363....)即为第一个特征值(1.7182516)的特征向量,以此类推。
$values
[1] 1.7182516 1.0935358 0.9813470 0.2068656
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.69996363 0.09501037 -0.24004879 0.6658833
[2,] 0.68979810 -0.28364662 0.05846333 -0.6635550
[3,] 0.08793923 0.90415870 -0.27031356 -0.3188955
[4,] 0.16277651 0.30498307 0.93053167 0.1208302
特征值分解可以得到特征值与特征向量,特征值表示的是这个特征到底有多重要,而特征向量表示这个特征是什么;奇异值σ跟特征值类似,在矩阵Σ中也是从大到小排列,而且σ的减少特别的快,在很多情况下,前10%甚至1%的奇异值的和就占了全部的奇异值之和的99%以上了。
> > svd(cov(P))$d #奇异值分解实现,应用的矩阵同样为原始数据的标准化后的协方差矩阵(方阵) [1] 1.7182516 1.0935358 0.9813470 0.2068656 $u [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 -0.24004879 -0.6658833 [2,] -0.68979810 -0.28364662 0.05846333 0.6635550 [3,] -0.08793923 0.90415870 -0.27031356 0.3188955 [4,] -0.16277651 0.30498307 0.93053167 -0.1208302 $v [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 -0.24004879 -0.6658833 [2,] -0.68979810 -0.28364662 0.05846333 0.6635550 [3,] -0.08793923 0.90415870 -0.27031356 0.3188955 [4,] -0.16277651 0.30498307 0.93053167 -0.1208302
结果显示和特征值分解的结果完全相同,即奇异值=特征值;左奇异向量与右奇异向量相等,这点和理论一致:
http://blog.csdn.net/wangzhiqing3/article/details/7446444
2. 奇异值分解
上面讨论了方阵的分解,但是在LSA中,我们是要对Term-Document矩阵进行分解,很显然这个矩阵不是方阵。这时需要奇异值分解对Term-Document进行分解。奇异值分解的推理使用到了上面所讲的方阵的分解。
假设C是M x N矩阵,U是M x M矩阵,其中U的列为CCT的正交特征向量,V为N x N矩阵,其中V的列为CTC的正交特征向量,再假设r为C矩阵的秩,则存在奇异值分解:
S奇异值分解是一个能适用于任意的矩阵的一种分解的方法,VD处理普通矩阵mxn,待续......
> svd(P)$d #奇异值分解实现,应用的矩阵为原始数据的标准化后矩阵(20X4) [1] 5.713736 4.558199 4.318054 1.982535 $u [,1] [,2] [,3] [,4] [1,] 0.213188874 -0.31854661 0.01117934 -0.09356334 [2,] 0.298743593 -0.26550125 -0.09966088 -0.02040249 [3,] -0.067184471 -0.05316902 -0.17961537 -0.19846043 [4,] -0.363306085 -0.13041863 0.41709916 -0.43090590 [5,] -0.116116981 -0.18960342 -0.21978181 -0.27040771 [6,] 0.258181271 -0.01720108 -0.23759348 -0.11644174 [7,] 0.272565880 0.17588846 -0.05485743 -0.02402906 [8,] -0.401395312 -0.04641079 -0.19713543 0.07886498 [9,] -0.353798967 -0.06803484 -0.20133714 0.31867233 [10,] -0.140818406 -0.11779839 -0.28058876 0.10504376 [11,] -0.196159553 -0.07244560 -0.04157562 -0.17994125 [12,] -0.001770250 0.46264984 -0.01709488 -0.21189013 [13,] -0.002549413 0.07396825 0.23141706 0.48510618 [14,] -0.009279184 0.66343445 -0.04822489 -0.02040610 [15,] -0.123807056 -0.03464961 0.09477346 0.26067355 [16,] 0.044254060 0.10591148 0.20027671 -0.04088441 [17,] -0.040535151 -0.06631368 0.29818366 0.36362322 [18,] 0.343318979 -0.18704239 0.26319302 0.05965465 [19,] 0.288607918 0.04522412 -0.32341707 0.10786442 [20,] 0.097860256 0.04005869 0.38476035 -0.17217051 $v [,1] [,2] [,3] [,4] [1,] -0.69996363 0.09501037 0.24004879 0.6658833 [2,] -0.68979810 -0.28364662 -0.05846333 -0.6635550 [3,] -0.08793923 0.90415870 0.27031356 -0.3188955 [4,] -0.16277651 0.30498307 -0.93053167 0.1208302 奇异值与潜在语义索引LSI Book <- read.csv("D:/Book.csv") Book K=as.matrix(data.frame(Book)) svd(K) rownames(kk)=Book$X kk rownames(v)=paste('T',1:9,sep='') plot(rnorm,xlim=c(-0.8,0),ylim=c(-0.8,0.6),lty=0) points(v[,3],v[,2],col='red') points(kk[,3],kk[,2],col='blue') text(kk[,3],kk[,2],Book$X) text(v[,3],v[,2],paste('T',1:9,sep=''))
结果显示右奇异矩阵为之前原始数据的标准化后的协方差矩阵的特征向量矩阵
svd即可以实现对列的压缩(变量),也可以实现对行的压缩(case)