利用SVD(Singular Value Decomposition),即奇异值分解,我们可以用更小的数据集来表示原始数据集。这样做,其实是去除了噪声和冗余信息。
奇异值分解
优点:简化数据,去除噪声,提高算法的结果
缺点:数据的转化可能难以理解
使用数据类型:数值型数据
最早的SVD应用之一就是信息检索,我们称利用SVD的方法为隐性语义索引(Latent Semantic Indexing,LSI),或隐性语义分析(Latent Semantic Analysis,LSA)。
SVD的另一个应用就是推荐系统。利用SVD可以从数据中构建一个主题空间,如果再在该空间下计算其相似度。
SVD是矩阵分解的一种类型,而矩阵分解是将数据矩阵分解为多个独立部分的过程。
NumPy中有一个称为linalg的线性代数工具箱。
In [94]: from numpy import *
In [95]: U, Sigma,VT = linalg.svd([[1,1],[7,7]])
In [96]: U
Out[96]:
array([[-0.14142136, -0.98994949],
[-0.98994949, 0.14142136]])
In [97]: Sigma
Out[97]: array([ 1.00000000e+01, 2.82797782e-16])
In [98]: VT
Out[98]:
array([[-0.70710678, -0.70710678],
[ 0.70710678, -0.70710678]])
Sigma以行向量array([10.,0.])返回,而非[[10,0],[0,0]]。这种返回方式节省空间。
建立一个新文件svdRec.py:
def loadExData():
return[[1, 1, 1, 0, 0],
[2, 2, 2, 0, 0],
[1, 1, 1, 0, 0],
[5, 5, 5, 0, 0],
[1, 1, 0, 0, 0],
[0, 0, 0, 3, 3],
[0, 0, 0, 1, 1]]
接下来对该矩阵进行SVD分解:
In [17]: import svdRec
...: Data = svdRec.loadExData()
...: U, Sigma,VT = linalg.svd(Data)
...: Sigma
...:
Out[17]:
array([ 9.71302333e+00, 4.47213595e+00, 8.10664981e-01,
1.62982155e-15, 8.33719667e-17])
因为最后两个数太小了,我们可以去掉。
我们试图重新构造原始矩阵,首先构建一个3x3的矩阵Sig3,因而我们只需要前三行和前三列:
In [18]: Sig3 = mat([[Sigma[0],0,0],[0,Sigma[1],0],[0,0,Sigma[2]]])
...: U[:,:3]*Sig3*VT[:3,:]
...:
Out[18]:
matrix([[ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
-7.70210327e-33, -7.70210327e-33],
[ 2.00000000e+00, 2.00000000e+00, 2.00000000e+00,
-4.60081159e-17, -4.60081159e-17],
[ 1.00000000e+00, 1.00000000e+00, 1.00000000e+00,
-1.23532915e-17, -1.23532915e-17],
...,
[ 1.00000000e+00, 1.00000000e+00, 4.53492652e-16,
-5.59432048e-34, -5.59432048e-34],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
3.00000000e+00, 3.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.00000000e+00, 1.00000000e+00]])
确定要保留的奇异值的数目有很多启发式的策略,其中一个典型的做法就是保留矩阵中90%的能量。为了计算总能量信息,我们将所有的奇异值求其平方和。于是可以将奇异值的平方和累加到总之的90%为止。另一个启发式策略就是,当矩阵上有上万的奇异值时,那么就保留钱2000~3000个。
下面开始研究相似度的计算:(上文矩阵出错,后文已更正)
from numpy import *
from numpy import linalg as la
def ecludSim(inA,inB):#欧氏距离
return 1.0/(1.0 + la.norm(inA - inB))
def pearsSim(inA,inB):#皮尔逊指数
if len(inA) < 3 : return 1.0#不存在,则两个向量完全相关
return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]
def cosSim(inA,inB):#余弦相似度
num = float(inA.T*inB)
denom = la.norm(inA)*la.norm(inB)
return 0.5+0.5*(num/denom)
下面我们将对上述函数进行尝试:
In [9]: import svdRec
...: myMat = mat(svdRec.loadExData())
...: svdRec.ecludSim(myMat[:,0],myMat[:,4])
...:
Out[9]: 0.13367660240019172
In [10]: svdRec.ecludSim(myMat[:,0],myMat[:,0])
Out[10]: 1.0
In [11]: svdRec.cosSim(myMat[:,0],myMat[:,4])
Out[11]: 0.54724555912615336
In [12]: svdRec.cosSim(myMat[:,0],myMat[:,0])
Out[12]: 0.99999999999999989
In [13]: svdRec.pearsSim(myMat[:,0],myMat[:,4])
Out[13]: 0.23768619407595815
In [14]: svdRec.pearsSim(myMat[:,0],myMat[:,0])
Out[14]: 1.0
这里采用列向量的表示方法,暗示着我们将利用基于物品的相似度计算方法。使用哪一种相似度,取决于用户或者物品的数目。
如何对推荐引擎进行评价呢?具体的做法是我们将某些已知的评分去掉,如何对他们进行预测,最后计算预测值和真实值的差异。
通常用于推荐引擎评价的指标是称为最小均方误差(Root Mean Squared,RMSE)的指标,他首先计算均方误差的平均值,然后取其平方根。
接下来我们尝试一个物品相似度推荐引擎:
def standEst(dataMat, user, simMeas, item):#给定相似度计算方法的条件下,计算用户对物品的估计评分制
n = shape(dataMat)[1]#物品数目
simTotal = 0.0; ratSimTotal = 0.0
for j in range(n):
userRating = dataMat[user,j]
if userRating == 0: continue#没有评分,跳过
overLap = nonzero(logical_and(dataMat[:,item].A>0,dataMat[:,j].A>0))[0]#寻找两个用户已经评分的物品
if len(overLap) == 0: similarity = 0
else: similarity = simMeas(dataMat[overLap,item],dataMat[overLap,j])
print 'the %d and %d similarity is: %f' % (item, j, similarity)
simTotal += similarity
ratSimTotal += similarity * userRating
if simTotal == 0: return 0
else: return ratSimTotal/simTotal
def recommend(dataMat, user, N=3, simMeas=cosSim, estMethod=standEst):
unratedItems = nonzero(dataMat[user,:].A==0)[1]#没有评分的物品
if len(unratedItems) == 0: return 'you rated everything'
itemScores = []
for item in unratedItems:
estimatedScore = estMethod(dataMat, user, simMeas, item)
itemScores.append((item, estimatedScore))
#sorted排序函数,key 是按照关键字排序,lambda是隐函数,固定写法,
#jj表示待排序元祖,jj[1]按照jj的第二列排序,reverse=True,降序;[:N]前N个
return sorted(itemScores, key=lambda jj: jj[1], reverse=True)[:N]
接下来看看他的实际效果,首先对前面给出的矩阵稍加修改:
def loadExData():
return[[0, 0, 0, 2, 2],
[0, 0, 0, 3, 3],
[0, 0, 0, 1, 1],
[1, 1, 1, 0, 0],
[2, 2, 2, 0, 0],
[5, 5, 5, 0, 0],
[1, 1, 1, 0, 0]]
In [20]: import svdRec
...: myMat = mat(svdRec.loadExData())
...: myMat[0,1]=myMat[0,0]=myMat[1,0]=myMat[2,0]=4
...: myMat[3,3]=2
...:
In [21]: myMat
Out[21]:
matrix([[4, 4, 0, 2, 2],
[4, 0, 0, 3, 3],
[4, 0, 0, 1, 1],
...,
[2, 2, 2, 0, 0],
[5, 5, 5, 0, 0],
[1, 1, 1, 0, 0]])
我们先尝试做一些推荐:
In [22]: svdRec.recommend(myMat,2)
the 1 and 0 similarity is: 1.000000
the 1 and 3 similarity is: 0.928746
the 1 and 4 similarity is: 1.000000
the 2 and 0 similarity is: 1.000000
the 2 and 3 similarity is: 1.000000
the 2 and 4 similarity is: 0.000000
Out[22]: [(2, 2.5), (1, 2.0243290220056256)]
下面利用其他相似度计算方法:
In [24]: svdRec.recommend(myMat,2,simMeas=svdRec.ecludSim)
the 1 and 0 similarity is: 1.000000
the 1 and 3 similarity is: 0.309017
the 1 and 4 similarity is: 0.333333
the 2 and 0 similarity is: 1.000000
the 2 and 3 similarity is: 0.500000
the 2 and 4 similarity is: 0.000000
Out[24]: [(2, 3.0), (1, 2.8266504712098603)]
In [25]: svdRec.recommend(myMat,2,simMeas=svdRec.pearsSim)
the 1 and 0 similarity is: 1.000000
the 1 and 3 similarity is: 1.000000
the 1 and 4 similarity is: 1.000000
the 2 and 0 similarity is: 1.000000
the 2 and 3 similarity is: 1.000000
the 2 and 4 similarity is: 0.000000
Out[25]: [(2, 2.5), (1, 2.0)]
实际的数据集会比我们用于展示recommend()函数功能的myMat矩阵稀疏得多。我们载入新的矩阵:
def loadExData2():
return[[2, 0, 0, 4, 4, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 4, 0],
[3, 3, 4, 0, 3, 0, 0, 2, 2, 0, 0],
[5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 5, 0, 0, 5, 0],
[4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4],
[0, 0, 0, 0, 0, 0, 5, 0, 0, 5, 0],
[0, 0, 0, 3, 0, 0, 0, 0, 4, 5, 0],
[1, 1, 2, 1, 1, 2, 1, 0, 4, 5, 0]]
下面我们计算该矩阵的SVD来了解其到底需要但是维特征:
In [39]: import svdRec
...: from numpy import linalg as la
...: U, Sigma, VT = la.svd(mat(svdRec.loadExData2()))
...: Sigma
...:
Out[39]:
array([ 1.34342819e+01, 1.18190832e+01, 8.20176076e+00, ...,
2.08702082e+00, 7.08715931e-01, 1.90990329e-16])
接下来我们看看到底有多少个奇异值能达到总能量的90%。首先对Sigma中的值求平方:
In [40]: Sig2=Sigma**2
In [41]: Sig2
Out[41]:
array([ 1.80479931e+02, 1.39690727e+02, 6.72688795e+01, ...,
4.35565591e+00, 5.02278271e-01, 3.64773057e-32])
In [42]: sum(Sig2)
Out[42]: 497.0
In [43]: sum(Sig2)*0.9
Out[43]: 447.30000000000001
In [44]: sum(Sig2[:2])
Out[44]: 320.17065834028847
In [45]: sum(Sig2[:3])
Out[45]: 387.43953785565782
In [46]: sum(Sig2[:4])
Out[46]: 434.62441339532074
In [47]: sum(Sig2[:5])
Out[47]: 462.61518152879415
所以可以使用11维矩阵转化成一个5维矩阵。
下面对转化后的空间构造出一个相似度计算函数。我们利用SVD将所有的菜肴映射到一个低维空间去:
def svdEst(dataMat, user, simMeas, item):
n = shape(dataMat)[1]
simTotal = 0.0; ratSimTotal = 0.0
U,Sigma,VT = la.svd(dataMat)
Sig4 = mat(eye(4)*Sigma[:4])
xformedItems = dataMat.T * U[:,:4] * Sig4.I
for j in range(n):
userRating = dataMat[user,j]
if userRating == 0 or j==item: continue
similarity = simMeas(xformedItems[item,:].T, xformedItems[j,:].T)
print 'the %d and %d similarity is: %f' % (item, j, similarity)
simTotal += similarity
ratSimTotal += similarity * userRating
if simTotal == 0: return 0
else: return ratSimTotal/simTotal
然后我们看看效果:
In [61]: myMat = mat(svdRec.loadExData2())
In [62]: svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst)
the 0 and 10 similarity is: 0.584526
the 1 and 10 similarity is: 0.342595
the 2 and 10 similarity is: 0.553617
the 3 and 10 similarity is: 0.509334
the 4 and 10 similarity is: 0.478823
the 5 and 10 similarity is: 0.842470
the 6 and 10 similarity is: 0.512666
the 7 and 10 similarity is: 0.320211
the 8 and 10 similarity is: 0.456105
the 9 and 10 similarity is: 0.489873
Out[62]: [(8, 5.0000000000000009), (0, 5.0), (1, 5.0)]
下面尝试另外一种相似度计算方法:
In [63]: svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst, simMeas=svdRec.pearsSim)
the 0 and 10 similarity is: 0.602364
the 1 and 10 similarity is: 0.303884
the 2 and 10 similarity is: 0.513270
the 3 and 10 similarity is: 0.787267
the 4 and 10 similarity is: 0.667888
the 5 and 10 similarity is: 0.833890
the 6 and 10 similarity is: 0.560256
the 7 and 10 similarity is: 0.371606
the 8 and 10 similarity is: 0.520289
the 9 and 10 similarity is: 0.604393
Out[63]: [(0, 5.0), (1, 5.0), (2, 5.0)]
在大型程序中,SVD每天运行一次或者频率更低,并且还要离线运行。冷启动问题(如何在缺乏数据时给出更好的推荐)处理起来也非常困难。
下面是使用SVD实现对图像的压缩,在svdRec.py中加入如下代码:
def printMat(inMat, thresh=0.8):#thresh阈值
for i in range(32):
for k in range(32):
if float(inMat[i,k]) > thresh:
print 1,
else: print 0,
print ''
def imgCompress(numSV=3, thresh=0.8):
myl = []
for line in open('0_5.txt').readlines():
newRow = []
for i in range(32):
newRow.append(int(line[i]))
myl.append(newRow)
myMat = mat(myl)
print "****original matrix******"
printMat(myMat, thresh)
U,Sigma,VT = la.svd(myMat)
#新建全0矩阵重构
SigRecon = mat(zeros((numSV, numSV)))
for k in range(numSV):
SigRecon[k,k] = Sigma[k]
reconMat = U[:,:numSV]*SigRecon*VT[:numSV,:]
print "****reconstructed matrix using %d singular values******" % numSV
printMat(reconMat, thresh)
下面我们看看实际效果:
In [81]: svdRec.imgCompress(2)
****original matrix******
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
****reconstructed matrix using 2 singular values******
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
可以看见,只需要两个奇异值就能相当精确地对图像实现重构。数字的总数目是64+64+2=130,与原来的1024相比实现了很高的压缩比。
在大规模数据集上,SVD的计算和推荐可能是一个很困难的工程问题。通过离线方式来进行SVD分解和相似度计算,是一种减少冗余和推荐时所需时间的方法。下一章将介绍在大数据集上进行机器学习的一些工具。