Mahout版本:0.7,hadoop版本:1.0.4,jdk:1.7.0_25 64bit。
接上篇,当数据准备完成后,就可以来分析其数据流了。
首先要分析的是new QRDecomposition(Ai),这个初始化QRDecomposition就做了好多的事情,具体分析如下:
先贴上源码,然后再分析:
public QRDecomposition(Matrix a) { // Initialize. qr = a.clone(); originalRows = a.numRows(); originalColumns = a.numCols(); rDiag = new DenseVector(originalColumns); // precompute and cache some views to avoid regenerating them time and again Vector[] QRcolumnsPart = new Vector[originalColumns]; for (int k = 0; k < originalColumns; k++) { QRcolumnsPart[k] = qr.viewColumn(k).viewPart(k, originalRows - k); } // Main loop. for (int k = 0; k < originalColumns; k++) { //DoubleMatrix1D QRcolk = QR.viewColumn(k).viewPart(k,m-k); // Compute 2-norm of k-th column without under/overflow. double nrm = 0; //if (k<m) nrm = QRcolumnsPart[k].aggregate(hypot,F.identity); for (int i = k; i < originalRows; i++) { // fixes bug reported by [email protected] nrm = Algebra.hypot(nrm, qr.getQuick(i, k)); } if (nrm != 0.0) { // Form k-th Householder vector. if (qr.getQuick(k, k) < 0) { nrm = -nrm; } QRcolumnsPart[k].assign(Functions.div(nrm)); /* for (int i = k; i < m; i++) { QR[i][k] /= nrm; } */ qr.setQuick(k, k, qr.getQuick(k, k) + 1); // Apply transformation to remaining columns. for (int j = k + 1; j < originalColumns; j++) { Vector QRcolj = qr.viewColumn(j).viewPart(k, originalRows - k); double s = QRcolumnsPart[k].dot(QRcolj); /* // fixes bug reported by John Chambers DoubleMatrix1D QRcolj = QR.viewColumn(j).viewPart(k,m-k); double s = QRcolumnsPart[k].zDotProduct(QRcolumns[j]); double s = 0.0; for (int i = k; i < m; i++) { s += QR[i][k]*QR[i][j]; } */ s = -s / qr.getQuick(k, k); //QRcolumnsPart[j].assign(QRcolumns[k], F.plusMult(s)); for (int i = k; i < originalRows; i++) { qr.setQuick(i, j, qr.getQuick(i, j) + s * qr.getQuick(i, k)); } } } rDiag.setQuick(k, -nrm); } }初始化qr矩阵:就是Ai的clone值
[[31.678402777777777, 4.08661209859189, 4.573918596524476], [4.08661209859189, 1.0203966547288652, 0.3987296589988406], [4.573918596524476, 0.3987296589988406, 1.059026647737198]]QRcolumnsPart: 截取qr的半个矩阵(对角线)
[{0:31.678402777777777,1:4.08661209859189,2:4.573918596524476}, {0:1.0203966547288652,1:0.3987296589988406}, {0:1.059026647737198}]
nrm:
32.26673724322168assign函数:不仅仅是更新了QRcolumnsPart值,同时更新了qr的值,QRcolumnsPart[k].assign(Functions.div(nrm)),这里因为QRcolumnsPart和qr中的部分引用是一样的,如下图:
对于QRcolumnsPart来说,做了如下更新(div_num在程序中的使用的变量是nrm):
设 div_num=(sqrt(QRcolumnsPart[k][0]^2+QRcolumnsPart[k][1]^2+...QRcolumnsPart[k][size-1]^2))
QRcolumnsPart[k]=QRcolumnsPart[k]/div_num
其中QRcolumnsPart[k][i]是QRcolumnsPart[k]中的第i个元素,size是QRcolumnsPart[k]中一共含有的元素个数;
QRcolumnsPart更新为:
{0:0.9817665337214256,1:0.12665092438034994,2:0.14175336545641365} {0:1.0203966547288652,1:0.3987296589988406} {0:1.059026647737198}对于qr,做了如下修改,因为qr存储的地址和QRcolumnsPart中的matrix中的values地址一样,所以修改了QRcolumnsPart 就会导致qr也被修改
[[0.9817665337214256, 4.08661209859189, 4.573918596524476], [0.12665092438034994, 1.0203966547288652, 0.3987296589988406], [0.14175336545641365, 0.3987296589988406, 1.059026647737198]]然后qr的对角线(row=col)自增1:
[[1.9817665337214256, 4.08661209859189, 4.573918596524476], [0.12665092438034994, 1.0203966547288652, 0.3987296589988406], [0.14175336545641365, 0.3987296589988406, 1.059026647737198]]QRcolumnsPart更新为:
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, -- 主循环改变的值QRcolumnsPart[k] {0:1.0203966547288652,1:0.3987296589988406}, {0:1.059026647737198}]内层for循环 ,循环从k+1开始,终止为Ai的列数,用j来表示次数
{0:4.08661209859189,1:1.0203966547288652,2:0.3987296589988406}s:就是QRcolumnsPart[k]和QRcolj的点乘,即每项对应相乘
8.28446654391689s重新赋值:s = -s / qr.getQuick(k, k);
-4.180344355881341qr:把qr的第j列第k行开始更新为原始值+s*第k列的相应值
[[1.9817665337214256, -4.197854445325, 4.573918596524476], [0.12665092438034994, 0.4909521778283148, 0.3987296589988406], [0.14175336545641365, -0.19384822221406328, 1.059026647737198]]QRcolumnsPart更新为:QRcolumnsPart截取的是qr的左下部分,和qr一直保持一致
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:0.4909521778283148,1:-0.19384822221406328}, -- 内循环改变的值QRcolumnsPart[k+1] {0:1.059026647737198}]--------------------------------sub-second time j=j+1=2,k=0
{0:4.573918596524476,1:0.3987296589988406,2:1.059026647737198}s:就是QRcolumnsPart[k]和QRcolj的点乘,即每项对应相乘
9.265058873873116s重新赋值:s = -s / qr.getQuick(k, k);
-4.675151545966863qr:把qr的第j列第k行开始更新为原始值+s*第k列的相应值
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 0.4909521778283148, -0.1933826059160847], [0.14175336545641365, -0.19384822221406328, 0.39630818207763985]]QRcolumnsPart更新为:QRcolumnsPart截取的是qr的左下部分,和qr一直保持一致
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:0.4909521778283148,1:-0.19384822221406328}, {0:0.39630818207763985}] -- 内循环改变的值QRcolumnsPart[k+2]---------------------------------------- 内循环结束,可以看到外循环当k=0时设置的是QRcolumnsPart[0]的值
0.527836313803738QRcolumnsPart更新为: 经过assign函数后QRcolumnsPart[k]和qr(qr中和QRcolumnsPart[k]对应的部分)都会改变
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:0.9301220188705365,1:-0.36725063650346085}, {0:0.39630818207763985}]qr:
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 0.9301220188705365, -0.1933826059160847], [0.14175336545641365, -0.36725063650346085, 0.39630818207763985]]更新qr,对角线(row=col)加1
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 1.9301220188705366, -0.1933826059160847], [0.14175336545641365, -0.36725063650346085, 0.39630818207763985]]QRcolumnsPart:
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:1.9301220188705366,1:-0.36725063650346085}, {0:0.39630818207763985}
{0:-0.1933826059160847,1:0.39630818207763985}s:就是QRcolumnsPart[k]和QRcolj的点乘,即每项对应相乘
-0.5187964578647415s重新赋值:s = -s / qr.getQuick(k, k);
0.2687894613876947qr:把qr的第j列第k行开始更新为原始值+s*第k列的相应值
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 1.9301220188705366, 0.3254138519486568], [0.14175336545641365, -0.36725063650346085, 0.29759508129758655]]QRcolumnsPart更新为:QRcolumnsPart截取的是qr的左下部分,和qr一直保持一致
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:1.9301220188705366,1:-0.36725063650346085}, {0:0.29759508129758655}]--------------------------------------------内层循环结束
{0:-32.26673724322168,1:-0.527836313803738}++++++++++++++++++++++++++++++++---------------------------------------------------third time k=2
QRcolumnsPart更新为: 经过assign函数后QRcolumnsPart[k]和qr(qr中和QRcolumnsPart[k]对应的部分)都会改变
qr:
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 1.9301220188705366, 0.3254138519486568], [0.14175336545641365, -0.36725063650346085, 0.9999999999999999]]QRcolumnsPart:
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:1.9301220188705366,1:-0.36725063650346085}, {0:0.9999999999999999}]更新qr,对角线(row=col)加1
[[1.9817665337214256, -4.197854445325, -4.69114027734864], [0.12665092438034994, 1.9301220188705366, 0.3254138519486568], [0.14175336545641365, -0.36725063650346085, 2.0]]QRcolumnsPart:
[{0:1.9817665337214256,1:0.12665092438034994,2:0.14175336545641365}, {0:1.9301220188705366,1:-0.36725063650346085}, {0:2.0}]内层for循环 ,循环从k+1开始,终止为Ai的列数,用j来表示次数
至此,new QRDecomposition(Ai)算是分析完毕。下篇分析new QRDecomposition(Ai).solve(Vi).viewColumn(0)。好吧,我也快吐了。。。
分享,成长,快乐
转载请注明blog地址:http://blog.csdn.net/fansy1990