Mahout源码分析之DistributedLanczosSolver(1)--实战

Mahout版本:0.7,hadoop版本:1.0.4,jdk:1.7.0_25 64bit。

本篇开始系列svd,即降维。这个在mahout中可以直接运行MAHOUT_HOME/mahout/svd -h 即可看到该算法的调用参数,或者在官网相应页面也可以看到,本次实战使用的svd的调用参数如下:

package mahout.fansy.svd;

import org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver;

public class RunSvd {

	/**
	 * 调用svd算法
	 * @throws Exception 
	 */
	public static void main(String[] args) throws Exception {
		String[] arg=new String[]{"-jt","ubuntu:9001","-fs","ubuntu:9000",
				"-i","hdfs://ubuntu:9000/svd/input/wind",
				"-o","hdfs://ubuntu:9000/svd/output",
				"-nr","178","-nc","14",
				"-r","3",
				"-sym","square",
				"--cleansvd","true",
				"--tempDir","hdfs://ubuntu:9000/svd/temp"
		};
		DistributedLanczosSolver.main(arg);
	}

}
在运行这个算法之前,需要转换输入数据,比如使用下面的输入数据:

Mahout源码分析之DistributedLanczosSolver(1)--实战_第1张图片
可以使用下面的代码把上面的文件转换为序列文件,同时value的格式为VectorWritable类型:

package mahout.fansy.utils.svd;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.mahout.common.HadoopUtil;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;

public class ReadAndWriteSeq {

	/*
	 * 私有化 构造函数,只能使用方法调用
	 */
	private ReadAndWriteSeq(){}
	
	/**
	 * 读取文本数据,写入HDFS文件系统格式为<Writable,VectorWritable>
	 * @param input 输入文本文件
	 * @param output 输出HDFS文件(seq格式)
	 * @param jobtracker 使用的jobtracker地址
	 * @param regex  使用的解析类,用于解析文本文件
	 * @return 任务是否成功
	 * @throws IOException
	 */
	public static boolean readAndWriteSeq(String input,String output,String jobtracker) throws IOException{
		boolean flag=true;
		
		Configuration conf=new Configuration();
		conf.set("mapred.job.tracker", jobtracker);
		
		FileSystem fsIn = FileSystem.get(URI.create(input), conf);
		FileSystem fsOut = FileSystem.get(URI.create(output), conf);
		HadoopUtil.delete(conf, new Path(output));
	    Path pathIn = new Path(input);
	    Path pathOut = new Path(output);
	    
	    SequenceFile.Writer writer = null;
	    FSDataInputStream in = fsIn.open(pathIn);
	    IntWritable key=new IntWritable(0);
	    VectorWritable value=new VectorWritable();
	    try {
	      writer = SequenceFile.createWriter(fsOut, conf, pathOut,
	              key.getClass(), value.getClass());
	      String line=null;
	      int length=0;
	      int row=1;
	      while ((line=in.readLine())!=null) {
	    	  String[] vs=line.split(",");
	    	  length=vs.length;
	    	  Vector vector=new RandomAccessSparseVector(length);
	    	  for(int i=0;i<length;i++){
	    		  vector.set(i, Double.parseDouble(vs[i]));
	    	  }
	    	  value.set(vector);
	    	  key.set(row++);
	    	  writer.append(key, value);
	      }
	    } catch(IOException e){
	    	flag=false;
	    }finally {
	       IOUtils.closeStream(writer);
	       in.close();
	    }
		return flag;
	}
}
运行后的在HDFS中一共有三个输出,其中一个是temp的,即临时的,还有两个一个是 rawEigenvectors,如下:

Mahout源码分析之DistributedLanczosSolver(1)--实战_第2张图片

一个是:cleanEigenvectors(最后的输出):


终端的显示信息如下:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/workspase/mahout/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/workspase/mahout/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/10/28 00:14:13 WARN fs.FileSystem: "ubuntu:9000" is a deprecated filesystem name. Use "hdfs://ubuntu:9000/" instead.
13/10/28 00:14:13 INFO common.AbstractJob: Command line arguments: {--cleansvd=[true], --endPhase=[2147483647], --inMemory=[false], --input=[hdfs://ubuntu:9000/svd/input/wind], --maxError=[0.05], --minEigenvalue=[0.0], --numCols=[14], --numRows=[178], --output=[hdfs://ubuntu:9000/svd/output], --rank=[3], --startPhase=[0], --symmetric=[square], --tempDir=[hdfs://ubuntu:9000/svd/temp]}
13/10/28 00:15:43 INFO lanczos.LanczosSolver: Finding 3 singular vectors of matrix with 178 rows, via Lanczos
13/10/28 00:15:45 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/28 00:15:49 INFO mapred.JobClient: Running job: job_201310220012_0023
13/10/28 00:15:50 INFO mapred.JobClient:  map 0% reduce 0%
13/10/28 00:18:40 INFO mapred.JobClient:  map 100% reduce 0%
13/10/28 00:19:11 INFO mapred.JobClient:  map 100% reduce 100%
13/10/28 00:19:16 INFO mapred.JobClient: Job complete: job_201310220012_0023
13/10/28 00:19:16 INFO mapred.JobClient: Counters: 30
13/10/28 00:19:16 INFO mapred.JobClient:   Job Counters 
13/10/28 00:19:16 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/28 00:19:16 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=241868
13/10/28 00:19:16 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/28 00:19:16 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/28 00:19:16 INFO mapred.JobClient:     Launched map tasks=2
13/10/28 00:19:16 INFO mapred.JobClient:     Data-local map tasks=2
13/10/28 00:19:16 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=28198
13/10/28 00:19:16 INFO mapred.JobClient:   File Input Format Counters 
13/10/28 00:19:16 INFO mapred.JobClient:     Bytes Read=23216
13/10/28 00:19:16 INFO mapred.JobClient:   File Output Format Counters 
13/10/28 00:19:16 INFO mapred.JobClient:     Bytes Written=220
13/10/28 00:19:16 INFO mapred.JobClient:   FileSystemCounters
13/10/28 00:19:16 INFO mapred.JobClient:     FILE_BYTES_READ=238
13/10/28 00:19:16 INFO mapred.JobClient:     HDFS_BYTES_READ=23967
13/10/28 00:19:16 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70111
13/10/28 00:19:16 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=220
13/10/28 00:19:16 INFO mapred.JobClient:   Map-Reduce Framework
13/10/28 00:19:16 INFO mapred.JobClient:     Map output materialized bytes=244
13/10/28 00:19:16 INFO mapred.JobClient:     Map input records=178
13/10/28 00:19:16 INFO mapred.JobClient:     Reduce shuffle bytes=244
13/10/28 00:19:16 INFO mapred.JobClient:     Spilled Records=4
13/10/28 00:19:16 INFO mapred.JobClient:     Map output bytes=228
13/10/28 00:19:16 INFO mapred.JobClient:     Total committed heap usage (bytes)=246685696
13/10/28 00:19:16 INFO mapred.JobClient:     CPU time spent (ms)=53940
13/10/28 00:19:16 INFO mapred.JobClient:     Map input bytes=21094
13/10/28 00:19:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=172
13/10/28 00:19:16 INFO mapred.JobClient:     Combine input records=2
13/10/28 00:19:16 INFO mapred.JobClient:     Reduce input records=2
13/10/28 00:19:16 INFO mapred.JobClient:     Reduce input groups=1
13/10/28 00:19:16 INFO mapred.JobClient:     Combine output records=2
13/10/28 00:19:16 INFO mapred.JobClient:     Physical memory (bytes) snapshot=414433280
13/10/28 00:19:16 INFO mapred.JobClient:     Reduce output records=1
13/10/28 00:19:16 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2934288384
13/10/28 00:19:16 INFO mapred.JobClient:     Map output records=2
13/10/28 00:19:16 INFO lanczos.LanczosSolver: 1 passes through the corpus so far...
13/10/28 00:19:17 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/28 00:19:18 INFO mapred.JobClient: Running job: job_201310220012_0024
13/10/28 00:19:19 INFO mapred.JobClient:  map 0% reduce 0%
13/10/28 00:20:06 INFO mapred.JobClient:  map 100% reduce 0%
13/10/28 00:20:25 INFO mapred.JobClient:  map 100% reduce 100%
13/10/28 00:20:30 INFO mapred.JobClient: Job complete: job_201310220012_0024
13/10/28 00:20:30 INFO mapred.JobClient: Counters: 30
13/10/28 00:20:30 INFO mapred.JobClient:   Job Counters 
13/10/28 00:20:30 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/28 00:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=79120
13/10/28 00:20:30 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/28 00:20:30 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/28 00:20:30 INFO mapred.JobClient:     Launched map tasks=2
13/10/28 00:20:30 INFO mapred.JobClient:     Data-local map tasks=2
13/10/28 00:20:30 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=15191
13/10/28 00:20:30 INFO mapred.JobClient:   File Input Format Counters 
13/10/28 00:20:30 INFO mapred.JobClient:     Bytes Read=23216
13/10/28 00:20:30 INFO mapred.JobClient:   File Output Format Counters 
13/10/28 00:20:30 INFO mapred.JobClient:     Bytes Written=220
13/10/28 00:20:30 INFO mapred.JobClient:   FileSystemCounters
13/10/28 00:20:30 INFO mapred.JobClient:     FILE_BYTES_READ=238
13/10/28 00:20:30 INFO mapred.JobClient:     HDFS_BYTES_READ=23967
13/10/28 00:20:30 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70105
13/10/28 00:20:30 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=220
13/10/28 00:20:30 INFO mapred.JobClient:   Map-Reduce Framework
13/10/28 00:20:30 INFO mapred.JobClient:     Map output materialized bytes=244
13/10/28 00:20:30 INFO mapred.JobClient:     Map input records=178
13/10/28 00:20:30 INFO mapred.JobClient:     Reduce shuffle bytes=244
13/10/28 00:20:30 INFO mapred.JobClient:     Spilled Records=4
13/10/28 00:20:30 INFO mapred.JobClient:     Map output bytes=228
13/10/28 00:20:30 INFO mapred.JobClient:     Total committed heap usage (bytes)=246685696
13/10/28 00:20:30 INFO mapred.JobClient:     CPU time spent (ms)=18830
13/10/28 00:20:30 INFO mapred.JobClient:     Map input bytes=21094
13/10/28 00:20:30 INFO mapred.JobClient:     SPLIT_RAW_BYTES=172
13/10/28 00:20:30 INFO mapred.JobClient:     Combine input records=2
13/10/28 00:20:30 INFO mapred.JobClient:     Reduce input records=2
13/10/28 00:20:30 INFO mapred.JobClient:     Reduce input groups=1
13/10/28 00:20:30 INFO mapred.JobClient:     Combine output records=2
13/10/28 00:20:30 INFO mapred.JobClient:     Physical memory (bytes) snapshot=406294528
13/10/28 00:20:30 INFO mapred.JobClient:     Reduce output records=1
13/10/28 00:20:30 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2934288384
13/10/28 00:20:30 INFO mapred.JobClient:     Map output records=2
13/10/28 00:20:30 INFO lanczos.LanczosSolver: 2 passes through the corpus so far...
13/10/28 00:20:30 INFO lanczos.LanczosSolver: Lanczos iteration complete - now to diagonalize the tri-diagonal auxiliary matrix.
13/10/28 00:20:30 INFO lanczos.LanczosSolver: Eigenvector 0 found with eigenvalue 0.0
13/10/28 00:20:30 INFO lanczos.LanczosSolver: Eigenvector 1 found with eigenvalue 190.66366814935847
13/10/28 00:20:30 INFO lanczos.LanczosSolver: Eigenvector 2 found with eigenvalue 10886.664434372891
13/10/28 00:20:30 INFO lanczos.LanczosSolver: LanczosSolver finished.
13/10/28 00:20:30 INFO decomposer.DistributedLanczosSolver: Persisting 3 eigenVectors and eigenValues to: hdfs://ubuntu:9000/svd/output/rawEigenvectors
13/10/28 00:20:31 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/28 00:20:31 INFO mapred.JobClient: Running job: job_201310220012_0025
13/10/28 00:20:32 INFO mapred.JobClient:  map 0% reduce 0%
13/10/28 00:21:06 INFO mapred.JobClient:  map 50% reduce 0%
13/10/28 00:21:12 INFO mapred.JobClient:  map 100% reduce 0%
13/10/28 00:21:30 INFO mapred.JobClient:  map 100% reduce 100%
13/10/28 00:21:35 INFO mapred.JobClient: Job complete: job_201310220012_0025
13/10/28 00:21:35 INFO mapred.JobClient: Counters: 30
13/10/28 00:21:35 INFO mapred.JobClient:   Job Counters 
13/10/28 00:21:35 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/28 00:21:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=61438
13/10/28 00:21:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/28 00:21:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/28 00:21:35 INFO mapred.JobClient:     Launched map tasks=2
13/10/28 00:21:35 INFO mapred.JobClient:     Data-local map tasks=2
13/10/28 00:21:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=17342
13/10/28 00:21:35 INFO mapred.JobClient:   File Input Format Counters 
13/10/28 00:21:35 INFO mapred.JobClient:     Bytes Read=23216
13/10/28 00:21:35 INFO mapred.JobClient:   File Output Format Counters 
13/10/28 00:21:35 INFO mapred.JobClient:     Bytes Written=220
13/10/28 00:21:35 INFO mapred.JobClient:   FileSystemCounters
13/10/28 00:21:35 INFO mapred.JobClient:     FILE_BYTES_READ=238
13/10/28 00:21:35 INFO mapred.JobClient:     HDFS_BYTES_READ=24061
13/10/28 00:21:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70108
13/10/28 00:21:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=220
13/10/28 00:21:35 INFO mapred.JobClient:   Map-Reduce Framework
13/10/28 00:21:35 INFO mapred.JobClient:     Map output materialized bytes=244
13/10/28 00:21:35 INFO mapred.JobClient:     Map input records=178
13/10/28 00:21:35 INFO mapred.JobClient:     Reduce shuffle bytes=244
13/10/28 00:21:35 INFO mapred.JobClient:     Spilled Records=4
13/10/28 00:21:35 INFO mapred.JobClient:     Map output bytes=228
13/10/28 00:21:35 INFO mapred.JobClient:     Total committed heap usage (bytes)=291512320
13/10/28 00:21:35 INFO mapred.JobClient:     CPU time spent (ms)=13100
13/10/28 00:21:35 INFO mapred.JobClient:     Map input bytes=21094
13/10/28 00:21:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=172
13/10/28 00:21:35 INFO mapred.JobClient:     Combine input records=2
13/10/28 00:21:35 INFO mapred.JobClient:     Reduce input records=2
13/10/28 00:21:35 INFO mapred.JobClient:     Reduce input groups=1
13/10/28 00:21:35 INFO mapred.JobClient:     Combine output records=2
13/10/28 00:21:35 INFO mapred.JobClient:     Physical memory (bytes) snapshot=417210368
13/10/28 00:21:35 INFO mapred.JobClient:     Reduce output records=1
13/10/28 00:21:35 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2934288384
13/10/28 00:21:35 INFO mapred.JobClient:     Map output records=2
13/10/28 00:21:36 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/28 00:21:36 INFO mapred.JobClient: Running job: job_201310220012_0026
13/10/28 00:21:37 INFO mapred.JobClient:  map 0% reduce 0%
13/10/28 00:22:11 INFO mapred.JobClient:  map 100% reduce 0%
13/10/28 00:22:29 INFO mapred.JobClient:  map 100% reduce 100%
13/10/28 00:22:34 INFO mapred.JobClient: Job complete: job_201310220012_0026
13/10/28 00:22:34 INFO mapred.JobClient: Counters: 30
13/10/28 00:22:34 INFO mapred.JobClient:   Job Counters 
13/10/28 00:22:34 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/28 00:22:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=46990
13/10/28 00:22:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/28 00:22:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/28 00:22:34 INFO mapred.JobClient:     Launched map tasks=2
13/10/28 00:22:34 INFO mapred.JobClient:     Data-local map tasks=2
13/10/28 00:22:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=15142
13/10/28 00:22:34 INFO mapred.JobClient:   File Input Format Counters 
13/10/28 00:22:34 INFO mapred.JobClient:     Bytes Read=23216
13/10/28 00:22:34 INFO mapred.JobClient:   File Output Format Counters 
13/10/28 00:22:34 INFO mapred.JobClient:     Bytes Written=220
13/10/28 00:22:34 INFO mapred.JobClient:   FileSystemCounters
13/10/28 00:22:34 INFO mapred.JobClient:     FILE_BYTES_READ=238
13/10/28 00:22:34 INFO mapred.JobClient:     HDFS_BYTES_READ=24061
13/10/28 00:22:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70105
13/10/28 00:22:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=220
13/10/28 00:22:34 INFO mapred.JobClient:   Map-Reduce Framework
13/10/28 00:22:34 INFO mapred.JobClient:     Map output materialized bytes=244
13/10/28 00:22:34 INFO mapred.JobClient:     Map input records=178
13/10/28 00:22:34 INFO mapred.JobClient:     Reduce shuffle bytes=244
13/10/28 00:22:34 INFO mapred.JobClient:     Spilled Records=4
13/10/28 00:22:34 INFO mapred.JobClient:     Map output bytes=228
13/10/28 00:22:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=336338944
13/10/28 00:22:34 INFO mapred.JobClient:     CPU time spent (ms)=4670
13/10/28 00:22:34 INFO mapred.JobClient:     Map input bytes=21094
13/10/28 00:22:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=172
13/10/28 00:22:34 INFO mapred.JobClient:     Combine input records=2
13/10/28 00:22:34 INFO mapred.JobClient:     Reduce input records=2
13/10/28 00:22:34 INFO mapred.JobClient:     Reduce input groups=1
13/10/28 00:22:34 INFO mapred.JobClient:     Combine output records=2
13/10/28 00:22:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=426409984
13/10/28 00:22:34 INFO mapred.JobClient:     Reduce output records=1
13/10/28 00:22:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2934288384
13/10/28 00:22:34 INFO mapred.JobClient:     Map output records=2
13/10/28 00:22:34 INFO mapred.FileInputFormat: Total input paths to process : 1
13/10/28 00:22:35 INFO mapred.JobClient: Running job: job_201310220012_0027
13/10/28 00:22:36 INFO mapred.JobClient:  map 0% reduce 0%
13/10/28 00:22:54 INFO mapred.JobClient:  map 100% reduce 0%
13/10/28 00:23:03 INFO mapred.JobClient:  map 100% reduce 33%
13/10/28 00:23:09 INFO mapred.JobClient:  map 100% reduce 100%
13/10/28 00:23:14 INFO mapred.JobClient: Job complete: job_201310220012_0027
13/10/28 00:23:14 INFO mapred.JobClient: Counters: 30
13/10/28 00:23:14 INFO mapred.JobClient:   Job Counters 
13/10/28 00:23:14 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/28 00:23:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=25276
13/10/28 00:23:14 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/28 00:23:14 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/28 00:23:14 INFO mapred.JobClient:     Launched map tasks=2
13/10/28 00:23:14 INFO mapred.JobClient:     Data-local map tasks=2
13/10/28 00:23:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14956
13/10/28 00:23:14 INFO mapred.JobClient:   File Input Format Counters 
13/10/28 00:23:14 INFO mapred.JobClient:     Bytes Read=23216
13/10/28 00:23:14 INFO mapred.JobClient:   File Output Format Counters 
13/10/28 00:23:14 INFO mapred.JobClient:     Bytes Written=220
13/10/28 00:23:14 INFO mapred.JobClient:   FileSystemCounters
13/10/28 00:23:14 INFO mapred.JobClient:     FILE_BYTES_READ=238
13/10/28 00:23:14 INFO mapred.JobClient:     HDFS_BYTES_READ=24031
13/10/28 00:23:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=70105
13/10/28 00:23:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=220
13/10/28 00:23:14 INFO mapred.JobClient:   Map-Reduce Framework
13/10/28 00:23:14 INFO mapred.JobClient:     Map output materialized bytes=244
13/10/28 00:23:14 INFO mapred.JobClient:     Map input records=178
13/10/28 00:23:14 INFO mapred.JobClient:     Reduce shuffle bytes=244
13/10/28 00:23:14 INFO mapred.JobClient:     Spilled Records=4
13/10/28 00:23:14 INFO mapred.JobClient:     Map output bytes=228
13/10/28 00:23:14 INFO mapred.JobClient:     Total committed heap usage (bytes)=336338944
13/10/28 00:23:14 INFO mapred.JobClient:     CPU time spent (ms)=2680
13/10/28 00:23:14 INFO mapred.JobClient:     Map input bytes=21094
13/10/28 00:23:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=172
13/10/28 00:23:14 INFO mapred.JobClient:     Combine input records=2
13/10/28 00:23:14 INFO mapred.JobClient:     Reduce input records=2
13/10/28 00:23:14 INFO mapred.JobClient:     Reduce input groups=1
13/10/28 00:23:14 INFO mapred.JobClient:     Combine output records=2
13/10/28 00:23:14 INFO mapred.JobClient:     Physical memory (bytes) snapshot=423587840
13/10/28 00:23:14 INFO mapred.JobClient:     Reduce output records=1
13/10/28 00:23:14 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2934288384
13/10/28 00:23:14 INFO mapred.JobClient:     Map output records=2

具体明天再分析吧


分享,成长,快乐

转载请注明blog地址:http://blog.csdn.net/fansy1990


你可能感兴趣的:(Mahout,降维)