Hadoop 2.5.1学习笔记3:关于Combiner

如果把前面的例子加上Combiner.class

public static class Combiner extends Reducer<Text, Text, Text, Text> {
  public void reduce(Text key, Iterable<Text> values, Context context)
    throws IOException, InterruptedException {
   long  count = 0;
   for (Text val : values) {
    count+=Long.parseLong(val.toString());
   }
   context.write(key, new Text(""+count));
  }

 }

 

然后指定 job.setCombinerClass(Combiner.class);

可以观察下两个的效率区别:

4/11/07 14:49:25 INFO mapreduce.Job: Counters: 38
 File System Counters
  FILE: Number of bytes read=52642504
  FILE: Number of bytes written=95200714
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=608036374
  HDFS: Number of bytes written=423
  HDFS: Number of read operations=22
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=5
 Map-Reduce Framework
  Map input records=2923923
  Map output records=2923923
  Map output bytes=20467464
  Map output materialized bytes=26315322
  Input split bytes=212
  Combine input records=0
  Combine output records=0
  Reduce input groups=38
  Reduce shuffle bytes=26315322
  Reduce input records=2923923
  Reduce output records=38
  Spilled Records=5847846
  Shuffled Maps =2
  Failed Shuffles=0
  Merged Map outputs=2
  GC time elapsed (ms)=252
  CPU time spent (ms)=0
  Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
  Total committed heap usage (bytes)=1150484480
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=236907275
 File Output Format Counters
  Bytes Written=423

 

 

使用后的:

14/11/07 16:04:49 INFO mapreduce.Job: Counters: 38
 File System Counters
  FILE: Number of bytes read=16224
  FILE: Number of bytes written=704061
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=608036374
  HDFS: Number of bytes written=423
  HDFS: Number of read operations=22
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=5
 Map-Reduce Framework
  Map input records=2923923
  Map output records=2923923
  Map output bytes=20467464
  Map output materialized bytes=523
  Input split bytes=212
  Combine input records=2923923
  Combine output records=39
  Reduce input groups=38
  Reduce shuffle bytes=523
  Reduce input records=39
  Reduce output records=38
  Spilled Records=78
  Shuffled Maps =2
  Failed Shuffles=0
  Merged Map outputs=2
  GC time elapsed (ms)=281
  CPU time spent (ms)=0
  Physical memory (bytes) snapshot=0
  Virtual memory (bytes) snapshot=0
  Total committed heap usage (bytes)=1154875392
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=0
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 File Input Format Counters
  Bytes Read=236907275
 File Output Format Counters
  Bytes Written=423

 

 

第一次耗费 28秒

第二次耗费21秒。

你可能感兴趣的:(hadoop)