1. 计数器
1) 内置计数器
2) 用户自定义Java计数器
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar MaxTemperatureWithCounters input/ncdc/all max-temp 12/07/03 19:53:21 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 19:53:21 INFO mapred.JobClient: Running job: job_201207030133_0002 12/07/03 19:53:22 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 19:53:37 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 19:53:49 INFO mapred.JobClient: map 100% reduce 100% 12/07/03 19:53:54 INFO mapred.JobClient: Job complete: job_201207030133_0002 12/07/03 19:53:54 INFO mapred.JobClient: Counters: 29 12/07/03 19:53:54 INFO mapred.JobClient: Job Counters 12/07/03 19:53:54 INFO mapred.JobClient: Launched reduce tasks=1 12/07/03 19:53:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16305 12/07/03 19:53:54 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 19:53:54 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 19:53:54 INFO mapred.JobClient: Launched map tasks=2 12/07/03 19:53:54 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 19:53:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10068 12/07/03 19:53:54 INFO mapred.JobClient: File Input Format Counters 12/07/03 19:53:54 INFO mapred.JobClient: Bytes Read=147972 12/07/03 19:53:54 INFO mapred.JobClient: File Output Format Counters 12/07/03 19:53:54 INFO mapred.JobClient: Bytes Written=18 12/07/03 19:53:54 INFO mapred.JobClient: FileSystemCounters 12/07/03 19:53:54 INFO mapred.JobClient: FILE_BYTES_READ=28 12/07/03 19:53:54 INFO mapred.JobClient: HDFS_BYTES_READ=148184 12/07/03 19:53:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=62992 12/07/03 19:53:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18 12/07/03 19:53:54 INFO mapred.JobClient: TemperatureQuality 12/07/03 19:53:54 INFO mapred.JobClient: 1=13129 12/07/03 19:53:54 INFO mapred.JobClient: 9=1 12/07/03 19:53:54 INFO mapred.JobClient: Air Temperature Records 12/07/03 19:53:54 INFO mapred.JobClient: Missing=1 12/07/03 19:53:54 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 19:53:54 INFO mapred.JobClient: Map output materialized bytes=34 12/07/03 19:53:54 INFO mapred.JobClient: Map input records=13130 12/07/03 19:53:54 INFO mapred.JobClient: Reduce shuffle bytes=34 12/07/03 19:53:54 INFO mapred.JobClient: Spilled Records=4 12/07/03 19:53:54 INFO mapred.JobClient: Map output bytes=118161 12/07/03 19:53:54 INFO mapred.JobClient: Map input bytes=1777168 12/07/03 19:53:54 INFO mapred.JobClient: Combine input records=13129 12/07/03 19:53:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=212 12/07/03 19:53:54 INFO mapred.JobClient: Reduce input records=2 12/07/03 19:53:54 INFO mapred.JobClient: Reduce input groups=2 12/07/03 19:53:54 INFO mapred.JobClient: Combine output records=2 12/07/03 19:53:54 INFO mapred.JobClient: Reduce output records=2 12/07/03 19:53:54 INFO mapred.JobClient: Map output records=13129
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar MissingTemperatureFields job_201207030133_0002 Records with missing temperature fields: 0.01%
对数据进行排序是MapReduce的核心。
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar SortDataPreprocessor input/ncdc/all input/ncdc/all-seq 12/07/03 20:55:15 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 20:55:16 INFO mapred.JobClient: Running job: job_201207030133_0003 12/07/03 20:55:17 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 20:55:30 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 20:55:35 INFO mapred.JobClient: Job complete: job_201207030133_0003 12/07/03 20:55:35 INFO mapred.JobClient: Counters: 16 12/07/03 20:55:35 INFO mapred.JobClient: Job Counters 12/07/03 20:55:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16560 12/07/03 20:55:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 20:55:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 20:55:35 INFO mapred.JobClient: Launched map tasks=2 12/07/03 20:55:35 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 20:55:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/07/03 20:55:35 INFO mapred.JobClient: File Input Format Counters 12/07/03 20:55:35 INFO mapred.JobClient: Bytes Read=147972 12/07/03 20:55:35 INFO mapred.JobClient: File Output Format Counters 12/07/03 20:55:35 INFO mapred.JobClient: Bytes Written=163409 12/07/03 20:55:35 INFO mapred.JobClient: FileSystemCounters 12/07/03 20:55:35 INFO mapred.JobClient: HDFS_BYTES_READ=148184 12/07/03 20:55:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=41754 12/07/03 20:55:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=163409 12/07/03 20:55:35 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 20:55:35 INFO mapred.JobClient: Map input records=13130 12/07/03 20:55:35 INFO mapred.JobClient: Spilled Records=0 12/07/03 20:55:35 INFO mapred.JobClient: Map input bytes=1777168 12/07/03 20:55:35 INFO mapred.JobClient: Map output records=13129 12/07/03 20:55:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=212
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar SortByTemperatureUsingHashPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashsort < 12/07/03 22:28:32 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 22:28:33 INFO mapred.JobClient: Running job: job_201207030133_0004 12/07/03 22:28:34 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 22:28:47 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 22:28:59 INFO mapred.JobClient: map 100% reduce 3% 12/07/03 22:29:02 INFO mapred.JobClient: map 100% reduce 6% 12/07/03 22:29:08 INFO mapred.JobClient: map 100% reduce 10% 12/07/03 22:29:11 INFO mapred.JobClient: map 100% reduce 13% 12/07/03 22:29:23 INFO mapred.JobClient: map 100% reduce 20% 12/07/03 22:29:32 INFO mapred.JobClient: map 100% reduce 23% 12/07/03 22:29:38 INFO mapred.JobClient: map 100% reduce 26% 12/07/03 22:29:41 INFO mapred.JobClient: map 100% reduce 30% 12/07/03 22:29:47 INFO mapred.JobClient: map 100% reduce 33% 12/07/03 22:29:56 INFO mapred.JobClient: map 100% reduce 36% 12/07/03 22:30:02 INFO mapred.JobClient: map 100% reduce 40% 12/07/03 22:30:05 INFO mapred.JobClient: map 100% reduce 43% 12/07/03 22:30:11 INFO mapred.JobClient: map 100% reduce 46% 12/07/03 22:30:14 INFO mapred.JobClient: map 100% reduce 50% 12/07/03 22:30:23 INFO mapred.JobClient: map 100% reduce 53% 12/07/03 22:30:29 INFO mapred.JobClient: map 100% reduce 56% 12/07/03 22:30:35 INFO mapred.JobClient: map 100% reduce 60% 12/07/03 22:30:38 INFO mapred.JobClient: map 100% reduce 63% 12/07/03 22:30:44 INFO mapred.JobClient: map 100% reduce 66% 12/07/03 22:30:47 INFO mapred.JobClient: map 100% reduce 70% 12/07/03 22:30:59 INFO mapred.JobClient: map 100% reduce 73% 12/07/03 22:31:02 INFO mapred.JobClient: map 100% reduce 76% 12/07/03 22:31:08 INFO mapred.JobClient: map 100% reduce 80% 12/07/03 22:31:11 INFO mapred.JobClient: map 100% reduce 83% 12/07/03 22:31:17 INFO mapred.JobClient: map 100% reduce 87% 12/07/03 22:31:23 INFO mapred.JobClient: map 100% reduce 90% 12/07/03 22:31:32 INFO mapred.JobClient: map 100% reduce 93% 12/07/03 22:31:35 INFO mapred.JobClient: map 100% reduce 96% 12/07/03 22:31:41 INFO mapred.JobClient: map 100% reduce 100% 12/07/03 22:31:46 INFO mapred.JobClient: Job complete: job_201207030133_0004 12/07/03 22:31:46 INFO mapred.JobClient: Counters: 26 12/07/03 22:31:46 INFO mapred.JobClient: Job Counters 12/07/03 22:31:46 INFO mapred.JobClient: Launched reduce tasks=30 12/07/03 22:31:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16282 12/07/03 22:31:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 22:31:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 22:31:46 INFO mapred.JobClient: Launched map tasks=2 12/07/03 22:31:46 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 22:31:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=335658 12/07/03 22:31:46 INFO mapred.JobClient: File Input Format Counters 12/07/03 22:31:46 INFO mapred.JobClient: Bytes Read=163409 12/07/03 22:31:46 INFO mapred.JobClient: File Output Format Counters 12/07/03 22:31:46 INFO mapred.JobClient: Bytes Written=180399 12/07/03 22:31:46 INFO mapred.JobClient: FileSystemCounters 12/07/03 22:31:46 INFO mapred.JobClient: FILE_BYTES_READ=1882171 12/07/03 22:31:46 INFO mapred.JobClient: HDFS_BYTES_READ=163635 12/07/03 22:31:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4431596 12/07/03 22:31:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=180399 12/07/03 22:31:46 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 22:31:46 INFO mapred.JobClient: Map output materialized bytes=1882351 12/07/03 22:31:46 INFO mapred.JobClient: Map input records=13129 12/07/03 22:31:46 INFO mapred.JobClient: Reduce shuffle bytes=1278651 12/07/03 22:31:46 INFO mapred.JobClient: Spilled Records=26258 12/07/03 22:31:46 INFO mapred.JobClient: Map output bytes=1842641 12/07/03 22:31:46 INFO mapred.JobClient: Map input bytes=163159 12/07/03 22:31:46 INFO mapred.JobClient: Combine input records=0 12/07/03 22:31:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=226 12/07/03 22:31:46 INFO mapred.JobClient: Reduce input records=13129 12/07/03 22:31:46 INFO mapred.JobClient: Reduce input groups=116 12/07/03 22:31:46 INFO mapred.JobClient: Combine output records=0 12/07/03 22:31:46 INFO mapred.JobClient: Reduce output records=13129 12/07/03 22:31:46 INFO mapred.JobClient: Map output records=13129
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar SortByTemperatureToMapFile -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashmapso> 12/07/03 22:35:53 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 22:35:53 INFO mapred.JobClient: Running job: job_201207030133_0005 12/07/03 22:35:54 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 22:36:08 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 22:36:20 INFO mapred.JobClient: map 100% reduce 3% 12/07/03 22:36:23 INFO mapred.JobClient: map 100% reduce 6% 12/07/03 22:36:29 INFO mapred.JobClient: map 100% reduce 10% 12/07/03 22:36:32 INFO mapred.JobClient: map 100% reduce 13% 12/07/03 22:36:44 INFO mapred.JobClient: map 100% reduce 20% 12/07/03 22:36:53 INFO mapred.JobClient: map 100% reduce 23% 12/07/03 22:36:56 INFO mapred.JobClient: map 100% reduce 26% 12/07/03 22:37:02 INFO mapred.JobClient: map 100% reduce 30% 12/07/03 22:37:05 INFO mapred.JobClient: map 100% reduce 33% 12/07/03 22:37:17 INFO mapred.JobClient: map 100% reduce 40% 12/07/03 22:37:26 INFO mapred.JobClient: map 100% reduce 43% 12/07/03 22:37:29 INFO mapred.JobClient: map 100% reduce 46% 12/07/03 22:37:35 INFO mapred.JobClient: map 100% reduce 50% 12/07/03 22:37:38 INFO mapred.JobClient: map 100% reduce 53% 12/07/03 22:37:51 INFO mapred.JobClient: map 100% reduce 60% 12/07/03 22:38:00 INFO mapred.JobClient: map 100% reduce 63% 12/07/03 22:38:03 INFO mapred.JobClient: map 100% reduce 66% 12/07/03 22:38:09 INFO mapred.JobClient: map 100% reduce 70% 12/07/03 22:38:12 INFO mapred.JobClient: map 100% reduce 73% 12/07/03 22:38:18 INFO mapred.JobClient: map 100% reduce 74% 12/07/03 22:38:21 INFO mapred.JobClient: map 100% reduce 77% 12/07/03 22:38:24 INFO mapred.JobClient: map 100% reduce 80% 12/07/03 22:38:33 INFO mapred.JobClient: map 100% reduce 83% 12/07/03 22:38:36 INFO mapred.JobClient: map 100% reduce 86% 12/07/03 22:38:42 INFO mapred.JobClient: map 100% reduce 90% 12/07/03 22:38:45 INFO mapred.JobClient: map 100% reduce 93% 12/07/03 22:38:57 INFO mapred.JobClient: map 100% reduce 100% 12/07/03 22:39:02 INFO mapred.JobClient: Job complete: job_201207030133_0005 12/07/03 22:39:02 INFO mapred.JobClient: Counters: 26 12/07/03 22:39:02 INFO mapred.JobClient: Job Counters 12/07/03 22:39:02 INFO mapred.JobClient: Launched reduce tasks=30 12/07/03 22:39:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16299 12/07/03 22:39:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 22:39:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 22:39:02 INFO mapred.JobClient: Launched map tasks=2 12/07/03 22:39:02 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 22:39:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=330354 12/07/03 22:39:02 INFO mapred.JobClient: File Input Format Counters 12/07/03 22:39:02 INFO mapred.JobClient: Bytes Read=163409 12/07/03 22:39:02 INFO mapred.JobClient: File Output Format Counters 12/07/03 22:39:02 INFO mapred.JobClient: Bytes Written=186935 12/07/03 22:39:02 INFO mapred.JobClient: FileSystemCounters 12/07/03 22:39:02 INFO mapred.JobClient: FILE_BYTES_READ=1882171 12/07/03 22:39:02 INFO mapred.JobClient: HDFS_BYTES_READ=163635 12/07/03 22:39:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4431532 12/07/03 22:39:02 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=186935 12/07/03 22:39:02 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 22:39:02 INFO mapred.JobClient: Map output materialized bytes=1882351 12/07/03 22:39:02 INFO mapred.JobClient: Map input records=13129 12/07/03 22:39:02 INFO mapred.JobClient: Reduce shuffle bytes=1054827 12/07/03 22:39:02 INFO mapred.JobClient: Spilled Records=26258 12/07/03 22:39:02 INFO mapred.JobClient: Map output bytes=1842641 12/07/03 22:39:02 INFO mapred.JobClient: Map input bytes=163159 12/07/03 22:39:02 INFO mapred.JobClient: Combine input records=0 12/07/03 22:39:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=226 12/07/03 22:39:02 INFO mapred.JobClient: Reduce input records=13129 12/07/03 22:39:02 INFO mapred.JobClient: Reduce input groups=116 12/07/03 22:39:02 INFO mapred.JobClient: Combine output records=0 12/07/03 22:39:02 INFO mapred.JobClient: Reduce output records=13129 12/07/03 22:39:02 INFO mapred.JobClient: Map output records=13129
>> hadoop jar ch08.jar SortByTemperatureUsingTotalOrderPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-totalsort < 12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 23:35:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/03 23:35:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor 12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor 12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor 12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor 12/07/03 23:35:45 INFO lib.InputSampler: Using 1339 samples 12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new compressor 12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 23:35:45 INFO mapred.JobClient: Running job: job_201207030133_0006 12/07/03 23:35:46 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 23:36:01 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 23:36:13 INFO mapred.JobClient: map 100% reduce 3% 12/07/03 23:36:16 INFO mapred.JobClient: map 100% reduce 6% 12/07/03 23:36:25 INFO mapred.JobClient: map 100% reduce 10% 12/07/03 23:36:28 INFO mapred.JobClient: map 100% reduce 13% 12/07/03 23:36:37 INFO mapred.JobClient: map 100% reduce 20% 12/07/03 23:36:49 INFO mapred.JobClient: map 100% reduce 26% 12/07/03 23:36:58 INFO mapred.JobClient: map 100% reduce 30% 12/07/03 23:37:01 INFO mapred.JobClient: map 100% reduce 33% 12/07/03 23:37:10 INFO mapred.JobClient: map 100% reduce 36% 12/07/03 23:37:16 INFO mapred.JobClient: map 100% reduce 40% 12/07/03 23:37:19 INFO mapred.JobClient: map 100% reduce 43% 12/07/03 23:37:25 INFO mapred.JobClient: map 100% reduce 46% 12/07/03 23:37:31 INFO mapred.JobClient: map 100% reduce 50% 12/07/03 23:37:40 INFO mapred.JobClient: map 100% reduce 56% 12/07/03 23:37:49 INFO mapred.JobClient: map 100% reduce 60% 12/07/03 23:37:52 INFO mapred.JobClient: map 100% reduce 63% 12/07/03 23:38:01 INFO mapred.JobClient: map 100% reduce 66% 12/07/03 23:38:04 INFO mapred.JobClient: map 100% reduce 70% 12/07/03 23:38:13 INFO mapred.JobClient: map 100% reduce 76% 12/07/03 23:38:22 INFO mapred.JobClient: map 100% reduce 80% 12/07/03 23:38:25 INFO mapred.JobClient: map 100% reduce 83% 12/07/03 23:38:34 INFO mapred.JobClient: map 100% reduce 87% 12/07/03 23:38:37 INFO mapred.JobClient: map 100% reduce 90% 12/07/03 23:38:40 INFO mapred.JobClient: map 100% reduce 91% 12/07/03 23:38:46 INFO mapred.JobClient: map 100% reduce 93% 12/07/03 23:38:49 INFO mapred.JobClient: map 100% reduce 96% 12/07/03 23:38:58 INFO mapred.JobClient: map 100% reduce 100% 12/07/03 23:39:03 INFO mapred.JobClient: Job complete: job_201207030133_0006 12/07/03 23:39:03 INFO mapred.JobClient: Counters: 26 12/07/03 23:39:03 INFO mapred.JobClient: Job Counters 12/07/03 23:39:03 INFO mapred.JobClient: Launched reduce tasks=30 12/07/03 23:39:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18040 12/07/03 23:39:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 23:39:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 23:39:03 INFO mapred.JobClient: Launched map tasks=2 12/07/03 23:39:03 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 23:39:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=336193 12/07/03 23:39:03 INFO mapred.JobClient: File Input Format Counters 12/07/03 23:39:03 INFO mapred.JobClient: Bytes Read=163409 12/07/03 23:39:03 INFO mapred.JobClient: File Output Format Counters 12/07/03 23:39:03 INFO mapred.JobClient: Bytes Written=177339 12/07/03 23:39:03 INFO mapred.JobClient: FileSystemCounters 12/07/03 23:39:03 INFO mapred.JobClient: FILE_BYTES_READ=1882171 12/07/03 23:39:03 INFO mapred.JobClient: HDFS_BYTES_READ=165067 12/07/03 23:39:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4462828 12/07/03 23:39:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=177339 12/07/03 23:39:03 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 23:39:03 INFO mapred.JobClient: Map output materialized bytes=1882351 12/07/03 23:39:03 INFO mapred.JobClient: Map input records=13129 12/07/03 23:39:03 INFO mapred.JobClient: Reduce shuffle bytes=1138806 12/07/03 23:39:03 INFO mapred.JobClient: Spilled Records=26258 12/07/03 23:39:03 INFO mapred.JobClient: Map output bytes=1842641 12/07/03 23:39:03 INFO mapred.JobClient: Map input bytes=163159 12/07/03 23:39:03 INFO mapred.JobClient: Combine input records=0 12/07/03 23:39:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=226 12/07/03 23:39:03 INFO mapred.JobClient: Reduce input records=13129 12/07/03 23:39:03 INFO mapred.JobClient: Reduce input groups=116 12/07/03 23:39:03 INFO mapred.JobClient: Combine output records=0 12/07/03 23:39:03 INFO mapred.JobClient: Reduce output records=13129 12/07/03 23:39:03 INFO mapred.JobClient: Map output records=13129
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar ch08.jar MaxTemperatureUsingSecondarySort input/ncdc/all output-secondarysort 12/07/03 23:59:15 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/03 23:59:15 INFO mapred.JobClient: Running job: job_201207030133_0007 12/07/03 23:59:16 INFO mapred.JobClient: map 0% reduce 0% 12/07/03 23:59:31 INFO mapred.JobClient: map 100% reduce 0% 12/07/03 23:59:43 INFO mapred.JobClient: map 100% reduce 100% 12/07/03 23:59:48 INFO mapred.JobClient: Job complete: job_201207030133_0007 12/07/03 23:59:48 INFO mapred.JobClient: Counters: 26 12/07/03 23:59:48 INFO mapred.JobClient: Job Counters 12/07/03 23:59:48 INFO mapred.JobClient: Launched reduce tasks=1 12/07/03 23:59:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16330 12/07/03 23:59:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/03 23:59:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/03 23:59:48 INFO mapred.JobClient: Launched map tasks=2 12/07/03 23:59:48 INFO mapred.JobClient: Data-local map tasks=2 12/07/03 23:59:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9967 12/07/03 23:59:48 INFO mapred.JobClient: File Input Format Counters 12/07/03 23:59:48 INFO mapred.JobClient: Bytes Read=147972 12/07/03 23:59:48 INFO mapred.JobClient: File Output Format Counters 12/07/03 23:59:48 INFO mapred.JobClient: Bytes Written=18 12/07/03 23:59:48 INFO mapred.JobClient: FileSystemCounters 12/07/03 23:59:48 INFO mapred.JobClient: FILE_BYTES_READ=131296 12/07/03 23:59:48 INFO mapred.JobClient: HDFS_BYTES_READ=148184 12/07/03 23:59:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=326482 12/07/03 23:59:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18 12/07/03 23:59:48 INFO mapred.JobClient: Map-Reduce Framework 12/07/03 23:59:48 INFO mapred.JobClient: Map output materialized bytes=131302 12/07/03 23:59:48 INFO mapred.JobClient: Map input records=13130 12/07/03 23:59:48 INFO mapred.JobClient: Reduce shuffle bytes=131302 12/07/03 23:59:48 INFO mapred.JobClient: Spilled Records=26258 12/07/03 23:59:48 INFO mapred.JobClient: Map output bytes=105032 12/07/03 23:59:48 INFO mapred.JobClient: Map input bytes=1777168 12/07/03 23:59:48 INFO mapred.JobClient: Combine input records=0 12/07/03 23:59:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=212 12/07/03 23:59:48 INFO mapred.JobClient: Reduce input records=0 12/07/03 23:59:48 INFO mapred.JobClient: Reduce input groups=2 12/07/03 23:59:48 INFO mapred.JobClient: Combine output records=0 12/07/03 23:59:48 INFO mapred.JobClient: Reduce output records=2 12/07/03 23:59:48 INFO mapred.JobClient: Map output records=13129
4. 次要数据的分布 Side Data Distribution
分布式缓存
相对于在作业配置中对次要数据进行序列化,更好的方法是使用Hadoop的分布式缓存机制来分布数据集。它提供了为该任务及时复制文件和存档文件到任务节点的服务以便在运行时使用它们。为了节省网络带宽,每个作业文件通常复制到任何特定的节点一次。
>> hadoop jar ch08.jar MaxTemperatureByStationNameUsingDistributedCacheFile -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output < 12/07/04 00:18:14 INFO mapred.FileInputFormat: Total input paths to process : 2 12/07/04 00:18:14 INFO mapred.JobClient: Running job: job_201207030133_0008 12/07/04 00:18:15 INFO mapred.JobClient: map 0% reduce 0% 12/07/04 00:18:29 INFO mapred.JobClient: map 100% reduce 0% 12/07/04 00:18:41 INFO mapred.JobClient: map 100% reduce 100% 12/07/04 00:18:46 INFO mapred.JobClient: Job complete: job_201207030133_0008 12/07/04 00:18:46 INFO mapred.JobClient: Counters: 26 12/07/04 00:18:46 INFO mapred.JobClient: Job Counters 12/07/04 00:18:46 INFO mapred.JobClient: Launched reduce tasks=1 12/07/04 00:18:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17800 12/07/04 00:18:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/04 00:18:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/04 00:18:46 INFO mapred.JobClient: Launched map tasks=2 12/07/04 00:18:46 INFO mapred.JobClient: Data-local map tasks=2 12/07/04 00:18:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10372 12/07/04 00:18:46 INFO mapred.JobClient: File Input Format Counters 12/07/04 00:18:46 INFO mapred.JobClient: Bytes Read=147972 12/07/04 00:18:46 INFO mapred.JobClient: File Output Format Counters 12/07/04 00:18:46 INFO mapred.JobClient: Bytes Written=170 12/07/04 00:18:46 INFO mapred.JobClient: FileSystemCounters 12/07/04 00:18:46 INFO mapred.JobClient: FILE_BYTES_READ=234 12/07/04 00:18:46 INFO mapred.JobClient: HDFS_BYTES_READ=148184 12/07/04 00:18:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=66722 12/07/04 00:18:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=170 12/07/04 00:18:46 INFO mapred.JobClient: Map-Reduce Framework 12/07/04 00:18:46 INFO mapred.JobClient: Map output materialized bytes=240 12/07/04 00:18:46 INFO mapred.JobClient: Map input records=13130 12/07/04 00:18:46 INFO mapred.JobClient: Reduce shuffle bytes=120 12/07/04 00:18:46 INFO mapred.JobClient: Spilled Records=24 12/07/04 00:18:46 INFO mapred.JobClient: Map output bytes=223193 12/07/04 00:18:46 INFO mapred.JobClient: Map input bytes=1777168 12/07/04 00:18:46 INFO mapred.JobClient: Combine input records=13129 12/07/04 00:18:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=212 12/07/04 00:18:46 INFO mapred.JobClient: Reduce input records=12 12/07/04 00:18:46 INFO mapred.JobClient: Reduce input groups=6 12/07/04 00:18:46 INFO mapred.JobClient: Combine output records=12 12/07/04 00:18:46 INFO mapred.JobClient: Reduce output records=6 12/07/04 00:18:46 INFO mapred.JobClient: Map output records=13129