serialization frameworks and on-disk data structures.
2. 压缩
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> echo "text" | hadoop StreamCompressor org.apache.hadoop.io.compress.GzipCodec | gunzip - 12/07/02 00:21:12 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 00:21:12 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library text
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> echo "text" | hadoop PooledStreamCompressor org.apache.hadoop.io.compress.GzipCodec | gunzip - 12/07/02 00:24:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 00:24:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/02 00:24:45 INFO compress.CodecPool: Got brand-new compressor text
3. 序列化
序列化指的是将结构化对象转为字节流以便于通过网络进行传输或写入持久存储的过程。反序列化指的是将字节流转为一系列结构化对象的过程。
序列化用于分布式数据处理中两个截然不同的领域:进程间通信和持久存储。
Hadoop中,节点之间的进程间通信是通过RPC来实现的。
几个序列化框架 Apache Thrift和Google的 Protocol Buffers,Avro。
4. 基于文件的数据结构
4.1 SequenceFileDemo
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop SequenceFileWriteDemo numbers.seq 12/07/02 01:11:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 01:11:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/02 01:11:00 INFO compress.CodecPool: Got brand-new compressor [128] 100 One, two, buckle my shoe [173] 99 Three, four, shut the door [220] 98 Five, six, pick up sticks [264] 97 Seven, eight, lay them straight [314] 96 Nine, ten, a big fat hen [359] 95 One, two, buckle my shoe [404] 94 Three, four, shut the door [451] 93 Five, six, pick up sticks [495] 92 Seven, eight, lay them straight [545] 91 Nine, ten, a big fat hen [590] 90 One, two, buckle my shoe [635] 89 Three, four, shut the door [682] 88 Five, six, pick up sticks [726] 87 Seven, eight, lay them straight [776] 86 Nine, ten, a big fat hen [821] 85 One, two, buckle my shoe [866] 84 Three, four, shut the door [913] 83 Five, six, pick up sticks [957] 82 Seven, eight, lay them straight [1007] 81 Nine, ten, a big fat hen [1052] 80 One, two, buckle my shoe [1097] 79 Three, four, shut the door [1144] 78 Five, six, pick up sticks [1188] 77 Seven, eight, lay them straight [1238] 76 Nine, ten, a big fat hen [1283] 75 One, two, buckle my shoe [1328] 74 Three, four, shut the door [1375] 73 Five, six, pick up sticks [1419] 72 Seven, eight, lay them straight [1469] 71 Nine, ten, a big fat hen [1514] 70 One, two, buckle my shoe [1559] 69 Three, four, shut the door [1606] 68 Five, six, pick up sticks [1650] 67 Seven, eight, lay them straight [1700] 66 Nine, ten, a big fat hen [1745] 65 One, two, buckle my shoe [1790] 64 Three, four, shut the door [1837] 63 Five, six, pick up sticks [1881] 62 Seven, eight, lay them straight [1931] 61 Nine, ten, a big fat hen [1976] 60 One, two, buckle my shoe [2021] 59 Three, four, shut the door [2088] 58 Five, six, pick up sticks [2132] 57 Seven, eight, lay them straight [2182] 56 Nine, ten, a big fat hen [2227] 55 One, two, buckle my shoe [2272] 54 Three, four, shut the door [2319] 53 Five, six, pick up sticks [2363] 52 Seven, eight, lay them straight [2413] 51 Nine, ten, a big fat hen [2458] 50 One, two, buckle my shoe [2503] 49 Three, four, shut the door [2550] 48 Five, six, pick up sticks [2594] 47 Seven, eight, lay them straight [2644] 46 Nine, ten, a big fat hen [2689] 45 One, two, buckle my shoe [2734] 44 Three, four, shut the door [2781] 43 Five, six, pick up sticks [2825] 42 Seven, eight, lay them straight [2875] 41 Nine, ten, a big fat hen [2920] 40 One, two, buckle my shoe [2965] 39 Three, four, shut the door [3012] 38 Five, six, pick up sticks [3056] 37 Seven, eight, lay them straight [3106] 36 Nine, ten, a big fat hen [3151] 35 One, two, buckle my shoe [3196] 34 Three, four, shut the door [3243] 33 Five, six, pick up sticks [3287] 32 Seven, eight, lay them straight [3337] 31 Nine, ten, a big fat hen [3382] 30 One, two, buckle my shoe [3427] 29 Three, four, shut the door [3474] 28 Five, six, pick up sticks [3518] 27 Seven, eight, lay them straight [3568] 26 Nine, ten, a big fat hen [3613] 25 One, two, buckle my shoe [3658] 24 Three, four, shut the door [3705] 23 Five, six, pick up sticks [3749] 22 Seven, eight, lay them straight [3799] 21 Nine, ten, a big fat hen [3844] 20 One, two, buckle my shoe [3889] 19 Three, four, shut the door [3936] 18 Five, six, pick up sticks [3980] 17 Seven, eight, lay them straight [4030] 16 Nine, ten, a big fat hen [4075] 15 One, two, buckle my shoe [4140] 14 Three, four, shut the door [4187] 13 Five, six, pick up sticks [4231] 12 Seven, eight, lay them straight [4281] 11 Nine, ten, a big fat hen [4326] 10 One, two, buckle my shoe [4371] 9 Three, four, shut the door [4418] 8 Five, six, pick up sticks [4462] 7 Seven, eight, lay them straight [4512] 6 Nine, ten, a big fat hen [4557] 5 One, two, buckle my shoe [4602] 4 Three, four, shut the door [4649] 3 Five, six, pick up sticks [4693] 2 Seven, eight, lay them straight [4743] 1 Nine, ten, a big fat hen
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop SequenceFileReadDemo numbers.seq 12/07/02 01:15:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 01:15:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/02 01:15:49 INFO compress.CodecPool: Got brand-new decompressor [128] 100 One, two, buckle my shoe [173] 99 Three, four, shut the door [220] 98 Five, six, pick up sticks [264] 97 Seven, eight, lay them straight [314] 96 Nine, ten, a big fat hen [359] 95 One, two, buckle my shoe [404] 94 Three, four, shut the door [451] 93 Five, six, pick up sticks [495] 92 Seven, eight, lay them straight [545] 91 Nine, ten, a big fat hen [590] 90 One, two, buckle my shoe [635] 89 Three, four, shut the door [682] 88 Five, six, pick up sticks [726] 87 Seven, eight, lay them straight [776] 86 Nine, ten, a big fat hen [821] 85 One, two, buckle my shoe [866] 84 Three, four, shut the door [913] 83 Five, six, pick up sticks [957] 82 Seven, eight, lay them straight [1007] 81 Nine, ten, a big fat hen [1052] 80 One, two, buckle my shoe [1097] 79 Three, four, shut the door [1144] 78 Five, six, pick up sticks [1188] 77 Seven, eight, lay them straight [1238] 76 Nine, ten, a big fat hen [1283] 75 One, two, buckle my shoe [1328] 74 Three, four, shut the door [1375] 73 Five, six, pick up sticks [1419] 72 Seven, eight, lay them straight [1469] 71 Nine, ten, a big fat hen [1514] 70 One, two, buckle my shoe [1559] 69 Three, four, shut the door [1606] 68 Five, six, pick up sticks [1650] 67 Seven, eight, lay them straight [1700] 66 Nine, ten, a big fat hen [1745] 65 One, two, buckle my shoe [1790] 64 Three, four, shut the door [1837] 63 Five, six, pick up sticks [1881] 62 Seven, eight, lay them straight [1931] 61 Nine, ten, a big fat hen [1976] 60 One, two, buckle my shoe [2021*] 59 Three, four, shut the door [2088] 58 Five, six, pick up sticks [2132] 57 Seven, eight, lay them straight [2182] 56 Nine, ten, a big fat hen [2227] 55 One, two, buckle my shoe [2272] 54 Three, four, shut the door [2319] 53 Five, six, pick up sticks [2363] 52 Seven, eight, lay them straight [2413] 51 Nine, ten, a big fat hen [2458] 50 One, two, buckle my shoe [2503] 49 Three, four, shut the door [2550] 48 Five, six, pick up sticks [2594] 47 Seven, eight, lay them straight [2644] 46 Nine, ten, a big fat hen [2689] 45 One, two, buckle my shoe [2734] 44 Three, four, shut the door [2781] 43 Five, six, pick up sticks [2825] 42 Seven, eight, lay them straight [2875] 41 Nine, ten, a big fat hen [2920] 40 One, two, buckle my shoe [2965] 39 Three, four, shut the door [3012] 38 Five, six, pick up sticks [3056] 37 Seven, eight, lay them straight [3106] 36 Nine, ten, a big fat hen [3151] 35 One, two, buckle my shoe [3196] 34 Three, four, shut the door [3243] 33 Five, six, pick up sticks [3287] 32 Seven, eight, lay them straight [3337] 31 Nine, ten, a big fat hen [3382] 30 One, two, buckle my shoe [3427] 29 Three, four, shut the door [3474] 28 Five, six, pick up sticks [3518] 27 Seven, eight, lay them straight [3568] 26 Nine, ten, a big fat hen [3613] 25 One, two, buckle my shoe [3658] 24 Three, four, shut the door [3705] 23 Five, six, pick up sticks [3749] 22 Seven, eight, lay them straight [3799] 21 Nine, ten, a big fat hen [3844] 20 One, two, buckle my shoe [3889] 19 Three, four, shut the door [3936] 18 Five, six, pick up sticks [3980] 17 Seven, eight, lay them straight [4030] 16 Nine, ten, a big fat hen [4075*] 15 One, two, buckle my shoe [4140] 14 Three, four, shut the door [4187] 13 Five, six, pick up sticks [4231] 12 Seven, eight, lay them straight [4281] 11 Nine, ten, a big fat hen [4326] 10 One, two, buckle my shoe [4371] 9 Three, four, shut the door [4418] 8 Five, six, pick up sticks [4462] 7 Seven, eight, lay them straight [4512] 6 Nine, ten, a big fat hen [4557] 5 One, two, buckle my shoe [4602] 4 Three, four, shut the door [4649] 3 Five, six, pick up sticks [4693] 2 Seven, eight, lay them straight [4743] 1 Nine, ten, a big fat hen
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop fs -text numbers.seq |less
>> hadoop jar /local/nomad2/hadoop/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar sort -r 1 \ more?> -inFormat org.apache.hadoop.mapred.SequenceFileInputFormat \ more?> -outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat \ more?> -outKey org.apache.hadoop.io.IntWritable \ more?> -outValue org.apache.hadoop.io.Text \ more?> numbers.seq sorted Running on 1 nodes to sort from hdfs://localhost/user/nomad2/numbers.seq into hdfs://localhost/user/nomad2/sorted with 1 reduces. Job started: Mon Jul 02 01:22:26 CST 2012 12/07/02 01:22:26 INFO mapred.FileInputFormat: Total input paths to process : 1 12/07/02 01:22:26 INFO mapred.JobClient: Running job: job_201207012246_0008 12/07/02 01:22:27 INFO mapred.JobClient: map 0% reduce 0% 12/07/02 01:22:40 INFO mapred.JobClient: map 100% reduce 0% 12/07/02 01:22:52 INFO mapred.JobClient: map 100% reduce 100% 12/07/02 01:22:57 INFO mapred.JobClient: Job complete: job_201207012246_0008 12/07/02 01:22:57 INFO mapred.JobClient: Counters: 26 12/07/02 01:22:57 INFO mapred.JobClient: Job Counters 12/07/02 01:22:57 INFO mapred.JobClient: Launched reduce tasks=1 12/07/02 01:22:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16289 12/07/02 01:22:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/02 01:22:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/02 01:22:57 INFO mapred.JobClient: Launched map tasks=2 12/07/02 01:22:57 INFO mapred.JobClient: Data-local map tasks=2 12/07/02 01:22:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10069 12/07/02 01:22:57 INFO mapred.JobClient: File Input Format Counters 12/07/02 01:22:57 INFO mapred.JobClient: Bytes Read=6613 12/07/02 01:22:57 INFO mapred.JobClient: File Output Format Counters 12/07/02 01:22:57 INFO mapred.JobClient: Bytes Written=4005 12/07/02 01:22:57 INFO mapred.JobClient: FileSystemCounters 12/07/02 01:22:57 INFO mapred.JobClient: FILE_BYTES_READ=3306 12/07/02 01:22:57 INFO mapred.JobClient: HDFS_BYTES_READ=6868 12/07/02 01:22:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=70016 12/07/02 01:22:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=4005 12/07/02 01:22:57 INFO mapred.JobClient: Map-Reduce Framework 12/07/02 01:22:57 INFO mapred.JobClient: Map output materialized bytes=3312 12/07/02 01:22:57 INFO mapred.JobClient: Map input records=100 12/07/02 01:22:57 INFO mapred.JobClient: Reduce shuffle bytes=2811 12/07/02 01:22:57 INFO mapred.JobClient: Spilled Records=200 12/07/02 01:22:57 INFO mapred.JobClient: Map output bytes=3100 12/07/02 01:22:57 INFO mapred.JobClient: Map input bytes=4660 12/07/02 01:22:57 INFO mapred.JobClient: Combine input records=0 12/07/02 01:22:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=190 12/07/02 01:22:57 INFO mapred.JobClient: Reduce input records=100 12/07/02 01:22:57 INFO mapred.JobClient: Reduce input groups=100 12/07/02 01:22:57 INFO mapred.JobClient: Combine output records=0 12/07/02 01:22:57 INFO mapred.JobClient: Reduce output records=100 12/07/02 01:22:57 INFO mapred.JobClient: Map output records=100 Job ended: Mon Jul 02 01:22:57 CST 2012 The job took 31 seconds.
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop MapFileWriteDemo numbers.map 12/07/02 01:27:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 01:27:49 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/02 01:27:49 INFO compress.CodecPool: Got brand-new compressor 12/07/02 01:27:49 INFO compress.CodecPool: Got brand-new compressor
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop jar /local/nomad2/hadoop/hadoop-0.20.203.0/hadoop-examples-0.20.203.0.jar sort -r 1 \^J-inFormat> Running on 1 nodes to sort from hdfs://localhost/user/nomad2/numbers.seq into hdfs://localhost/user/nomad2/numbers.map with 1 reduces. Job started: Mon Jul 02 01:31:58 CST 2012 12/07/02 01:31:58 INFO mapred.FileInputFormat: Total input paths to process : 1 12/07/02 01:31:58 INFO mapred.JobClient: Running job: job_201207012246_0010 12/07/02 01:31:59 INFO mapred.JobClient: map 0% reduce 0% 12/07/02 01:32:13 INFO mapred.JobClient: map 100% reduce 0% 12/07/02 01:32:25 INFO mapred.JobClient: map 100% reduce 100% 12/07/02 01:32:30 INFO mapred.JobClient: Job complete: job_201207012246_0010 12/07/02 01:32:30 INFO mapred.JobClient: Counters: 26 12/07/02 01:32:30 INFO mapred.JobClient: Job Counters 12/07/02 01:32:30 INFO mapred.JobClient: Launched reduce tasks=1 12/07/02 01:32:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16355 12/07/02 01:32:30 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/02 01:32:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/02 01:32:30 INFO mapred.JobClient: Launched map tasks=2 12/07/02 01:32:30 INFO mapred.JobClient: Data-local map tasks=2 12/07/02 01:32:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10036 12/07/02 01:32:30 INFO mapred.JobClient: File Input Format Counters 12/07/02 01:32:30 INFO mapred.JobClient: Bytes Read=6613 12/07/02 01:32:30 INFO mapred.JobClient: File Output Format Counters 12/07/02 01:32:30 INFO mapred.JobClient: Bytes Written=4005 12/07/02 01:32:30 INFO mapred.JobClient: FileSystemCounters 12/07/02 01:32:30 INFO mapred.JobClient: FILE_BYTES_READ=3306 12/07/02 01:32:30 INFO mapred.JobClient: HDFS_BYTES_READ=6868 12/07/02 01:32:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=70031 12/07/02 01:32:30 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=4005 12/07/02 01:32:30 INFO mapred.JobClient: Map-Reduce Framework 12/07/02 01:32:30 INFO mapred.JobClient: Map output materialized bytes=3312 12/07/02 01:32:30 INFO mapred.JobClient: Map input records=100 12/07/02 01:32:30 INFO mapred.JobClient: Reduce shuffle bytes=3312 12/07/02 01:32:30 INFO mapred.JobClient: Spilled Records=200 12/07/02 01:32:30 INFO mapred.JobClient: Map output bytes=3100 12/07/02 01:32:30 INFO mapred.JobClient: Map input bytes=4660 12/07/02 01:32:30 INFO mapred.JobClient: Combine input records=0 12/07/02 01:32:30 INFO mapred.JobClient: SPLIT_RAW_BYTES=190 12/07/02 01:32:30 INFO mapred.JobClient: Reduce input records=100 12/07/02 01:32:30 INFO mapred.JobClient: Reduce input groups=100 12/07/02 01:32:30 INFO mapred.JobClient: Combine output records=0 12/07/02 01:32:30 INFO mapred.JobClient: Reduce output records=100 12/07/02 01:32:30 INFO mapred.JobClient: Map output records=100 Job ended: Mon Jul 02 01:32:30 CST 2012 The job took 32 seconds.
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop fs -mv numbers.map/part-00000 numbers.map/data
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ] >> hadoop MapFileFixer numbers.map 12/07/02 01:33:31 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/07/02 01:33:31 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/07/02 01:33:31 INFO compress.CodecPool: Got brand-new compressor Created MapFile numbers.map with 100 entries