MapReducer的输出导入到HBase有多种方式可以实现, TableOutputFormat就是其中一种.

1. hbase建表

   
   
   
   
  1. hbase(main):132:0* create 't1','f1' 
  2. 0 row(s) in 1.4890 seconds 
  3.  
  4. hbase(main):133:0> scan 't1' 
  5. ROW                              COLUMN+CELL                                                                                  
  6. 0 row(s) in 1.2330 seconds 

2.写MR作业
HBaseMapper.java

   
   
   
   
  1. public class HBaseMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, Text> { 
  2.     @Override 
  3.     public void map(LongWritable key, Text values, 
  4.             OutputCollector<LongWritable, Text> output, Reporter reporter) 
  5.             throws IOException { 
  6.         output.collect(key, values); 
  7.     } 

HBaseReducer.java

   
   
   
   
  1. public class HBaseReducer extends MapReduceBase implements Reducer<LongWritable, Text, ImmutableBytesWritable, Put> { 
  2.     @Override 
  3.     public void reduce(LongWritable key, Iterator<Text> values, 
  4.             OutputCollector<ImmutableBytesWritable, Put> output, Reporter reporter) 
  5.             throws IOException { 
  6.         String value=""
  7.         ImmutableBytesWritable immutableBytesWritable = new ImmutableBytesWritable(); 
  8.         Text text = new Text(); 
  9.         while(values.hasNext()) 
  10.         { 
  11.             value = values.next().toString(); 
  12.             if(value != null && !"".equals(value)) 
  13.             { 
  14.                 Put put = createPut(value.toString()); 
  15.                 if(put!=null) 
  16.                     output.collect(immutableBytesWritable, put); 
  17.             } 
  18.         } 
  19.     } 
  20.     
  21. // str格式为row:family:qualifier:value 简单模拟下而已
  22.     private Put createPut(String str) 
  23.     { 
  24.         String[] strstrs = str.split(":"); 
  25.         if(strs.length<4
  26.             return null; 
  27.         String row=strs[0]; 
  28.         String family=strs[1]; 
  29.         String qualifier=strs[2]; 
  30.         String value=strs[3]; 
  31.         Put put = new Put(Bytes.toBytes(row)); 
  32.         put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), 1L,Bytes.toBytes(value)); 
  33.         return put; 
  34.     } 

HbaseDriver.java

   
   
   
   
  1. public class HbaseDriver { 
  2.     public static void main(String[] args) { 
  3.         JobConf conf = new JobConf(com.test.HbaseDriver.class); 
  4.         conf.setMapperClass(com.test.HBaseMapper.class); 
  5.         conf.setReducerClass(com.test.HBaseReducer.class); 
  6.          
  7.         conf.setMapOutputKeyClass(LongWritable.class); 
  8.         conf.setMapOutputValueClass(Text.class); 
  9.          
  10.         conf.setOutputKeyClass(ImmutableBytesWritable.class); 
  11.         conf.setOutputValueClass(Put.class); 
  12.          
  13.         conf.setOutputFormat(TableOutputFormat.class); 
  14.          
  15.         FileInputFormat.setInputPaths(conf, "/home/yinjie/input"); 
  16.         FileOutputFormat.setOutputPath(conf, new Path("/home/yinjie/output")); 
  17.          
  18.         conf.set(TableOutputFormat.OUTPUT_TABLE, "t1"); 
  19.         conf.set("hbase.zookeeper.quorum", "localhost"); 
  20.         conf.set("hbase.zookeeper.property.clientPort", "2181"); 
  21.         try { 
  22.             JobClient.runJob(conf); 
  23.         } catch (Exception e) { 
  24.             e.printStackTrace(); 
  25.         } 
  26.     } 

/home/yinjie/input目录下有一个hbasedata.txt文件,内容为

   
   
   
   
  1. [root@localhost input]# cat hbasedata.txt  
  2. r1:f1:c1:value1 
  3. r2:f1:c2:value2 
  4. r3:f1:c3:value3 

在eclipse下使用MR插件,运行作业:
作业成功后再次查询hbase表,验证数据是否已进去:

   
   
   
   
  1. hbase(main):135:0> scan 't1' 
  2. ROW                              COLUMN+CELL                                                                                  
  3.  r1                              column=f1:c1, timestamp=1value=value1                                                      
  4.  r2                              column=f1:c2, timestamp=1value=value2                                                      
  5.  r3                              column=f1:c3, timestamp=1value=value3                                                      
  6. 3 row(s) in 0.0580 seconds 

数据已进插入^_^, TableOutputFormat效率并不好,大数据量装载到hbase的话最好生成HFile后再倒入到hbase, HFile是hbase内部存储表示形式, 所以装载数度很快.