MapReducer的输出导入到HBase有多种方式可以实现, TableOutputFormat就是其中一种.
1. hbase建表
- hbase(main):132:0* create 't1','f1'
- 0 row(s) in 1.4890 seconds
- hbase(main):133:0> scan 't1'
- ROW COLUMN+CELL
- 0 row(s) in 1.2330 seconds
2.写MR作业
HBaseMapper.java
- public class HBaseMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, Text> {
- @Override
- public void map(LongWritable key, Text values,
- OutputCollector<LongWritable, Text> output, Reporter reporter)
- throws IOException {
- output.collect(key, values);
- }
- }
HBaseReducer.java
- public class HBaseReducer extends MapReduceBase implements Reducer<LongWritable, Text, ImmutableBytesWritable, Put> {
- @Override
- public void reduce(LongWritable key, Iterator<Text> values,
- OutputCollector<ImmutableBytesWritable, Put> output, Reporter reporter)
- throws IOException {
- String value="";
- ImmutableBytesWritable immutableBytesWritable = new ImmutableBytesWritable();
- Text text = new Text();
- while(values.hasNext())
- {
- value = values.next().toString();
- if(value != null && !"".equals(value))
- {
- Put put = createPut(value.toString());
- if(put!=null)
- output.collect(immutableBytesWritable, put);
- }
- }
- }
- // str格式为row:family:qualifier:value 简单模拟下而已
- private Put createPut(String str)
- {
- String[] strstrs = str.split(":");
- if(strs.length<4)
- return null;
- String row=strs[0];
- String family=strs[1];
- String qualifier=strs[2];
- String value=strs[3];
- Put put = new Put(Bytes.toBytes(row));
- put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), 1L,Bytes.toBytes(value));
- return put;
- }
- }
HbaseDriver.java
- public class HbaseDriver {
- public static void main(String[] args) {
- JobConf conf = new JobConf(com.test.HbaseDriver.class);
- conf.setMapperClass(com.test.HBaseMapper.class);
- conf.setReducerClass(com.test.HBaseReducer.class);
- conf.setMapOutputKeyClass(LongWritable.class);
- conf.setMapOutputValueClass(Text.class);
- conf.setOutputKeyClass(ImmutableBytesWritable.class);
- conf.setOutputValueClass(Put.class);
- conf.setOutputFormat(TableOutputFormat.class);
- FileInputFormat.setInputPaths(conf, "/home/yinjie/input");
- FileOutputFormat.setOutputPath(conf, new Path("/home/yinjie/output"));
- conf.set(TableOutputFormat.OUTPUT_TABLE, "t1");
- conf.set("hbase.zookeeper.quorum", "localhost");
- conf.set("hbase.zookeeper.property.clientPort", "2181");
- try {
- JobClient.runJob(conf);
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }
/home/yinjie/input目录下有一个hbasedata.txt文件,内容为
- [root@localhost input]# cat hbasedata.txt
- r1:f1:c1:value1
- r2:f1:c2:value2
- r3:f1:c3:value3
在eclipse下使用MR插件,运行作业:
作业成功后再次查询hbase表,验证数据是否已进去:
- hbase(main):135:0> scan 't1'
- ROW COLUMN+CELL
- r1 column=f1:c1, timestamp=1, value=value1
- r2 column=f1:c2, timestamp=1, value=value2
- r3 column=f1:c3, timestamp=1, value=value3
- 3 row(s) in 0.0580 seconds
数据已进插入^_^, TableOutputFormat效率并不好,大数据量装载到hbase的话最好生成HFile后再倒入到hbase, HFile是hbase内部存储表示形式, 所以装载数度很快.