多谢楼主的分享!
转载自:http://blog.csdn.net/xiao_jun_0820/article/details/25413919
hbase自带了一个聚合coprocessor类:org.apache.hadoop.hbase.coprocessor.AggregateImplementation。使用该类可以count一张表的总记录数。
当然在hbase shell下面也可以count <table_name>来统计。我这里比较了一下两者的执行时间,我有一张表有700多万的数据,在hbase shell下count足足花费了我12分钟的时间,而用coprocessor来统计,只花费了78秒!!!由此可见coprocessor的强大。
hbase aip 添加协处理器:
- Configuration hbaseconfig = HBaseConfiguration.create();
-
- HBaseAdmin hbaseAdmin = new HBaseAdmin(hbaseconfig);
- hbaseAdmin.disableTable(TABLE_NAME);
-
- HTableDescriptor htd = hbaseAdmin.getTableDescriptor(TABLE_NAME);
- htd.addCoprocessor(AggregateImplementation.class.getName());
- hbaseAdmin.modifyTable(TABLE_NAME, htd);
- hbaseAdmin.enableTable(TABLE_NAME);
- hbaseAdmin.close();
使用hbase提供的聚合coprocessor:
- AggregationClient aggregationClient = new AggregationClient(hbaseconfig);
- Scan scan = new Scan();
- scan.addFamily(Bytes.toBytes("fr"));
- Date start = new Date();
- long rowcount = aggregationClient.rowCount(TABLE_NAME,
- new LongColumnInterpreter(), scan);
- Date end = new Date();
- System.out.println("rowcount:" + rowcount);
- System.out.println("timecost:" + (end.getTime() - start.getTime()));
hbase shell添加coprocessor:
disable 'member'
alter 'member',METHOD => 'table_att','coprocessor' => 'hdfs://master24:9000/user/hadoop/jars/test.jar|mycoprocessor.SampleCoprocessor|1001|'
enable 'member'
hbase shell 删除coprocessor:
disable 'member'
alter 'member',METHOD => 'table_att_unset',NAME =>'coprocessor$1'
enable 'member'