本文是基于hbase 0.96.0 测试的,理论上支持hbase 0.94 以上版本!!
HBase有两种协处理器(Coprocessor)
1、RegionObserver
:类似于关系型数据库的触发器
2、Endpoint:类似于关系型数据库的存储过程,本文将介绍此种Coprocessor.
Endpoint 允许您定义自己的动态RPC协议,用于客户端与region servers通讯。Coprocessor 与region server在相同的进程空间中,因此您可以在region端定义自己的方法(endpoint),将计算放到region端,减少网络开销,常用于提升hbase的功能,如:count,sum等。
本文以count为例,实现一个自己的endpoint:
一、定义一个protocol buffer Service。
1、安装protobuf
下载protoc-2.5.0-win32.zip(根据自己的操作系统选择),解压;
将protoc-2.5.0-win32中的protoc.exe拷贝到c:\windows\system32中。
将proto.exe文件拷贝到解压后的XXX\protobuf-2.5.0\src目录中.
参考链接:http://shuofenglxy.iteye.com/blog/1512980
2.定义.proto文件,用于定义类的一些基本信息
CXKTest.proto的代码如下:
- <span style="font-family:SimSun;font-size:14px;">option java_package = "com.cxk.coprocessor.test.generated";
- option java_outer_classname = "CXKTestProtos";
- option java_generic_services = true;
- option java_generate_equals_and_hash = true;
- option optimize_for = SPEED;
- message CountRequest {
- }
- message CountResponse {
- required int64 count = 1 [default = 0];
- }
- service RowCountService {
- rpc getRowCount(CountRequest)
- returns (CountResponse);
- }</span>
参考链接:https://developers.google.com/protocol-buffers/docs/proto#services
3.用proto.exe 生成java代码
执行命令:proto.exe--java_out=. CXKTest.proto
在 com.cxk.coprocessor.test.generated 下会生成类:CXKTestProtos
二、定义自己的Endpoint类(实现一下自己的方法)
RowCountEndpoint.java 的代码片段如下:
- <span style="font-family:SimSun;font-size:14px;">package com.cxk.coprocessor.test;
- import java.io.IOException;
- import java.util.ArrayList;
- import java.util.List;
- import org.apache.hadoop.hbase.Cell;
- import org.apache.hadoop.hbase.CellUtil;
- import org.apache.hadoop.hbase.Coprocessor;
- import org.apache.hadoop.hbase.CoprocessorEnvironment;
- import org.apache.hadoop.hbase.client.Scan;
- import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
- import org.apache.hadoop.hbase.coprocessor.CoprocessorService;
- import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
- import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
- import org.apache.hadoop.hbase.protobuf.ResponseConverter;
- import org.apache.hadoop.hbase.regionserver.InternalScanner;
- import org.apache.hadoop.hbase.util.Bytes;
- import com.google.protobuf.RpcCallback;
- import com.google.protobuf.RpcController;
- import com.google.protobuf.Service;
-
- public class RowCountEndpoint extends CXKTestProtos.RowCountService
- implements Coprocessor, CoprocessorService {
- private RegionCoprocessorEnvironment env;
-
- public RowCountEndpoint() {
- }
-
- @Override
- public Service getService() {
- return this;
- }
-
- /**
- * 统计hbase表总行数
- */
- @Override
- public void getRowCount(RpcController controller, CXKTestProtos.CountRequest request,
- RpcCallback<CXKTestProtos.CountResponse> done) {
- Scan scan = new Scan();
- scan.setFilter(new FirstKeyOnlyFilter());
- CXKTestProtos.CountResponse response = null;
- InternalScanner scanner = null;
- try {
- scanner = env.getRegion().getScanner(scan);
- List<Cell> results = new ArrayList<Cell>();
- boolean hasMore = false;
- byte[] lastRow = null;
- long count = 0;
- do {
- hasMore = scanner.next(results);
- for (Cell kv : results) {
- byte[] currentRow = CellUtil.cloneRow(kv);
- if (lastRow == null || !Bytes.equals(lastRow, currentRow)) {
- lastRow = currentRow;
- count++;
- }
- }
- results.clear();
- } while (hasMore);
-
- response = CXKTestProtos.CountResponse.newBuilder()
- .setCount(count).build();
- } catch (IOException ioe) {
- ResponseConverter.setControllerException(controller, ioe);
- } finally {
- if (scanner != null) {
- try {
- scanner.close();
- } catch (IOException ignored) {}
- }
- }
- done.run(response);
- }
-
- @Override
- public void start(CoprocessorEnvironment env) throws IOException {
- if (env instanceof RegionCoprocessorEnvironment) {
- this.env = (RegionCoprocessorEnvironment)env;
- } else {
- throw new CoprocessorException("Must be loaded on a table region!");
- }
- }
-
- @Override
- public void stop(CoprocessorEnvironment env) throws IOException {
- // nothing to do
- }
- }
- </span>
三、实现自己的客户端方法:
TestEndPoint.java 代码如下:
- <span style="font-family:SimSun;font-size:14px;">package com.test;
-
-
- import java.io.IOException;
- import java.util.Map;
-
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.client.HTable;
- import org.apache.hadoop.hbase.client.coprocessor.Batch;
- import org.apache.hadoop.hbase.ipc.BlockingRpcCallback;
- import org.apache.hadoop.hbase.ipc.ServerRpcController;
-
- import com.cxk.coprocessor.test.CXKTestProtos;
- import com.cxk.coprocessor.test.CXKTestProtos.RowCountService;
- import com.google.protobuf.ServiceException;
-
- public class TestEndPoint {
- /**
- *
- * @param args[0] ip ,args[1] zk_ip,args[2] table_name
- * @throws ServiceException
- * @throws Throwable
- */
- public static void main(String[] args) throws ServiceException, Throwable {
- // TODO Auto-generated method stub
- System.out.println("begin.....");
- long begin_time=System.currentTimeMillis();
- Configuration config=HBaseConfiguration.create();
- // String master_ip="192.168.150.128";
- String master_ip=args[0];
- String zk_ip=args[1];
- String table_name=args[2];
- config.set("hbase.zookeeper.property.clientPort", "2181");
- config.set("hbase.zookeeper.quorum", zk_ip);
- config.set("hbase.master", master_ip+":600000");
- final CXKTestProtos.CountRequest request = CXKTestProtos.CountRequest.getDefaultInstance();
- HTable table=new HTable(config,table_name);
-
- Map<byte[],Long> results = table.coprocessorService(RowCountService.class,
- null, null,
- new Batch.Call<CXKTestProtos.RowCountService,Long>() {
- public Long call(CXKTestProtos.RowCountService counter) throws IOException {
- ServerRpcController controller = new ServerRpcController();
- BlockingRpcCallback<CXKTestProtos.CountResponse> rpcCallback =
- new BlockingRpcCallback<CXKTestProtos.CountResponse>();
- counter.getRowCount(controller, request, rpcCallback);
- CXKTestProtos.CountResponse response = rpcCallback.get();
- if (controller.failedOnException()) {
- throw controller.getFailedOn();
- }
- return (response != null && response.hasCount()) ? response.getCount() : 0;
- }
- });
- table.close();
-
- if(results.size()>0){
- System.out.println(results.values());
- }else{
- System.out.println("没有任何返回结果");
- }
- long end_time=System.currentTimeMillis();
- System.out.println("end:"+(end_time-begin_time));
- }
-
- }
- </span>
四、部署endpoint
部署endpoint有两种方法,第一种通过修改hbase.site.xml文件,实现对所有表加载这个endpoint;第二张通过alter表,实现对某一张表加载这个endpoint;
1、修改hbase.site.xml
在hbase.site.xml中添加如下内容
- <span style="font-family:SimSun;font-size:14px;"><property>
- <name>hbase.coprocessor.region.classes</name>
- <value>com.cxk.coprocessor.test.RowCountEndpoint</value>
- <description>A comma-separated list of Coprocessors that are loaded by
- default. For any override coprocessor method from RegionObservor or
- Coprocessor, these classes' implementation will be called
- in order. After implement your own
- Coprocessor, just put it in HBase's classpath and add the fully
- qualified class name here.
- </description>
- </property></span>
2、hbase shell alter表
A、将CXKTestProtos.java和RowCountEndpoint.java打成jar放到hdfs上;
B、
- <span style="font-family:SimSun;font-size:14px;">disable 'test'</span>
C、
- <span style="font-family:SimSun;font-size:14px;">alter 'test','coprocessor'=>'hdfs:///user/hadoop/test/coprocessor/cxkcoprocessor.1.01.jar|com.cxk.coprocessor.test.RowCountEndpoint|1001|arg1=1,arg2=2'</span>
D、
- <span style="font-family:SimSun;font-size:14px;">enable 'test'</span>
五、运行客户端
将TestEndPoint.java 打成jar,通过以下命令运行
- <span style="font-family:SimSun;font-size:14px;">java -jar test.cxk.endpiont.1.03.jar ip1 ip2 test</span>
ps:如果eclipse可以直接调试hadoop,可直接运行测试类。
参考材料:
http://hbase.apache.org/devapidocs/index.html