HBase协处理器的一点皮毛知识

Introduction to Coprocessors

本文的内容就是一个简单的注释,写点自己的感受罢了,不喜欢翻译英文,内容全部来自http://ofps.oreilly.com/titles/9781449396107/clientapisadv.html#clientapiscoprocessors

Using the client API, combined with specific selector mechanisms, such as filters, or column family scoping it is possible to limit what data is transferred to the client. It would be good though to take this further and, for example, perform certain operations directly on the server side while only returning a small result set. Think of this a small MapReduce framework that distributes work across the entire cluster.

注:Coprocessor就是个小型的MapReduce framework,它能够让我们在Region Server上执行我们的代码,整个HBase Cluster全部掌握在我们手中了。


Coprocessors enable you to run arbitrary code directly on each region server. More precisely it executes the code on a per region basis, giving you trigger like functionality - similar to stored procedures in the RDBMS world. From the client side you do not have to take specific actions as the framework handles the distributed nature transparently.

注: 我们知道在RDBMS里面有两个很强大的东西,存储过程(Stored procedure)和触发器(trigger),那么Coprocessor其实也提供了类似的功能,Observer就类似RDBMS里面的trigger,Observer能够让我们针对Hbase的各种事件(比如put操作、get操作、Region的split或者是move等等)hook我们的代码,可以让我们的代码在这些事情对应操作之前执行(pre*)还是之后执行(post*),需要注意的就是我们hook的代码操作的对象是region,如果要操作所有的region就要靠endpoint了;endpoint就类似RDBMS里面的Stored procedure了,可以直接在Region Server上执行我们的代码,而不是等待Hbase事件触发才执行我们的代码了,它主要提供了能够操作单个Region的proxy和操作多个Regions的proxy,所以Coprocessor已经为你想了很多东西了,操作空间很大。


There is a set of implicit events that you can use to hook into, performing auxiliary task. If this is not enough then you can also extend the RPC protocol to introduce your own set of calls, which are invoked from your client and executed on the server on your behalf.

注:如果hook不能满足你的要求,就考虑endpoint吧!


Just as with the custom filters (see the section called “Custom Filters”) you need to create special Java classes that implement specific interfaces. Once compiled you make these classes available to the servers in form of a Java Archive file (jar). The region server process can instantiate these classes and execute them in the correct environment. In contrast to the filters though, coprocessors can be loaded dynamically as well. This allows you to extend the functionality of a running HBase cluster.

注:我们写好coprocessor的代码,然后放置hbase path下面,能够让Hbase找到它,接下来我使用的load方式就是在hbase-site.xml中配置一下,告诉hbase执行哪些我自己定义的coprocessor,除了我使用的这种方式以外还有另外一种static load方法和dynamic方法(dynamic还没提供API)

Use-cases for coprocessors are, for instance, using hooks into row mutation operations to maintain secondary indexes, or implement some kind of referential integrity. Filters could be enhanced to become stateful and therefore make decisions across row boundaries. Aggregate functions, such as sum(), or avg() known from RDBMS and SQL, could be moved to the servers to scan the data locally and only returning the single number result across the network.

注:几种应用场景了,比如各种聚合操作,什么sum啊,avg啊等等,coprocessor已经能看到聚合函数相关的代码实现了;还有权限控制,貌似这块已经开始做了;还有一块就是二级索引,设计还在研究中。

The framework already provides classes, based on the coprocessor framework, which you can use to extend from when implementing your own functionality. They fall into two main groups: observer, and endpoint. Here a brief overview of their purpose:

注:Observer和endpoint两个接口都已经有了,可以从git上check out最新的HBase代码,就可以运行几个coprocessor的case了。

Observer

This type of coprocessor is comparable to triggers: callback functions (also referred to here as hooks) are executed when certain events occur. This includes user generated, but also server internal, automated events.

The interfaces provided by the coprocessor framework are:
RegionObserver
You can handle data manipulation events with this kind of observer. They are closely bound to the regions of a table.
MasterObserver
Can be used to react to administrative or DDL-type operations. These are cluster-wide events.
WALObserver

Provides hooks into the write-ahead log processing.

Observers provide you with well defined event callbacks, for every operation a cluster server may handle.

注:Observer主要提供了3个interface,Hbase cluster发生的事情其实主要在三个地方上,Region Server(Region上各种操作,e.g: put, get, scan, split region), Master server(针对cluster的各种管理,e.g: move region, assign region)和WAL(保证数据安全的,e.g: log roller),所以coprocessor提供的3个接口里面的函数也都是针对这三个地方发生的事件的。

Endpoint

Next to event handling there is also a need to add custom operations to a cluster. User code can be deployed to the servers hosting the data to, for example, perform server-local computations.

Endpoints are dynamic extensions to the RPC protocol, adding callable remote procedures. Think of them as stored procedures, as known from RDBMSs. They may be combined with observer implementations to directly interact with the server-side state.

注:Endpoint让你不用care hbase的事件了,直接执行你的代码,它主要提供了两个proxy: coprocessorProxy和coprocessorExec

All of these interfaces are based on the Coprocessor interface to gain common features, but then implement their own specific functionality.

Finally, coprocessors can be chained, very similar to what the Java Servlet API does with request filters. The following discusses the various types available in the coprocessor framework.

注: 在HBase: The Definitive Guide中还有一部分就是细节上的讲述和教你如果写一些case了,对于Observer需要理清Coprocessor, CoprocessorEnvironment, and CoprocessorHost之间的关系就可以了,然后知道对哪些事件hook代码,如果获取资源;对于endpoint就是定义自己的endpoint并使用coprocessorPorxy和coprocessorExec把事情交给Hbase cluster。 

你可能感兴趣的:(database/nosql,distributed,system)