HBaseAdmin

这篇文章主要想介绍一下HBaseAdmin,这个家伙比我想象中的要强大许多,它能够提供下面几个方面的操作:

Just as with the client API you also have an API for administrative tasks at your disposal. Compare this to the Data Definition Language (DDL) found in RDBMS systems - while the client API is more an analog to the Data Manipulation Language (DML).

It provides operations to create tables with specific column families, check for table existence, alter table and column family definitions, drop tables, and much more. The provided functions can be grouped into related operations, discussed separately below.


1)Basic Operations 

基本的操作主要就是HBaseAdmin、getMaster、isMasterRunning、getConnection、getConfiguration、close等几个函数,HBaseAdmin就是构造函数,getMaster能够返回一个client与master交互的接口HMasterInterface、isMasterRunning自然就是判断master是否live、getConnection则获取一个连接、getConfiguration用来获取config信息、close用来释放HBaseAdmin所占用的资源。

2)Table Operations

Table相关的各种操作,如createTable的各种不同参数的函数、查看一个table是否存在的tableExists函数、查看所有table的listTables、根据table name可以到对应描述符的getTableDescriptior、删除一个表的deleteTable、disable或者enable一个表的disableTable/enableTable/isTableEnabled/isTableDisabled等函数,有了它们,想怎么操作一个表就不是问题,看谁不爽就可以disable + delete将其干掉。

3)Schema Operations

Schema相关的操作主要针对的是HBase的Schema,因为HBase的一个表在使用前需要定义table name + column family,所以我们可以使用addColumn、deleteColumn、modifyColumn来定义或者其他schema相关的操作。

4)Cluster Operations

这个是我认为最强大的地方,有了它们,整个cluster就在你的掌控之中了,通过调用相应接口能够干很多高级的操作,直接看英文:

The last group of operation the HBaseAdmin class exposes is related to cluster operations. They allow you to check the status of the cluster, and perform tasks on tables and/or regions.

You can use checkHBaseAvailable() to verify that your client application can communicate with the remote HBase cluster, as specified in the given configuration file. If it fails to do so, an exception is thrown, in other words this method does not return a boolean flag, but either silently succeeds - or throws the said error.

The getClusterStatus() call on the other hand allows you to retrieve an instance of the ClusterStatus class, containing detailed information about the cluster status. See the section called “Cluster Status Information” for what you get provided with.

Use these calls to close regions that have previously been deployed to region servers. Any enabled table has all regions enabled, so you could actively close and undeploy one.

You need to supply the exact regionname as stored in the .META. table. Further you may optionally supply the hostAndPort parameter, that overrides the server assignment as found in the .META. as well.

Using this close call does bypass any master notification, i.e., the region is directly closed by the region server, unseen by the master node.

As updates to a region, and the table in general, accumulate the MemStoreinstances of the region servers fill with unflushed modifications. A client application can use these synchronous methods to flush such pending records to disk, before they are implicitly written by hitting the Memstore Flush Size(see the section called “Table Properties”) at a later time.

The method takes either a region name, or a table name. The value provided by your code is tested if it matches an existing table - if it does it is assumed to be a table, otherwise it is treated as a region name. If you specify neither a proper table nor region name a UnknownRegionException is thrown.

Similar to the above, you must either give a table or region name. The call itself is asynchronous, as compactions can potentially take a long time to complete. Invoking this method queues the table, or region, for compaction, which is executed in the background by the server hosting the named region, or by all servers hosting any region of the given table (see the section called “Auto Sharding” for details on compactions).

Same as the compact() calls, but queue the region, or table, for a major compaction instead. In case a table name is given, the administrative API iterates over all regions of the table and invokes the compaction call implicitly for each of them.

Using these calls allow you to split a specific region, or table. In case a table name is given, it iterates over all regions of that table and implicitly invokes the split command on each of them.

A noted exception to this rule is when the splitPoint parameter is given. In that case the split() command will try to split the given region at the provided row key. In case of specifying a table name, all regions are checked and the one containing the splitPoint is being split at the given key.

The splitPoint must be a valid row key, and - in case you specify a region name - be part of the region to be split. It also must be greater than the regions start key, since splitting a region at its start key would make no sense. If you fail to give the correct row key the split request is ignored without reporting back to the client. The region server currently hosting the region will log this locally with the following message:

When a client requires for a region to be deployed or undeployed from the region servers, it can invoke these calls. The first would assign a region, based on the overall assignment plan, while the second would unassign the givem region.

The force parameter set to true has different meanings for each of the calls: first for assign() it forces the region to marked as unassigned in ZooKeeper before continuing in its attempt to assign the region to a new region server. Be careful when using this on already assigned regions.

Secondly, for unassign() it means that a region already marked to be unassigned - for example from a previous call to unassign() - is forced to be unassigned again. If force would be set to false then this would have no effect.

Using the move() call enables a client to actively control which server is hosting what regions. You can move a region from its current region server to a new one. The destServerName parameter can be set to null to pick a new server at random, otherwise it must be a valid server name, running a region server process. If the server name is wrong, or currently not responding then the region is deployed to a different server instead. In a worst case scenario the move could fail and leave the region unassigned.

The first method allows you to switch the region balancer on or off. When the balancer is enabled then a call to balancer() will start the process of moving regions from the servers with more deployed to those with less deployed regions. the section called “Load Balancing” explains how this works in detail.

These calls either shut down the entire cluster, stop the master server, or a particular region server only. Once invoked the affected servers will be stopped, i.e., there is no delay nor a way to revert the process.

你可能感兴趣的:(hbase)