RPC是hbase中Master,RegionServer和Client三者之间通信交流的纽带。了解hbase的rpc机制能够为通过源码学习hbase奠定良好的基础。因为了解了hbase的rpc机制能够很快通过debug深入理解hbase各种机制(比方说flush,compaction,scan等请求)的流程。同时也便于碰到问题时,通过源码分析找到原因,毕竟源码面前了无秘密。
RPC(remote procedure call)即远程过程调用。对于本地调用,定义好一个函数以后,程序的其他部分通过调用该函数,就可以返回想要的结果。而RPC唯一的区别就是函数定义和函数调用通常位于不同的机器,因为涉及到不同的机器,所以RPC相比较本地函数调用多了通信部分。主要涉及到两个角色调用方(client端)和函数定义实现(server端)。RPC调用的流程如下面图所示(图片来自链接)。
HBase中的RPC是RegionServer,Master以及Client(如Hbase shell, JAVA client API)三者之间通信的纽带。RegionServer和Master作为hbase server端部分最核心的两个component,主要是通过提供RPC调用的服务来满足客户端的请求。当然RegionServer和Master之间服务也通过RPC来实现。
通过下面的代码片段,可以看到RegionServer提供的RPC服务主要是ClientService和AdminService这两个接口提供的服务。
1 protected List getServices() {
2 List bssi = new ArrayList(2);
3 bssi.add(new BlockingServiceAndInterface(
4 ClientService.newReflectiveBlockingService(this),
5 ClientService.BlockingInterface.class));
6 bssi.add(new BlockingServiceAndInterface(
7 AdminService.newReflectiveBlockingService(this),
8 AdminService.BlockingInterface.class));
9 return bssi;
10 }
其中,ClientServices接口定义如下,可以看到主要是提供数据操作的接口(Get,Mutate,scan等等)
1 service ClientService {
2 rpc Get(GetRequest)
3 returns(GetResponse);
4
5 rpc Mutate(MutateRequest)
6 returns(MutateResponse);
7
8 rpc Scan(ScanRequest)
9 returns(ScanResponse);
10
11 rpc BulkLoadHFile(BulkLoadHFileRequest)
12 returns(BulkLoadHFileResponse);
13
14 rpc ExecService(CoprocessorServiceRequest)
15 returns(CoprocessorServiceResponse);
16
17 rpc ExecRegionServerService(CoprocessorServiceRequest)
18 returns(CoprocessorServiceResponse);
19
20 rpc Multi(MultiRequest)
21 returns(MultiResponse);
22 }
AdminService的服务定义如下, 可以看到主要提供hbase表管理相关的操作,region的合并,split等等。
1 service AdminService {
2 rpc GetRegionInfo(GetRegionInfoRequest)
3 returns(GetRegionInfoResponse);
4
5 rpc GetStoreFile(GetStoreFileRequest)
6 returns(GetStoreFileResponse);
7
8 rpc GetOnlineRegion(GetOnlineRegionRequest)
9 returns(GetOnlineRegionResponse);
10
11 rpc OpenRegion(OpenRegionRequest)
12 returns(OpenRegionResponse);
13
14 rpc WarmupRegion(WarmupRegionRequest)
15 returns(WarmupRegionResponse);
16
17 rpc CloseRegion(CloseRegionRequest)
18 returns(CloseRegionResponse);
19
20 rpc FlushRegion(FlushRegionRequest)
21 returns(FlushRegionResponse);
22 ...52 }
通过下面的代码片段,可以看到Master主要四个接口的服务。MasterService和RegionServerStatusService,以及super.getServices()就是ClientServices和AdminService。
1 protected List getServices() {
2 List bssi = new ArrayList(4);
3 bssi.add(new BlockingServiceAndInterface(
4 MasterService.newReflectiveBlockingService(this),
5 MasterService.BlockingInterface.class));
6 bssi.add(new BlockingServiceAndInterface(
7 RegionServerStatusService.newReflectiveBlockingService(this),
8 RegionServerStatusService.BlockingInterface.class));
9 bssi.addAll(super.getServices());
10 return bssi;
11 }
MasterService的服务定义部分如下, 可以看到主要提供表DML相关的服务。
1 service MasterService {
2 /** Used by the client to get the number of regions that have received the updated schema */
3 rpc GetSchemaAlterStatus(GetSchemaAlterStatusRequest)
4 returns(GetSchemaAlterStatusResponse);
5
6 /** Get list of TableDescriptors for requested tables. */
7 rpc GetTableDescriptors(GetTableDescriptorsRequest)
8 returns(GetTableDescriptorsResponse);
9
10 /** Get the list of table names. */
11 rpc GetTableNames(GetTableNamesRequest)
12 returns(GetTableNamesResponse);
13
14 /** Return cluster status. */
15 rpc GetClusterStatus(GetClusterStatusRequest)
16 returns(GetClusterStatusResponse);
17
18 /** return true if master is available */
19 rpc IsMasterRunning(IsMasterRunningRequest) returns(IsMasterRunningResponse);
20
21 /** Adds a column to the specified table. */
22 rpc AddColumn(AddColumnRequest)
23 returns(AddColumnResponse);
24
25 /** Deletes a column from the specified table. Table must be disabled. */
26 rpc DeleteColumn(DeleteColumnRequest)
27 returns(DeleteColumnResponse);
28
29 /** Modifies an existing column on the specified table. */
30 rpc ModifyColumn(ModifyColumnRequest)
31 returns(ModifyColumnResponse);
32
33 /** Move the region region to the destination server. */
34 rpc MoveRegion(MoveRegionRequest)
35 returns(MoveRegionResponse);
36 ...236 }
而RegionServerStatusService主要是与regionserver状态有关的接口。
1 service RegionServerStatusService {
2 /** Called when a region server first starts. */
3 rpc RegionServerStartup(RegionServerStartupRequest)
4 returns(RegionServerStartupResponse);
5
6 /** Called to report the load the RegionServer is under. */
7 rpc RegionServerReport(RegionServerReportRequest)
8 returns(RegionServerReportResponse);
9
10 /**
11 * Called by a region server to report a fatal error that is causing it to
12 * abort.
13 */
14 rpc ReportRSFatalError(ReportRSFatalErrorRequest)
15 returns(ReportRSFatalErrorResponse);
16
17 /** Called to get the sequence id of the last MemStore entry flushed to an
18 * HFile for a specified region. Used by the region server to speed up
19 * log splitting. */
20 rpc GetLastFlushedSequenceId(GetLastFlushedSequenceIdRequest)
21 returns(GetLastFlushedSequenceIdResponse);
22
23 /**
24 * Called by a region server to report the progress of a region
25 * transition. If the request fails, the transition should
26 * be aborted.
27 */
28 rpc ReportRegionStateTransition(ReportRegionStateTransitionRequest)
29 returns(ReportRegionStateTransitionResponse);
30 }
AdminService的服务定义如下, 可以看到主要提供hbase表管理相关的操作,region的合并,split等等。
1 service AdminService {
2 rpc GetRegionInfo(GetRegionInfoRequest)
3 returns(GetRegionInfoResponse);
4
5 rpc GetStoreFile(GetStoreFileRequest)
6 returns(GetStoreFileResponse);
7
8 rpc GetOnlineRegion(GetOnlineRegionRequest)
9 returns(GetOnlineRegionResponse);
10
11 rpc OpenRegion(OpenRegionRequest)
12 returns(OpenRegionResponse);
13
14 rpc WarmupRegion(WarmupRegionRequest)
15 returns(WarmupRegionResponse);
16
17 rpc CloseRegion(CloseRegionRequest)
18 returns(CloseRegionResponse);
19
20 rpc FlushRegion(FlushRegionRequest)
21 returns(FlushRegionResponse);
22 ...52 }
通过下面的代码片段,可以看到Master主要四个接口的服务。MasterService和RegionServerStatusService,以及super.getServices()就是ClientServices和AdminService。
1 protected List getServices() {
2 List bssi = new ArrayList(4);
3 bssi.add(new BlockingServiceAndInterface(
4 MasterService.newReflectiveBlockingService(this),
5 MasterService.BlockingInterface.class));
6 bssi.add(new BlockingServiceAndInterface(
7 RegionServerStatusService.newReflectiveBlockingService(this),
8 RegionServerStatusService.BlockingInterface.class));
9 bssi.addAll(super.getServices());
10 return bssi;
11 }
MasterService的服务定义部分如下, 可以看到主要提供表DML相关的服务。
1 service MasterService {
2 /** Used by the client to get the number of regions that have received the updated schema */
3 rpc GetSchemaAlterStatus(GetSchemaAlterStatusRequest)
4 returns(GetSchemaAlterStatusResponse);
5
6 /** Get list of TableDescriptors for requested tables. */
7 rpc GetTableDescriptors(GetTableDescriptorsRequest)
8 returns(GetTableDescriptorsResponse);
9
10 /** Get the list of table names. */
11 rpc GetTableNames(GetTableNamesRequest)
12 returns(GetTableNamesResponse);
13
14 /** Return cluster status. */
15 rpc GetClusterStatus(GetClusterStatusRequest)
16 returns(GetClusterStatusResponse);
17
18 /** return true if master is available */
19 rpc IsMasterRunning(IsMasterRunningRequest) returns(IsMasterRunningResponse);
20
21 /** Adds a column to the specified table. */
22 rpc AddColumn(AddColumnRequest)
23 returns(AddColumnResponse);
24
25 /** Deletes a column from the specified table. Table must be disabled. */
26 rpc DeleteColumn(DeleteColumnRequest)
27 returns(DeleteColumnResponse);
28
29 /** Modifies an existing column on the specified table. */
30 rpc ModifyColumn(ModifyColumnRequest)
31 returns(ModifyColumnResponse);
32
33 /** Move the region region to the destination server. */
34 rpc MoveRegion(MoveRegionRequest)
35 returns(MoveRegionResponse);
36 ...236 }
而RegionServerStatusService主要是与regionserver状态有关的接口。
1 service RegionServerStatusService {
2 /** Called when a region server first starts. */
3 rpc RegionServerStartup(RegionServerStartupRequest)
4 returns(RegionServerStartupResponse);
5
6 /** Called to report the load the RegionServer is under. */
7 rpc RegionServerReport(RegionServerReportRequest)
8 returns(RegionServerReportResponse);
9
10 /**
11 * Called by a region server to report a fatal error that is causing it to
12 * abort.
13 */
14 rpc ReportRSFatalError(ReportRSFatalErrorRequest)
15 returns(ReportRSFatalErrorResponse);
16
17 /** Called to get the sequence id of the last MemStore entry flushed to an
18 * HFile for a specified region. Used by the region server to speed up
19 * log splitting. */
20 rpc GetLastFlushedSequenceId(GetLastFlushedSequenceIdRequest)
21 returns(GetLastFlushedSequenceIdResponse);
22
23 /**
24 * Called by a region server to report the progress of a region
25 * transition. If the request fails, the transition should
26 * be aborted.
27 */
28 rpc ReportRegionStateTransition(ReportRegionStateTransitionRequest)
29 returns(ReportRegionStateTransitionResponse);
30 }
2.2 Hbase 中rpc图:
这里应该理解三者是如何通信的呢?
参考:https://www.cnblogs.com/superhedantou/p/5840635.html
2.3 client 代码分析:
参考博文:
https://www.cnblogs.com/duanxz/p/4512929.html
总结一下:
后总结一下,HRegionServer作用如下:
HRegion定位过程:
client -> zookeeper -> -ROOT- -> .META -> HRegion地址 -> HRegionServer-> HRegion
在这个过程中客户端先通过zk找到Root表所在的RegionServer(通过zk上的/hbase/root-region-server节点获取),然后找到Meta表对应的HRegion地址,最后在Meta表里找到目标表所在的HRegion地址,这个过程客户端并没有和HMaster进行交互。
Client端并不会每次数据操作都做这整个路由过程,因为HRegion的相关信息会缓存到本地,当有变化时,通过zk监听器能够及时感知。
数据写入过程: