RegionServer的整体结构
一个region server包含了五部分功能:
1.和zookeeper相关的线程
MasterAddressTracker负责捕获master节点的状态
ClusterStatusTracker追踪hbase集群的状态
CatalogTracker跟踪root表meta表和region的状态
SplitlogWorker竞争获取znode上的splitlog,并切分HLog按照region分组,放到相应region
的recovered.edits目录下
2.region相关的线程
regionserver包含了一个region的集合,每个具体的操作会分到一个指定的region去处理
CompactionChecker用于周期性的检查是否需要compact,如需要交给CompactSplitThread处理
CompactSplitThread用于合并和切分处理的线程
MemStoreFlusher如果memstore满了则flush到HDFS中
3.WAL相关
HLog按照hbase的架构,一个regionserver只有一个hlog,多个region是共享的
LogRoller用于日志回滚
4.和客户端通讯
RPC server模块,这里包含了很多线程,listener,select,handler线程
Leases 用于租借时间检查
5.和master及监控相关
HMasterRegionInterface用户管理hbase
HServerLoad检查hbase负载,并和master通讯
HealthCheckChore服务的监控检查
RegionServerMetrics 获取metrics相关的数据
web server,启用一个jettyserver,可以监控region相关的信息
RegionServer的相关配置
参数名称 | 默认值 | 含义 |
hbase.client.retries.number | 10 | 客户端的重试次数 |
hbase.regionserver.msginterval | 3000 | 未知 |
hbase.regionserver.checksum.verify | false | 是否启用hbase的 checksum |
hbase.server.thread.wakefrequency | 10秒 | 检查线程的频率 |
hbase.regionserver.numregionstoreport | 10 | 未知 |
hbase.regionserver.handler.count | 10 | 处理用户表的工作 线程数量 |
hbase.regionserver.metahandler.count | 10 | 处理meta和root表 的工作线程数量 |
hbase.rpc.verbose | false | 未知 |
hbase.regionserver.nbreservationblocks | false | 未知 |
hbase.regionserver.compactionChecker. majorCompactPriority |
max int | 未知 |
hbase.regionserver.executor.openregion.threads | 3 | 开打用户表region 的线程数量 |
hbase.regionserver.executor.openroot.threads | 1 | 打开root表region 的线程数量 |
hbase.regionserver.executor.openmeta.threads | 1 | 打开meta表region 的线程数量 |
hbase.regionserver.executor.closeregion.threads | 3 | 关闭用户表region 的线程数量 |
hbase.regionserver.executor.closeroot.threads | 1 | 关闭root表region 的线程数量 |
hbase.regionserver.executor.closemeta.threads | 1 | 关闭meta表region 的线程数量 |
HRegionServer的启动入口类
org.apache.hadoop.hbase.regionserver.HRegionServer
hbase-site.xml中可以配置参数 hbase.regionserver.impl来自定自己的实现,但必须继承HRegionServer
之后调用HRegionServerCommandLine (这个类继承自ServerCommandLine,所以master也有一个实现)
HRegionServerCommandLine使用hadoop提供的ToolRunner去运行
ToolRunner#run(Configuration,Tool,String[])
ToolRunner会调用GenericOptionsParser,解析一些固定的参数,如-conf,-D,-fs,-files 这样的参数
解析好之后,配置configuration对象,然后将启动参数传给Tool接口的实现
所以ToolRunner 就是一个启动参数解析,配置configuration对象的工具类,然后将这些信息交给Tool实现类
初始化-调用HRgionServer构造函数
HRegionServerCommandLine反射创建HRegionServer(或其自定义子类)
1.这里对客户端连接配置做了一些初始化工作
2.配置host,DNS相关
3.HRegionServer 调用HBaseRPC,创建一个RpcEngine实现,这里是WritableRpcEngine
hbase-site.xml中可以配置参数 hbase.rpc.engine来自定自己的实现,但必须继承RpcEngine接口
HBaseRPC调用 getServer()获得一个具体的RpcServer实现,即通过RpcEngine --> RpcServer
这里获取的是WritableRpcEngine的内部类WritableRpcEngine$Server,它继承了HBaserServer
4.创建metrics线程(for JVM),LRU检查线程
5.连接zookeeper,做一些验证工作(kerbose)
启动,HRegionServer#run (在新线程中启动)
之后就开始启动server了,启动是从HRegionServer#run()开始的(新启动的线程)
1.创建zookeeper监听线程
2.创建和master通讯的线程
3.创建WAL相关的线程
4.创建metrics线程(for hbase)
5.创建日志回滚线程、cache flush线程、compact线程、心跳检查线程、租借检查线程
6.创建jetty线程
7.创建response线程,listener线程,handle线程,高优先级handle(处理meta表)线程,复制handle线程
8.创建日志切分线程
这里还定义了线程池,以后会通过处理请求的时候,可能会开启这些线程:(在第五步的时候定义的)
hbase.regionserver.executor.openregion.threads3(默认)
hbase.regionserver.executor.openroot.threads1
hbase.regionserver.executor.openmeta.threads1
hbase.regionserver.executor.closeregion.threads3
hbase.regionserver.executor.closeroot.threads1
hbase.regionserver.executor.closemeta.threads1
日志切分线程在启动的时候可能会有很多事情要做
之后整个region server就启动完成了
HRegionServer包含一些功能
HRegion集合
Leases(租借时间检查)
HMasterRegionInterface(管理hbase)
HServerLoad(hbase负载)
CompactSplitThread(用于合并处理)
CompactionChecker(周期性的检查是否需要compact,如需要交给CompactSplitThread处理)
MemStoreFlusher(用于刷新memstore)
HLog(WAL相关)
LogRoller(日志回滚)
ZooKeeperWatcher(zk监听)
SplitLogWorker(用于切分日志)
ExecutorService(用户启动open,close HRegion的线程池)
ReplicationSourceService和ReplicationSinkService(replication相关)
HealthCheckChore(健康检查)
RegionServerMetrics(监控)
一些监听类
MasterAddressTracker
CatalogTracker
ClusterStatusTracker
postOpenDeployTasks 用于更新root表或meta表
各种CURD,scanner,increment操作
multi操作(对于delete和put)
对HRegion的flush,close,open(提交到线程池去做)
split,compact操作,这些最终由一个具体的HRegion去完成
RegionServer的线程
用于小合并的
Daemon Thread [regionserver60020-smallCompactions-1392958977368] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
PriorityBlockingQueue
ThreadPoolExecutor.getTask() line: 957
ThreadPoolExecutor$Worker.run() line: 917
Thread.run() line: 662
打开用户表region的线程
Thread [RS_OPEN_REGION-myhost,60020,1392868973177-0] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
ExecutorService$TrackingThreadPoolExecutor(ThreadPoolExecutor).getTask() line: 957
ThreadPoolExecutor$Worker.run() line: 917
Thread.run() line: 662
这个是跟zookeeper通讯的线程
Daemon Thread [PostOpenDeployTasks:1028785192-EventThread] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
ClientCnxn$EventThread.run() line: 491
跟zookeeper通讯的线程
Daemon Thread [PostOpenDeployTasks:1028785192-SendThread(myhost:2181)] (Suspended)
EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 210
EPollSelectorImpl.doSelect(long) line: 65
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
ClientCnxnSocketNIO.doTransport(int, List
ClientCnxn$SendThread.run() line: 1068
专门处理META表的线程
Thread [RS_OPEN_META-myhost,60020,1392868973177-0] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
ExecutorService$TrackingThreadPoolExecutor(ThreadPoolExecutor).getTask() line: 957
ThreadPoolExecutor$Worker.run() line: 917
Thread.run() line: 662
专门处理ROOT表的线程
Thread [RS_OPEN_ROOT-myhost,60020,1392868973177-0] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
ExecutorService$TrackingThreadPoolExecutor(ThreadPoolExecutor).getTask() line: 957
ThreadPoolExecutor$Worker.run() line: 917
Thread.run() line: 662
日志拆分线程
Thread [SplitLogWorker-myhost,60020,1392868973177] (Suspended)
Object.wait(long) line: not available [native method]
Object.wait() line: 485
SplitLogWorker.taskLoop() line: 219
SplitLogWorker.run() line: 179
Thread.run() line: 662
jetty的扫描线程
Daemon Thread [Timer-0] (Suspended)
Object.wait(long) line: not available [native method]
TimerThread.mainLoop() line: 509
TimerThread.run() line: 462
jetty的接收线程
Thread [1142818380@qtp-1463348369-1 - Acceptor0 [email protected]:60030] (Suspended)
EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 210
EPollSelectorImpl.doSelect(long) line: 65
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
SelectorManager$SelectSet.doSelect() line: 498
SelectChannelConnector$1(SelectorManager).doSelect(int) line: 192
SelectChannelConnector.accept(int) line: 124
AbstractConnector$Acceptor.run() line: 708
QueuedThreadPool$PoolThread.run() line: 582
jetty的工作线程
Thread [869247333@qtp-1463348369-0] (Suspended)
Object.wait(long) line: not available [native method]
QueuedThreadPool$PoolThread.run() line: 626
租借检查相关线程
Daemon Thread [regionserver60020.leaseChecker] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.parkNanos(Object, long) line: 196
AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) line: 2025
DelayQueue
Leases.run() line: 83
Thread.run() line: 662
检查压缩的线程
Daemon Thread [regionserver60020.compactionChecker] (Suspended)
Object.wait(long) line: not available [native method]
Sleeper.sleep(long) line: 91
HRegionServer$CompactionChecker(Chore).run() line: 75
Thread.run() line: 662
检查memstore flush的线程
Daemon Thread [regionserver60020.cacheFlusher] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.parkNanos(Object, long) line: 196
AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) line: 2025
DelayQueue
DelayQueue
MemStoreFlusher.run() line: 220
Thread.run() line: 662
日志回滚的线程
Daemon Thread [regionserver60020.logRoller] (Suspended)
Object.wait(long) line: not available [native method]
LogRoller.run() line: 77
Thread.run() line: 662
metrics线程
Daemon Thread [Timer thread for monitoring jvm] (Suspended)
Object.wait(long) line: not available [native method]
TimerThread.mainLoop() line: 509
TimerThread.run() line: 462
日志同步线程
Daemon Thread [regionserver60020.logSyncer] (Suspended)
Object.wait(long) line: not available [native method]
HLog$LogSyncer.run() line: 1265
Thread.run() line: 662
HDFS客户端租借检查线程
Daemon Thread [LeaseChecker] (Suspended)
Thread.sleep(long) line: not available [native method]
DFSClient$LeaseChecker.run() line: 1379
Daemon(Thread).run() line: 662
未知
Thread [regionserver60020] (Suspended)
Object.wait(long) line: not available [native method]
Sleeper.sleep(long) line: 91
Sleeper.sleep() line: 55
HRegionServer.run() line: 787
Thread.run() line: 662
LRU相关的线程
Daemon Thread [LRU Statistics #0] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.parkNanos(Object, long) line: 196
AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) line: 2025
DelayQueue
ScheduledThreadPoolExecutor$DelayedWorkQueue.take() line: 609
ScheduledThreadPoolExecutor$DelayedWorkQueue.take() line: 602
ScheduledThreadPoolExecutor(ThreadPoolExecutor).getTask() line: 957
ThreadPoolExecutor$Worker.run() line: 917
Thread.run() line: 662
LRU缓存检查线程
Daemon Thread [main.LruBlockCache.EvictionThread] (Suspended)
Object.wait(long) line: not available [native method]
LruBlockCache$EvictionThread(Object).wait() line: 485
LruBlockCache$EvictionThread.run() line: 612
Thread.run() line: 662
RPC监控线程
Daemon Thread [Timer thread for monitoring rpc] (Suspended)
Object.wait(long) line: not available [native method]
TimerThread.mainLoop() line: 509
TimerThread.run() line: 462
reader线程
Daemon Thread [IPC Reader 0 on port 60020] (Suspended)
EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 210
EPollSelectorImpl.doSelect(long) line: 65
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
EPollSelectorImpl(SelectorImpl).select() line: 84
HBaseServer$Listener$Reader.doRunLoop() line: 528
HBaseServer$Listener$Reader.run() line: 514
ThreadPoolExecutor$Worker.runTask(Runnable) line: 895
ThreadPoolExecutor$Worker.run() line: 918
Thread.run() line: 662
工作线程(可以配置多个) REPL是用于复制的,PRI是用于处理META表的,IPC是普通的工作线程
Daemon Thread [REPL IPC Server handler 0 on 60020] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
HBaseServer$Handler.run() line: 1398
Daemon Thread [PRI IPC Server handler 0 on 60020] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
HBaseServer$Handler.run() line: 1398
Daemon Thread [IPC Server handler 1 on 60020] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 156
AbstractQueuedSynchronizer$ConditionObject.await() line: 1987
LinkedBlockingQueue
HBaseServer$Handler.run() line: 1398
select线程
Daemon Thread [IPC Server listener on 60020] (Suspended)
EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 210
EPollSelectorImpl.doSelect(long) line: 65
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
EPollSelectorImpl(SelectorImpl).select() line: 84
HBaseServer$Listener.run() line: 636
响应线程
Daemon Thread [IPC Server Responder] (Suspended)
EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 210
EPollSelectorImpl.doSelect(long) line: 65
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
HBaseServer$Responder.doRunLoop() line: 825
HBaseServer$Responder.run() line: 808
未知
Daemon Thread [hbase-tablepool-1-thread-1] (Suspended)
Object.wait(long) line: not available [native method]
HBaseClient$Call(Object).wait() line: 485
HBaseClient.call(Writable, InetSocketAddress, Class
WritableRpcEngine$Invoker.invoke(Object, Method, Object[]) line: 86
$Proxy14.multi(MultiAction) line: not available
HConnectionManager$HConnectionImplementation$3$1.call() line: 1427
HConnectionManager$HConnectionImplementation$3$1.call() line: 1425
HConnectionManager$HConnectionImplementation$3$1(ServerCallable
HConnectionManager$HConnectionImplementation$3.call() line: 1434
HConnectionManager$HConnectionImplementation$3.call() line: 1422
FutureTask$Sync.innerRun() line: 303
FutureTask
ThreadPoolExecutor$Worker.runTask(Runnable) line: 895
ThreadPoolExecutor$Worker.run() line: 918
Thread.run() line: 662
总体来说线程有这么一些
1.日志回滚,日志syn同步,日志切分
2.大合并,小合并线程
3.LRU缓存检查,memstore检查
4.HDFS客户端,HDFS客户端超时检查,zookeeper通讯
5.专门处理root表,专门处理meta表
6.jetty线程
7.listener线程,reader线程,handle线程,响应线程,用于处理META的handle线程,用于复制的handle线程
postOpenDeployTask线程(用于更新META表)
具体逻辑如下:
//PostOpenDeployTask用于更新META表的线程
OpenRegionHandler$PostOpenDeployTasksThread#run() {
HRegionServer#postOpenDeployTasks();
}
//首先看是否需要刷新Store中的数据
//之后根据是ROOT表META表还是普通表再做更新
HRegionServer#postOpenDeployTasks() {
for (Store s : r.getStores().values()) {
if (s.hasReferences() || s.needsCompaction()) {
getCompactionRequester().requestCompaction(r, s, "Opening Region", null);
}
}
// Update ZK, ROOT or META
if (r.getRegionInfo().isRootRegion()) {
RootLocationEditor.setRootLocation(getZooKeeper(),this.serverNameFromMasterPOV);
} else if (r.getRegionInfo().isMetaRegion()) {
MetaEditor.updateMetaLocation(ct, r.getRegionInfo(),this.serverNameFromMasterPOV);
} else {
if (daughter) {
// If daughter of a split, update whole row, not just location.
MetaEditor.addDaughter(ct, r.getRegionInfo(),
this.serverNameFromMasterPOV);
} else {
MetaEditor.updateRegionLocation(ct, r.getRegionInfo(),
this.serverNameFromMasterPOV);
}
}
}
//更新ROOT表在ZK中的信息
RootLocationEditor#setRootLocation() {
ZKUtil.createAndWatch(zookeeper, zookeeper.rootServerZNode,
Bytes.toBytes(location.toString()));
}
//更新META表的内容,这里是创建了一个Put对象然后去更新
MetaEditor#updateLocation() {
Put put = new Put(regionInfo.getRegionName());
put.add("info", "server",Bytes.toBytes(sn.getHostAndPort()));
put.add("info", "serverstartcode",Bytes.toBytes(sn.getStartcode()));
HTable table = isRootTableRow(row)? getRootHTable(catalogTracker):
getMetaHTable(catalogTracker);
table.put(put);
}
//如果是在做split,则更新这个row的所有KeyValue
//否则就更新server和serverstartcode两个KeyValue即可
MetaEditor#addDaughter() {
Put put = new Put(regionInfo.getRegionName());
p.add("info", "regioninfo",Writables.getBytes(hri));
if (ServerName != null) {
put.add("info", "server",Bytes.toBytes(sn.getHostAndPort()));
put.add("info", "serverstartcode",Bytes.toBytes(sn.getStartcode()));
}
putToMetaTable(catalogTracker, put);
}
leaseChecker线程(执行超时后销毁这些操作)
这个类的作用是当某些执行超时,比如get,scan等,需要释放相应的scan或者行锁等
这里是在异步的线程中执行的
具体逻辑如下:
//租借时间检查,当一些执行操作超时后
//需要释放这些操作
Leases#run() {
Lease lease = leaseQueue.poll(leaseCheckFrequency, TimeUnit.MILLISECONDS);
lease.getListener().leaseExpired();
}
//行操作执行尝试则释放行锁
RowLockListener#leaseExpired() {
Integer r = rowlocks.remove(this.lockName);
if (r != null) {
region.releaseRowLock(r);
}
}
//当scan执行超时就关闭这个scan
ScannerListener#leaseExpired() {
RegionScanner s = scanners.remove(this.scannerName);
HRegion region = getRegion(s.getRegionInfo().getRegionName());
s.close();
}
参考
HBase深入分析之RegionServer
Hbase系统架构及数据结构
HRegionServer启动过程