HMaster的整体结构
一个master包含如下部分:
1.对外的接口
RPC服务
jetty web服务
Master MBean
其中RPC服务包括了若干listener,reader,以及handler线程(IPC Handler和 用于replication的IPC Handler)
2.执行服务
都是一些线程池,当有任务出现时就就会交给这些类来处理
这些线程有
MASTER_SERVER_OPERATIONS
MASTER_META_SERVER_OPERATIONS
MASTER_CLOSE_REGION
MASTER_OPEN_REGION
MASTER_TABLE_OPERATIONS
相关的hanlder有:
OpenRegionHandler
ClosedRegionHandler
ServerShutdownHandler
MetaServerShutdownHandler
DeleteTableHandler
DisableTableHandler
EnableTableHandler
ModifyTableHandler
CreateTableHandler
Executor Service |
Event |
Event Handler |
Threads (Default) |
Master Open Region |
RS_ZK_REGION_OPENED |
OpenRegionHandler |
5 |
Master Close Region |
RS_ZK_REGION_CLOSED |
ClosedRegionHandler |
5 |
Master Server Operations |
RS_ZK_REGION_SPLITM_SERVER_SHUTDOWN |
SplitRegionHandlerServerShutdownHandler |
3 |
Master Meta Server Operations |
M_META_SERVER_SHUTDOWN |
MetaServerShutdownHandler |
5 |
Master Table Operations |
C_M_DELETE_TABLE C_M_DISABLE_TABLE C_M_ENABLE_TABLE C_M_MODIFY_TABLE C_M_CREATE_TABLE |
DeleteTableHandler DisableTableHandler EnableTableHandler ModifyTableHandler CreateTableHandler |
1 |
3.和zookeeper相关的线程
- 1.ActiveMasterManager
- 会在ZK中创建/hbase/master短暂节点,master将其信息记录到这个节点下
- 如果是备份的master会在这里阻塞,直到这个节点为空
-
- 2.RegionServerTracker
- 用于监控region server,通过监控ZK的/hbase/rs节点,获取region server的状态
- 当region server上线或者下线,ZK都会触发通知事件
-
- 3.DrainingServerTracker
- 没太明白,貌似是处理RS增加和删除事件用的
-
- 4.CatalogTracker
- 用来监控META表和ROOT表
-
- 5.ClusterStatusTracker
- 用于监控ZK的/shutdown节点,监控是否有机器宕机了
-
- 6.AssignmentManager
- 用于管理和分配region的
-
- 7.RootRegionTracker
- 用于管理和监控/root-region-server 节点的
-
- 8.LoadBalancer
- 用于平衡各个regoin server上的region
-
- 9.MetaNodeTracker
- 监控/unassigned 节点,分配那些未在META表中存在的region
-
- 此外在 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher类中还负责管理一些ZK节点
- baseZNode /hbase
- assignmentZNode /unassigned
- rsZNode /rs
- drainingZNode /draining
- masterTableZNode /table
- masterTableZNode92 /table92 (用于hbase0.92版本)
- splitLogZNode /splitlog
- backupMasterAddressesZNode /backup-masters
- clusterStateZNode /shutdown
- masterAddressZNode /master
- clusterIdZNode /hbaseid
ZK监听相关的类图
4.文件接口和其他
MasterFileSystem
用于创建META表和ROOT表,.oldlog目录,hbase.version文件等
LogCleaner
用于定期的清理.oldlog目录中的内容
HFileCleaner
用于定期清理归档目录下的内容
其他包括后台线程如LogCleaner和HFileCleaner等
ServerManager 维护一个在线和下线的RS列表
Balancer 用于执行region均衡的后台线程
HMaster的相关配置
参数名称 |
默认值 |
含义 |
hbase.master.handler.count |
25 |
工作线程大小 |
hbase.master.buffer.for.rs.fatals |
1M |
|
mapred.task.id |
|
|
hbase.master.wait.for.log.splitting |
false |
|
zookeeper.session.timeout |
180秒 |
|
hbase.master.backup |
|
|
hbase.master.impl |
|
|
hbase.master.event.waiting.time |
1000 |
|
HMaster的启动入口类
org.apache.hadoop.hbase.master.HMaster
hbase-site.xml中可以配置参数 hbase.master.impl来自定自己的实现,但必须继承HMaster
之后调用HMasterCommandLine (这个类继承自ServerCommandLine)
HMasterCommandLine使用hadoop提供的ToolRunner去运行
ToolRunner#run(Configuration,Tool,String[])
ToolRunner会调用GenericOptionsParser,解析一些固定的参数,如-conf,-D,-fs,-files 这样的参数
解析好之后,配置configuration对象,然后将启动参数传给Tool接口的实现
所以ToolRunner 就是一个启动参数解析,配置configuration对象的工具类,然后将这些信息交给Tool实现类
调用顺序是
1.HMaster#main()
2.HMasterCommandLine#doMain()
3.ToolRunner#run()
4.HMasterCommandLine#run()
5.HMasterCommandLine#startMaster()
6.HMaster#constructMaster()
7.反射调用HMaster的构造函数
初始化-调用HRgionServer构造函数
1.配置host,NDS相关
2.配置RPC连接,创建RPC连接
3.初始化ZK认证
4.创建ZooKeeperWatcher(和ZK相关的线程),RPC服务,metrics
5.创建HealthCheckChore
6.配置splitlog相关
启动,HMaster#run (在新线程中启动)
-
-
- HMaster#run() {
- becomeActiveMaster(startupStatus);
- finishInitialization(startupStatus, false);
- }
-
-
-
- HMaster#becomeActiveMaster() {
- this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,this);
- this.zooKeeper.registerListener(activeMasterManager);
- while (!amm.isActiveMaster()) {
- Thread.sleep(c.getInt("zookeeper.session.timeout", 180 * 1000));
- }
- this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
- this.clusterStatusTracker.start();
- return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,this.clusterStatusTracker);
- }
-
-
-
-
-
-
-
-
- HMaster#finishInitialization() {
-
- fileSystemManager = new MasterFileSystem();
- tableDescriptors = new FSTableDescriptors(fileSystemManager.getFileSystem(),fileSystemManager.getRootDir());
-
-
-
-
- initializeZKBasedSystemTrackers();
-
-
-
- startServiceThreads();
-
-
-
- for (ServerName sn: regionServerTracker.getOnlineServers()) {
- ServerManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD);
- }
-
-
- if (waitingOnLogSplitting) {
- fileSystemManager.splitAllLogs(servers);
- }
-
-
- assignRoot();
- assignMeta();
- enableServerShutdownHandler();
-
-
- for (ServerName curServer : failedServers) {
- serverManager.expireServer(curServer);
- }
- DefaultLoadBalancer.setMasterServices();
- startCatalogJanitorChore();
- registerMBean();
- }
-
-
-
- HMaster#assignRoot() {
-
-
- processRegionInTransitionAndBlockUntilAssigned();
- verifyRootRegionLocation();
- getRootLocation();
- expireIfOnline();
-
-
-
-
-
- }
HMaster#run的时序图如下
HMaster包含的一些变量
InfoServer
ZooKeeperWatcher
ActiveMasterManager
RegionServerTracker
DrainingServerTracker
RPCServer
MasterMetrics
MasterFileSystem
ServerManager
AssignmentManager
CatalogTracker
ClusterStatusTracker
CatalogJanitor
LogCleaner
HFileCleaner
TableDescriptors
SnapshotManager
HealthCheckChore
HMaster的线程
RPC相关的的listener线程,reader线程,handler线程
Daemon Thread [IPC Server listener on 60000] (Suspended)
Daemon Thread [IPC Reader 3 on port 60000] (Suspended)
Daemon Thread [IPC Server handler 0 on 60000] (Suspended)
Daemon Thread [REPL IPC Server handler 2 on 60000] (Running)
Daemon Thread [IPC Server Responder] (Running)
ZK相关线程
Daemon Thread [main-EventThread] (Suspended)
Daemon Thread [main-SendThread(myhost:2181)] (Suspended)
后台线程
Daemon Thread [myhost,60000,1427458363875-BalancerChore] (Running)
Daemon Thread [myhost,60000,1427458363875-CatalogJanitor] (Running)
Daemon Thread [master-myhost,60000,1427458363875.archivedHFileCleaner] (Running)
Daemon Thread [master-myhost,60000,1427458363875.oldLogCleaner] (Running)
Daemon Thread [myhost,60000,1427458363875.splitLogManagerTimeoutMonitor] (Running)
Daemon Thread [myhost,60000,1427458363875.timerUpdater] (Running)
监控线程
Daemon Thread [Timer thread for monitoring hbase] (Running)
Daemon Thread [Timer thread for monitoring jvm] (Running)
Daemon Thread [Timer thread for monitoring rpc] (Running)
Daemon Thread [myhost,60000,1427458363875.timeoutMonitor] (Running)
jetty相关线程
Thread [1008881877@qtp-314160763-0] (Running)
timeoutMonitor(用于分配region)线程执行原理(AssignmentManager$TimeoutMonitor)
执行逻辑如下:
-
-
- AssignmentManager$TimeoutMonitor#chore() {
- for (RegionState regionState : regionsInTransition.values()) {
- if (regionState.getStamp() + timeout <= now) {
-
- actOnTimeOut(regionState);
- } else if (this.allRegionServersOffline && !allRSsOffline) {
- RegionPlan existingPlan = regionPlans.get(regionState.getRegion().getEncodedName());
- if (existingPlan == null || !this.serverManager.isServerOnline(existingPlan.getDestination())) {
- actOnTimeOut(regionState);
- }
- }
- }
- }
-
-
- AssignmentManager$TimeoutMonitor#actOnTimeOut() {
- HRegionInfo regionInfo = regionState.getRegion();
- switch (regionState.getState()) {
- case CLOSED:
- regionState.updateTimestampToNow();
- break;
- case OFFLINE:
- invokeAssign(regionInfo);
- break;
- case PENDING_OPEN:
- invokeAssign(regionInfo);
- break;
- case OPENING:
- processOpeningState(regionInfo);
- break;
- case OPEN:
- regionState.updateTimestampToNow();
- break;
- case PENDING_CLOSE:
- invokeUnassign(regionInfo);
- break;
- case CLOSING:
- invokeUnassign(regionInfo);
- break;
- }
-
-
-
-
-
-
- AssignmentManager#assign() {
- for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
- String tableName = region.getTableNameAsString();
- if (!zkTable.isEnablingTable(tableName) && !zkTable.isEnabledTable(tableName)) {
- setEnabledTable(region);
- }
- RegionOpeningState regionOpenState = ServerManager.sendRegionOpen();
- if (regionOpenState == RegionOpeningState.OPENED) {
- return;
- } else if (regionOpenState == RegionOpeningState.ALREADY_OPENED) {
- ZKAssign.deleteOfflineNode(master.getZooKeeper(), encodedRegionName);
- }
- }
- }
-
-
- AssignmentManager#unassign() {
- state = regionsInTransition.get(encodedName);
- if (state == null) {
- ZKAssign.createNodeClosing(master.getZooKeeper(), region, master.getServerName());
- } else if (force && (state.isPendingClose() || state.isClosing())) {
- state.update(state.getState());
- } else {
- return;
- }
- ServerName server = regions.get(region);
- if (server == null) {
- deleteClosingOrClosedNode(region);
- }
- ServerManager.sendRegionClose();
- }
CatalogJanitor线程(CatalogJanitor)
这个线程用于扫描split后残留的部分,比如split之后父region的META信息可以删除了
同样split之后,info:splitA和info:splitB这两个META表中的信息也可以删除了
主要逻辑如下:
-
-
- CatalogJanitor#scan() {
- Pair> pair = getSplitParents();
- Map splitParents = pair.getSecond();
- int cleaned = 0;
- for (Map.Entry e : splitParents.entrySet()) {
- if (!parentNotCleaned.contains(e.getKey().getEncodedName())) {
- cleanParent(e.getKey(), e.getValue());
- cleaned++;
- } else {
-
- parentNotCleaned.add(getDaughterRegionInfo("splitA");
- parentNotCleaned.add(getDaughterRegionInfo("splitB");
- }
- }
- }
-
-
-
-
- CatalogJanitor#cleanParent() {
- HRegionInfo a_region = getDaughterRegionInfo(rowContent, "splitA");
- HRegionInfo b_region = getDaughterRegionInfo(rowContent, "splitB");
- Pair a = checkDaughterInFs(parent, a_region, "splitA");
- Pair b = checkDaughterInFs(parent, b_region, "splitB");
- removeDaughtersFromParent(parent);
- FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
- HFileArchiver.archiveRegion(this.services.getConfiguration(), fs, parent);
- Delete delete = new Delete(regionInfo.getRegionName());
- deleteFromMetaTable(catalogTracker, delete);
- }
-
-
- CatalogJanitor#checkDaughterInFs() {
- FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
- Path rootdir = this.services.getMasterFileSystem().getRootDir();
- Path tabledir = new Path(rootdir, split.getTableNameAsString());
- Path regiondir = new Path(tabledir, split.getEncodedName());
- exists = fs.exists(regiondir);
- HTableDescriptor parentDescriptor = getTableDescriptor(parent.getTableName());
- for (HColumnDescriptor family: parentDescriptor.getFamilies()) {
- Path p = Store.getStoreHomedir(tabledir, split.getEncodedName(),family.getName());
- if (!fs.exists(p)) {
- continue;
- }
-
- FileStatus [] ps = FSUtils.listStatus(fs, p,
- new PathFilter () {
- public boolean accept(Path path) {
- return StoreFile.isReference(path);
- }
- });
- }
- }
-
-
-
- CatalogJanitor#removeDaughtersFromParent() [
- Delete delete = new Delete(parent.getRegionName());
- delete.deleteColumns("info","splitA");
- delete.deleteColumns("info","splitB");
- deleteFromMetaTable(catalogTracker, delete);
- }
BalancerChore线程(HMaster#balance)
这个类负责执行balance过程,具体逻辑如下:
-
-
-
- HMaster#balance() {
- Map>> assignmentsByTable =
- this.assignmentManager.getAssignmentsByTable();
- List plans = new ArrayList();
- for (Map> assignments : assignmentsByTable.values()) {
- List partialPlans = this.balancer.balanceCluster(assignments);
- if (partialPlans != null) {
- plans.addAll(partialPlans);
- }
- }
- for (RegionPlan plan: plans) {
- AssignmentManager.balance(plan);
- }
- }
-
-
-
- AssignmentManager#balance() {
- synchronized (this.regionPlans) {
- this.regionPlans.put(plan.getRegionName(), plan);
- }
- unassign(plan.getRegionInfo());
- }
archivedHFileCleaner线程(HFileCleaner#chore)
这个类用于删除archive目录下的归档文件,具体逻辑如下:
-
-
- HFileCleaner#chore() {
- FileStatus[] files = FSUtils.listStatus(this.fs, this.oldFileDir, null);
- for (FileStatus file : files) {
- if (file.isDir()) {
- checkAndDeleteDirectory(file.getPath());
- } else {
- checkAndDelete(file.getPath());
- }
- }
- }
-
-
- CleanerChore#checkAndDeleteDirectory() {
- FileStatus[] children = FSUtils.listStatus(fs, toCheck, null);
- HBaseFileSystem.deleteFileFromFileSystem(fs, toCheck);
- }
-
-
- CleanerChore#checkAndDelete() {
- HBaseFileSystem.deleteDirFromFileSystem(fs, filePath);
- }
oldLogCleaner线程(LogCleaner)
这个类用于oldlog目录下文件
具体执行逻辑和archivedHFileCleaner线程一样
都是调用父类CleanerChore#chore()函数去执行的
timerUpdater线程(AssignmentManager$TimerUpdater#chore)
这个类用于更新region的时间戳,这些region都是出于事务中的region
主要逻辑如下:
-
- AssignmentManager$TimerUpdater#chore() {
- while (!serversInUpdatingTimer.isEmpty() && !stopper.isStopped()) {
- if (serverToUpdateTimer == null) {
- serverToUpdateTimer = serversInUpdatingTimer.first();
- } else {
- serverToUpdateTimer = serversInUpdatingTimer.higher(serverToUpdateTimer);
- }
- updateTimers(serverToUpdateTimer);
- }
- }
-
-
-
- AssignmentManager#updateTimers() {
- for (Map.Entry e: copy.entrySet()) {
- rs = this.regionsInTransition.get(e.getKey());
- rs.updateTimestampToNow();
- }
- }
splitLogManagerTimeoutMonitor线程(SplitLogManager$TimeoutMonitor#chore)
这个类用于周期性的检查是否有执行超时的任务(获取ZK的split节点的任务,然后执行切分日志工作),如果有则
需要重新提交这个任务,如果出现region下线,server宕机等情况也需要重新提交,最后删除失败的任务
具体逻辑如下:
-
-
-
-
- SplitLogManager$TimeoutMonitor#chore() {
- for (Map.Entry e : tasks.entrySet()) {
- if (localDeadWorkers != null && localDeadWorkers.contains(cur_worker)) {
- if (resubmit(path, task, FORCE)) {
- resubmitted++;
- } else {
-
- handleDeadWorker(cur_worker);
- }
- } else if (resubmit(path, task, CHECK)) {
- resubmitted++;
- }
- }
- for (Map.Entry e : tasks.entrySet()) {
- String path = e.getKey();
- Task task = e.getValue();
- if (task.isUnassigned() && (task.status != FAILURE)) {
-
- tryGetDataSetWatch(path);
- }
- }
- createRescanNode(Long.MAX_VALUE);
-
-
- if (failedDeletions.size() > 0) {
- for (String tmpPath : tmpPaths) {
-
- deleteNode(tmpPath, zkretries);
- }
- }
- }
-
-
- SplitLogManager#deleteNode() {
- ZooKeeper.delete(path, -1, new DeleteAsyncCallback(),retries);
- }
参考
HMaster架构
master和regionserver启动过程