HMaster的整体结构
一个master包含如下部分:
1.对外的接口
RPC服务
jetty web服务
Master MBean
其中RPC服务包括了若干listener,reader,以及handler线程(IPC Handler和 用于replication的IPC Handler)
2.执行服务
都是一些线程池,当有任务出现时就就会交给这些类来处理
这些线程有
MASTER_SERVER_OPERATIONS
MASTER_META_SERVER_OPERATIONS
MASTER_CLOSE_REGION
MASTER_OPEN_REGION
MASTER_TABLE_OPERATIONS
相关的hanlder有:
OpenRegionHandler
ClosedRegionHandler
ServerShutdownHandler
MetaServerShutdownHandler
DeleteTableHandler
DisableTableHandler
EnableTableHandler
ModifyTableHandler
CreateTableHandler
Executor Service |
Event | Event Handler | Threads (Default) |
Master Open Region |
RS_ZK_REGION_OPENED | OpenRegionHandler | 5 |
Master Close Region |
RS_ZK_REGION_CLOSED | ClosedRegionHandler | 5 |
Master Server Operations |
RS_ZK_REGION_SPLIT M_SERVER_SHUTDOWN | SplitRegionHandler ServerShutdownHandler | 3 |
Master Meta Server Operations |
M_META_SERVER_SHUTDOWN | MetaServerShutdownHandler | 5 |
Master Table Operations |
C_M_DELETE_TABLE C_M_DISABLE_TABLE C_M_ENABLE_TABLE C_M_MODIFY_TABLE C_M_CREATE_TABLE | DeleteTableHandler DisableTableHandler EnableTableHandler ModifyTableHandler CreateTableHandler | 1
|
3.和zookeeper相关的线程
1.ActiveMasterManager
会在ZK中创建/hbase/master短暂节点,master将其信息记录到这个节点下
如果是备份的master会在这里阻塞,直到这个节点为空
2.RegionServerTracker
用于监控region server,通过监控ZK的/hbase/rs节点,获取region server的状态
当region server上线或者下线,ZK都会触发通知事件
3.DrainingServerTracker
没太明白,貌似是处理RS增加和删除事件用的
4.CatalogTracker
用来监控META表和ROOT表
5.ClusterStatusTracker
用于监控ZK的/shutdown节点,监控是否有机器宕机了
6.AssignmentManager
用于管理和分配region的
7.RootRegionTracker
用于管理和监控/root-region-server 节点的
8.LoadBalancer
用于平衡各个regoin server上的region
9.MetaNodeTracker
监控/unassigned 节点,分配那些未在META表中存在的region
此外在 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher类中还负责管理一些ZK节点
baseZNode /hbase
assignmentZNode /unassigned
rsZNode /rs
drainingZNode /draining
masterTableZNode /table
masterTableZNode92 /table92 (用于hbase0.92版本)
splitLogZNode /splitlog
backupMasterAddressesZNode /backup-masters
clusterStateZNode /shutdown
masterAddressZNode /master
clusterIdZNode /hbaseid
ZK监听相关的类图
4.文件接口和其他
MasterFileSystem
用于创建META表和ROOT表,.oldlog目录,hbase.version文件等
LogCleaner
用于定期的清理.oldlog目录中的内容
HFileCleaner
用于定期清理归档目录下的内容
其他包括后台线程如LogCleaner和HFileCleaner等
ServerManager 维护一个在线和下线的RS列表
Balancer 用于执行region均衡的后台线程
HMaster的相关配置
参数名称 | 默认值 | 含义 |
hbase.master.handler.count | 25 | 工作线程大小 |
hbase.master.buffer.for.rs.fatals | 1M | |
mapred.task.id | ||
hbase.master.wait.for.log.splitting | false | |
zookeeper.session.timeout | 180秒 | |
hbase.master.backup | ||
hbase.master.impl | ||
hbase.master.event.waiting.time | 1000 |
HMaster的启动入口类
org.apache.hadoop.hbase.master.HMaster
hbase-site.xml中可以配置参数 hbase.master.impl来自定自己的实现,但必须继承HMaster
之后调用HMasterCommandLine (这个类继承自ServerCommandLine)
HMasterCommandLine使用hadoop提供的ToolRunner去运行
ToolRunner#run(Configuration,Tool,String[])
ToolRunner会调用GenericOptionsParser,解析一些固定的参数,如-conf,-D,-fs,-files 这样的参数
解析好之后,配置configuration对象,然后将启动参数传给Tool接口的实现
所以ToolRunner 就是一个启动参数解析,配置configuration对象的工具类,然后将这些信息交给Tool实现类
调用顺序是
1.HMaster#main()
2.HMasterCommandLine#doMain()
3.ToolRunner#run()
4.HMasterCommandLine#run()
5.HMasterCommandLine#startMaster()
6.HMaster#constructMaster()
7.反射调用HMaster的构造函数
初始化-调用HRgionServer构造函数
1.配置host,NDS相关
2.配置RPC连接,创建RPC连接
3.初始化ZK认证
4.创建ZooKeeperWatcher(和ZK相关的线程),RPC服务,metrics
5.创建HealthCheckChore
6.配置splitlog相关
启动,HMaster#run (在新线程中启动)
//将当前的master变成active状态(如果是备份master则一直等待)
//完成初始化
HMaster#run() {
becomeActiveMaster(startupStatus);
finishInitialization(startupStatus, false);
}
//如果当前的master不是活跃的则一直等待
HMaster#becomeActiveMaster() {
this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,this);
this.zooKeeper.registerListener(activeMasterManager);
while (!amm.isActiveMaster()) {
Thread.sleep(c.getInt("zookeeper.session.timeout", 180 * 1000));
}
this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
this.clusterStatusTracker.start();
return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,this.clusterStatusTracker);
}
//初始化master组件,文件系统,ServerManager
//AssignmentManager,RegionServerTracker,CatalogTracker等
//设置Zookeeper的集群状态
//等待RegionServer的检查完毕
//如果.log目录下有文件,则执行split log任务
//分配ROOT和META的region
//处理可以运行的RegionServer和宕机的RegionServer
HMaster#finishInitialization() {
//检查ROOT和META表是否存在,不存在则创建,还会创建tmp目录,oldlog目录
fileSystemManager = new MasterFileSystem();
tableDescriptors = new FSTableDescriptors(fileSystemManager.getFileSystem(),fileSystemManager.getRootDir());
//创建CatalogTracker,LoadBalancer,AssignmentManager
//RegionServerTracker,DrainingServerTracker
//ClusterStatusTracker,SnapshotManager
initializeZKBasedSystemTrackers();
//开启service线程,如openregion线程,closeregion线程,serveroptions线程等
//再开启jetty服务和RPC服务
startServiceThreads();
//将所有的RegionServer加入到ServerManager中,ServerManager负责管理
//所有在线宕机的server,并负责启动和关闭
for (ServerName sn: regionServerTracker.getOnlineServers()) {
ServerManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD);
}
//如果有log日志则进行预处理然后挂到ZK上,再由所有RS处理
if (waitingOnLogSplitting) {
fileSystemManager.splitAllLogs(servers);
}
//如果ROOT表和META为分配则先分配
assignRoot();
assignMeta();
enableServerShutdownHandler();
//处理所有宕机的server
for (ServerName curServer : failedServers) {
serverManager.expireServer(curServer);
}
DefaultLoadBalancer.setMasterServices();
startCatalogJanitorChore();
registerMBean();
}
HMaster#assignRoot() {
//先看一下分区正在转换状态当中,
//如果处于转换状态当中则先处理相关的状态,并等待体处理结束后再往下进行
processRegionInTransitionAndBlockUntilAssigned();
verifyRootRegionLocation();
getRootLocation();
expireIfOnline();
//先删掉"/hbase/root-region-server",不管它存不存在
//KeeperException.NoNodeException被忽略了
//写入EventType.M_ZK_REGION_OFFLINE、当前时间戳、跟分区名(-ROOT-,,0)
//master的版本化ServerName
//到/hbase/unassigned/70236052, payload为null,所以不写入
}
HMaster#run的时序图如下
HMaster包含的一些变量
InfoServer
ZooKeeperWatcher
ActiveMasterManager
RegionServerTracker
DrainingServerTracker
RPCServer
MasterMetrics
MasterFileSystem
ServerManager
AssignmentManager
CatalogTracker
ClusterStatusTracker
CatalogJanitor
LogCleaner
HFileCleaner
TableDescriptors
SnapshotManager
HealthCheckChore
HMaster的线程
RPC相关的的listener线程,reader线程,handler线程
Daemon Thread [IPC Server listener on 60000] (Suspended)
Daemon Thread [IPC Reader 3 on port 60000] (Suspended)
Daemon Thread [IPC Server handler 0 on 60000] (Suspended)
Daemon Thread [REPL IPC Server handler 2 on 60000] (Running)
Daemon Thread [IPC Server Responder] (Running)
ZK相关线程
Daemon Thread [main-EventThread] (Suspended)
Daemon Thread [main-SendThread(myhost:2181)] (Suspended)
后台线程
Daemon Thread [myhost,60000,1427458363875-BalancerChore] (Running)
Daemon Thread [myhost,60000,1427458363875-CatalogJanitor] (Running)
Daemon Thread [master-myhost,60000,1427458363875.archivedHFileCleaner] (Running)
Daemon Thread [master-myhost,60000,1427458363875.oldLogCleaner] (Running)
Daemon Thread [myhost,60000,1427458363875.splitLogManagerTimeoutMonitor] (Running)
Daemon Thread [myhost,60000,1427458363875.timerUpdater] (Running)
监控线程
Daemon Thread [Timer thread for monitoring hbase] (Running)
Daemon Thread [Timer thread for monitoring jvm] (Running)
Daemon Thread [Timer thread for monitoring rpc] (Running)
Daemon Thread [myhost,60000,1427458363875.timeoutMonitor] (Running)
jetty相关线程
Thread [1008881877@qtp-314160763-0] (Running)
timeoutMonitor(用于分配region)线程执行原理(AssignmentManager$TimeoutMonitor)
执行逻辑如下:
//在独立的线程中运行
//从Chore#run()函数调到这里的
AssignmentManager$TimeoutMonitor#chore() {
for (RegionState regionState : regionsInTransition.values()) {
if (regionState.getStamp() + timeout <= now) {
//decide on action upon timeout
actOnTimeOut(regionState);
} else if (this.allRegionServersOffline && !allRSsOffline) {
RegionPlan existingPlan = regionPlans.get(regionState.getRegion().getEncodedName());
if (existingPlan == null || !this.serverManager.isServerOnline(existingPlan.getDestination())) {
actOnTimeOut(regionState);
}
}
}
}
//判断当前region的状态,如果下线了则分配
AssignmentManager$TimeoutMonitor#actOnTimeOut() {
HRegionInfo regionInfo = regionState.getRegion();
switch (regionState.getState()) {
case CLOSED:
regionState.updateTimestampToNow();
break;
case OFFLINE:
invokeAssign(regionInfo);
break;
case PENDING_OPEN:
invokeAssign(regionInfo);
break;
case OPENING:
processOpeningState(regionInfo);
break;
case OPEN:
regionState.updateTimestampToNow();
break;
case PENDING_CLOSE:
invokeUnassign(regionInfo);
break;
case CLOSING:
invokeUnassign(regionInfo);
break;
}
//通过AssignCallable#call()调用
//分配region,先修改ZK的znode信息
//然后调用sendRegionOpen(),这里会触发HRegionServer#openRegion()函数
//最后创建OpenRegionHandler放到线程池中执行,
//再调用HRegion#openRegion()函数
AssignmentManager#assign() {
for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
String tableName = region.getTableNameAsString();
if (!zkTable.isEnablingTable(tableName) && !zkTable.isEnabledTable(tableName)) {
setEnabledTable(region);
}
RegionOpeningState regionOpenState = ServerManager.sendRegionOpen();
if (regionOpenState == RegionOpeningState.OPENED) {
return;
} else if (regionOpenState == RegionOpeningState.ALREADY_OPENED) {
ZKAssign.deleteOfflineNode(master.getZooKeeper(), encodedRegionName);
}
}
}
//处理未分配的region,将其关闭
AssignmentManager#unassign() {
state = regionsInTransition.get(encodedName);
if (state == null) {
ZKAssign.createNodeClosing(master.getZooKeeper(), region, master.getServerName());
} else if (force && (state.isPendingClose() || state.isClosing())) {
state.update(state.getState());
} else {
return;
}
ServerName server = regions.get(region);
if (server == null) {
deleteClosingOrClosedNode(region);
}
ServerManager.sendRegionClose();
}
CatalogJanitor线程(CatalogJanitor)
这个线程用于扫描split后残留的部分,比如split之后父region的META信息可以删除了
同样split之后,info:splitA和info:splitB这两个META表中的信息也可以删除了
主要逻辑如下:
//在独立的线程中运行
//从Chore#run()函数调到这里的
CatalogJanitor#scan() {
Pair> pair = getSplitParents();
Map splitParents = pair.getSecond();
int cleaned = 0;
for (Map.Entry e : splitParents.entrySet()) {
if (!parentNotCleaned.contains(e.getKey().getEncodedName())) {
cleanParent(e.getKey(), e.getValue());
cleaned++;
} else {
//info:splitA 和 info:splitB 列
parentNotCleaned.add(getDaughterRegionInfo("splitA");
parentNotCleaned.add(getDaughterRegionInfo("splitB");
}
}
}
//如果分割之后的splitA和splitB两个新region不再引用
//父region,则将父region删除
//最后创建Delete对象删除父对象,再将其从META表中删除
CatalogJanitor#cleanParent() {
HRegionInfo a_region = getDaughterRegionInfo(rowContent, "splitA");
HRegionInfo b_region = getDaughterRegionInfo(rowContent, "splitB");
Pair a = checkDaughterInFs(parent, a_region, "splitA");
Pair b = checkDaughterInFs(parent, b_region, "splitB");
removeDaughtersFromParent(parent);
FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
HFileArchiver.archiveRegion(this.services.getConfiguration(), fs, parent);
Delete delete = new Delete(regionInfo.getRegionName());
deleteFromMetaTable(catalogTracker, delete);
}
//检查splitA和splitB两个新region是否还引用父region
CatalogJanitor#checkDaughterInFs() {
FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
Path rootdir = this.services.getMasterFileSystem().getRootDir();
Path tabledir = new Path(rootdir, split.getTableNameAsString());
Path regiondir = new Path(tabledir, split.getEncodedName());
exists = fs.exists(regiondir);
HTableDescriptor parentDescriptor = getTableDescriptor(parent.getTableName());
for (HColumnDescriptor family: parentDescriptor.getFamilies()) {
Path p = Store.getStoreHomedir(tabledir, split.getEncodedName(),family.getName());
if (!fs.exists(p)) {
continue;
}
// Look for reference files. Call listStatus with anonymous instance of PathFilter.
FileStatus [] ps = FSUtils.listStatus(fs, p,
new PathFilter () {
public boolean accept(Path path) {
return StoreFile.isReference(path);
}
});
}
}
//创建Delete对象,将META表中的splitA和splitB
//这些在split时候创建的已经无用的列删除
CatalogJanitor#removeDaughtersFromParent() [
Delete delete = new Delete(parent.getRegionName());
delete.deleteColumns("info","splitA");
delete.deleteColumns("info","splitB");
deleteFromMetaTable(catalogTracker, delete);
}
BalancerChore线程(HMaster#balance)
这个类负责执行balance过程,具体逻辑如下:
//在单独线程中执行,通过HMaster$2#run()调用到这里的
//收集所有的region然后执行balance()
//具体细节没看明白
HMaster#balance() {
Map>> assignmentsByTable =
this.assignmentManager.getAssignmentsByTable();
List plans = new ArrayList();
for (Map> assignments : assignmentsByTable.values()) {
List partialPlans = this.balancer.balanceCluster(assignments);
if (partialPlans != null) {
plans.addAll(partialPlans);
}
}
for (RegionPlan plan: plans) {
AssignmentManager.balance(plan);
}
}
//执行balance过程,将待执行的region放到map中
//最后执行unassign()函数没看懂
AssignmentManager#balance() {
synchronized (this.regionPlans) {
this.regionPlans.put(plan.getRegionName(), plan);
}
unassign(plan.getRegionInfo());
}
archivedHFileCleaner线程(HFileCleaner#chore)
这个类用于删除archive目录下的归档文件,具体逻辑如下:
//这里是调用父类CleanerChore#chore()函数
//用来清理.archive目录下的归档文件
HFileCleaner#chore() {
FileStatus[] files = FSUtils.listStatus(this.fs, this.oldFileDir, null);
for (FileStatus file : files) {
if (file.isDir()) {
checkAndDeleteDirectory(file.getPath());
} else {
checkAndDelete(file.getPath());
}
}
}
//检查并删除目录
CleanerChore#checkAndDeleteDirectory() {
FileStatus[] children = FSUtils.listStatus(fs, toCheck, null);
HBaseFileSystem.deleteFileFromFileSystem(fs, toCheck);
}
//检查并删除文件
CleanerChore#checkAndDelete() {
HBaseFileSystem.deleteDirFromFileSystem(fs, filePath);
}
oldLogCleaner线程(LogCleaner)
这个类用于oldlog目录下文件
具体执行逻辑和archivedHFileCleaner线程一样
都是调用父类CleanerChore#chore()函数去执行的
timerUpdater线程(AssignmentManager$TimerUpdater#chore)
这个类用于更新region的时间戳,这些region都是出于事务中的region
主要逻辑如下:
//在单独线程中执行,通过Chore#run()调用到这里的
AssignmentManager$TimerUpdater#chore() {
while (!serversInUpdatingTimer.isEmpty() && !stopper.isStopped()) {
if (serverToUpdateTimer == null) {
serverToUpdateTimer = serversInUpdatingTimer.first();
} else {
serverToUpdateTimer = serversInUpdatingTimer.higher(serverToUpdateTimer);
}
updateTimers(serverToUpdateTimer);
}
}
//更新处于事务中的region的时间戳
//这里会迭代所有机器,然后更新每个机器上的region
AssignmentManager#updateTimers() {
for (Map.Entry e: copy.entrySet()) {
rs = this.regionsInTransition.get(e.getKey());
rs.updateTimestampToNow();
}
}
splitLogManagerTimeoutMonitor线程(SplitLogManager$TimeoutMonitor#chore)
这个类用于周期性的检查是否有执行超时的任务(获取ZK的split节点的任务,然后执行切分日志工作),如果有则
需要重新提交这个任务,如果出现region下线,server宕机等情况也需要重新提交,最后删除失败的任务
具体逻辑如下:
//在单独线程中执行,通过Chore#run()调用到这里的
//周期性的检查是否有处理splitlog超时的region,或者
//出现某些region下线了,这时候需要重新提交splitlog
//最后将失败的任务删除掉
SplitLogManager$TimeoutMonitor#chore() {
for (Map.Entry e : tasks.entrySet()) {
if (localDeadWorkers != null && localDeadWorkers.contains(cur_worker)) {
if (resubmit(path, task, FORCE)) {
resubmitted++;
} else {
//将死掉的工作regoin server放入列表中
handleDeadWorker(cur_worker);
}
} else if (resubmit(path, task, CHECK)) {
resubmitted++;
}
}
for (Map.Entry e : tasks.entrySet()) {
String path = e.getKey();
Task task = e.getValue();
if (task.isUnassigned() && (task.status != FAILURE)) {
// We just touch the znode to make sure its still there
tryGetDataSetWatch(path);
}
}
createRescanNode(Long.MAX_VALUE);
// Retry previously failed deletes
if (failedDeletions.size() > 0) {
for (String tmpPath : tmpPaths) {
// deleteNode is an async call
deleteNode(tmpPath, zkretries);
}
}
}
//异步删除节点
SplitLogManager#deleteNode() {
ZooKeeper.delete(path, -1, new DeleteAsyncCallback(),retries);
}
参考
HMaster架构
master和regionserver启动过程