Hbase-HMaster架构

 

HMaster的整体结构


一个master包含如下部分:

1.对外的接口

   RPC服务

   jetty web服务

   Master MBean

  其中RPC服务包括了若干listener,reader,以及handler线程(IPC Handler和 用于replication的IPC Handler)

2.执行服务

都是一些线程池,当有任务出现时就就会交给这些类来处理

这些线程有

MASTER_SERVER_OPERATIONS

MASTER_META_SERVER_OPERATIONS

MASTER_CLOSE_REGION

MASTER_OPEN_REGION

MASTER_TABLE_OPERATIONS

相关的hanlder有:

OpenRegionHandler

ClosedRegionHandler

ServerShutdownHandler

MetaServerShutdownHandler

DeleteTableHandler

DisableTableHandler

EnableTableHandler

ModifyTableHandler 

CreateTableHandler 

 

Executor Service

Event Event Handler

Threads

(Default)

Master Open Region
RS_ZK_REGION_OPENED
OpenRegionHandler
5
Master Close Region
RS_ZK_REGION_CLOSED
ClosedRegionHandler
5
Master Server Operations
 
RS_ZK_REGION_SPLIT
M_SERVER_SHUTDOWN
SplitRegionHandler
ServerShutdownHandler
3
Master Meta Server Operations
M_META_SERVER_SHUTDOWN
MetaServerShutdownHandler
5
Master Table Operations
 
C_M_DELETE_TABLE C_M_DISABLE_TABLE C_M_ENABLE_TABLE C_M_MODIFY_TABLE C_M_CREATE_TABLE
DeleteTableHandler DisableTableHandler EnableTableHandler ModifyTableHandler CreateTableHandler
1

 

3.和zookeeper相关的线程

1.ActiveMasterManager
会在ZK中创建/hbase/master短暂节点,master将其信息记录到这个节点下
如果是备份的master会在这里阻塞,直到这个节点为空

2.RegionServerTracker
用于监控region server,通过监控ZK的/hbase/rs节点,获取region server的状态
当region server上线或者下线,ZK都会触发通知事件

3.DrainingServerTracker
没太明白,貌似是处理RS增加和删除事件用的

4.CatalogTracker
用来监控META表和ROOT表

5.ClusterStatusTracker
用于监控ZK的/shutdown节点,监控是否有机器宕机了

6.AssignmentManager
用于管理和分配region的

7.RootRegionTracker
用于管理和监控/root-region-server 节点的

8.LoadBalancer
用于平衡各个regoin server上的region

9.MetaNodeTracker
监控/unassigned 节点,分配那些未在META表中存在的region

此外在 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher类中还负责管理一些ZK节点
baseZNode			/hbase
assignmentZNode			/unassigned
rsZNode				/rs
drainingZNode			/draining
masterTableZNode		/table
masterTableZNode92		/table92	(用于hbase0.92版本)
splitLogZNode			/splitlog
backupMasterAddressesZNode	/backup-masters
clusterStateZNode		/shutdown
masterAddressZNode		/master
clusterIdZNode         		/hbaseid

 

ZK监听相关的类图


4.文件接口和其他

MasterFileSystem

用于创建META表和ROOT表,.oldlog目录,hbase.version文件等

 

LogCleaner

用于定期的清理.oldlog目录中的内容

 

HFileCleaner

用于定期清理归档目录下的内容

 

其他包括后台线程如LogCleaner和HFileCleaner等

ServerManager 维护一个在线和下线的RS列表

Balancer 用于执行region均衡的后台线程

 

 

 

 

 

HMaster的相关配置

参数名称 默认值 含义
hbase.master.handler.count 25 工作线程大小
hbase.master.buffer.for.rs.fatals 1M  
mapred.task.id    
hbase.master.wait.for.log.splitting false  
zookeeper.session.timeout 180秒  
hbase.master.backup    
hbase.master.impl    
hbase.master.event.waiting.time 1000  

 

 

 

 

 

HMaster的启动入口类

org.apache.hadoop.hbase.master.HMaster

hbase-site.xml中可以配置参数 hbase.master.impl来自定自己的实现,但必须继承HMaster

之后调用HMasterCommandLine (这个类继承自ServerCommandLine)

HMasterCommandLine使用hadoop提供的ToolRunner去运行

ToolRunner#run(Configuration,Tool,String[])

ToolRunner会调用GenericOptionsParser,解析一些固定的参数,如-conf,-D,-fs,-files 这样的参数

解析好之后,配置configuration对象,然后将启动参数传给Tool接口的实现

所以ToolRunner 就是一个启动参数解析,配置configuration对象的工具类,然后将这些信息交给Tool实现类

 

调用顺序是

1.HMaster#main()

2.HMasterCommandLine#doMain()

3.ToolRunner#run()

4.HMasterCommandLine#run()

5.HMasterCommandLine#startMaster()

6.HMaster#constructMaster()

7.反射调用HMaster的构造函数


 

 

 

 

 

初始化-调用HRgionServer构造函数

1.配置host,NDS相关

2.配置RPC连接,创建RPC连接

3.初始化ZK认证

4.创建ZooKeeperWatcher(和ZK相关的线程),RPC服务,metrics

5.创建HealthCheckChore

6.配置splitlog相关

 

 

 

 

 

启动,HMaster#run (在新线程中启动)

//将当前的master变成active状态(如果是备份master则一直等待)
//完成初始化
HMaster#run() {
	becomeActiveMaster(startupStatus);
	finishInitialization(startupStatus, false);	
}


//如果当前的master不是活跃的则一直等待
HMaster#becomeActiveMaster() {
    this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,this);
    this.zooKeeper.registerListener(activeMasterManager);
    while (!amm.isActiveMaster()) {
    	Thread.sleep(c.getInt("zookeeper.session.timeout", 180 * 1000));	
    }
    this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
    this.clusterStatusTracker.start();
    return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus,this.clusterStatusTracker);
}

//初始化master组件,文件系统,ServerManager
//AssignmentManager,RegionServerTracker,CatalogTracker等
//设置Zookeeper的集群状态
//等待RegionServer的检查完毕
//如果.log目录下有文件,则执行split log任务
//分配ROOT和META的region
//处理可以运行的RegionServer和宕机的RegionServer
HMaster#finishInitialization() {
	//检查ROOT和META表是否存在,不存在则创建,还会创建tmp目录,oldlog目录
	fileSystemManager = new MasterFileSystem();
	tableDescriptors = new FSTableDescriptors(fileSystemManager.getFileSystem(),fileSystemManager.getRootDir());
	
	//创建CatalogTracker,LoadBalancer,AssignmentManager
	//RegionServerTracker,DrainingServerTracker
	//ClusterStatusTracker,SnapshotManager
	initializeZKBasedSystemTrackers();
	
	//开启service线程,如openregion线程,closeregion线程,serveroptions线程等
	//再开启jetty服务和RPC服务
	startServiceThreads();
	
	//将所有的RegionServer加入到ServerManager中,ServerManager负责管理
	//所有在线宕机的server,并负责启动和关闭
	for (ServerName sn: regionServerTracker.getOnlineServers()) {
		ServerManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD);
	}
	
	//如果有log日志则进行预处理然后挂到ZK上,再由所有RS处理
	if (waitingOnLogSplitting) {
		fileSystemManager.splitAllLogs(servers);	
	}
	
	//如果ROOT表和META为分配则先分配
	assignRoot();
	assignMeta();
	enableServerShutdownHandler();
	
	//处理所有宕机的server
	for (ServerName curServer : failedServers) {
		serverManager.expireServer(curServer);
    }	
    DefaultLoadBalancer.setMasterServices();
	startCatalogJanitorChore();
	registerMBean();
}



HMaster#assignRoot() {
	//先看一下分区正在转换状态当中,
	//如果处于转换状态当中则先处理相关的状态,并等待体处理结束后再往下进行
	processRegionInTransitionAndBlockUntilAssigned();
	verifyRootRegionLocation();
	getRootLocation();
	expireIfOnline();
	//先删掉"/hbase/root-region-server",不管它存不存在
	//KeeperException.NoNodeException被忽略了  
	//写入EventType.M_ZK_REGION_OFFLINE、当前时间戳、跟分区名(-ROOT-,,0)
	//master的版本化ServerName  
	//到/hbase/unassigned/70236052, payload为null,所以不写入 
}

HMaster#run的时序图如下


 

 

 

 

 

HMaster包含的一些变量

InfoServer

ZooKeeperWatcher

ActiveMasterManager

RegionServerTracker

DrainingServerTracker

RPCServer

MasterMetrics

MasterFileSystem

ServerManager

AssignmentManager

CatalogTracker

ClusterStatusTracker

CatalogJanitor

LogCleaner

HFileCleaner

TableDescriptors

SnapshotManager

HealthCheckChore

 

 

 

 

 

HMaster的线程

RPC相关的的listener线程,reader线程,handler线程

Daemon Thread [IPC Server listener on 60000] (Suspended)

Daemon Thread [IPC Reader 3 on port 60000] (Suspended)

Daemon Thread [IPC Server handler 0 on 60000] (Suspended)

Daemon Thread [REPL IPC Server handler 2 on 60000] (Running)

Daemon Thread [IPC Server Responder] (Running)

 

ZK相关线程

Daemon Thread [main-EventThread] (Suspended)

Daemon Thread [main-SendThread(myhost:2181)] (Suspended) 

 

后台线程

Daemon Thread [myhost,60000,1427458363875-BalancerChore] (Running)

Daemon Thread [myhost,60000,1427458363875-CatalogJanitor] (Running)

Daemon Thread [master-myhost,60000,1427458363875.archivedHFileCleaner] (Running)

Daemon Thread [master-myhost,60000,1427458363875.oldLogCleaner] (Running)

Daemon Thread [myhost,60000,1427458363875.splitLogManagerTimeoutMonitor] (Running)

Daemon Thread [myhost,60000,1427458363875.timerUpdater] (Running)

 

监控线程

Daemon Thread [Timer thread for monitoring hbase] (Running)

Daemon Thread [Timer thread for monitoring jvm] (Running)

Daemon Thread [Timer thread for monitoring rpc] (Running)

Daemon Thread [myhost,60000,1427458363875.timeoutMonitor] (Running)

 

 

jetty相关线程

Thread [1008881877@qtp-314160763-0] (Running)

 

 

 

 

 

timeoutMonitor(用于分配region)线程执行原理(AssignmentManager$TimeoutMonitor)


执行逻辑如下:

//在独立的线程中运行
//从Chore#run()函数调到这里的
AssignmentManager$TimeoutMonitor#chore() {
	for (RegionState regionState : regionsInTransition.values()) {
		if (regionState.getStamp() + timeout <= now) {
			//decide on action upon timeout
            actOnTimeOut(regionState);
		} else if (this.allRegionServersOffline && !allRSsOffline) {
			RegionPlan existingPlan = regionPlans.get(regionState.getRegion().getEncodedName());
			if (existingPlan == null || !this.serverManager.isServerOnline(existingPlan.getDestination())) {
				actOnTimeOut(regionState);
			}
		}
	}
}

//判断当前region的状态,如果下线了则分配
AssignmentManager$TimeoutMonitor#actOnTimeOut() {
	HRegionInfo regionInfo = regionState.getRegion();
	switch (regionState.getState()) {
	case CLOSED:
		regionState.updateTimestampToNow();
		break;
	case OFFLINE:
		invokeAssign(regionInfo);
        break;			
	case PENDING_OPEN:
        invokeAssign(regionInfo);
        break;
	case OPENING:
        processOpeningState(regionInfo);
        break;        	                
	case OPEN:
		regionState.updateTimestampToNow();
		break;
	case PENDING_CLOSE:
		invokeUnassign(regionInfo);
		break;
	case CLOSING:
		invokeUnassign(regionInfo);
		break;		
}

//通过AssignCallable#call()调用
//分配region,先修改ZK的znode信息
//然后调用sendRegionOpen(),这里会触发HRegionServer#openRegion()函数
//最后创建OpenRegionHandler放到线程池中执行,
//再调用HRegion#openRegion()函数
AssignmentManager#assign() {
	for (int i = 0; i < this.maximumAssignmentAttempts; i++) {
		String tableName = region.getTableNameAsString();
		if (!zkTable.isEnablingTable(tableName) && !zkTable.isEnabledTable(tableName)) {
			setEnabledTable(region);				
		}	
		RegionOpeningState regionOpenState = ServerManager.sendRegionOpen();	
		if (regionOpenState == RegionOpeningState.OPENED) {
			return;	
		} else if (regionOpenState == RegionOpeningState.ALREADY_OPENED) {
			ZKAssign.deleteOfflineNode(master.getZooKeeper(), encodedRegionName);	
		}
	}
}

//处理未分配的region,将其关闭
AssignmentManager#unassign() {
	state = regionsInTransition.get(encodedName);
	if (state == null) {
		ZKAssign.createNodeClosing(master.getZooKeeper(), region, master.getServerName());	
	} else if (force && (state.isPendingClose() || state.isClosing())) {
		state.update(state.getState());	
	} else {
		return;	
	}
	ServerName server = regions.get(region);
	if (server == null) {
		deleteClosingOrClosedNode(region);	
	}
	ServerManager.sendRegionClose();
}

 

 

 

 

 

CatalogJanitor线程(CatalogJanitor)

这个线程用于扫描split后残留的部分,比如split之后父region的META信息可以删除了

同样split之后,info:splitA和info:splitB这两个META表中的信息也可以删除了

主要逻辑如下:

//在独立的线程中运行
//从Chore#run()函数调到这里的
CatalogJanitor#scan() {
	Pair> pair = getSplitParents();
    Map splitParents = pair.getSecond();
    int cleaned = 0;
    for (Map.Entry e : splitParents.entrySet()) {
    	if (!parentNotCleaned.contains(e.getKey().getEncodedName())) {
    		cleanParent(e.getKey(), e.getValue());
    		cleaned++;    			
    	} else {
    		//info:splitA 和 info:splitB 列
    		parentNotCleaned.add(getDaughterRegionInfo("splitA");
    		parentNotCleaned.add(getDaughterRegionInfo("splitB");	
    	}
    }
}

//如果分割之后的splitA和splitB两个新region不再引用
//父region,则将父region删除
//最后创建Delete对象删除父对象,再将其从META表中删除
CatalogJanitor#cleanParent() {
	HRegionInfo a_region = getDaughterRegionInfo(rowContent, "splitA");
    HRegionInfo b_region = getDaughterRegionInfo(rowContent, "splitB");
    Pair a = checkDaughterInFs(parent, a_region, "splitA");
    Pair b = checkDaughterInFs(parent, b_region, "splitB");
    removeDaughtersFromParent(parent);
    FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
	HFileArchiver.archiveRegion(this.services.getConfiguration(), fs, parent);
	Delete delete = new Delete(regionInfo.getRegionName());
    deleteFromMetaTable(catalogTracker, delete);
}

//检查splitA和splitB两个新region是否还引用父region
CatalogJanitor#checkDaughterInFs() {
	FileSystem fs = this.services.getMasterFileSystem().getFileSystem();
    Path rootdir = this.services.getMasterFileSystem().getRootDir();
    Path tabledir = new Path(rootdir, split.getTableNameAsString());
    Path regiondir = new Path(tabledir, split.getEncodedName());
    exists = fs.exists(regiondir);	
    HTableDescriptor parentDescriptor = getTableDescriptor(parent.getTableName());
	for (HColumnDescriptor family: parentDescriptor.getFamilies()) {
		Path p = Store.getStoreHomedir(tabledir, split.getEncodedName(),family.getName());
      	if (!fs.exists(p)) {
			continue;	
		}
		// Look for reference files.  Call listStatus with anonymous instance of PathFilter.
      	FileStatus [] ps = FSUtils.listStatus(fs, p,
		new PathFilter () {
			public boolean accept(Path path) {
				return StoreFile.isReference(path);
            }
		});		
	}
}

//创建Delete对象,将META表中的splitA和splitB
//这些在split时候创建的已经无用的列删除
CatalogJanitor#removeDaughtersFromParent() [
	Delete delete = new Delete(parent.getRegionName());
    delete.deleteColumns("info","splitA");
    delete.deleteColumns("info","splitB");
    deleteFromMetaTable(catalogTracker, delete);
}

 

 

 

 

 

BalancerChore线程(HMaster#balance)

这个类负责执行balance过程,具体逻辑如下:

//在单独线程中执行,通过HMaster$2#run()调用到这里的
//收集所有的region然后执行balance()
//具体细节没看明白
HMaster#balance() {
	Map>> assignmentsByTable =
        this.assignmentManager.getAssignmentsByTable();	
 	List plans = new ArrayList();
	for (Map> assignments : assignmentsByTable.values()) {
		List partialPlans = this.balancer.balanceCluster(assignments);
        if (partialPlans != null) {
        	plans.addAll(partialPlans);
        }
	}
	for (RegionPlan plan: plans) {
		AssignmentManager.balance(plan);
	}        
}

//执行balance过程,将待执行的region放到map中
//最后执行unassign()函数没看懂
AssignmentManager#balance() {
	synchronized (this.regionPlans) {
		this.regionPlans.put(plan.getRegionName(), plan);
    }
    unassign(plan.getRegionInfo());	
}

 

 

 

 

 

archivedHFileCleaner线程(HFileCleaner#chore)

这个类用于删除archive目录下的归档文件,具体逻辑如下:

//这里是调用父类CleanerChore#chore()函数
//用来清理.archive目录下的归档文件
HFileCleaner#chore() {
	FileStatus[] files = FSUtils.listStatus(this.fs, this.oldFileDir, null);
	for (FileStatus file : files) {
		if (file.isDir()) {
			checkAndDeleteDirectory(file.getPath());
		} else {
			checkAndDelete(file.getPath());
		}
	}
}

//检查并删除目录
CleanerChore#checkAndDeleteDirectory() {
	FileStatus[] children = FSUtils.listStatus(fs, toCheck, null);
	HBaseFileSystem.deleteFileFromFileSystem(fs, toCheck);
}

//检查并删除文件
CleanerChore#checkAndDelete() {
	HBaseFileSystem.deleteDirFromFileSystem(fs, filePath);
}

 

 

 

 

 

oldLogCleaner线程(LogCleaner)

这个类用于oldlog目录下文件

具体执行逻辑和archivedHFileCleaner线程一样

都是调用父类CleanerChore#chore()函数去执行的

 

 

 

 

 

timerUpdater线程(AssignmentManager$TimerUpdater#chore)

这个类用于更新region的时间戳,这些region都是出于事务中的region

主要逻辑如下:

//在单独线程中执行,通过Chore#run()调用到这里的
AssignmentManager$TimerUpdater#chore() {
	while (!serversInUpdatingTimer.isEmpty() && !stopper.isStopped()) {
		if (serverToUpdateTimer == null) {
			serverToUpdateTimer = serversInUpdatingTimer.first();
		} else {
			serverToUpdateTimer = serversInUpdatingTimer.higher(serverToUpdateTimer);
		}
		updateTimers(serverToUpdateTimer);
	}		        
}

//更新处于事务中的region的时间戳
//这里会迭代所有机器,然后更新每个机器上的region
AssignmentManager#updateTimers() {
	for (Map.Entry e: copy.entrySet()) {
		rs = this.regionsInTransition.get(e.getKey());
		rs.updateTimestampToNow();
	}
}

 

 

 

 

 

splitLogManagerTimeoutMonitor线程(SplitLogManager$TimeoutMonitor#chore)

这个类用于周期性的检查是否有执行超时的任务(获取ZK的split节点的任务,然后执行切分日志工作),如果有则

需要重新提交这个任务,如果出现region下线,server宕机等情况也需要重新提交,最后删除失败的任务

具体逻辑如下:

//在单独线程中执行,通过Chore#run()调用到这里的
//周期性的检查是否有处理splitlog超时的region,或者
//出现某些region下线了,这时候需要重新提交splitlog
//最后将失败的任务删除掉
SplitLogManager$TimeoutMonitor#chore() {
	for (Map.Entry e : tasks.entrySet()) {
		if (localDeadWorkers != null && localDeadWorkers.contains(cur_worker)) {
			if (resubmit(path, task, FORCE)) {
				resubmitted++;
			} else {
				//将死掉的工作regoin server放入列表中
				handleDeadWorker(cur_worker);
	        }	
		} else if (resubmit(path, task, CHECK)) {
          resubmitted++;
        }	
	}
	for (Map.Entry e : tasks.entrySet()) {
		String path = e.getKey();
		Task task = e.getValue();
		if (task.isUnassigned() && (task.status != FAILURE)) {
			// We just touch the znode to make sure its still there
            tryGetDataSetWatch(path);
		}			
	}	
	createRescanNode(Long.MAX_VALUE);
	
	// Retry previously failed deletes
	if (failedDeletions.size() > 0) {
		for (String tmpPath : tmpPaths) {
			// deleteNode is an async call
			deleteNode(tmpPath, zkretries);
		}      	
    }
}

//异步删除节点
SplitLogManager#deleteNode() {
	ZooKeeper.delete(path, -1, new DeleteAsyncCallback(),retries);	
}

 

 

 

 

 

参考

HMaster架构

master和regionserver启动过程

 

 

 

 

 

你可能感兴趣的:(hadoop)