写之前先吐槽一下自己的sb公司环境,电脑上不了网,优盘又不能插。所以做点笔记基本上都是晚上回家再写一遍。哎,废话不说了
先贴个hbase在构造函数中起来的RPC服务的UML图 :http://blackproof.iteye.com/blog/2029170
HMaster启动会调用Run方法,概述为
HBase启动
hmaster处理备份master
进入HMaster的启动过程:方法:
1.Zk监控类
2.创建线程池,并启动60010jetty服务
3.Regionserver启动
4.HLog处理流程
5.HMaster注册root和meta表
6.准备步骤,收集offline server including region和zk assigned region
7.分配region
8.检查子region
9.启动balance线程
详细分析:
调用方法becomeActiveMaster,先检查配置项"hbase.master.backup",自己是否backup机器,如果是则直接block直至检查到系统中的active master挂掉(默认每3分钟检查一次)
private boolean becomeActiveMaster(MonitoredTask startupStatus)
throws InterruptedException {
// TODO: This is wrong!!!! Should have new servername if we restart ourselves,
// if we come back to life.
this.activeMasterManager = new ActiveMasterManager(zooKeeper, this.serverName,
this);
this.zooKeeper.registerListener(activeMasterManager);
stallIfBackupMaster(this.conf, this.activeMasterManager);
// The ClusterStatusTracker is setup before the other
// ZKBasedSystemTrackers because it's needed by the activeMasterManager
// to check if the cluster should be shutdown.
this.clusterStatusTracker = new ClusterStatusTracker(getZooKeeper(), this);
this.clusterStatusTracker.start();
return this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus);
}
1) Zk监控类
首先初始化所有zk监控类,调用方法initializeZKBasedSystemTrackers
初始化zk listener
1.createCatalogTracker(rootRegionTracker,metaRegionTracker)是root和meta的监控类
2. AssignmentManager 管理分配region
3. RegionServerManager 监控regionserver,服务已死的server
4.DrainingServerTracker 服务维护draining RS list
void initializeZKBasedSystemTrackers() throws IOException,
InterruptedException, KeeperException {
this.catalogTracker = createCatalogTracker(this.zooKeeper, this.conf, this);
this.catalogTracker.start();
this.balancer = LoadBalancerFactory.getLoadBalancer(conf);
this.loadBalancerTracker = new LoadBalancerTracker(zooKeeper, this);
this.loadBalancerTracker.start();
this.assignmentManager = new AssignmentManager(this, serverManager,
this.catalogTracker, this.balancer, this.executorService, this.metricsMaster,
this.tableLockManager);
zooKeeper.registerListenerFirst(assignmentManager);
this.regionServerTracker = new RegionServerTracker(zooKeeper, this,
this.serverManager);
this.regionServerTracker.start();
this.drainingServerTracker = new DrainingServerTracker(zooKeeper, this,
this.serverManager);
this.drainingServerTracker.start();
// Set the cluster as up. If new RSs, they'll be waiting on this before
// going ahead with their startup.
boolean wasUp = this.clusterStatusTracker.isClusterUp();
if (!wasUp) this.clusterStatusTracker.setClusterUp();
LOG.info("Server active/primary master=" + this.serverName +
", sessionid=0x" +
Long.toHexString(this.zooKeeper.getRecoverableZooKeeper().getSessionId()) +
", setting cluster-up flag (Was=" + wasUp + ")");
// create the snapshot manager
this.snapshotManager = new SnapshotManager(this, this.metricsMaster);
}
2)创建线程池,并启动60010jetty服务
1.启动线程池,和以下线程
MASTER_META_SERVER_OPERATIONS
MASTER_SERVER_OPERATIONS
MASTER_CLOSE_REGION
MASTER_OPEN_REGION
MASTER_TABLE_OPERATIONS
2.LogCleaner线程 清除.oldlog目录
3.60010 jetty 服务
void startServiceThreads() throws IOException{
// Start the executor service pools
this.executorService.startExecutorService(ExecutorType.MASTER_OPEN_REGION,
conf.getInt("hbase.master.executor.openregion.threads", 5));
this.executorService.startExecutorService(ExecutorType.MASTER_CLOSE_REGION,
conf.getInt("hbase.master.executor.closeregion.threads", 5));
this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,
conf.getInt("hbase.master.executor.serverops.threads", 5));
this.executorService.startExecutorService(ExecutorType.MASTER_META_SERVER_OPERATIONS,
conf.getInt("hbase.master.executor.serverops.threads", 5));
this.executorService.startExecutorService(ExecutorType.M_LOG_REPLAY_OPS,
conf.getInt("hbase.master.executor.logreplayops.threads", 10));
// We depend on there being only one instance of this executor running
// at a time. To do concurrency, would need fencing of enable/disable of
// tables.
this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS, 1);
// Start log cleaner thread
String n = Thread.currentThread().getName();
int cleanerInterval = conf.getInt("hbase.master.cleaner.interval", 60 * 1000);
this.logCleaner =
new LogCleaner(cleanerInterval,
this, conf, getMasterFileSystem().getFileSystem(),
getMasterFileSystem().getOldLogDir());
Threads.setDaemonThreadRunning(logCleaner.getThread(), n + ".oldLogCleaner");
//start the hfile archive cleaner thread
Path archiveDir = HFileArchiveUtil.getArchivePath(conf);
this.hfileCleaner = new HFileCleaner(cleanerInterval, this, conf, getMasterFileSystem()
.getFileSystem(), archiveDir);
Threads.setDaemonThreadRunning(hfileCleaner.getThread(), n + ".archivedHFileCleaner");
// Start the health checker
if (this.healthCheckChore != null) {
Threads.setDaemonThreadRunning(this.healthCheckChore.getThread(), n + ".healthChecker");
}
// Start allowing requests to happen.
this.rpcServer.openServer();
this.rpcServerOpen = true;
if (LOG.isTraceEnabled()) {
LOG.trace("Started service threads");
}
}
3)Regionserver启动
serverManager调用方法:waitForRegionServers
等待regionserver启动,在满足一下条件后继续启动hmaster:
a 至少等待4.5s("hbase.master.wait.on.regionservers.timeout")
b 成功启动regionserver节点数>=1("hbase.master.wait.on.regionservers.mintostart")
c 1.5s内没有regionsever死掉或新启动("hbase.master.wait.on.regionservers.interval")
public void waitForRegionServers(MonitoredTask status)
throws InterruptedException {
final long interval = this.master.getConfiguration().
getLong(WAIT_ON_REGIONSERVERS_INTERVAL, 1500);
final long timeout = this.master.getConfiguration().
getLong(WAIT_ON_REGIONSERVERS_TIMEOUT, 4500);
int minToStart = this.master.getConfiguration().
getInt(WAIT_ON_REGIONSERVERS_MINTOSTART, 1);
if (minToStart < 1) {
LOG.warn(String.format(
"The value of '%s' (%d) can not be less than 1, ignoring.",
WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
minToStart = 1;
}
int maxToStart = this.master.getConfiguration().
getInt(WAIT_ON_REGIONSERVERS_MAXTOSTART, Integer.MAX_VALUE);
if (maxToStart < minToStart) {
LOG.warn(String.format(
"The value of '%s' (%d) is set less than '%s' (%d), ignoring.",
WAIT_ON_REGIONSERVERS_MAXTOSTART, maxToStart,
WAIT_ON_REGIONSERVERS_MINTOSTART, minToStart));
maxToStart = Integer.MAX_VALUE;
}
long now = System.currentTimeMillis();
final long startTime = now;
long slept = 0;
long lastLogTime = 0;
long lastCountChange = startTime;
int count = countOfRegionServers();
int oldCount = 0;
while (
!this.master.isStopped() &&
count < maxToStart &&
(lastCountChange+interval > now || timeout > slept || count < minToStart)
){
// Log some info at every interval time or if there is a change
if (oldCount != count || lastLogTime+interval < now){
lastLogTime = now;
String msg =
"Waiting for region servers count to settle; currently"+
" checked in " + count + ", slept for " + slept + " ms," +
" expecting minimum of " + minToStart + ", maximum of "+ maxToStart+
", timeout of "+timeout+" ms, interval of "+interval+" ms.";
LOG.info(msg);
status.setStatus(msg);
}
// We sleep for some time
final long sleepTime = 50;
Thread.sleep(sleepTime);
now = System.currentTimeMillis();
slept = now - startTime;
oldCount = count;
count = countOfRegionServers();
if (count != oldCount) {
lastCountChange = now;
}
}
LOG.info("Finished waiting for region servers count to settle;" +
" checked in " + count + ", slept for " + slept + " ms," +
" expecting minimum of " + minToStart + ", maximum of "+ maxToStart+","+
" master is "+ (this.master.isStopped() ? "stopped.": "running.")
);
}
4)
HLog处理流程(贴出代码是处理流程部分代码)
处理目的是为了找到没有server归属的HLog,并将其付给其他的region server
masterfilesystem 调用 splitLogAfterStartup方法
1.split加锁
2.获得所有HLog与online region server做碰撞,没有碰上的Hlog进入splitLog方法
3.创建HLogSplitter,等待saftmode,调用HlogSplitter.splitLog,
4.HLogSplitter的reader读取hlog到内存中EntryBuffers,读取完毕移动log,有问题的到.corrupt目录中,处理完的放在.oldlogs中
4.HLogSplitter的 OutputSink 创建多个writer线程,读取entryBuffers写到region下的recovered.edits下的文件夹
5.解锁
this.splitLogLock.lock();
try {
HLogSplitter splitter = HLogSplitter.createLogSplitter(
conf, rootdir, logDir, oldLogDir, this.fs);
try {
// If FS is in safe mode, just wait till out of it.
FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));
splitter.splitLog();
} catch (OrphanHLogAfterSplitException e) {
LOG.warn("Retrying splitting because of:", e);
//An HLogSplitter instance can only be used once. Get new instance.
splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
oldLogDir, this.fs);
splitter.splitLog();
}
splitTime = splitter.getTime();
splitLogSize = splitter.getSize();
} finally {
this.splitLogLock.unlock();
}
private List<Path> splitLog(final FileStatus[] logfiles) throws IOException {
List<Path> processedLogs = new ArrayList<Path>();
List<Path> corruptedLogs = new ArrayList<Path>();
List<Path> splits = null;
boolean skipErrors = conf.getBoolean("hbase.hlog.split.skip.errors", true);
countTotalBytes(logfiles);
splitSize = 0;
outputSink.startWriterThreads(entryBuffers);
try {
int i = 0;
for (FileStatus log : logfiles) {
Path logPath = log.getPath();
long logLength = log.getLen();
splitSize += logLength;
logAndReport("Splitting hlog " + (i++ + 1) + " of " + logfiles.length
+ ": " + logPath + ", length=" + logLength);
Reader in;
try {
in = getReader(fs, log, conf, skipErrors);
if (in != null) {
parseHLog(in, logPath, entryBuffers, fs, conf, skipErrors);
try {
in.close();
} catch (IOException e) {
LOG.warn("Close log reader threw exception -- continuing",
e);
}
}
processedLogs.add(logPath);
} catch (CorruptedLogFileException e) {
LOG.info("Got while parsing hlog " + logPath +
". Marking as corrupted", e);
corruptedLogs.add(logPath);
continue;
}
}
status.setStatus("Log splits complete. Checking for orphaned logs.");
if (fs.listStatus(srcDir).length > processedLogs.size()
+ corruptedLogs.size()) {
throw new OrphanHLogAfterSplitException(
"Discovered orphan hlog after split. Maybe the "
+ "HRegionServer was not dead when we started");
}
} finally {
status.setStatus("Finishing writing output logs and closing down.");
splits = outputSink.finishWritingAndClose();
}
status.setStatus("Archiving logs after completed split");
archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
return splits;
}
5)HMaster注册root和meta表
int assignRootAndMeta(MonitoredTask status)
throws InterruptedException, IOException, KeeperException {
int assigned = 0;
long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 1000);
// Work on ROOT region. Is it in zk in transition?
status.setStatus("Assigning ROOT region");
boolean rit = this.assignmentManager.
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
ServerName currentRootServer = null;
boolean rootRegionLocation = catalogTracker.verifyRootRegionLocation(timeout);
if (!rit && !rootRegionLocation) {
currentRootServer = this.catalogTracker.getRootLocation();
splitLogAndExpireIfOnline(currentRootServer);
this.assignmentManager.assignRoot();
waitForRootAssignment();
assigned++;
} else if (rit && !rootRegionLocation) {
waitForRootAssignment();
assigned++;
} else {
// Region already assigned. We didn't assign it. Add to in-memory state.
this.assignmentManager.regionOnline(HRegionInfo.ROOT_REGIONINFO,
this.catalogTracker.getRootLocation());
}
// Enable the ROOT table if on process fail over the RS containing ROOT
// was active.
enableCatalogTables(Bytes.toString(HConstants.ROOT_TABLE_NAME));
LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit +
", location=" + catalogTracker.getRootLocation());
// Work on meta region
status.setStatus("Assigning META region");
rit = this.assignmentManager.
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
boolean metaRegionLocation = this.catalogTracker.verifyMetaRegionLocation(timeout);
if (!rit && !metaRegionLocation) {
ServerName currentMetaServer =
this.catalogTracker.getMetaLocationOrReadLocationFromRoot();
if (currentMetaServer != null
&& !currentMetaServer.equals(currentRootServer)) {
splitLogAndExpireIfOnline(currentMetaServer);
}
assignmentManager.assignMeta();
enableSSHandWaitForMeta();
assigned++;
} else if (rit && !metaRegionLocation) {
enableSSHandWaitForMeta();
assigned++;
} else {
// Region already assigned. We didnt' assign it. Add to in-memory state.
this.assignmentManager.regionOnline(HRegionInfo.FIRST_META_REGIONINFO,
this.catalogTracker.getMetaLocation());
}
enableCatalogTables(Bytes.toString(HConstants.META_TABLE_NAME));
LOG.info(".META. assigned=" + assigned + ", rit=" + rit +
", location=" + catalogTracker.getMetaLocation());
status.setStatus("META and ROOT assigned.");
return assigned;
}
6)准备步骤,收集offline server including region和zk assigned region
hmaster调用assignManager的joinCluster方法 - rebuildUserRegions方法
获取不在线的region server包括他的region,以及zk上已经注册的region(hmaster宕掉的情况)
Map<ServerName, List<Pair<HRegionInfo, Result>>> rebuildUserRegions() throws IOException,
KeeperException {
// Region assignment from META
List<Result> results = MetaReader.fullScan(this.catalogTracker);
// Get any new but slow to checkin region server that joined the cluster
Set<ServerName> onlineServers = serverManager.getOnlineServers().keySet();
// Map of offline servers and their regions to be returned
Map<ServerName, List<Pair<HRegionInfo,Result>>> offlineServers =
new TreeMap<ServerName, List<Pair<HRegionInfo, Result>>>();
// Iterate regions in META
for (Result result : results) {
boolean disabled = false;
boolean disablingOrEnabling = false;
Pair<HRegionInfo, ServerName> region = MetaReader.parseCatalogResult(result);
if (region == null) continue;
HRegionInfo regionInfo = region.getFirst();
ServerName regionLocation = region.getSecond();
if (regionInfo == null) continue;
String tableName = regionInfo.getTableNameAsString();
if (regionLocation == null) {
// regionLocation could be null if createTable didn't finish properly.
// When createTable is in progress, HMaster restarts.
// Some regions have been added to .META., but have not been assigned.
// When this happens, the region's table must be in ENABLING state.
// It can't be in ENABLED state as that is set when all regions are
// assigned.
// It can't be in DISABLING state, because DISABLING state transitions
// from ENABLED state when application calls disableTable.
// It can't be in DISABLED state, because DISABLED states transitions
// from DISABLING state.
if (false == checkIfRegionsBelongsToEnabling(regionInfo)) {
LOG.warn("Region " + regionInfo.getEncodedName() +
" has null regionLocation." + " But its table " + tableName +
" isn't in ENABLING state.");
}
addTheTablesInPartialState(this.disablingTables, this.enablingTables, regionInfo,
tableName);
} else if (!onlineServers.contains(regionLocation)) {
// Region is located on a server that isn't online
List<Pair<HRegionInfo, Result>> offlineRegions =
offlineServers.get(regionLocation);
if (offlineRegions == null) {
offlineRegions = new ArrayList<Pair<HRegionInfo,Result>>(1);
offlineServers.put(regionLocation, offlineRegions);
}
offlineRegions.add(new Pair<HRegionInfo,Result>(regionInfo, result));
disabled = checkIfRegionBelongsToDisabled(regionInfo);
disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
this.enablingTables, regionInfo, tableName);
// need to enable the table if not disabled or disabling or enabling
// this will be used in rolling restarts
enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
disablingOrEnabling, tableName);
} else {
// If region is in offline and split state check the ZKNode
if (regionInfo.isOffline() && regionInfo.isSplit()) {
String node = ZKAssign.getNodeName(this.watcher, regionInfo
.getEncodedName());
Stat stat = new Stat();
byte[] data = ZKUtil.getDataNoWatch(this.watcher, node, stat);
// If znode does not exist dont consider this region
if (data == null) {
LOG.debug("Region "+ regionInfo.getRegionNameAsString() + " split is completed. "
+ "Hence need not add to regions list");
continue;
}
}
// Region is being served and on an active server
// add only if region not in disabled and enabling table
if (false == checkIfRegionBelongsToDisabled(regionInfo)
&& false == checkIfRegionsBelongsToEnabling(regionInfo)) {
synchronized (this.regions) {
regions.put(regionInfo, regionLocation);
addToServers(regionLocation, regionInfo);
}
}
disablingOrEnabling = addTheTablesInPartialState(this.disablingTables,
this.enablingTables, regionInfo, tableName);
disabled = checkIfRegionBelongsToDisabled(regionInfo);
// need to enable the table if not disabled or disabling or enabling
// this will be used in rolling restarts
enableTableIfNotDisabledOrDisablingOrEnabling(disabled,
disablingOrEnabling, tableName);
}
}
return offlineServers;
}
7)分配region
assignmanager调用processDeadServersAndRegionsInTransition方法
分配region分为两个情况,一种是只需要分配的region不为0,则为mater宕机情况A,反之为正常情况B
A:processDeadServersAndRecoverLostRegions();
B:cleanoutUnassigned();
assignAllUserRegions();
分支A:方法processDeadServersAndRecoverLostRegions
遍历所有所有不在线上的region server,以及他的region
1.1判断region是否在zk上,若不在则表示region已经被一起在线的region server所处理
1.2若存在,则继续判断当前region是否需要分配
1.2.1region是disabled table的,则不需要处理
1.2.2region是splited region,则须要处理子region,若他的两个子region丢失,则在meta表上注册子region
1.3若需要分配region,调用方法createOrForceNodeOffline,给region设置watch
private void processDeadServersAndRecoverLostRegions(
Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
List<String> nodes) throws IOException, KeeperException {
if (null != deadServers) {
Set<ServerName> actualDeadServers = this.serverManager.getDeadServers();
for (Map.Entry<ServerName, List<Pair<HRegionInfo, Result>>> deadServer :
deadServers.entrySet()) {
// skip regions of dead servers because SSH will process regions during rs expiration.
// see HBASE-5916
if (actualDeadServers.contains(deadServer.getKey())) {
for (Pair<HRegionInfo, Result> deadRegion : deadServer.getValue()) {
nodes.remove(deadRegion.getFirst().getEncodedName());
}
continue;
}
List<Pair<HRegionInfo, Result>> regions = deadServer.getValue();
for (Pair<HRegionInfo, Result> region : regions) {
HRegionInfo regionInfo = region.getFirst();
Result result = region.getSecond();
// If region was in transition (was in zk) force it offline for
// reassign
try {
RegionTransitionData data = ZKAssign.getData(watcher,
regionInfo.getEncodedName());
// If zk node of this region has been updated by a live server,
// we consider that this region is being handled.
// So we should skip it and process it in
// processRegionsInTransition.
if (data != null && data.getOrigin() != null &&
serverManager.isServerOnline(data.getOrigin())) {
LOG.info("The region " + regionInfo.getEncodedName()
+ "is being handled on " + data.getOrigin());
continue;
}
// Process with existing RS shutdown code
boolean assign = ServerShutdownHandler.processDeadRegion(
regionInfo, result, this, this.catalogTracker);
if (assign) {
ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
master.getServerName());
if (!nodes.contains(regionInfo.getEncodedName())) {
nodes.add(regionInfo.getEncodedName());
}
}
} catch (KeeperException.NoNodeException nne) {
// This is fine
}
}
}
}
之后对需要分配的region,进入RIT工作流,调用方法processRegionsInTransition
RIT流程,贴一个RIT流程分析的帖子http://blog.csdn.net/shenxiaoming77/article/details/18360199
void processRegionsInTransition(final RegionTransitionData data,
final HRegionInfo regionInfo,
final Map<ServerName, List<Pair<HRegionInfo, Result>>> deadServers,
int expectedVersion)
throws KeeperException {
String encodedRegionName = regionInfo.getEncodedName();
LOG.info("Processing region " + regionInfo.getRegionNameAsString() +
" in state " + data.getEventType());
synchronized (regionsInTransition) {
RegionState regionState = regionsInTransition.get(encodedRegionName);
if (regionState != null ||
failoverProcessedRegions.containsKey(encodedRegionName)) {
// Just return
return;
}
switch (data.getEventType()) {
case M_ZK_REGION_CLOSING:
// If zk node of the region was updated by a live server skip this
// region and just add it into RIT.
if (isOnDeadServer(regionInfo, deadServers) &&
(data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) {
// If was on dead server, its closed now. Force to OFFLINE and this
// will get it reassigned if appropriate
forceOffline(regionInfo, data);
} else {
// Just insert region into RIT.
// If this never updates the timeout will trigger new assignment
regionsInTransition.put(encodedRegionName, new RegionState(
regionInfo, RegionState.State.CLOSING,
data.getStamp(), data.getOrigin()));
}
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
case RS_ZK_REGION_CLOSED:
case RS_ZK_REGION_FAILED_OPEN:
// Region is closed, insert into RIT and handle it
addToRITandCallClose(regionInfo, RegionState.State.CLOSED, data);
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
case M_ZK_REGION_OFFLINE:
// If zk node of the region was updated by a live server skip this
// region and just add it into RIT.
if (isOnDeadServer(regionInfo, deadServers) &&
(data.getOrigin() == null ||
!serverManager.isServerOnline(data.getOrigin()))) {
// Region is offline, insert into RIT and handle it like a closed
addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
} else if (data.getOrigin() != null &&
!serverManager.isServerOnline(data.getOrigin())) {
// to handle cases where offline node is created but sendRegionOpen
// RPC is not yet sent
addToRITandCallClose(regionInfo, RegionState.State.OFFLINE, data);
} else {
regionsInTransition.put(encodedRegionName, new RegionState(
regionInfo, RegionState.State.PENDING_OPEN, data.getStamp(), data
.getOrigin()));
}
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
case RS_ZK_REGION_OPENING:
// TODO: Could check if it was on deadServers. If it was, then we could
// do what happens in TimeoutMonitor when it sees this condition.
// Just insert region into RIT
// If this never updates the timeout will trigger new assignment
if (regionInfo.isMetaTable()) {
regionsInTransition.put(encodedRegionName, new RegionState(
regionInfo, RegionState.State.OPENING, data.getStamp(), data
.getOrigin()));
// If ROOT or .META. table is waiting for timeout monitor to assign
// it may take lot of time when the assignment.timeout.period is
// the default value which may be very long. We will not be able
// to serve any request during this time.
// So we will assign the ROOT and .META. region immediately.
processOpeningState(regionInfo);
break;
}
regionsInTransition.put(encodedRegionName, new RegionState(regionInfo,
RegionState.State.OPENING, data.getStamp(), data.getOrigin()));
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
case RS_ZK_REGION_OPENED:
// Region is opened, insert into RIT and handle it
regionsInTransition.put(encodedRegionName, new RegionState(
regionInfo, RegionState.State.OPEN,
data.getStamp(), data.getOrigin()));
ServerName sn = data.getOrigin() == null? null: data.getOrigin();
// sn could be null if this server is no longer online. If
// that is the case, just let this RIT timeout; it'll be assigned
// to new server then.
if (sn == null) {
LOG.warn("Region in transition " + regionInfo.getEncodedName() +
" references a null server; letting RIT timeout so will be " +
"assigned elsewhere");
} else if (!serverManager.isServerOnline(sn)
&& (isOnDeadServer(regionInfo, deadServers)
|| regionInfo.isMetaRegion() || regionInfo.isRootRegion())) {
forceOffline(regionInfo, data);
} else {
new OpenedRegionHandler(master, this, regionInfo, sn, expectedVersion)
.process();
}
failoverProcessedRegions.put(encodedRegionName, regionInfo);
break;
}
}
}
B分支:1.调用方法cleanoutUnassigned,清除zk所有node,重新watch
2.调用assignAllUserRegions
1.获取所有region
2.分配region,当hbase.master.startup.retainassign为true,按照meta表信息分配region,反之 则online region server中随机选择
8)检查子region
void fixupDaughters(final MonitoredTask status) throws IOException {
final Map<HRegionInfo, Result> offlineSplitParents =
new HashMap<HRegionInfo, Result>();
// This visitor collects offline split parents in the .META. table
MetaReader.Visitor visitor = new MetaReader.Visitor() {
@Override
public boolean visit(Result r) throws IOException {
if (r == null || r.isEmpty()) return true;
HRegionInfo info =
MetaReader.parseHRegionInfoFromCatalogResult(
r, HConstants.REGIONINFO_QUALIFIER);
if (info == null) return true; // Keep scanning
if (info.isOffline() && info.isSplit()) {
offlineSplitParents.put(info, r);
}
// Returning true means "keep scanning"
return true;
}
};
// Run full scan of .META. catalog table passing in our custom visitor
MetaReader.fullScan(this.catalogTracker, visitor);
// Now work on our list of found parents. See if any we can clean up.
int fixups = 0;
for (Map.Entry<HRegionInfo, Result> e : offlineSplitParents.entrySet()) {
fixups += ServerShutdownHandler.fixupDaughters(
e.getValue(), assignmentManager, catalogTracker);
}
if (fixups != 0) {
LOG.info("Scanned the catalog and fixed up " + fixups +
" missing daughter region(s)");
}
}
9)启动balance线程
HMaster balance策略为region server拥有的最小region为 regions/servers,最大为regions/servers+1
先找到超过max的region server,获得需要分配的region;以及获得小于min的region server
再将需要分配的region分给小于min的region server直到max或分配结束
@Override
public boolean balance() {
// if master not initialized, don't run balancer.
if (!this.initialized) {
LOG.debug("Master has not been initialized, don't run balancer.");
return false;
}
// If balance not true, don't run balancer.
if (!this.balanceSwitch) return false;
// Do this call outside of synchronized block.
int maximumBalanceTime = getBalancerCutoffTime();
long cutoffTime = System.currentTimeMillis() + maximumBalanceTime;
boolean balancerRan;
synchronized (this.balancer) {
// Only allow one balance run at at time.
if (this.assignmentManager.isRegionsInTransition()) {
LOG.debug("Not running balancer because " +
this.assignmentManager.getRegionsInTransition().size() +
" region(s) in transition: " +
org.apache.commons.lang.StringUtils.
abbreviate(this.assignmentManager.getRegionsInTransition().toString(), 256));
return false;
}
if (this.serverManager.areDeadServersInProgress()) {
LOG.debug("Not running balancer because processing dead regionserver(s): " +
this.serverManager.getDeadServers());
return false;
}
if (this.cpHost != null) {
try {
if (this.cpHost.preBalance()) {
LOG.debug("Coprocessor bypassing balancer request");
return false;
}
} catch (IOException ioe) {
LOG.error("Error invoking master coprocessor preBalance()", ioe);
return false;
}
}
Map<String, Map<ServerName, List<HRegionInfo>>> assignmentsByTable =
this.assignmentManager.getAssignmentsByTable();
List<RegionPlan> plans = new ArrayList<RegionPlan>();
for (Map<ServerName, List<HRegionInfo>> assignments : assignmentsByTable.values()) {
List<RegionPlan> partialPlans = this.balancer.balanceCluster(assignments);
if (partialPlans != null) plans.addAll(partialPlans);
}
int rpCount = 0; // number of RegionPlans balanced so far
long totalRegPlanExecTime = 0;
balancerRan = plans != null;
if (plans != null && !plans.isEmpty()) {
for (RegionPlan plan: plans) {
LOG.info("balance " + plan);
long balStartTime = System.currentTimeMillis();
this.assignmentManager.balance(plan);
totalRegPlanExecTime += System.currentTimeMillis()-balStartTime;
rpCount++;
if (rpCount < plans.size() &&
// if performing next balance exceeds cutoff time, exit the loop
(System.currentTimeMillis() + (totalRegPlanExecTime / rpCount)) > cutoffTime) {
LOG.debug("No more balancing till next balance run; maximumBalanceTime=" +
maximumBalanceTime);
break;
}
}
}
if (this.cpHost != null) {
try {
this.cpHost.postBalance();
} catch (IOException ioe) {
// balancing already succeeded so don't change the result
LOG.error("Error invoking master coprocessor postBalance()", ioe);
}
}
}
return balancerRan;
}
终于写完了,也感谢能看到这里的朋友,因为是回家重写的,有点糙,对不起了