项目hbase数据库出现很诡异的assignment ,region移动的src和dest都是同一台regionserver,不过时间戳不同,启动的只有一个regionserver, 不知道怎么出现了两个时间戳
分析下源码解决一下
// Wait for region servers to report in. this.serverManager.waitForRegionServers(status); // Check zk for regionservers that are up but didn't register for (ServerName sn: this.regionServerTracker.getOnlineServers()) { if (!this.serverManager.isServerOnline(sn)) { // Not registered; add it. LOG.info("Registering server found up in zk but who has not yet " + "reported in: " + sn); this.serverManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD); } }
serverManager线程sleep一定时间,等待HRegionServer注册
HRegionServer.java:
// Try and register with the Master; tell it we are here. Break if // server is stopped or the clusterup flag is down or hdfs went wacky. while (keepLooping()) { MapWritable w = reportForDuty(); if (w == null) { LOG.warn("reportForDuty failed; sleeping and then retrying."); this.sleeper.sleep(); } else { handleReportForDutyResponse(w); break; } }
HRegionServer 注册之后进入mainloop
// The main run loop. while (!this.stopped && isHealthy()) {
long now = System.currentTimeMillis();
if ((now - lastMsg) >= msgInterval) {
doMetrics();
tryRegionServerReport();
lastMsg = System.currentTimeMillis();
}
}
每隔hbase.regionserver.msginterval时间(默认3秒),进行一次注册尝试,如果服务器ip和端口不在已注册列表中,则添加ServerName进map
ServerManager.java
void regionServerReport(ServerName sn, HServerLoad hsl) throws YouAreDeadException, PleaseHoldException { checkIsDead(sn, "REPORT"); if (!this.onlineServers.containsKey(sn)) { // Already have this host+port combo and its just different start code? checkAlreadySameHostPort(sn); // Just let the server in. Presume master joining a running cluster. // recordNewServer is what happens at the end of reportServerStartup. // The only thing we are skipping is passing back to the regionserver // the ServerName to use. Here we presume a master has already done // that so we'll press on with whatever it gave us for ServerName. recordNewServer(sn, hsl); } else { this.onlineServers.put(sn, hsl); } }
recordNewServer 会打印 ServerName对象的ip 端口和时间戳信息
同一个region server注册的ServerName对象 会拥有同样的时间戳
this.startcode = System.currentTimeMillis(); ... result = this.hbaseMaster.regionServerStartup(port, this.startcode, now); ... this.serverNameFromMasterPOV = new ServerName(hostnameFromMasterPOV, this.isa.getPort(), this.startcode); ... this.hbaseMaster.regionServerReport(this.serverNameFromMasterPOV.getVersionedBytes(), hsl);
region server启动时startCode是固定死的,按照这个流程是不会出现相同IP和端口,但时间戳不同的region server跑在线上的
如果一台机器上启动了两个region server 会把时间戳小的移出,下次添加进时间戳大的进去
我们遇到的问题是时间戳不同的regionserver被注册在了master上,并且相互之间做region move