在分享replication时,有同事提出replication延时怎么样,(基于0.94.3)
本文主要代码分析一下Hlog生成及对relication的影响。具体replication请参考
http://brianf.iteye.com/blog/1776936
首先分析hlog什么时候产生:
在生成HLog对象时,会调用HLog的rollWriter(),此时由于this.writer为null,所以通过rollWriter方法会创建第一个hlog文件,之后会调用replicaton相关的参见http://brianf.iteye.com/blog/1776936
-
- rollWriter();
-
-
- this.getNumCurrentReplicas = getGetNumCurrentReplicas(this.hdfs_out);
-
- logSyncerThread = new LogSyncer(this.optionalFlushInterval);
-----------------
- public byte [][] rollWriter(boolean force)
- throws FailedLogCloseException, IOException {
-
- if (!force && this.writer != null && this.numEntries.get() <= 0) {
- return null;
- }
-------------
LogRoller.run中
- public void run() {
- while (!server.isStopped()) {
- long now = System.currentTimeMillis();
- boolean periodic = false;
- if (!rollLog.get()) {
- periodic = (now - this.lastrolltime) > this.rollperiod;
- if (!periodic) {
- synchronized (rollLog) {
- try {
- rollLog.wait(this.threadWakeFrequency);
- } catch (InterruptedException e) {
-
- }
- }
- continue;
- }
-
- if (LOG.isDebugEnabled()) {
- LOG.debug("Hlog roll period " + this.rollperiod + "ms elapsed");
- }
- } else if (LOG.isDebugEnabled()) {
- LOG.debug("HLog roll requested");
- }
- rollLock.lock();
- try {
- this.lastrolltime = now;
-
- byte [][] regionsToFlush = this.services.getWAL().rollWriter(rollLog.get());
LogRoller线程默认会等待1小时,也就是默认是1个小时一个log(后面会说还有hlog size也会是一个因素)
- this.rollperiod = this.server.getConfiguration().
- getLong("hbase.regionserver.logroll.period", 3600000);
而rollLog是AtomicBoolean, 当为true时,调用rollWriter会创建新log, 什么时候rollLog为true呢?
- public void logRollRequested() {
- synchronized (rollLog) {
- rollLog.set(true);
- rollLog.notifyAll();
- }
- }
是在Hlog.syncer方法中调用的。
数据写入hbase时,如put,调用Hlog的append ,此方法中将数据写到Hlog的缓存中(List),再同步sync数据到HDSF,还有LogSyncer线程会1000ms执行一次 Hlog.syncer方法 。
- private long append(HRegionInfo info, byte [] tableName, WALEdit edits, UUID clusterId,
- final long now, HTableDescriptor htd, boolean doSync)
- throws IOException {
- if (edits.isEmpty()) return this.unflushedEntries.get();;
- if (this.closed) {
- throw new IOException("Cannot append; log is closed");
- }
- long txid = 0;
- synchronized (this.updateLock) {
- long seqNum = obtainSeqNum();
-
-
-
-
-
-
-
- byte [] encodedRegionName = info.getEncodedNameAsBytes();
- this.lastSeqWritten.putIfAbsent(encodedRegionName, seqNum);
- HLogKey logKey = makeKey(encodedRegionName, tableName, seqNum, now, clusterId);
- doWrite(info, logKey, edits, htd);
- this.numEntries.incrementAndGet();
- txid = this.unflushedEntries.incrementAndGet();
- if (htd.isDeferredLogFlush()) {
- lastDeferredTxid = txid;
- }
- }
-
-
- if (doSync &&
- (info.isMetaRegion() ||
- !htd.isDeferredLogFlush())) {
-
- this.sync(txid);
- }
- return txid;
- }
其中在Hlog.syncer方法中调用checkLowReplication方法用来判断是否hlog在hdfs上的副本数低于配置项,若低于则requestLogRoll,最终调用logRollRequested方法,但是调用次数不超过默认5次(
- this.lowReplicationRollLimit = conf.getInt(
- "hbase.regionserver.hlog.lowreplication.rolllimit", 5);
)
然后判断正在写的hlog是否大于一个size(64MB*0.95),若大于,说明也要生成新的Hlog
- this.blocksize = conf.getLong("hbase.regionserver.hlog.blocksize",
- getDefaultBlockSize());
-
- float multi = conf.getFloat("hbase.regionserver.logroll.multiplier", 0.95f);
- this.logrollsize = (long)(this.blocksize * multi);
- ----------------
- if (tempWriter.getLength() > this.logrollsize) {
- requestLogRoll();
- }
对于replication来说,延迟时间主要是与ZK的通讯及RPC调用slave RS时间。
hbase.regionserver.optionallogflushinterval
将Hlog同步到HDFS的间隔。如果Hlog没有积累到一定的数量,到了时间,也会触发同步。默认是1秒,单位毫秒。
默认: 1000
hbase.regionserver.logroll.period
提交commit log的间隔,不管有没有写足够的值。
默认: 3600000
hbase.master.logcleaner.ttl
Hlog存在于.oldlogdir 文件夹的最长时间, 超过了就会被 Master 的线程清理掉.
默认: 600000
hbase.master.logcleaner.plugins
值用逗号间隔的文本表示。这些WAL/HLog cleaners会按顺序调用。可以把先调用的放在前面。可以实现自己的LogCleanerDelegat,加到Classpath下,然后在这里写上类的全路径就可以。一般都是加在默认值的前面。
具体的初始是在CleanerChore 的initCleanerChain方法,此方法同时也实现HFile的cleaner的初台化。
默认: org.apache.hadoop.hbase.master.TimeToLiveLogCleaner
hbase.regionserver.hlog.blocksize
hbase.regionserver.maxlogs
WAL的最大值由hbase.regionserver.maxlogs * hbase.regionserver.hlog.blocksize (2GB by default)决定。一旦达到这个值,Memstore flush就会被触发。通过WAL限制来触发Memstore的flush并非最佳方式,这样做可能会会一次flush很多Region,引发flush雪崩。
最好将hbase.regionserver.hlog.blocksize * hbase.regionserver.maxlogs 设置为稍微大于hbase.regionserver.global.memstore.lowerLimit * HBASE_HEAPSIZE.