一. HLog在HDFS上位置和RegionServer对应关系
HLog持久化在HDFS之上, HLog存储位置查看:
hadoop fs -ls /hbase/.logs
通过HBase架构图, HLog与HRegionServer一一对应,
Found 5 items
drwxr-xr-x - hadoop cug-admin 0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS02,61020,1365661380729
drwxr-xr-x - hadoop cug-admin 0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS03,61020,1365661378638
drwxr-xr-x - hadoop cug-admin 0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS04,61020,1365661379200
drwxr-xr-x - hadoop cug-admin 0 2013-04-11 14:22 /hbase/.logs/HADOOPCLUS05,61020,1365661378053
drwxr-xr-x - hadoop cug-admin 0 2013-04-11 14:23 /hbase/.logs/HADOOPCLUS06,61020,1365661378832
HADOOPCLUS02 ~ HADOOPCLUS06 为RegionServer.
上面显示的文件目录为HLog存储. 如果HLog已经失效(所有之前的写入MemStore已经持久化在HDFS),HLog存在于HDFS之上的文件会从/hbase/.logs转移至/hbase/.oldlogs, oldlogs会删除, HLog的生命周期结束.
二. HBase写流程和写HLog的阶段点.
向HBase Put数据时通过HBaseClient-->连接ZooKeeper--->-ROOT--->.META.-->RegionServer-->Region:
Region写数据之前会先检查MemStore.
1. 如果此Region的MemStore已经有缓存已有写入的数据, 则直接返回;
2. 如果没有缓存, 写入HLog(WAL), 再写入MemStore.成功后再返回.
MemStore内存达到一定的值调用flush成为StoreFile,存到HDFS.
在对HBase插入数据时,插入到内存MemStore所以很快,对于安全性不高的应用可以关闭HLog,可以获得更高的写性能.
三. HLog相关源码.
1. 总览.
写入HLog主要靠HLog对象的doWrite(HRegionInfo info, HLogKey logKey, WALEdit logEdit)
或者completeCacheFlush(final byte [] encodedRegionName, final byte [] tableName, final long logSeqId, final boolean isMetaRegion),
在这两方法中调用this.writer.append(new HLog.Entry(logKey, logEdit));方法写入操作.
在方法内构造HLog.Entry:使用当前构造好的writer, 见上图引用对象,
完整实现类: org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter,
HLog 方法createWriterInstance(fs, newPath, conf) 创建 Writer对象.
2. SequenceFileLogWriter 和SequenceFileLogReader
在SequenceFileLogWriter 类中可以看到, 使用Hadoop SequenceFile.Writer写入到文件系统. SequenceFile是HLog在Hadoop存储的文件格式.
HLog.Entry为HLog存储的最小单位.
public class SequenceFileLogWriter implements HLog.Writer {
private final Log LOG = LogFactory.getLog(this.getClass());
// The hadoop sequence file we delegate to.
private SequenceFile.Writer writer;
// The dfsclient out stream gotten made accessible or null if not available.
private OutputStream dfsClient_out;
// The syncFs method from hdfs-200 or null if not available.
private Method syncFs;
// init writer need the key;
private Class extends HLogKey> keyClass;
@Override
public void init(FileSystem fs, Path path, Configuration conf)
throws IOException {
// 1. create Hadoop file SequenceFile.Writer for writer initation.
// 2. Get at the private FSDataOutputStream inside in SequenceFile so we can call sync on it. for dfsClient_out initation.
}
@Override
public void append(HLog.Entry entry) throws IOException {
this.writer.append(entry.getKey(), entry.getEdit());
}
@Override
public void sync() throws IOException {
if (this.syncFs != null) {
try {
this.syncFs.invoke(this.writer, HLog.NO_ARGS);
} catch (Exception e) {
throw new IOException("Reflection", e);
}
}
}
}
SequenceFileLogReader为读取HLog.Entry对象使用.
3. HLog.Entry与属性logSeqNum
每一个Entry包含了 HLogKey和WALEdit
HLogKey包含了基本信息:
private byte [] encodedRegionName;
private byte [] tablename;
private long logSeqNum;
// Time at which this edit was written.
private long writeTime;
private byte clusterId;
logSeqNum是一个重要的字段值, sequence number是作为StoreFile里的一个元数据字段,可以针对StoreFile直接得到longSeqNum;
public class StoreFile {
static final String HFILE_BLOCK_CACHE_SIZE_KEY = "hfile.block.cache.size";
private static BlockCache hfileBlockCache = null;
// Is this from an in-memory store
private boolean inMemory;
// Keys for metadata stored in backing HFile.
// Set when we obtain a Reader. StoreFile row 140
private long sequenceid = -1;
/**
* @return This files maximum edit sequence id.
*/
public long getMaxSequenceId() {
return this.sequenceid;
}
/**
* Return the highest sequence ID found across all storefiles in
* the given list. Store files that were created by a mapreduce
* bulk load are ignored, as they do not correspond to any edit
* log items.
* @return 0 if no non-bulk-load files are provided or, this is Store that
* does not yet have any store files.
*/
public static long getMaxSequenceIdInList(List sfs) {
long max = 0;
for (StoreFile sf : sfs) {
if (!sf.isBulkLoadResult()) {
max = Math.max(max, sf.getMaxSequenceId());
}
}
return max;
}
/**
* Writes meta data. important for maxSequenceId WRITE!!
* Call before {@link #close()} since its written as meta data to this file.
* @param maxSequenceId Maximum sequence id.
* @param majorCompaction True if this file is product of a major compaction
* @throws IOException problem writing to FS
*/
public void appendMetadata(final long maxSequenceId, final boolean majorCompaction)
throws IOException {
writer.appendFileInfo(MAX_SEQ_ID_KEY, Bytes.toBytes(maxSequenceId));
writer.appendFileInfo(MAJOR_COMPACTION_KEY,
Bytes.toBytes(majorCompaction));
appendTimeRangeMetadata();
}
}
/**
* Opens reader on this store file. Called by Constructor.
* @return Reader for the store file.
* @throws IOException
* @see #closeReader()
*/
private Reader open() throws IOException {
// ........
this.sequenceid = Bytes.toLong(b);
if (isReference()) {
if (Reference.isTopFileRegion(this.reference.getFileRegion())) {
this.sequenceid += 1;
}
}
this.reader.setSequenceID(this.sequenceid);
return this.reader;
}
}
Store 类对StoreFile进行了管理, 如compact.在 很多StoreFile进行合并时, 取值最大的longSeqNum;
public class Store implements HeapSize {
/**
* Compact the StoreFiles. This method may take some time, so the calling
* thread must be able to block for long periods. *
* During this time, the Store can work as usual, getting values from
* StoreFiles and writing new StoreFiles from the memstore. *
* Existing StoreFiles are not destroyed until the new compacted StoreFile is
* completely written-out to disk. *
*
The compactLock prevents multiple simultaneous compactions.
* The structureLock prevents us from interfering with other write operations. *
*
We don't want to hold the structureLock for the whole time, as a compact()
* can be lengthy and we want to allow cache-flushes during this period. *
* @param forceMajor True to force a major compaction regardless of thresholds
* @return row to split around if a split is needed, null otherwise
* @throws IOException
*/
StoreSize compact(final boolean forceMajor) throws IOException {
boolean forceSplit = this.region.shouldForceSplit();
boolean majorcompaction = forceMajor;
synchronized (compactLock) {
/* get store file sizes for incremental compacting selection.
* normal skew:
*
* older ----> newer
* _
* | | _
* | | | | _
* --|-|- |-|- |-|---_-------_------- minCompactSize
* | | | | | | | | _ | |
* | | | | | | | | | | | |
* | | | | | | | | | | | |
*/
// .............
this.lastCompactSize = totalSize;
// Max-sequenceID is the last key in the files we're compacting
long maxId = StoreFile.getMaxSequenceIdInList(filesToCompact);
// Ready to go. Have list of files to compact.
LOG.info("Started compaction of " + filesToCompact.size() + " file(s) in cf=" +
this.storeNameStr +
(references? ", hasReferences=true,": " ") + " into " +
region.getTmpDir() + ", seqid=" + maxId +
", totalSize=" + StringUtils.humanReadableInt(totalSize));
StoreFile.Writer writer = compact(filesToCompact, majorcompaction, maxId);
// Move the compaction into place.
StoreFile sf = completeCompaction(filesToCompact, writer);
}
return checkSplit(forceSplit);
}
/**
* Do a minor/major compaction. Uses the scan infrastructure to make it easy.
*
* @param filesToCompact which files to compact
* @param majorCompaction true to major compact (prune all deletes, max versions, etc)
* @param maxId Readers maximum sequence id.
* @return Product of compaction or null if all cells expired or deleted and
* nothing made it through the compaction.
* @throws IOException
*/
private StoreFile.Writer compact(final List filesToCompact,
final boolean majorCompaction, final long maxId)
throws IOException {
// Make the instantiation lazy in case compaction produces no product; i.e.
// where all source cells are expired or deleted.
StoreFile.Writer writer = null;
try {
// ......
} finally {
if (writer != null) {
// !!!! StoreFile.Writer write Metadata for maxid.
writer.appendMetadata(maxId, majorCompaction);
writer.close();
}
}
return writer;
}
}
在compact时, 第一个compact(final boolean forceMajor)调用
compact(final List
此方法最后写入writer.appendMetadata(maxId, majorCompaction); 也就是StoreFile中的appendMetadata方法.
可见, 是在finally中写入最大的logSeqNum. 这样StoreFile在取得每个logSeqNum, 可以由open读取logSeqNum;
clusterId 保存在Hadoop集群ID.
4. HLog的生命周期
这里就涉及到HLog的生命周期问题了.如果HLog的logSeqNum对应的HFile已经存储在HDFS了(主要是比较HLog的logSeqNum是否比与其对应的表的HDFS StoreFile的maxLongSeqNum小),那么HLog就没有存在的必要了.移动到.oldlogs目录,最后删除.
反过来如果此时系统down了,可以通过HLog把数据从HDFS中读取,把要原来Put的数据读取出来, 重新刷新到HBase.
补充资料:
HBase 架构101 –预写日志系统 (WAL)
http://cloudera.iteye.com/blog/911700
HLog的结构和生命周期
http://www.spnguru.com/2011/03/hlog%e7%9a%84%e7%bb%93%e6%9e%84%e5%92%8c%e7%94%9f%e5%91%bd%e5%91%a8%e6%9c%9f/