Major compaction时的scan操作
发起major compaction时,通过CompactSplitThread.CompactionRunner.run开始执行
-->region.compact(compaction, store)-->store.compact(compaction)-->
CompactionContext.compact,发起compact操作
CompactionContext的实例通过HStore中的storeEngine.createCompaction()生成,
默认值为DefaultStoreEngine,通过hbase.hstore.engine.class配置。
默认的CompactionContext实例为DefaultCompactionContext。
而DefaultCompactionContext.compact方法最终调用DefaultStoreEngine.compactor来执行
compactor的实现通过hbase.hstore.defaultengine.compactor.class配置,默认实现为DefaultCompactor
调用DefaultCompactor.compact(request);
1.根据要进行compact的storefile文件,生成对应的StoreFileScanner集合列表。
在生成StoreFileScanner实例时,每一个scanner中的ScanQueryMatcher为null
2.创建StoreScanner实例,设置ScanType为ScanType.COMPACT_DROP_DELETES。
privateStoreScanner(Storestore, ScanInfo scanInfo, Scan scan,
List<? extendsKeyValueScanner> scanners, ScanTypescanType, longsmallestReadPoint,
longearliestPutTs, byte[] dropDeletesFromRow, byte[] dropDeletesToRow) throws IOException {
this(store, false, scan, null, scanInfo.getTtl(),
scanInfo.getMinVersions());
if (dropDeletesFromRow == null) {
执行这里,传入的columns为null
matcher = newScanQueryMatcher(scan, scanInfo, null, scanType,
smallestReadPoint, earliestPutTs, oldestUnexpiredTS);
} else {
matcher = newScanQueryMatcher(scan, scanInfo, null, smallestReadPoint,
earliestPutTs, oldestUnexpiredTS, dropDeletesFromRow, dropDeletesToRow);
}
ScanqueryMatcher的构造方法:
传入的columns为null
publicScanQueryMatcher(Scan scan, ScanInfo scanInfo,
NavigableSet<byte[]> columns, ScanTypescanType,
longreadPointToUse, longearliestPutTs, longoldestUnexpiredTS) {
tr中mintime=0,maxtime=long.maxvalue
this.tr = scan.getTimeRange();
this.rowComparator = scanInfo.getComparator();
此deletes属性中的kv delete信息为到一个新的row时,都会重新进行清空。
this.deletes = newScanDeleteTracker();
this.stopRow = scan.getStopRow();
this.startKey = KeyValue.createFirstDeleteFamilyOnRow(scan.getStartRow(),
scanInfo.getFamily());
得到filter实例
this.filter = scan.getFilter();
this.earliestPutTs = earliestPutTs;
this.maxReadPointToTrackVersions = readPointToUse;
this.timeToPurgeDeletes = scanInfo.getTimeToPurgeDeletes();
此处为的值为false
/* how to deal with deletes */
this.isUserScan = scanType == ScanType.USER_SCAN;
此处的值为false,scanInfo.getKeepDeletedCells()的值默认false,
可通过table的columnfmaily中配置KEEP_DELETED_CELLS属性
scan.isRaw()可通过在scan中setAttribute的_raw_属性,默认为false
// keep deleted cells: if compaction or raw scan
this.keepDeletedCells = (scanInfo.getKeepDeletedCells() && !isUserScan) || scan.isRaw();
此处的值为false,此时是major的compact,不保留delete的数据
scan.isRaw()可通过在scan中setAttribute的_raw_属性,默认为false
// retain deletes: if minor compaction or raw scan
this.retainDeletesInOutput = scanType == ScanType.COMPACT_RETAIN_DELETES || scan.isRaw();
此时的值为false
// seePastDeleteMarker: user initiated scans
this.seePastDeleteMarkers = scanInfo.getKeepDeletedCells() && isUserScan;
得到查询的最大版本数,此时的scan.maxversion与scanInfo.maxversion的值是相同的值
intmaxVersions =
scan.isRaw() ? scan.getMaxVersions() : Math.min(scan.getMaxVersions(),
scanInfo.getMaxVersions());
生成columns属性的值为ScanWildcardColumnTracker实例,设置hasNullColumn的值为true
// Single branch to deal with two types of reads (columns vs all in family)
if (columns == null || columns.size() == 0) {
// there is always a null column in the wildcard column query.
hasNullColumn = true;
columns属性中的index表示当前比对到的column的下标值,每比较一行时,此值会重新清空
// use a specialized scan for wildcard column tracker.
this.columns = newScanWildcardColumnTracker(
scanInfo.getMinVersions(), maxVersions, oldestUnexpiredTS);
} else {
这个部分在compact时是不会执行的
// whether there is null column in the explicit column query
hasNullColumn = (columns.first().length == 0);
// We can share the ExplicitColumnTracker, diff is we reset
// between rows, not between storefiles.
this.columns = newExplicitColumnTracker(columns,
scanInfo.getMinVersions(), maxVersions, oldestUnexpiredTS);
}
}
ScanQueryMatcher.match过滤kv是否包含的方法分析
在通过StoreScanner.next(kvlist,limit)得到下一行的kv集合时
调用ScanQueryMatcher.match来过滤数据的方法分析
其中match方法返回的值具体作用可参见StoreScanner中的如下方法:
public boolean next(List<Cell> outResult, int limit).....
public MatchCode match(KeyValue kv) throws IOException {
调用filter的filterAllRemaining方法,如果此方法返回true表示此次scan结束
if (filter != null && filter.filterAllRemaining()) {
returnMatchCode.DONE_SCAN;
}
得到kv的值
byte [] bytes = kv.getBuffer();
KV在bytes中的开始位置
intoffset = kv.getOffset();
得到key的长度
keyvalue的组成:
4 |
4 |
2 |
~ |
1 |
~ |
~ |
8 |
1 |
~ |
kenlen |
vlen |
rowlen |
row |
cflen |
cf |
column |
timestamp |
kvtype |
value |
intkeyLength = Bytes.toInt(bytes, offset, Bytes.SIZEOF_INT);
得到rowkey的长度记录的开始位置(不包含keylen与vlen)
offset += KeyValue.ROW_OFFSET;
rowkey的长度记录的开始位置
intinitialOffset = offset;
得到rowkey的长度
shortrowLength = Bytes.toShort(bytes, offset, Bytes.SIZEOF_SHORT);
得到rowkey的开始位置
offset += Bytes.SIZEOF_SHORT;
比较当前传入的kv的rowkey部分是否与当前行中第一个kv的rowkey部分相同。也就是是否是同一行的数据
intret = this.rowComparator.compareRows(row, this.rowOffset, this.rowLength,
bytes, offset, rowLength);
如果当前传入的kv中的rowkey大于当前行的kv的rowkey部分,表示现在传入的kv是下一行,
结束当前next操作,(不是结束scan,是结束当次的next,表示这个next的一行数据的所有kv都查找完了)
if (ret <= -1) {
returnMatchCode.DONE;
否则表示当前传入的kv是上一行的数据,需要把当前的scanner向下移动一行
} elseif (ret >= 1) {
// could optimize this, if necessary?
// Could also be called SEEK_TO_CURRENT_ROW, but this
// should be rare/never happens.
returnMatchCode.SEEK_NEXT_ROW;
}
优化配置,是否需要不执行下面流程,直接把当前的scanner向下移动一行
stickyNextRow的值为true的条件:
1.ColumnTracker.done返回为true,
2.ColumnTracker.checkColumn返回为SEEK_NEXT_ROW.
3.filter.filterKeyValue(kv);返回结果为NEXT_ROW。
4.ColumnTracker.checkVersions返回为INCLUDE_AND_SEEK_NEXT_ROW。
ColumnTracker的实现在scan的columns为null或者是compact scan时为ScanWildcardColumnTracker。
否则为ExplicitColumnTracker。
// optimize case.
if (this.stickyNextRow)
returnMatchCode.SEEK_NEXT_ROW;
在ScanWildcardColumnTracker实例中返回值为false,
在ExplicitColumnTracker实例中返回值是当前的kv是否大于或等于查找的column列表的总和
if (this.columns.done()) {
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
}
得到familylen的记录位置
//Passing rowLength
offset += rowLength;
得到family的长度
//Skipping family
bytefamilyLength = bytes [offset];
把位置移动到family的名称记录的位置
offset += familyLength + 1;
得到column的长度
intqualLength = keyLength -
(offset - initialOffset) - KeyValue.TIMESTAMP_TYPE_SIZE;
得到kv中timestamp的值
longtimestamp = Bytes.toLong(bytes, initialOffset + keyLength – KeyValue.TIMESTAMP_TYPE_SIZE);
检查timestamp是否在指定的范围内,主要检查ttl是否过期
// check for early out based on timestamp alone
if (columns.isDone(timestamp)) {
如果发现kv的ttl过期,在ScanWildcardColumnTracker实例中直接返回SEEK_NEXT_COL。这个在compact中是默认
在ExplicitColumnTracker实例中检查是否有下一个column如果有返回SEEK_NEXT_COL。否则返回SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
/*
* The delete logic is pretty complicated now.
* This is corroborated by the following:
* 1. The store might be instructed to keep deleted rows around.
* 2. A scan can optionally see past a delete marker now.
* 3. If deleted rows are kept, we have to find out when we can
* remove the delete markers.
* 4. Family delete markers are always first (regardless of their TS)
* 5. Delete markers should not be counted as version
* 6. Delete markers affect puts of the *same* TS
* 7. Delete marker need to be version counted together with puts
* they affect
*/
得到kv的类型。
bytetype = bytes[initialOffset + keyLength – 1];
如果kv是删除的kv
if (kv.isDelete()) {
在默认情况下,此keepDeletedCells值为false,这里的if检查会进去
if (!keepDeletedCells) {
// first ignore delete markers if the scanner can do so, and the
// range does not include the marker
//
// during flushes and compactions also ignore delete markers newer
// than the readpoint of any open scanner, this prevents deleted
// rows that could still be seen by a scanner from being collected
此时的值为true,scan中的tr默认为alltime=true
booleanincludeDeleteMarker = seePastDeleteMarkers ?
tr.withinTimeRange(timestamp) :
tr.withinOrAfterTimeRange(timestamp);
把删除的kv添加到DeleteTracker中。compact时的实现为ScanDeleteTracker。
if (includeDeleteMarker
&& kv.getMvccVersion() <= maxReadPointToTrackVersions) {
this.deletes.add(bytes, offset, qualLength, timestamp, type);
}
// Can't early out now, because DelFam come before any other keys
}
如果非minor compact时,
或者在compact的scan时,同时当前时间减去kv的timestamp还不到hbase.hstore.time.to.purge.deletes配置的时间,
默认的配置值为0,
或者kv的mvcc值大于现在最大的mvcc值时。此if会进行。目前在做major compact的scan,不进去
if (retainDeletesInOutput
|| (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes)
|| kv.getMvccVersion() > maxReadPointToTrackVersions) {
// always include or it is not time yet to check whether it is OK
// to purge deltes or not
if (!isUserScan) {
// if this is not a user scan (compaction), we can filter this deletemarker right here
// otherwise (i.e. a "raw" scan) we fall through to normal version and timerange checking
returnMatchCode.INCLUDE;
}
以下的检查通常情况不会进入
} elseif (keepDeletedCells) {
if (timestamp < earliestPutTs) {
// keeping delete rows, but there are no puts older than
// this delete in the store files.
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
// else: fall through and do version counting on the
// delete markers
如果kv是已经delete的kv,添加到DeleteTracker后,直接返回SKIP.
} else {
returnMatchCode.SKIP;
}
// note the following next else if...
// delete marker are not subject to other delete markers
} elseif (!this.deletes.isEmpty()) {
如果不是删除的KV时,检查删除的kv中是否包含此kv的版本。
a.如果KV是DeleteFamily。同时当前的KV的TIMESTAMP的值小于删除的KV的TIMESTAMP的值,返回FAMILY_DELETED。
b.如果KV是DeleteFamilyVersion已经删除掉的版本(删除时指定了timestamp)。返回FAMILY_VERSION_DELETED。
c.如果KV的是DeleteColumn,同时deleteTracker中包含的kv中部分qualifier的值
与传入的kv中部分qualifier的值相同。同时delete中包含的kv是DeleteColumn返回COLUMN_DELETED。
否则deleteTracker中包含的kv中部分qualifier的值与传入的kv中部分qualifier的值相同。
同时传入的kv中的timestamp的值是delete中的timestamp值,表示删除指定的版本,返回VERSION_DELETED。
d.否则表示没有删除的情况,返回NOT_DELETED。
DeleteResultdeleteResult = deletes.isDeleted(bytes, offset, qualLength,
timestamp);
switch (deleteResult) {
caseFAMILY_DELETED:
caseCOLUMN_DELETED:
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseVERSION_DELETED:
caseFAMILY_VERSION_DELETED:
returnMatchCode.SKIP;
caseNOT_DELETED:
break;
default:
thrownewRuntimeException("UNEXPECTED");
}
}
检查当前传入的kv的timestamp是否在包含的时间范围内,默认的scan是所有时间都包含
inttimestampComparison = tr.compare(timestamp);
如果当前kv的时间超过了最大的时间,返回SKIP。
if (timestampComparison >= 1) {
returnMatchCode.SKIP;
} elseif (timestampComparison <= -1) {
如果当前kv的时间小于了最小的时间,返回SEEK_NEXT_COL或者SEEK_NEXT_ROW。
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
}
如果时间在正常的范围内,columns.checkColumn如果是compact时的scan 此方法返回INCLUDE。
其它情况请参见ExplicitColumnTracker的实现。
// STEP 1: Check if the column is part of the requested columns
MatchCodecolChecker = columns.checkColumn(bytes, offset, qualLength, type);
此处的IF检查会进入
if (colChecker == MatchCode.INCLUDE) {
执行filter操作,并根据filter的响应返回相关的值,此处不在说明,比较容易看明白。
ReturnCodefilterResponse = ReturnCode.SKIP;
// STEP 2: Yes, the column is part of the requested columns. Check if filter is present
if (filter != null) {
// STEP 3: Filter the key value and return if it filters out
filterResponse = filter.filterKeyValue(kv);
switch (filterResponse) {
caseSKIP:
returnMatchCode.SKIP;
caseNEXT_COL:
returncolumns.getNextRowOrNextColumn(bytes, offset, qualLength);
caseNEXT_ROW:
stickyNextRow = true;
returnMatchCode.SEEK_NEXT_ROW;
caseSEEK_NEXT_USING_HINT:
returnMatchCode.SEEK_NEXT_USING_HINT;
default:
//It means it is either include or include and seek next
break;
}
}
/*
* STEP 4: Reaching this step means the column is part of the requested columns and either
* the filter is null or the filter has returned INCLUDE or INCLUDE_AND_NEXT_COL response.
* Now check the number of versions needed. This method call returns SKIP, INCLUDE,
* INCLUDE_AND_SEEK_NEXT_ROW, INCLUDE_AND_SEEK_NEXT_COL.
*
* FilterResponse ColumnChecker Desired behavior
* INCLUDE SKIP row has already been included, SKIP.
* INCLUDE INCLUDE INCLUDE
* INCLUDE INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
* INCLUDE_AND_SEEK_NEXT_COL SKIP row has already been included, SKIP.
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_COL
* INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_ROW INCLUDE_AND_SEEK_NEXT_ROW
*
* In all the above scenarios, we return the column checker return value except for
* FilterResponse (INCLUDE_AND_SEEK_NEXT_COL) and ColumnChecker(INCLUDE)
*/
colChecker =
columns.checkVersions(bytes, offset, qualLength, timestamp, type,
kv.getMvccVersion() > maxReadPointToTrackVersions);
//Optimize with stickyNextRow
stickyNextRow = colChecker == MatchCode.INCLUDE_AND_SEEK_NEXT_ROW ? true : stickyNextRow;
return (filterResponse == ReturnCode.INCLUDE_AND_NEXT_COL &&
colChecker == MatchCode.INCLUDE) ? MatchCode.INCLUDE_AND_SEEK_NEXT_COL
: colChecker;
}
stickyNextRow = (colChecker == MatchCode.SEEK_NEXT_ROW) ? true
: stickyNextRow;
returncolChecker;
}
major与minor的compact写入新storefile时的区别
如果是major的compact的写入,会在close掉writer时,
在meta中写入major==true的值MAJOR_COMPACTION_KEY=true。
此值主要用来控制做minor的compact时是否选择这个storefile文件。
if (writer != null) {
writer.appendMetadata(fd.maxSeqId, request.isMajor());
writer.close();
newFiles.add(writer.getPath());
}