HBase MVCC(Multi Version Consistencey Control)
mvcc多版本并发控制,是相对锁来说对并发处理的一种方法,
在HBase中,当writernumber > reade number
表明这个memstore在写,如此时读需要等待。
1. MVCC初始化
在HRegion 的initializeRegionInternals方法中,初始化
Return the largest memstoreTS found across all storefiles in the given list. Store files that were created by a mapreduce bulk load are ignored,
long maxStoreMemstoreTS = store.getMaxMemstoreTS();
if (maxStoreMemstoreTS > maxMemstoreTS) {
maxMemstoreTS = maxStoreMemstoreTS;
}
---
mvcc.initialize(maxMemstoreTS + 1);
2. 例如 HRegion中internalFlushcache方法
首先.
w = mvcc.beginMemstoreInsert();
主要是为了设置nextWriteNumber并生成WriteEntry的对象e并加入writeQueue(LinkList)队尾。
简单的说就是通过MVCC表明当前的memstore已经开始写了,并且写的位置是nextWriteNumber
public WriteEntry beginMemstoreInsert() {
synchronized (writeQueue) {
long nextWriteNumber = ++memstoreWrite;
WriteEntry e = new WriteEntry(nextWriteNumber);
writeQueue.add(e);
return e;
}
}
mvcc.advanceMemstore(w);
主要是有个while循环从writeQueue队头中取出WriteEntry的对象一个个判断
如果nextWriteNumber>0, if (nextReadValue+1 != queueFirst.getWriteNumber()),抛异常 。
如果WriteEntry的对象已经完成,更新nextReadValue并从writeQueue中删除当前对象,否则break;
跳出while后更新memstoreRead并通知readWaiters.notifyAll().
简单的说这个方法主要是为了更新memstoreRead,也就是可以读的位置, 并通知readWaiters.notifyAll()。
boolean advanceMemstore(WriteEntry e) {
synchronized (writeQueue) {
e.markCompleted();
long nextReadValue = -1;
boolean ranOnce=false;
while (!writeQueue.isEmpty()) {
ranOnce=true;
WriteEntry queueFirst = writeQueue.getFirst();
if (nextReadValue > 0) {
if (nextReadValue+1 != queueFirst.getWriteNumber()) {
throw new RuntimeException("invariant in completeMemstoreInsert violated, prev: "
+ nextReadValue + " next: " + queueFirst.getWriteNumber());
}
}
if (queueFirst.isCompleted()) {
nextReadValue = queueFirst.getWriteNumber();
writeQueue.removeFirst();
} else {
break;
}
}
if (!ranOnce) {
throw new RuntimeException("never was a first");
}
if (nextReadValue > 0) {
synchronized (readWaiters) {
memstoreRead = nextReadValue;
readWaiters.notifyAll();
}
}
if (memstoreRead >= e.getWriteNumber()) {
return true;
}
return false;
}
}
3. 例如 HRegion中internalFlushcache方法中调用
mvcc.waitForRead(w);
这个方法就是wait直到memstore可以读,那么memstore什么时候可以读呢?
memstoreRead >= e.getWriteNumber()时才可以读。
public void waitForRead(WriteEntry e) {
boolean interrupted = false;
synchronized (readWaiters) {
while (memstoreRead < e.getWriteNumber()) {
try {
readWaiters.wait(0);
} catch (InterruptedException ie) {
// We were interrupted... finish the loop -- i.e. cleanup --and then
// on our way out, reset the interrupt flag.
interrupted = true;
}
}
}
if (interrupted) Thread.currentThread().interrupt();
}
那么在internalFlushcache中,调用waitForRead主要作用是为了在flush之前等待还在处理中的事务commit到Hlog中,并阻止未提交的事务写到HFile中。
之后就进行flush.
同样在HRegion的doMiniBatchMutation方法中有类似的mvcc应用,通过mvcc实现写完成的数据能被及时读到。
// ------------------------------------
// Acquire the latest mvcc number
// ----------------------------------
w = mvcc.beginMemstoreInsert();
// ------------------------------------------------------------------
// STEP 8. Advance mvcc. This will make this put visible to scanners and getters.
// ------------------------------------------------------------------
if (w != null) {
mvcc.completeMemstoreInsert(w);
w = null;
}