blink中minibatch源码剖析

一.minibatch作用与功能
相关源码：
MiniBatchGroupAggFunction.scala
KeyedBundleOperator.java
MiniBatchAssignerOperator.java
minibatch作用主要是：

1.将数据攒成一个batch进行处理，batch超过一定时间或者超过一定个数就开始处理

2.从功能上对于需要访问state的计算来说，将以前来一条数据访问state改为来一batch数据访问一次state

优点：

1.牺牲latency换来搞吞吐

缺点：

1.现有实现方式通过watermark告诉下游开始计算，一定程度上与watermark强关联在一起，在依赖event time进行数据处理的场景下不适用，最佳方式还是采用类似barrier的方式进行处理插入特殊的信号来控制batch
二.源码分析
blink会根据配置使用MiniBatchAssignerOperator和 KeyedBundleOperator两个op供minibatch使用：其中MiniBatchAssignerOperator根据定义的batch大小和时间往流里面插入信号，提示下游一个batch已经到了，KeyedBundleOperator用来处理batch的数据

2.1 MiniBatchAssignerOperator
这个op什么事情都没有做只是用来处理watermark

// 接受上游来的元素看看是否攒了一定时间，达到攒batch的时间了开始发送watermark，所以现在的waternark已经不是正常的waternark了

@Override
public void processElement(StreamRecord element) throws Exception {
long now = getProcessingTimeService().getCurrentProcessingTime();
long currentBatch = now - now % intervalMs;
if (currentBatch > currentWatermark) {
currentWatermark = currentBatch;
// emit
output.emitWatermark(new Watermark(currentBatch));
}
output.collect(element);
}

//如果注册有根据processTime的定时器，定时器生效时调用这个函数
@Override
public void onProcessingTime(long timestamp) throws Exception {
long now = getProcessingTimeService().getCurrentProcessingTime();
long currentBatch = now - now % intervalMs;
if (currentBatch > currentWatermark) {
currentWatermark = currentBatch;
// emit
output.emitWatermark(new Watermark(currentBatch));
}
getProcessingTimeService().registerTimer(currentBatch + intervalMs, this);
}

//收到watermark时的处理，这里由于我们将watermark变为batch的信号了，所以我们不往下发送watermark直到input结束
/**

Override the base implementation to completely ignore watermarks propagated from
upstream (we rely only on the {@link AssignerWithPeriodicWatermarks} to emit
watermarks from here).
*/
@Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark);
}
}
2.1 KeyedBundleOperator
在这个op中有几个比较重要的成员变量：

// 用来统计收到了几个元素，如果超过batch size则开始出路

private final BundleTrigger bundleTrigger;
// 用来做batch处理的函数
private final BundleFunction function;
// 用来暂存元素的cache
private transient Map buffer;
// 为保证exactly语义用来持久化cache的state
/** The state to store buffer to make it exactly once. */
private transient KeyedValueState bufferState;

在open函数中初始化时，初始化了一下CountBundleTrigger主要用来根据统计的元素个数决定是否需要处理，具体代码参加CountBundleTrigger

bundleTrigger.registerBundleTriggerCallback(this);
//reset trigger
bundleTrigger.reset();
下面我们来看具体处理：

// 来一条数据，判断是否正在处于处理过程中，如果是等待，否则将数据放入buffer中，加入一个同样key的list中，将数据去交给bundleTrigger判断，如果可以处理则调用finishBundle函数处理，处理过程中不能做cp，保持excatly语义
@Override
public void processElement(StreamRecord element) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
K key = (K) getCurrentKey();
V value = buffer.get(key); // maybe null
V newValue = function.addInput(value, element.getValue());
buffer.put(key, newValue);
numOfElements++;
bundleTrigger.onElement(element.getValue());
}
// 如果来了一条watermark也开始batch处理
@Override

public void processWatermark(Watermark mark) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
// bundle operator only used in unbounded group by which not need to handle watermark
finishBundle();
super.processWatermark(mark);
}
// 调用function进行处理，处理完成之后清空buffer
@Override
public void finishBundle() throws Exception {
assert(Thread.holdsLock(checkpointingLock));
while (isInFinishingBundle) {
checkpointingLock.wait();
}
isInFinishingBundle = true;
if (!buffer.isEmpty()) {
numOfElements = 0;
function.finishBundle(buffer, collector);
buffer.clear();
}
// reset trigger
bundleTrigger.reset();
checkpointingLock.notifyAll();
isInFinishingBundle = false;
}
// 为保证exactly语义每次cp来了开始保存buffer中的数据
@Override
public void snapshotState(StateSnapshotContext context) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
super.snapshotState(context);

if (!finishBundleBeforeSnapshot) {
// clear state first
bufferState.removeAll();

// update state
bufferState.putAll(buffer);
}
}
2.2 BundleFunction
下面分析一个具体的BundleFunction实现MiniBatchGlobalGroupAggFunction，注释非常详尽，来一批数据后，这一batch数据中根据key取出存放的中间结果的state，然后迭代计算发送结果出去，将中间结果缓存下来，下次我们分下一下flink state 读写具体流程

override def finishBundle(
buffer: JMap[BaseRow, BaseRow],
out: Collector[BaseRow]): Unit = {

// batch get to cache
val accMap = accState.getAll(buffer.keySet())

val iter = buffer.entrySet().iterator()
while (iter.hasNext) {
val entry = iter.next()
val currentAcc = entry.getValue
val currentKey = entry.getKey
// set current key to make dataview know current key
ctx.setCurrentKey(currentKey)

var firstRow = false

// get acc from cache
var acc = accMap.get(currentKey)
if (acc == null) {
acc = globalAgg.createAccumulators()
firstRow = true
}
// set accumulator first
globalAgg.setAccumulators(acc)
// get previous aggregate result
prevAggValue = globalAgg.getValue

// merge currentAcc to acc

globalAgg.merge(currentAcc)
// get current aggregate result
newAggValue = globalAgg.getValue
// get new accumulator
acc = globalAgg.getAccumulators

if (!inputCounter.countIsZero(acc)) {
  // we aggregated at least one record for this key

// update new acc to the cache, batch put to state later
entry.setValue(acc)

  // if this was not the first row and we have to emit retractions

if (!firstRow) {
if (!equaliser.equalsWithoutHeader(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateRetraction) {
out.collect(prevResultRow(currentKey))
}
out.collect(newResultRow(currentKey))
} else {
// new row is same with prev row, no need to output
}
} else {
// this is the first, output new result
out.collect(newResultRow(currentKey))
}

} else {
  // we retracted the last record for this key

// sent out a delete message
if (!firstRow) {
out.collect(prevResultRow(currentKey))
}
// and clear all state, remove directly not removeAll because removeAll calls remove
accState.remove(currentKey)
// remove the entry, in order to avoid batch put to state later
iter.remove()
// cleanup dataview under current key
globalAgg.cleanup()
}
}

// batch put to state
if (!buffer.isEmpty) {
accState.putAll(buffer)
}
}

blink中minibatch源码剖析

你可能感兴趣的:(blink中minibatch源码剖析)