blink中minibatch源码剖析

一.minibatch作用与功能
相关源码:
MiniBatchGroupAggFunction.scala
KeyedBundleOperator.java
MiniBatchAssignerOperator.java
minibatch作用主要是:

1.将数据攒成一个batch进行处理,batch超过一定时间或者超过一定个数就开始处理

2.从功能上对于需要访问state的计算来说,将以前来一条数据访问state改为来一batch数据访问一次state

优点:

1.牺牲latency换来搞吞吐

缺点:

1.现有实现方式通过watermark告诉下游开始计算,一定程度上与watermark强关联在一起,在依赖event time进行数据处理的场景下不适用,最佳方式还是采用类似barrier的方式进行处理插入特殊的信号来控制batch
二.源码分析
blink会根据配置使用MiniBatchAssignerOperator和 KeyedBundleOperator两个op供minibatch使用:其中MiniBatchAssignerOperator根据定义的batch大小和时间往流里面插入信号,提示下游一个batch已经到了,KeyedBundleOperator用来处理batch的数据

2.1 MiniBatchAssignerOperator
这个op什么事情都没有做只是用来处理watermark

// 接受上游来的元素看看是否攒了一定时间,达到攒batch的时间了开始发送watermark,所以现在的waternark已经不是正常的waternark了

@Override
public void processElement(StreamRecord element) throws Exception {
long now = getProcessingTimeService().getCurrentProcessingTime();
long currentBatch = now - now % intervalMs;
if (currentBatch > currentWatermark) {
currentWatermark = currentBatch;
// emit
output.emitWatermark(new Watermark(currentBatch));
}
output.collect(element);
}

//如果注册有根据processTime的定时器,定时器生效时调用这个函数
@Override
public void onProcessingTime(long timestamp) throws Exception {
long now = getProcessingTimeService().getCurrentProcessingTime();
long currentBatch = now - now % intervalMs;
if (currentBatch > currentWatermark) {
currentWatermark = currentBatch;
// emit
output.emitWatermark(new Watermark(currentBatch));
}
getProcessingTimeService().registerTimer(currentBatch + intervalMs, this);
}

//收到watermark时的处理,这里由于我们将watermark变为batch的信号了,所以我们不往下发送watermark直到input结束
/**

  • Override the base implementation to completely ignore watermarks propagated from
  • upstream (we rely only on the {@link AssignerWithPeriodicWatermarks} to emit
  • watermarks from here).
    */
    @Override
    public void processWatermark(Watermark mark) throws Exception {
    // if we receive a Long.MAX_VALUE watermark we forward it since it is used
    // to signal the end of input and to not block watermark progress downstream
    if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
    currentWatermark = Long.MAX_VALUE;
    output.emitWatermark(mark);
    }
    }
    2.1 KeyedBundleOperator
    在这个op中有几个比较重要的成员变量:

// 用来统计收到了几个元素,如果超过batch size则开始出路

private final BundleTrigger bundleTrigger;
// 用来做batch处理的函数
private final BundleFunction function;
// 用来暂存元素的cache
private transient Map buffer;
// 为保证exactly语义用来持久化cache的state
/** The state to store buffer to make it exactly once. */
private transient KeyedValueState bufferState;

在open函数中初始化时,初始化了一下CountBundleTrigger主要用来根据统计的元素个数决定是否需要处理,具体代码参加CountBundleTrigger

bundleTrigger.registerBundleTriggerCallback(this);
//reset trigger
bundleTrigger.reset();
下面我们来看具体处理:

// 来一条数据,判断是否正在处于处理过程中,如果是等待,否则将数据放入buffer中,加入一个同样key的list中,将数据去交给bundleTrigger判断,如果可以处理则调用finishBundle函数处理,处理过程中不能做cp,保持excatly语义
@Override
public void processElement(StreamRecord element) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
K key = (K) getCurrentKey();
V value = buffer.get(key); // maybe null
V newValue = function.addInput(value, element.getValue());
buffer.put(key, newValue);
numOfElements++;
bundleTrigger.onElement(element.getValue());
}
// 如果来了一条watermark也开始batch处理
@Override

public void processWatermark(Watermark mark) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
// bundle operator only used in unbounded group by which not need to handle watermark
finishBundle();
super.processWatermark(mark);
}
// 调用function进行处理,处理完成之后清空buffer
@Override
public void finishBundle() throws Exception {
assert(Thread.holdsLock(checkpointingLock));
while (isInFinishingBundle) {
checkpointingLock.wait();
}
isInFinishingBundle = true;
if (!buffer.isEmpty()) {
numOfElements = 0;
function.finishBundle(buffer, collector);
buffer.clear();
}
// reset trigger
bundleTrigger.reset();
checkpointingLock.notifyAll();
isInFinishingBundle = false;
}
// 为保证exactly语义每次cp来了开始保存buffer中的数据
@Override
public void snapshotState(StateSnapshotContext context) throws Exception {
while (isInFinishingBundle) {
checkpointingLock.wait();
}
super.snapshotState(context);

if (!finishBundleBeforeSnapshot) {
// clear state first
bufferState.removeAll();

// update state
bufferState.putAll(buffer);
}
}
2.2 BundleFunction
下面分析一个具体的BundleFunction实现MiniBatchGlobalGroupAggFunction,注释非常详尽,来一批数据后,这一batch数据中根据key取出存放的中间结果的state,然后迭代计算发送结果出去,将中间结果缓存下来,下次我们分下一下flink state 读写具体流程

override def finishBundle(
buffer: JMap[BaseRow, BaseRow],
out: Collector[BaseRow]): Unit = {

// batch get to cache
val accMap = accState.getAll(buffer.keySet())

val iter = buffer.entrySet().iterator()
while (iter.hasNext) {
val entry = iter.next()
val currentAcc = entry.getValue
val currentKey = entry.getKey
// set current key to make dataview know current key
ctx.setCurrentKey(currentKey)

var firstRow = false

// get acc from cache
var acc = accMap.get(currentKey)
if (acc == null) {
acc = globalAgg.createAccumulators()
firstRow = true
}
// set accumulator first
globalAgg.setAccumulators(acc)
// get previous aggregate result
prevAggValue = globalAgg.getValue

// merge currentAcc to acc

globalAgg.merge(currentAcc)
// get current aggregate result
newAggValue = globalAgg.getValue
// get new accumulator
acc = globalAgg.getAccumulators

if (!inputCounter.countIsZero(acc)) {
  // we aggregated at least one record for this key

// update new acc to the cache, batch put to state later
entry.setValue(acc)

  // if this was not the first row and we have to emit retractions

if (!firstRow) {
if (!equaliser.equalsWithoutHeader(prevAggValue, newAggValue)) {
// new row is not same with prev row
if (generateRetraction) {
out.collect(prevResultRow(currentKey))
}
out.collect(newResultRow(currentKey))
} else {
// new row is same with prev row, no need to output
}
} else {
// this is the first, output new result
out.collect(newResultRow(currentKey))
}

} else {
  // we retracted the last record for this key

// sent out a delete message
if (!firstRow) {
out.collect(prevResultRow(currentKey))
}
// and clear all state, remove directly not removeAll because removeAll calls remove
accState.remove(currentKey)
// remove the entry, in order to avoid batch put to state later
iter.remove()
// cleanup dataview under current key
globalAgg.cleanup()
}
}

// batch put to state
if (!buffer.isEmpty) {
accState.putAll(buffer)
}
}

你可能感兴趣的:(blink中minibatch源码剖析)