前情提要:上一篇刚把FlinkCEP中个体模式的定义和AfterMatchSkipStrategy讲完,这一期要仔细研究一下ConsumingStrategy、量词与组合模式的关系
(转载请注明原作者地址:https://blog.csdn.net/xiaozoom?type=blog)
(昨天才发的第一篇就被人复制粘贴了,心情复杂)
说到量词,就必须得研究一下个体特征是如何匹配的。比如在文字正则表达式里:
a表示必须出现一个字符a且为小写,而事件应该如何表示呢?
首先,我们可以设置最简单的条件,比如event.price = 100,这表示一个条件,在实务中,很可能会出现多个条件,比如 价格 = 100 and 个数 = 10 or 价格 = 0.
继承了SimpleCondition类,实现filter方法即可:
//官方给的SimpleCondition案例
start.where(new SimpleCondition() {
@Override
public boolean filter(Event value) {
return value.getName().startsWith("foo");
}
});
所谓复杂,是指需要用到该个体特征已经匹配到的元素或者使用基于这些元素计算出的一些统计指标。方法为继承IterativeCondition类,实现filter方法时允许我们获取上下文:
* private class MyCondition extends IterativeCondition {
*
* @Override
* public boolean filter(Event value, Context ctx) throws Exception {
* if (!value.getName().equals("middle")) {
* return false;
* }
*
* double sum = 0.0;
* for (Event e: ctx.getEventsForPattern("middle")) {
* sum += e.getPrice();
* }
* sum += value.getPrice();
* return Double.compare(sum, 5.0) <= 0;
* }
* }
使用方法和SimpleCondition相同:
pattern.where(new IterativeCondition() {
@Override
public boolean filter(Event value, Context ctx) throws Exception {
return ... // some condition
}
});
如何实现一个特殊的一元条件(限定Subtype)
pattern.subtype(SubEvent.class);
@Override
public boolean filter(T value) throws Exception {
return subtype.isAssignableFrom(value.getClass());
}
isAssignableFrom,已经追述到底层实现代码了,这是一个在java中用于判断subtype是否与该类相同或者是超类(父类)。
Pattern类中涉及到Condition的有四种组合方法,在官网文档上也都有对应:
分别是: where, or, until 和 subtype
/**
* Adds a condition that has to be satisfied by an event in order to be considered a match. If
* another condition has already been set, the new one is going to be combined with the previous
* with a logical {@code AND}. In other case, this is going to be the only condition.
*
* @param condition The condition as an {@link IterativeCondition}.
* @return The pattern with the new condition is set.
*/
public Pattern where(IterativeCondition condition) {
Preconditions.checkNotNull(condition, "The condition cannot be null.");
ClosureCleaner.clean(condition, ExecutionConfig.ClosureCleanerLevel.RECURSIVE, true);
if (this.condition == null) {
this.condition = condition;
} else {
this.condition = new RichAndCondition<>(this.condition, condition);
}
return this;
}
/**
* Adds a condition that has to be satisfied by an event in order to be considered a match. If
* another condition has already been set, the new one is going to be combined with the previous
* with a logical {@code OR}. In other case, this is going to be the only condition.
*
* @param condition The condition as an {@link IterativeCondition}.
* @return The pattern with the new condition is set.
*/
public Pattern or(IterativeCondition condition) {
Preconditions.checkNotNull(condition, "The condition cannot be null.");
ClosureCleaner.clean(condition, ExecutionConfig.ClosureCleanerLevel.RECURSIVE, true);
if (this.condition == null) {
this.condition = condition;
} else {
this.condition = new RichOrCondition<>(this.condition, condition);
}
return this;
}
/**
* Applies a subtype constraint on the current pattern. This means that an event has to be of
* the given subtype in order to be matched.
*
* @param subtypeClass Class of the subtype
* @param Type of the subtype
* @return The same pattern with the new subtype constraint
*/
public Pattern subtype(final Class subtypeClass) {
Preconditions.checkNotNull(subtypeClass, "The class cannot be null.");
if (condition == null) {
this.condition = new SubtypeCondition(subtypeClass);
} else {
this.condition =
new RichAndCondition<>(condition, new SubtypeCondition(subtypeClass));
}
@SuppressWarnings("unchecked")
Pattern result = (Pattern) this;
return result;
}
/**
* Applies a stop condition for a looping state. It allows cleaning the underlying state.
*
* @param untilCondition a condition an event has to satisfy to stop collecting events into
* looping state
* @return The same pattern with applied untilCondition
*/
public Pattern until(IterativeCondition untilCondition) {
Preconditions.checkNotNull(untilCondition, "The condition cannot be null");
if (this.untilCondition != null) {
throw new MalformedPatternException("Only one until condition can be applied.");
}
if (!quantifier.hasProperty(Quantifier.QuantifierProperty.LOOPING)) {
throw new MalformedPatternException(
"The until condition is only applicable to looping states.");
}
ClosureCleaner.clean(untilCondition, ExecutionConfig.ClosureCleanerLevel.RECURSIVE, true);
this.untilCondition = untilCondition;
return this;
}
在量词的类中给出了一段定义:
A quantifier describing the Pattern. There are three main groups of {@link Quantifier}
single
looping
times
为什么和这有关呢?我们可以看到where,or,subtype都只是对RichAndCondition和RichOrCondition的包装,只有until代码画风完全不同,它使用到了LOOPING这个概念。
RichAndCondition, RichOrCondition,RichNotCondition的区别在于对继承的IterativeCondition的filter方法的不同实现,代表了基本的逻辑关系,与、或、非。虽然非没有用到,估计是作者写嗨了?
//RichAndCondition
@Override
public boolean filter(T value, Context ctx) throws Exception {
return getLeft().filter(value, ctx) && getRight().filter(value, ctx);
}
//RichOrCondition
@Override
public boolean filter(T value, Context ctx) throws Exception {
return getLeft().filter(value, ctx) || getRight().filter(value, ctx);
}
//RichNotCondition
@Override
public boolean filter(T value, Context ctx) throws Exception {
return !getNestedConditions()[0].filter(value, ctx);
}
而之所以与和或两种条件可以有Left和Right也是因为他们继承的RichCompositeIterativeCondition类的构造器允许一次增加多个
@SafeVarargs
public RichCompositeIterativeCondition(final IterativeCondition... nestedConditions) {
for (IterativeCondition condition : nestedConditions) {
Preconditions.checkNotNull(condition, "The condition cannot be null.");
}
this.nestedConditions = nestedConditions;
}
那Until呢?根据Until的源代码,必须对具体Looping特性才生效,存在Looping特性的方法:
public Pattern oneOrMore(@Nullable Time windowTime) {
checkIfNoNotPattern();
checkIfQuantifierApplied();
this.quantifier = Quantifier.looping(quantifier.getConsumingStrategy());
this.times = Times.of(1, windowTime);
return this;
}
public Pattern timesOrMore(int times, @Nullable Time windowTime) {
checkIfNoNotPattern();
checkIfQuantifierApplied();
this.quantifier = Quantifier.looping(quantifier.getConsumingStrategy());
this.times = Times.of(times, windowTime);
return this;
}
必须在调用了oneOrMore和timesOrMore中才生效.
整理一下:
FlinkCEP个体模式的量词分类.
FlinkCEP中的个体模式的基本条件分类:
FlinkCEP种个体模式的基本条件也可以组合,组合方法包括:
个体特征的几种组合方式
到此为止,我们一共学习了:
到现在为止,我们终于可以在Flink在实现类似于 "a"的简单匹配模式了!
那么如何实现 a+, a?,a until .... 的一次匹配多个事件?
下面我们要学习如何把个体模式进一步组合起来,组合为 “模式组”(中文翻译找不到更好的词了)。我们只有能够完整的定义出 a+ b1 b2 c 这样的完整正则,才能提出什么是完整匹配和部分匹配,以及在应对完全匹配和多个部分匹配时如何设置剪枝策略(AfterMatchSkipStrategy)。
先看连接方式,一共存在四种(除了begin之外,根据松紧度,分为三个级别):
即,如果模式组定义为 a b, 数据流为 a c b1 b2
(b1 b2为两个不同的B类型变量)
如果是含Looping的模式,其实也是一样的,官方的举例为:
模式组: a b+ c
数据流:"a", "b1", "d1", "b2", "d2", "b3" "c"
{a b1 c}
, {a b1 b2 c}
, {a b1 b2 b3 c} |
{a b2 c}
, {a b2 b3 c} |
{a b3 c}
=> {a b1 c}
, {a b1 b2 c}
, {a b1 b3 c}
, {a b1 b2 b3 c}
, {a b2 c}
, {a b2 b3 c}
, {a b3 c}
为了便于理解,我对官方的文档顺序做出了调整,官方是按照事件流的顺序给出的。
{a b1 c}
, {a b1 b2 c}
, {a b1 b2 b3 c} |
{a b2 c}
, {a b2 b3 c} |
{a b3 c}
followedByAny => {a b1 c}
, {a b2 c}
, {a b3 c} |
{a b1 b2 c}
, {a b1 b3 c}
, {a b2 b3 c} | {a b1 b2 b3 c} =>
就是b1,b2,b3的任意个数组合。为了明明是STRICT,却允许出现 a b3 c 呢?
那为什么followedBy中会允许出现 a b2 c 和 a b3 c呢?为什么不从b1开始?
这个其实没什么神奇的,是因为官方文档小字写了。它的意思是followedByAny
begin(a).followedByAny(b).oneOrMore().consecutive.followedBy(c)
即虽然LoopPattern B本身是consecutive,但是a和b+之间是followedByAny关系。
我们去看一眼源码,证明我不是胡说:
NFACompiler类在调用ConsumingStrategy的时候有这么一个判断:
//org.apache.flink.cep.nfa.compiler.NFACompiler
private IterativeCondition getInnerIgnoreCondition(Pattern pattern) {
Quantifier.ConsumingStrategy consumingStrategy =
pattern.getQuantifier().getInnerConsumingStrategy();
if (headOfGroup(pattern)) {
// for the head pattern of a group pattern, we should consider the
// inner consume strategy of the group pattern
consumingStrategy = currentGroupPattern.getQuantifier().getInnerConsumingStrategy();
}
IterativeCondition innerIgnoreCondition = null;
switch (consumingStrategy) {
case STRICT:
innerIgnoreCondition = null;
break;
case SKIP_TILL_NEXT:
innerIgnoreCondition =
new RichNotCondition<>((IterativeCondition) pattern.getCondition());
break;
case SKIP_TILL_ANY:
innerIgnoreCondition = BooleanConditions.trueFunction();
break;
}
if (currentGroupPattern != null && currentGroupPattern.getUntilCondition() != null) {
innerIgnoreCondition =
extendWithUntilCondition(
innerIgnoreCondition,
(IterativeCondition) currentGroupPattern.getUntilCondition(),
false);
}
return innerIgnoreCondition;
}
/**
* @return The {@link IterativeCondition condition} for the {@code IGNORE} edge that
* corresponds to the specified {@link Pattern} and extended with stop(until) condition
* if necessary. For more on strategy see {@link Quantifier}
*/
@SuppressWarnings("unchecked")
private IterativeCondition getIgnoreCondition(Pattern pattern) {
Quantifier.ConsumingStrategy consumingStrategy =
pattern.getQuantifier().getConsumingStrategy();
if (headOfGroup(pattern)) {
// for the head pattern of a group pattern, we should consider the inner consume
// strategy
// of the group pattern if the group pattern is not the head of the TIMES/LOOPING
// quantifier;
// otherwise, we should consider the consume strategy of the group pattern
if (isCurrentGroupPatternFirstOfLoop()) {
consumingStrategy = currentGroupPattern.getQuantifier().getConsumingStrategy();
} else {
consumingStrategy =
currentGroupPattern.getQuantifier().getInnerConsumingStrategy();
}
}
IterativeCondition ignoreCondition = null;
switch (consumingStrategy) {
case STRICT:
ignoreCondition = null;
break;
case SKIP_TILL_NEXT:
ignoreCondition =
new RichNotCondition<>((IterativeCondition) pattern.getCondition());
break;
case SKIP_TILL_ANY:
ignoreCondition = BooleanConditions.trueFunction();
break;
}
if (currentGroupPattern != null && currentGroupPattern.getUntilCondition() != null) {
ignoreCondition =
extendWithUntilCondition(
ignoreCondition,
(IterativeCondition) currentGroupPattern.getUntilCondition(),
false);
}
return ignoreCondition;
}
当进入LoopPattern的第一个事件的时候(触发),会获取InnerConsumingStrategy。而它是这么定义的:
//org.apache.flink.cep.pattern.Quantifier
...
private ConsumingStrategy innerConsumingStrategy = ConsumingStrategy.SKIP_TILL_NEXT;
...
public void combinations() {
checkPattern(
!hasProperty(QuantifierProperty.SINGLE),
"Combinations not applicable to " + this + "!");
checkPattern(
innerConsumingStrategy != ConsumingStrategy.STRICT,
"You can apply either combinations or consecutive, not both!");
checkPattern(
innerConsumingStrategy != ConsumingStrategy.SKIP_TILL_ANY,
"Combinations already applied!");
innerConsumingStrategy = ConsumingStrategy.SKIP_TILL_ANY;
}
public void consecutive() {
checkPattern(
hasProperty(QuantifierProperty.LOOPING) || hasProperty(QuantifierProperty.TIMES),
"Consecutive not applicable to " + this + "!");
checkPattern(
innerConsumingStrategy != ConsumingStrategy.SKIP_TILL_ANY,
"You can apply either combinations or consecutive, not both!");
checkPattern(
innerConsumingStrategy != ConsumingStrategy.STRICT, "Consecutive already applied!");
innerConsumingStrategy = ConsumingStrategy.STRICT;
}
即,对于LoopPattern而言:
这一段也解释了三种级别的差别:
那,总不能无限等待吧,能不能设一个时间呢?Pattern类有一个within的方法:
public Pattern within(Time windowTime) {
return within(windowTime, WithinType.FIRST_AND_LAST);
}
/**
* Defines the maximum time interval in which a matching pattern has to be completed in order to
* be considered valid. This interval corresponds to the maximum time gap between events.
*
* @param withinType Type of the within interval between events
* @param windowTime Time of the matching window
* @return The same pattern operator with the new window length
*/
public Pattern within(Time windowTime, WithinType withinType) {
if (windowTime != null) {
windowTimes.put(withinType, windowTime);
}
return this;
}
官网默认的within(Time.seconds(10)) 对应的就是WithinType.FIRST_AND_LAST
package org.apache.flink.cep.pattern;
/** Type enum of time interval corresponds to the maximum time gap between events. */
public enum WithinType {
// Interval corresponds to the maximum time gap between the previous and current event.
PREVIOUS_AND_CURRENT,
// Interval corresponds to the maximum time gap between the first and last event.
FIRST_AND_LAST;
}
代码里的注释给的也比较清楚了:
严谨起见,确认Pattern(个体模式)和模式组在within上没有区别:
public class GroupPattern extends Pattern {
/** Group pattern representing the pattern definition of this group. */
private final Pattern groupPattern;
GroupPattern(
final Pattern previous,
final Pattern groupPattern,
final Quantifier.ConsumingStrategy consumingStrategy,
final AfterMatchSkipStrategy afterMatchSkipStrategy) {
super("GroupPattern", previous, consumingStrategy, afterMatchSkipStrategy);
this.groupPattern = groupPattern;
}
@Override
public Pattern where(IterativeCondition condition) {
throw new UnsupportedOperationException("GroupPattern does not support where clause.");
}
@Override
public Pattern or(IterativeCondition condition) {
throw new UnsupportedOperationException("GroupPattern does not support or clause.");
}
@Override
public Pattern subtype(final Class subtypeClass) {
throw new UnsupportedOperationException("GroupPattern does not support subtype clause.");
}
public Pattern getRawPattern() {
return groupPattern;
}
}
很明显,GroupPattern仅仅是一个包装类。
还有另一个问题,AfterMatchSkipStrategy在适用性上对个体模式和模式组有没有区别呢?
可以仔细的阅读Pattern类的代码,结论如下:AfterMatchSkipStrategy只能在定义初始事件时设定,类型为final,一旦设定就没有别的更改方法,用任何方式把个体模式链接为模式组,afterMatchSkipStrategy也只会无限同化。
private final AfterMatchSkipStrategy afterMatchSkipStrategy;
protected Pattern(
final String name,
final Pattern previous,
final ConsumingStrategy consumingStrategy,
final AfterMatchSkipStrategy afterMatchSkipStrategy) {
this.name = name;
this.previous = previous;
this.quantifier = Quantifier.one(consumingStrategy);
this.afterMatchSkipStrategy = afterMatchSkipStrategy;
}
...
public static Pattern begin(
final String name, final AfterMatchSkipStrategy afterMatchSkipStrategy) {
return new Pattern(name, null, ConsumingStrategy.STRICT, afterMatchSkipStrategy);
}
...
public Pattern next(final String name) {
return new Pattern<>(name, this, ConsumingStrategy.STRICT, afterMatchSkipStrategy);
}
/**
* Appends a new pattern to the existing one. The new pattern enforces that there is no event
* matching this pattern right after the preceding matched event.
*
* @param name Name of the new pattern
* @return A new pattern which is appended to this one
*/
public Pattern notNext(final String name) {
if (quantifier.hasProperty(Quantifier.QuantifierProperty.OPTIONAL)) {
throw new UnsupportedOperationException(
"Specifying a pattern with an optional path to NOT condition is not supported yet. "
+ "You can simulate such pattern with two independent patterns, one with and the other without "
+ "the optional part.");
}
return new Pattern<>(name, this, ConsumingStrategy.NOT_NEXT, afterMatchSkipStrategy);
}
...
public Pattern followedBy(final String name) {
return new Pattern<>(name, this, ConsumingStrategy.SKIP_TILL_NEXT, afterMatchSkipStrategy);
}
/**
* Appends a new pattern to the existing one. The new pattern enforces that there is no event
* matching this pattern between the preceding pattern and succeeding this one.
*
* NOTE: There has to be other pattern after this one.
*
* @param name Name of the new pattern
* @return A new pattern which is appended to this one
*/
public Pattern notFollowedBy(final String name) {
if (quantifier.hasProperty(Quantifier.QuantifierProperty.OPTIONAL)) {
throw new UnsupportedOperationException(
"Specifying a pattern with an optional path to NOT condition is not supported yet. "
+ "You can simulate such pattern with two independent patterns, one with and the other without "
+ "the optional part.");
}
return new Pattern<>(name, this, ConsumingStrategy.NOT_FOLLOW, afterMatchSkipStrategy);
}
...
public Pattern notFollowedBy(final String name) {
if (quantifier.hasProperty(Quantifier.QuantifierProperty.OPTIONAL)) {
throw new UnsupportedOperationException(
"Specifying a pattern with an optional path to NOT condition is not supported yet. "
+ "You can simulate such pattern with two independent patterns, one with and the other without "
+ "the optional part.");
}
return new Pattern<>(name, this, ConsumingStrategy.NOT_FOLLOW, afterMatchSkipStrategy);
}
/**
* Appends a new pattern to the existing one. The new pattern enforces non-strict temporal
* contiguity. This means that a matching event of this pattern and the preceding matching event
* might be interleaved with other events which are ignored.
*
* @param name Name of the new pattern
* @return A new pattern which is appended to this one
*/
public Pattern followedByAny(final String name) {
return new Pattern<>(name, this, ConsumingStrategy.SKIP_TILL_ANY, afterMatchSkipStrategy);
}
public Pattern notNext(final String name) {
if (quantifier.hasProperty(Quantifier.QuantifierProperty.OPTIONAL)) {
throw new UnsupportedOperationException(
"Specifying a pattern with an optional path to NOT condition is not supported yet. "
+ "You can simulate such pattern with two independent patterns, one with and the other without "
+ "the optional part.");
}
return new Pattern<>(name, this, ConsumingStrategy.NOT_NEXT, afterMatchSkipStrategy);
}
public Pattern notFollowedBy(final String name) {
if (quantifier.hasProperty(Quantifier.QuantifierProperty.OPTIONAL)) {
throw new UnsupportedOperationException(
"Specifying a pattern with an optional path to NOT condition is not supported yet. "
+ "You can simulate such pattern with two independent patterns, one with and the other without "
+ "the optional part.");
}
return new Pattern<>(name, this, ConsumingStrategy.NOT_FOLLOW, afterMatchSkipStrategy);
}
不出意料,没有notFollowedByAny和 notUntil,并且都注明了,Not链接不能跟在optional后面
到底为止,个体模式和模式组的定义、链接、配置策略都研究完毕,下一期笔者要更加深入的研究,flinkCEP是如何执行模式匹配的,以及更加深入的技术细节。
(求收藏)