FlinkCEP(Complex event processing for Flink)
是在Flink实现的复杂事件处理
库. 它可以让你在无界流中检测出特定的数据,有机会掌握数据中重要的那部分。
是一种基于动态环境中事件流的分析技术,事件在这里通常是有意义的状态变化,通过分析事件间的关系,利用过滤、关联、聚合
等技术,根据事件间的时序关系和聚合关系制定检测规则,持续地从事件流中查询出符合要求的事件序列,最终分析得到更复杂的复合事件。
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-cep_${scala.binary.version}artifactId>
<version>${flink.version}version>
dependency>
import com.atguigu.flink.java.chapter_5.WaterSensor;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.time.Duration;
import java.util.List;
import java.util.Map;
public class Flink01_CEP_BasicUse {
public static void main(String[] args) {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
SingleOutputStreamOperator<WaterSensor> waterSensorStream = env
.readTextFile("input/sensor.txt")
.map(new MapFunction<String, WaterSensor>() {
@Override
public WaterSensor map(String value) throws Exception {
String[] split = value.split(",");
return new WaterSensor(split[0],
Long.parseLong(split[1]) * 1000,
Integer.parseInt(split[2]));
}
})
.assignTimestampsAndWatermarks(WatermarkStrategy
.<WaterSensor>forBoundedOutOfOrderness(Duration.ofSeconds(5))
.withTimestampAssigner((element, recordTimestamp) -> element.getTs()));
// 1. 定义模式
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
});
// 2. 在流上应用模式
PatternStream<WaterSensor> waterSensorPS = CEP.pattern(waterSensorStream, pattern);
// 3. 获取匹配到的结果
waterSensorPS
.select(new PatternSelectFunction<WaterSensor, String>() {
@Override
public String select(Map<String, List<WaterSensor>> pattern) throws Exception {
return pattern.toString();
}
})
.print();
try {
env.execute();
} catch (Exception e) {
e.printStackTrace();
}
}
}
sensor.txt数据:
sensor_1,1,10
sensor_1,2,20
sensor_2,3,30
sensor_1,4,40
sensor_2,5,50
sensor_1,6,60
模式API可以让你定义想从输入流中抽取的复杂模式序列
。
几个概念:
单个模式可以是单例模式或者循环模式.
单例模式只接受一个事件
. 默认情况模式都是单例模式.
前面的例子就是一个单例模式
循环模式可以接受多个事件
.
单例模式配合上量词就是循环模式.(非常类似正则表达式)
// 1. 定义模式
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
});
// 1.1 使用量词 出现两次
Pattern<WaterSensor, WaterSensor> patternWithQuantifier = pattern.times(2);
// 1.1 使用量词 [2,4] 2次,3次或4次
Pattern<WaterSensor, WaterSensor> patternWithQuantifier = pattern.times(2, 4);
Pattern<WaterSensor, WaterSensor> patternWithQuantifier = pattern.oneOrMore();
// 2次或2次一样
Pattern<WaterSensor, WaterSensor> patternWithQuantifier = pattern.timesOrMore(2);
对每个模式你可以指定一个条件来决定一个进来的事件是否被接受进入这个模式,例如前面用到的where就是一种条件
最普遍的条件类型
。使用它可以指定一个基于前面已经被接受的事件的属性或者它们的一个子集的统计数据来决定是否接受时间序列的条件。Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new IterativeCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value, Context<WaterSensor> ctx) throws Exception {
return "sensor_1".equals(value.getId());
}
});
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
System.out.println(value);
return "sensor_1".equals(value.getId());
}
});
如果想使用OR来组合条件,你可以像下面这样使用or()方法。
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new IterativeCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value, Context<WaterSensor> ctx) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return value.getVc() > 30;
}
})
.or(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return value.getTs() > 3000;
}
});
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new IterativeCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value, Context<WaterSensor> ctx) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.timesOrMore(2)
.until(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return value.getVc() >= 40;
}
});
把多个单个模式组合在一起就是组合模式. 组合模式由一个初始化模式
(.begin(…))开头
期望所有匹配的事件严格的一个接一个出现,中间没有任何不匹配的事件
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.next("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
});
注意:
notNext 如果不想后面直接连着一个特定事件
忽略
匹配的事件之间
的不匹配的事件。
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.followedBy("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
});
注意:
notFollowBy 如果不想一个特定事件发生在两个事件之间的任何地方。(notFollowBy不能位于事件的最后
)
更进一步的松散连续,允许忽略掉一些匹配事件的附加匹配
当且仅当数据为a,c,b,b时,对于followedBy模式而言命中的为{a,b},对于followedByAny而言会有两次命中{a,b},{a,b}
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.followedByAny("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
});
前面的连续性也可以运用在单个循环模式中
. 连续性会被运用在被接受进入模式的事件之间。
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.times(2);
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.times(2)
.consecutive();
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.times(2)
.allowCombinations();
在组合模式情况下, 对次数的处理尽快能获取最多个的那个次数, 就是贪婪
!当一个事件同时满足两个模式
的时候起作用.
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
}).times(2, 3).greedy()
.next("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return value.getVc() == 30;
}
});
数据:
sensor_1,1,10
sensor_1,2,20
sensor_1,3,30
sensor_2,4,30
sensor_1,4,400
sensor_2,5,50
sensor_2,6,60
结果:
{start=[WaterSensor(id=sensor_1, ts=1000, vc=10), WaterSensor(id=sensor_1, ts=2000, vc=20), WaterSensor(id=sensor_1, ts=3000, vc=30)], end=[WaterSensor(id=sensor_2, ts=4000, vc=30)]}
{start=[WaterSensor(id=sensor_1, ts=2000, vc=20), WaterSensor(id=sensor_1, ts=3000, vc=30)], end=[WaterSensor(id=sensor_2, ts=4000, vc=30)]}
分析:
sensor_1,3,30
在匹配的的时候, 既能匹配第一个模式也可以匹配的第二个模式, 由于第一个模式使用量词则使用greedy的时候会优先匹配第一个模式, 因为要尽可能多的次数
注意:
贪婪比非贪婪的结果要少
!可以使用pattern.optional()方法让所有的模式变成可选的,不管是否是循环模式
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
}).times(2).optional() // 0次或2次
.next("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
});
说明:
start模式可能会没有!
在前面的代码中次数只能用在某个模式上
, 比如: .begin(…).where(…).next(…).where(…).times(2) 这里的次数只会用在next这个模式上, 而不会用在begin模式上.
如果需要用在多个模式上,可以使用模式组
!
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.begin(Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.next("next")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
}))
.times(2);
结果: sensor_1,sensor_2, sensor_1, sensor_2
当一个模式上通过within加上窗口长度后,部分匹配的事件序列就可能因为超过窗口长度而被丢弃。
Pattern<WaterSensor, WaterSensor> pattern = Pattern
.<WaterSensor>begin("start")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_1".equals(value.getId());
}
})
.next("end")
.where(new SimpleCondition<WaterSensor>() {
@Override
public boolean filter(WaterSensor value) throws Exception {
return "sensor_2".equals(value.getId());
}
})
.within(Time.seconds(2));
对于一个给定的模式,同一个事件可能会分配到多个成功的匹配上。为了控制一个事件会分配到多少个匹配上,你需要指定跳过策略AfterMatchSkipStrategy。 有五种跳过策略,如下:
AfterMatchSkipStrategy skipStrategy = ...
Pattern.begin("patternName", skipStrategy);