一、CEP
一个或多个由简单事件构成的事件流通过一定的规则匹配,然后输出用户想得到的数据,满足规则的复杂事件。
CEP支持在流上进行模式匹配,根据模式的条件不同,分为连续的条件或不连续的条件;模式的条件允许有时间的限制,当在条件范围内没有达到满足的条件时,会导致模式匹配超时。
CEP就相当于在流上对event进行模式匹配。比如 连续两条登录失败日志不超过2秒,则进行错误预警。
二、CEP使用流程
2.1先获取流
case class LoginEvent(userId: String, ip: String, eventType: String, eventTime: String)
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
val loginEventStream = env.fromCollection(List(
LoginEvent("1", "192.168.0.1", "fail", "1558430842"),
LoginEvent("1", "192.168.0.2", "fail", "1558430843"),
LoginEvent("1", "192.168.0.3", "fail", "1558430844"),
LoginEvent("2", "192.168.10.10", "success", "1558430845")
)).assignAscendingTimestamps(_.eventTime.toLong)
2.2 定义Pattern
val loginFailPattern = Pattern.begin[LoginEvent]("begin")
.where(_.eventType.equals("fail”))//一条登录失败
.next("next")
.where(_.eventType.equals("fail”))//下一条登录event也失败
.within(Time.seconds(2)//两条的间隔不超过两秒
2.3 执行Pattern
PatternStream:
val input = ...
val pattern = ...
val patternStream = CEP.pattern(input, pattern)//一个输入流+匹配的Pattern
val patternStream = CEP.pattern(loginEventStream.keyBy(_.userId), loginFailPattern)
2.4 通过select或flatSelect获取符合条件的流
select :
val loginFailDataStream = patternStream
.select((pattern: Map[String, Iterable[LoginEvent]]) => {
val first = pattern.getOrElse("begin", null).iterator.next()
val second = pattern.getOrElse("next", null).iterator.next()
(second.userId, second.ip, second.eventType)
})
其返回值仅为1条记录。
flatSelect:通过实现PatternFlatSelectFunction,实现与select相似的功能。唯一的区别就是flatSelect方法可以返回多条记录。
1.8版本中,都过时,使用ProcessFunction 获取。
三、案例
3.1 同用户2秒内连续两次登录失败
object LoginFailWithCep {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
// 自定义测试数据
val loginStream = env.fromCollection( List(
LoginEvent(1, "192.168.0.1", "fail", 1558430842),
LoginEvent(1, "192.168.0.2", "success", 1558430843),
LoginEvent(1, "192.168.0.3", "fail", 1558430844),
LoginEvent(1, "192.168.0.3", "fail", 1558430847),
LoginEvent(1, "192.168.0.3", "fail", 1558430848),
LoginEvent(2, "192.168.10.10", "success", 1558430850)
) )
.assignAscendingTimestamps(_.eventTime * 1000)
// 定义pattern,对事件流进行模式匹配
val loginFailPattern = Pattern.begin[LoginEvent]("begin")
.where(_.eventType == "fail")
.next("next")
.where(_.eventType == "fail")
.within(Time.seconds(2))
// 在输入流的基础上应用pattern,得到匹配的pattern stream
val patternStream = CEP.pattern( loginStream.keyBy(_.userId), loginFailPattern )
// 用select方法从pattern stream中提取输出数据流
// import scala.collection.Map
// val loginFailDataStream : DataStream[Warning] = patternStream.select( ( patternEvents: Map[String, Iterable[LoginEvent]] ) => {
// // 从Map里取出对应的登录失败事件,然后包装成warning
// val firstFailEvent = patternEvents.getOrElse("begin", null).iterator.next()
// val secondFailEvent = patternEvents.getOrElse("next", null).iterator.next()
// Warning( firstFailEvent.userId, firstFailEvent.eventTime, secondFailEvent.eventTime, "login fail waring" )
// } )
val loginFailDataStream = patternStream.select( new MySelectFuction() )
// 将得到的警告信息流输出sink
loginFailDataStream.print("warning")
env.execute("Login Fail Detect with CEP")
}
}
class MySelectFuction() extends PatternSelectFunction[LoginEvent, Warning] {
override def select(patternEvents: util.Map[String, util.List[LoginEvent]]): Warning = {
val firstFailEvent = patternEvents.getOrDefault("begin", null).iterator.next()
val secondFailEvent = patternEvents.getOrDefault("next", null).iterator.next()
Warning(firstFailEvent.userId, firstFailEvent.eventTime, secondFailEvent.eventTime, "login fail waring")
}
}
3.2 用户下单后15分钟没有支付
// 输入订单事件数据流
case class OrderEvent( orderId: Long, eventType: String, eventTime: Long )
// 输出订单处理结果数据流
case class OrderResult( orderId: Long, resultMsg: String )
object OrderTimeout {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
// 读入订单数据
val orderEventStream = env.fromCollection(List(
OrderEvent(1, "create", 1558430842),
OrderEvent(2, "create", 1558430843),
OrderEvent(2, "other", 1558430845),
OrderEvent(2, "pay", 1558430850),
OrderEvent(1, "pay", 1558431920)
))
.assignAscendingTimestamps(_.eventTime * 1000)
// 定义一个带时间限制的pattern,选出先创建订单、之后又支付的事件流
val orderPayPattern = Pattern.begin[OrderEvent]("begin")
.where(_.eventType == "create")
.followedBy("follow") //宽松连续,中间可以发生其他事情
.where(_.eventType == "pay")
.within(Time.minutes(15))
// 定义一个输出标签,用来标明侧输出流
val orderTimeoutOutputTag = OutputTag[OrderResult]("orderTimeout")
// 将pattern作用到input stream上,得到一个pattern stream
val patternStream = CEP.pattern( orderEventStream.keyBy(_.orderId), orderPayPattern )
import scala.collection.Map
// 调用select得到最后的复合输出流
val complexResult: DataStream[OrderResult] = patternStream.select(orderTimeoutOutputTag)(
// pattern timeout function
( orderPayEvents: Map[String, Iterable[OrderEvent]], timestamp: Long ) => {
val timeoutOrderId = orderPayEvents.getOrElse("begin", null).iterator.next().orderId
OrderResult( timeoutOrderId, "order time out" )
}
)(
// pattern select function
( orderPayEvents: Map[String, Iterable[OrderEvent]]) => {
val payedOrderId = orderPayEvents.getOrElse("follow", null).iterator.next().orderId
OrderResult( payedOrderId, "order payed successfully" )
}
)
// 已正常支付的数据流
complexResult.print("payed")
// 从复合输出流里拿到侧输出流
val timeoutResult = complexResult.getSideOutput( orderTimeoutOutputTag )
timeoutResult.print("timeout")
env.execute("Order Timeout Detect")
}
}