CEP模式示例详解
首先需要模式的检测
制定要查找的模式序列后,就可以将其应用于输入流检测潜在匹配
调用CEP.pattern(),给定输入流和模式,就能得到一个PatternStream
val input : DataStream[Event] = ...
val pattern : Pattern[Event, _] = ...
val patternStream : PatternStream[Event] = CEP.pattern(input,pattern)
创建PatternStream之后,就可以应用select 或者 flatselect方法,从检测到的事件序列中提取事件了
select()方法需要一个select function作为参数,每个成功匹配的事件序列都会调用它
select()以一个Map[String,Iterable[IN]]来接收匹配到的事件序列,其中key就是每个模式的名称,而value就是所有接收到的事件的Iterable类型
def selectFn(pattern : Map[String,Iterable[IN]]): OUT = {
val startEvent = pattern.get("start").get.next
val endEvent = pattern.get("end").get.next
OUT(startEvent,endEvent)
}
LoginLog.csv数据格式
2133,50.16.19.13,success,1558430857
6745,66.249.73.185,success,1558430859
76456,110.136.166.128,success,1558430853
8345,46.105.14.53,success,1558430855
76456,110.136.166.128,success,1558430857
76456,110.136.166.128,success,1558430854
76456,110.136.166.128,fail,1558430859
76456,110.136.166.128,success,1558430861
3464,123.125.71.35,success,1558430860
76456,110.136.166.128,success,1558430865
65322,50.150.204.184,success,1558430866
23565,207.241.237.225,fail,1558430862
…
登陆异常检测
package com.atguigu.loginfail_detect
import java.util
import org.apache.flink.cep.PatternSelectFunction
import org.apache.flink.cep.scala.CEP
import org.apache.flink.cep.scala.pattern.Pattern
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
object LoginFailWithCep {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
// 1. 读取事件数据,创建简单事件流
val resource = getClass.getResource("/LoginLog.csv")
val loginEventStream = env.readTextFile(resource.getPath)
.map( data => {
val dataArray = data.split(",")
LoginEvent( dataArray(0).trim.toLong, dataArray(1).trim, dataArray(2).trim, dataArray(3).trim.toLong )
} )
.assignTimestampsAndWatermarks( new BoundedOutOfOrdernessTimestampExtractor[LoginEvent](Time.seconds(5)) {
override def extractTimestamp(element: LoginEvent): Long = element.eventTime * 1000L
} )
.keyBy(_.userId)
// 2. 定义匹配模式
val loginFailPattern = Pattern.begin[LoginEvent]("begin").where(_.eventType == "fail")
.next("next").where(_.eventType == "fail")
.within(Time.seconds(3))
// 3. 在事件流上应用模式,得到一个pattern stream
val patternStream = CEP.pattern(loginEventStream, loginFailPattern)
// 4. 从pattern stream上应用select function,检出匹配事件序列
val loginFailDataStream = patternStream.select( new LoginFailMatch() )
loginFailDataStream.print()
env.execute("login fail with cep job")
}
}
class LoginFailMatch() extends PatternSelectFunction[LoginEvent, Warning]{
override def select(map: util.Map[String, util.List[LoginEvent]]): Warning = {
// 从map中按照名称取出对应的事件
// val iter = map.get("begin").iterator()
val firstFail = map.get("begin").iterator().next()
val lastFail = map.get("next").iterator().next()
Warning( firstFail.userId, firstFail.eventTime, lastFail.eventTime, "login fail!" )
}
}
.within(Time.seconds(3))
当一个模式通过within关键字定义了检测窗口的时间时,部分时间序列可能因为超过窗口长度而被丢弃;为了能够处理这些超时的部分匹配,select 和 flatSelect Api调用允许指定超时处理程序
超时处理程序会就收到目前为止模式匹配到的所有事件,由一个OutPutTag(输出标记)定义接收到的超时事件序列