FlinkCEP匹配事件提取

CEP模式示例详解
首先需要模式的检测
制定要查找的模式序列后,就可以将其应用于输入流检测潜在匹配
调用CEP.pattern(),给定输入流和模式,就能得到一个PatternStream

val input : DataStream[Event] = ...
val pattern : Pattern[Event, _] = ...
val patternStream : PatternStream[Event] = CEP.pattern(input,pattern)

匹配事件的提取

创建PatternStream之后,就可以应用select 或者 flatselect方法,从检测到的事件序列中提取事件了

select()方法需要一个select function作为参数,每个成功匹配的事件序列都会调用它
select()以一个Map[String,Iterable[IN]]来接收匹配到的事件序列,其中key就是每个模式的名称,而value就是所有接收到的事件的Iterable类型

def selectFn(pattern : Map[String,Iterable[IN]]): OUT = {
	val startEvent = pattern.get("start").get.next
	val endEvent = pattern.get("end").get.next
	OUT(startEvent,endEvent)
}

cep案例

LoginLog.csv数据格式

2133,50.16.19.13,success,1558430857
6745,66.249.73.185,success,1558430859
76456,110.136.166.128,success,1558430853
8345,46.105.14.53,success,1558430855
76456,110.136.166.128,success,1558430857
76456,110.136.166.128,success,1558430854
76456,110.136.166.128,fail,1558430859
76456,110.136.166.128,success,1558430861
3464,123.125.71.35,success,1558430860
76456,110.136.166.128,success,1558430865
65322,50.150.204.184,success,1558430866
23565,207.241.237.225,fail,1558430862

登陆异常检测

package com.atguigu.loginfail_detect

import java.util

import org.apache.flink.cep.PatternSelectFunction
import org.apache.flink.cep.scala.CEP
import org.apache.flink.cep.scala.pattern.Pattern
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time


object LoginFailWithCep {
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    env.setParallelism(1)

    // 1. 读取事件数据,创建简单事件流
    val resource = getClass.getResource("/LoginLog.csv")
    val loginEventStream = env.readTextFile(resource.getPath)
      .map( data => {
        val dataArray = data.split(",")
        LoginEvent( dataArray(0).trim.toLong, dataArray(1).trim, dataArray(2).trim, dataArray(3).trim.toLong )
      } )
      .assignTimestampsAndWatermarks( new BoundedOutOfOrdernessTimestampExtractor[LoginEvent](Time.seconds(5)) {
        override def extractTimestamp(element: LoginEvent): Long = element.eventTime * 1000L
      } )
      .keyBy(_.userId)

    // 2. 定义匹配模式
    val loginFailPattern = Pattern.begin[LoginEvent]("begin").where(_.eventType == "fail")
      .next("next").where(_.eventType == "fail")
      .within(Time.seconds(3))

    // 3. 在事件流上应用模式,得到一个pattern stream
    val patternStream = CEP.pattern(loginEventStream, loginFailPattern)

    // 4. 从pattern stream上应用select function,检出匹配事件序列
    val loginFailDataStream = patternStream.select( new LoginFailMatch() )

    loginFailDataStream.print()

    env.execute("login fail with cep job")
  }
}

class LoginFailMatch() extends PatternSelectFunction[LoginEvent, Warning]{
  override def select(map: util.Map[String, util.List[LoginEvent]]): Warning = {
    // 从map中按照名称取出对应的事件
//    val iter = map.get("begin").iterator()
    val firstFail = map.get("begin").iterator().next()
    val lastFail = map.get("next").iterator().next()
    Warning( firstFail.userId, firstFail.eventTime, lastFail.eventTime, "login fail!" )
  }
}

.within(Time.seconds(3))
当一个模式通过within关键字定义了检测窗口的时间时,部分时间序列可能因为超过窗口长度而被丢弃;为了能够处理这些超时的部分匹配,select 和 flatSelect Api调用允许指定超时处理程序

超时处理程序会就收到目前为止模式匹配到的所有事件,由一个OutPutTag(输出标记)定义接收到的超时事件序列

你可能感兴趣的:(Flink)