Flink的CEP使用流程+使用案例

一、CEP

 

    一个或多个由简单事件构成的事件流通过一定的规则匹配,然后输出用户想得到的数据,满足规则的复杂事件。

CEP支持在流上进行模式匹配,根据模式的条件不同,分为连续的条件或不连续的条件;模式的条件允许有时间的限制,当在条件范围内没有达到满足的条件时,会导致模式匹配超时。

Flink的CEP使用流程+使用案例_第1张图片

CEP就相当于在流上对event进行模式匹配。比如 连续两条登录失败日志不超过2秒,则进行错误预警。

 

二、CEP使用流程

 

    2.1先获取流

case class LoginEvent(userId: String, ip: String, eventType: String, eventTime: String)

 

val env = StreamExecutionEnvironment.getExecutionEnvironment

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

env.setParallelism(1)

 

val loginEventStream = env.fromCollection(List(

  LoginEvent("1", "192.168.0.1", "fail", "1558430842"),

  LoginEvent("1", "192.168.0.2", "fail", "1558430843"),

  LoginEvent("1", "192.168.0.3", "fail", "1558430844"),

  LoginEvent("2", "192.168.10.10", "success", "1558430845")

)).assignAscendingTimestamps(_.eventTime.toLong)

 

    2.2    定义Pattern

val loginFailPattern = Pattern.begin[LoginEvent]("begin")

  .where(_.eventType.equals("fail”))//一条登录失败

  .next("next")

  .where(_.eventType.equals("fail”))//下一条登录event也失败

  .within(Time.seconds(2)//两条的间隔不超过两秒

 

    2.3    执行Pattern

PatternStream:

val input = ...

val pattern = ...

 

val patternStream = CEP.pattern(input, pattern)//一个输入流+匹配的Pattern

val patternStream = CEP.pattern(loginEventStream.keyBy(_.userId), loginFailPattern)

 

    2.4    通过select或flatSelect获取符合条件的流

 

    select :

val loginFailDataStream = patternStream

  .select((pattern: Map[String, Iterable[LoginEvent]]) => {

    val first = pattern.getOrElse("begin", null).iterator.next()

    val second = pattern.getOrElse("next", null).iterator.next()

 

    (second.userId, second.ip, second.eventType)

  })

其返回值仅为1条记录。

    flatSelect:通过实现PatternFlatSelectFunction,实现与select相似的功能。唯一的区别就是flatSelect方法可以返回多条记录。

    1.8版本中,都过时,使用ProcessFunction 获取。

 

三、案例

    

3.1  同用户2秒内连续两次登录失败

Flink的CEP使用流程+使用案例_第2张图片

 

object LoginFailWithCep {

  def main(args: Array[String]): Unit = {

    val env = StreamExecutionEnvironment.getExecutionEnvironment

    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    env.setParallelism(1)

 

    // 自定义测试数据

    val loginStream = env.fromCollection( List(

      LoginEvent(1, "192.168.0.1", "fail", 1558430842),

      LoginEvent(1, "192.168.0.2", "success", 1558430843),

      LoginEvent(1, "192.168.0.3", "fail", 1558430844),

      LoginEvent(1, "192.168.0.3", "fail", 1558430847),

      LoginEvent(1, "192.168.0.3", "fail", 1558430848),

      LoginEvent(2, "192.168.10.10", "success", 1558430850)

    ) )

      .assignAscendingTimestamps(_.eventTime * 1000)

 

    // 定义pattern,对事件流进行模式匹配

    val loginFailPattern = Pattern.begin[LoginEvent]("begin")

      .where(_.eventType == "fail")

      .next("next")

      .where(_.eventType == "fail")

      .within(Time.seconds(2))

 

    // 在输入流的基础上应用pattern,得到匹配的pattern stream

    val patternStream = CEP.pattern( loginStream.keyBy(_.userId), loginFailPattern )

 

    // 用select方法从pattern stream中提取输出数据流

 

//    import scala.collection.Map

//    val loginFailDataStream : DataStream[Warning] = patternStream.select( ( patternEvents: Map[String, Iterable[LoginEvent]] ) => {

//      // 从Map里取出对应的登录失败事件,然后包装成warning

//      val firstFailEvent = patternEvents.getOrElse("begin", null).iterator.next()

//      val secondFailEvent = patternEvents.getOrElse("next", null).iterator.next()

//      Warning( firstFailEvent.userId, firstFailEvent.eventTime, secondFailEvent.eventTime, "login fail waring" )

//    } )

    val loginFailDataStream = patternStream.select( new MySelectFuction() )

 

    // 将得到的警告信息流输出sink

    loginFailDataStream.print("warning")

 

    env.execute("Login Fail Detect with CEP")

  }

}

 

class MySelectFuction() extends PatternSelectFunction[LoginEvent, Warning] {

  override def select(patternEvents: util.Map[String, util.List[LoginEvent]]): Warning = {

    val firstFailEvent = patternEvents.getOrDefault("begin", null).iterator.next()

    val secondFailEvent = patternEvents.getOrDefault("next", null).iterator.next()

    Warning(firstFailEvent.userId, firstFailEvent.eventTime, secondFailEvent.eventTime, "login fail waring")

  }

}

 

3.2    用户下单后15分钟没有支付

Flink的CEP使用流程+使用案例_第3张图片

// 输入订单事件数据流

case class OrderEvent( orderId: Long, eventType: String, eventTime: Long )

 

// 输出订单处理结果数据流

case class OrderResult( orderId: Long, resultMsg: String )

 

object OrderTimeout {

  def main(args: Array[String]): Unit = {

    val env = StreamExecutionEnvironment.getExecutionEnvironment

    env.setParallelism(1)

    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

 

    // 读入订单数据

    val orderEventStream = env.fromCollection(List(

      OrderEvent(1, "create", 1558430842),

      OrderEvent(2, "create", 1558430843),

      OrderEvent(2, "other", 1558430845),

      OrderEvent(2, "pay", 1558430850),

      OrderEvent(1, "pay", 1558431920)

    ))

      .assignAscendingTimestamps(_.eventTime * 1000)

 

    // 定义一个带时间限制的pattern,选出先创建订单、之后又支付的事件流

    val orderPayPattern = Pattern.begin[OrderEvent]("begin")

      .where(_.eventType == "create")

      .followedBy("follow") //宽松连续,中间可以发生其他事情

      .where(_.eventType == "pay")

      .within(Time.minutes(15))

 

    // 定义一个输出标签,用来标明侧输出流

    val orderTimeoutOutputTag = OutputTag[OrderResult]("orderTimeout")

 

    // 将pattern作用到input stream上,得到一个pattern stream

    val patternStream = CEP.pattern( orderEventStream.keyBy(_.orderId), orderPayPattern )

 

    import scala.collection.Map

    // 调用select得到最后的复合输出流

    val complexResult: DataStream[OrderResult] = patternStream.select(orderTimeoutOutputTag)(

      // pattern timeout function

      ( orderPayEvents: Map[String, Iterable[OrderEvent]], timestamp: Long ) => {

        val timeoutOrderId = orderPayEvents.getOrElse("begin", null).iterator.next().orderId

        OrderResult( timeoutOrderId, "order time out" )

      }

    )(

      // pattern select function

      ( orderPayEvents: Map[String, Iterable[OrderEvent]]) => {

        val payedOrderId = orderPayEvents.getOrElse("follow", null).iterator.next().orderId

        OrderResult( payedOrderId, "order payed successfully" )

      }

    )

 

    // 已正常支付的数据流

    complexResult.print("payed")

 

    // 从复合输出流里拿到侧输出流

    val timeoutResult = complexResult.getSideOutput( orderTimeoutOutputTag )

    timeoutResult.print("timeout")

 

    env.execute("Order Timeout Detect")

  }

}

 

你可能感兴趣的:(Flink)