Flink CEP简单示例----对用户访问页面顺序检测

Flink CEP简单示例----对用户访问页面顺序检测

CEP是flink早期推出的一个的库,是基于一些规则模型,检测异常行为。比如检测反爬虫,或检测优惠活动褥羊毛行为等。

下面简单介绍一下项目背景,使用CEP做模式检测。

需求:

因公司针对会员,发放优惠券活动,因防止羊毛党褥羊毛,通过Flink CEP进行异常检测。检测规则,如果同一个设备号在5分钟内顺序访问login页面–>my页面–>ling quan页面超过5次,那么该数据print至窗口。

这里使用python脚本来模拟用户的行为日志。

1、Flink CEP检测代码:


public class CEPDemo {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
        sEnv.setParallelism(1);
        
        Properties p = new Properties();
        p.setProperty("bootstrap.servers", "localhost:9092");
        p.setProperty("group.id", "test");
        DataStreamSource ds = sEnv.addSource(new FlinkKafkaConsumer09("cep", new SimpleStringSchema(), p));

        KeyedStream keyedStream = ds
                .map(new MapFunction() {
                    @Override
                    public Event map(String value) throws Exception {
                        return new Gson().fromJson(value, Event.class);
                    }
                })
                .assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
                    @Override
                    public long extractAscendingTimestamp(Event element) {
                        return element.timestamp;
                    }
                }).keyBy(new KeySelector() {

                    @Override
                    public String getKey(Event value) throws Exception {
                        return value.driverId;
                    }
                });


        Pattern pattern = Pattern.begin("first")
                .where(new SimpleCondition() {
                    @Override
                    public boolean filter(Event value) throws Exception {
                        return value.event.equals("login");
                    }
                })
                .next("second").where(new SimpleCondition() {
                    @Override
                    public boolean filter(Event value) throws Exception {
                        return value.event.equals("my");
                    }
                })
                .followedBy("end").where(new SimpleCondition() {
                    @Override
                    public boolean filter(Event value) throws Exception {
                        return value.event.equals("ling quan");
                    }
                })
                .within(Time.minutes(5)) // 5分钟内
                .timesOrMore(5);// 超过5次


        PatternStream patternStream = CEP.pattern(keyedStream, pattern);
        patternStream.process(new PatternProcessFunction() {
            @Override
            public void processMatch(Map> match, Context ctx, Collector out) throws Exception {
                out.collect(match.toString());
            }
        }).print();

        sEnv.execute("CEP");
    }
}

2、模拟kafka的product的python脚本

import random
import time

from kafka import KafkaProducer

if __name__ == '__main__':

    driver_ids = ["1001", "1002", "1003", "1004", "1005", "1006", "1007", "1008", "1009"]
    events = ["register", "login", "my", "search", "list", "detail", "order", "ling quan"]
    p = KafkaProducer(bootstrap_servers="localhost:9092")

    while True:
        i = random.randint(0, len(driver_ids) - 1)
        driverId = driver_ids[i]
        index = random.randint(0, len(events) - 1)
        event = events[index]
        timestamp = int(time.time() * 1000)
        v = '{"driverId":"%s","event":"%s","timestamp":%s}' % (driverId, event, timestamp)
        print(v)
        p.send("cep", bytes(v, encoding="utf-8"))
        p.flush()

你可能感兴趣的:(大数据)