Flink根据配置实时计算热门商品TopN

 

通过本文你将学到:

1.如何通过Broadcast广播的形式,关联配置文件2如何使用 Flink 灵活的 Window API3.何时需要用到 State,以及如何使用4.如何使用 ProcessFunction 实现 TopN 功能

业务场景

    实时根据运营人员前端配置的商品id,监控商品id在黄金广告位最近一小时销量的销量情况,发现销量差的商品id及时下线调整,发现销量好的商品扩大曝光力度和广告力度。 

数据准备

简化版kafka实时流流	
{"userId":"11","itemId":"0001"}	
{"userId":"11","itemId":"0002"}	
{"userId":"22","itemId":"0002"}	
{"userId":"22","itemId":"0002"}	
{"userId":"22","itemId":"0003"}	
{"userId":"33","itemId":"0003"}	
{"userId":"33","itemId":"0004"}
简化板前端运营人员配置的商品itemId	
0001	
0002	
0003

计算流程梳理

    运营人员监控的itemId会存入在mysql中,本次为了简化,直接读取的text文件

整体思路梳理	
1,读取运营人员想要监控的itemId	
2,把想要监控的itemId通过广播的形式广播出去	
3,kafka读取订单数据,过滤哪些商品订单是运营想要监控的itemId	
4,实时计算topN

 

实现代码

import com.alibaba.fastjson.JSONObject;	
import org.apache.flink.api.common.serialization.SimpleStringSchema;	
import org.apache.flink.api.common.state.*;	
import org.apache.flink.api.common.typeinfo.BasicTypeInfo;	
import org.apache.flink.api.common.typeinfo.TypeInformation;	
import org.apache.flink.api.java.tuple.Tuple1;	
import org.apache.flink.api.java.tuple.Tuple2;	
import org.apache.flink.streaming.api.datastream.BroadcastStream;	
import org.apache.flink.streaming.api.datastream.DataStream;	
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;	
import org.apache.flink.streaming.api.functions.co.BroadcastProcessFunction;	
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08;	
import java.net.URL;	
import java.util.*;	
import com.alibaba.fastjson.JSON;	
import org.apache.flink.api.java.tuple.Tuple;	
import org.apache.flink.configuration.Configuration;	
import org.apache.flink.streaming.api.TimeCharacteristic;	
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;	
import org.apache.flink.streaming.api.functions.timestamps.AscendingTimestampExtractor;	
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;	
import org.apache.flink.streaming.api.windowing.time.Time;	
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;	
import org.apache.flink.util.Collector;	
import org.apache.flink.api.common.functions.AggregateFunction;	
public class FlinkTopN {	
    public static void main(String args[]) throws Exception{	
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();	
        env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);	
        Properties props = new Properties();	
        props.setProperty("bootstrap.servers", "localhost:9092");	
        props.setProperty("zookeeper.connect", "localhost:2181");	
        props.setProperty("group.id", "flink-kafka");	
        URL fileUrl = FlinkRedisTopN.class.getClassLoader().getResource("itemId.txt");	
        // 广播运营人员的配置文件,大家也可以实时读取mysql做广播	
        DataStream charge_battery_sn = env.readTextFile(fileUrl.getPath());	
        MapStateDescriptor descriptor  = new MapStateDescriptor("itemId_descriptor", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(String.class));	
        BroadcastStream charge_battery_sn_stream = charge_battery_sn.broadcast(descriptor);	
        FlinkKafkaConsumer08 consumer = new FlinkKafkaConsumer08("flink3", new SimpleStringSchema(), props);	
        //设置水位	
        DataStream order_stream = env.addSource(consumer).assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {	
            @Override	
            public long extractAscendingTimestamp(String str) {	
                return System.currentTimeMillis();	
            }	
        });	
        // 两个流实时join,运营人员想要监控商品id 0001,0002,0003,	
        // 所以商品id=0004的会被过滤掉	
        DataStream>  data = order_stream.connect(charge_battery_sn_stream).process(new BroadcastProcessFunction>() {	

	
            @Override	
            public void processElement(String value, ReadOnlyContext ctx, Collector> out) throws Exception {	
                ReadOnlyBroadcastState state = ctx.getBroadcastState(descriptor);	
                try {	
                    JSONObject json = JSON.parseObject(value);	
                    if(state.contains(json.get("itemId").toString())){	
                        out.collect(new Tuple2(json.get("itemId").toString(),1L));	
                    }	
                }catch (Exception e){	
                    System.out.println(e);	
                }	
            }	
            @Override	
            public void processBroadcastElement(String value, Context ctx, Collector> out) throws Exception {	
                BroadcastState state = ctx.getBroadcastState(descriptor);	
                state.put(value, value);	
            }	
        });	
        //打印两个流join后的效果	
        data.print();	
        data.keyBy(0).timeWindow(Time.seconds(5), Time.seconds(5))	
                .aggregate(new AggregateFunction, Long, Long>() {	
                    @Override	
                    public Long createAccumulator() {	
                        return 0L;	
                    }	
                    @Override	
                    public Long add(Tuple2 tuple2, Long acc) {	
                        return acc + 1;	
                    }	
                    @Override	
                    public Long getResult(Long acc) {	
                        return acc;	
                    }	
                    @Override	
                    public Long merge(Long acc1, Long acc2) {	
                        return acc1 + acc2;	
                    }	
                }, new WindowFunction() {	
                    @Override	
                    public void apply(Tuple  key, TimeWindow window, Iterable input, Collector out) throws Exception {	
                        String itemId= ((Tuple1) key).f0;;	
                        Long count = input.iterator().next();	
                        out.collect(ItemIdCount.of(itemId , window.getEnd(), count));	
                    }	
                }).keyBy("windowEnd").process(new TopNHotItems(3)).print();	
        env.execute("kaishi");	
    }	
}	
class TopNHotItems extends KeyedProcessFunction {	
    private final int topSize;	
    public TopNHotItems(int topSize) {	
        this.topSize = topSize;	
    }	
    // 用于存储用户订单数量状态,待收齐同一个窗口的数据后,再触发 TopN 计算	
    private ListState itemIdCountListState;	
    @Override	
    public void open(Configuration parameters) throws Exception {	
        super.open(parameters);	
        // 状态的注册	
        ListStateDescriptor itemsStateDesc = new ListStateDescriptor<>(	
                "itemsState",	
                ItemIdCount.class);	
        itemIdCountListState = getRuntimeContext().getListState(itemsStateDesc);	
    }	
    @Override	
    public void processElement(	
            ItemIdCount input,	
            Context context,	
            Collector collector) throws Exception {	
        // 每条数据都保存到状态中	
        itemIdCountListState.add(input);	
        // 注册 windowEnd+1 的 EventTime Timer, 当触发时,说明收齐了属于windowEnd窗口的所有商品数	
        context.timerService().registerProcessingTimeTimer(input.windowEnd + 1);	
    }	
    @Override	
    public void onTimer(long timestamp, OnTimerContext ctx, Collector out) throws Exception {	
        // 获取所有商品的订单信息	
        List allUserOrder = new ArrayList<>();	
        for (ItemIdCount item : itemIdCountListState.get()) {	
            allUserOrder.add(item);	
        }	
        // 提前清除状态中的数据,释放空间	
        itemIdCountListState.clear();	
        // 按照订单量从大到小排序	
        allUserOrder.sort(new Comparator() {	
            @Override	
            public int compare(ItemIdCount o1, ItemIdCount o2) {	
                return (int) (o2.orderCount - o1.orderCount);	
            }	
        });	
        for (int i=0;i

最后效果

由于运营人员没有配置商品id=0004的数据,所以两个流关联的时候0004从广播内存	
中查询不到,会被过滤掉。	
8> (0002,1)	
6> (0001,1)	
7> (0002,1)	
6> (0003,1)	
7> (0002,1)	
6> (0003,1)	
8> {"itemId":"0002","orderQt":3}	
8> {"itemId":"0003","orderQt":2}	
8> {"itemId":"0001","orderQt":1}

 

 

 

你可能感兴趣的:(flink专栏)