Flink常见应用案例代码示例

前置知识

Process Function

转换算子是无法访问事件的时间戳信息和水位线信息的。而这在一些应用场景下,极为重要。例如MapFunction这样的map转换算子就无法访问时间戳或者当前事件的事件时间。

DataStream API提供了一系列的Low-Level转换算子。可以访问时间戳、watermark以及注册定时事件。还可以输出特定的一些事件,例如超时事件等。Process Function用来构建事件驱动的应用以及实现自定义的业务逻辑(使用之前的window函数和转换算子无法实现)。例如,Flink SQL就是使用Process Function实现的。

Flink提供了8个Process Function:

  • ProcessFunction
  • KeyedProcessFunction
  • CoProcessFunction
  • ProcessJoinFunction
  • BroadcastProcessFunction
  • KeyedBroadcastProcessFunction

网站总浏览量(PV)的统计

衡量网站流量一个最简单的指标,就是网站的页面浏览量(Page View,PV)。用户每次打开一个页面便记录1次PV,多次打开同一页面则浏览量累计。一般来说,PV与来访者的数量成正比,但是PV并不直接决定页面的真实来访者数量,如同一个来访者通过不断的刷新页面,也可以制造出非常高的PV。接下来我们就用咱们之前学习的Flink算子来实现在PV的统计 

package com.atguigu.bean;
 
public class UserBehavior {
    /**
     * 用户ID
     */
    private Long userId;

    /**
     * 商品ID
     */
    private Long itemId;

    /**
     * 品类ID
     */
    private Integer categoryId;

    /**
     * 用户的行为类型:pv、buy、fav、cart
     */
    private String behavior;

    /**
     * 时间戳
     */
    private Long timestamp;

    public UserBehavior() {
    }

    public UserBehavior(Long userId, Long itemId, Integer categoryId, String behavior, Long timestamp) {
        this.userId = userId;
        this.itemId = itemId;
        this.categoryId = categoryId;
        this.behavior = behavior;
        this.timestamp = timestamp;
    }

    public Long getUserId() {
        return userId;
    }

    public void setUserId(Long userId) {
        this.userId = userId;
    }

    public Long getItemId() {
        return itemId;
    }

    public void setItemId(Long itemId) {
        this.itemId = itemId;
    }

    public Integer getCategoryId() {
        return categoryId;
    }

    public void setCategoryId(Integer categoryId) {
        this.categoryId = categoryId;
    }

    public String getBehavior() {
        return behavior;
    }

    public void setBehavior(String behavior) {
        this.behavior = behavior;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    @Override
    public String toString() {
        return "UserBehavior{" +
                "userId=" + userId +
                ", itemId=" + itemId +
                ", categoryId=" + categoryId +
                ", behavior='" + behavior + '\'' +
                ", timestamp=" + timestamp +
                '}';
    }
}

UserBehavior.csv 

用户ID、商品ID、商品类目ID、行为类型、时间戳

543462,1715,1464116,pv,1511658000
662867,2244074,1575622,pv,1511658000
561558,3611281,965809,pv,1511658000
894923,3076029,1879194,pv,1511658000
834377,4541270,3738615,pv,1511658000
315321,942195,4339722,pv,1511658000
625915,1162383,570735,pv,1511658000 

package com.atguigu.chapter05;

import com.atguigu.bean.UserBehavior;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;


public class Flink23_Case_PV {
    public static void main(String[] args) throws Exception {

        // 0.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.从文件读取数据、转换成 bean对象
        SingleOutputStreamOperator userBehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction() {

                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

        // TODO 参考WordCount思路,实现 PV的统计
        // 2.处理数据
        // 2.1 过滤出 pv 行为
        SingleOutputStreamOperator userBehaviorFilter = userBehaviorDS.filter(data -> "pv".equals(data.getBehavior()));
        // 2.2 转换成 二元组 (pv,1) => 只关心pv行为,所以第一个元素给定一个写死的 "pv"
        SingleOutputStreamOperator> pvAndOneTuple2 = userBehaviorFilter.map(new MapFunction>() {
            @Override
            public Tuple2 map(UserBehavior value) throws Exception {
                return Tuple2.of("pv", 1);
            }
        });

        // 2.3 按照第一个位置的元素 分组 => 聚合算子只能在分组之后调用,也就是 keyedStream才能调用 sum
        KeyedStream, Tuple> pvAndOneKS = pvAndOneTuple2.keyBy(0);

        // 2.4 求和
        SingleOutputStreamOperator> pvDS = pvAndOneKS.sum(1);

        // 3.打印
        pvDS.print("pv");

        env.execute();
    }

}
package com.atguigu.chapter05;

import com.atguigu.bean.UserBehavior;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

public class Flink24_Case_PVByProcess {
    public static void main(String[] args) throws Exception {

        // 0.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.从文件读取数据、转换成 bean对象
        SingleOutputStreamOperator userBehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction() {

                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

        // TODO 实现 PV的统计
        // 2.处理数据
        // 2.1 过滤出 pv 行为
        SingleOutputStreamOperator userBehaviorFilter = userBehaviorDS.filter(data -> "pv".equals(data.getBehavior()));
        // 2.2 按照 统计的维度 分组 :pv行为
        KeyedStream userBehaviorKS = userBehaviorFilter.keyBy(data -> data.getBehavior());
        // 2.3 求和 => 实现 计数 的功能,没有count这种聚合算子
        // 一般找不到现成的算子,那就调用底层的 process
        SingleOutputStreamOperator resultDS = userBehaviorKS.process(
                new KeyedProcessFunction() {
                    // 定义一个变量,来统计条数
                    private Long pvCount = 0L;

                    /**
                     * 来一条处理一条
                     * @param value
                     * @param ctx
                     * @param out
                     * @throws Exception
                     */
                    @Override
                    public void processElement(UserBehavior value, Context ctx, Collector out) throws Exception {
                        // 来一条,count + 1
                        pvCount++;
                        // 采集器往下游发送统计结果
                        out.collect(pvCount);
                    }
                }
        );

        resultDS.print("pv by process");


        env.execute();
    }

}
package com.atguigu.chapter05;

import com.atguigu.bean.UserBehavior;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

 
public class Flink25_Case_PVByFlatmap {
    public static void main(String[] args) throws Exception {

        // 0.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.从文件读取数据、转换成 bean对象
        env
                .readTextFile("input/UserBehavior.csv")
                .flatMap(new FlatMapFunction>() {
                    @Override
                    public void flatMap(String value, Collector> out) throws Exception {
                        String[] datas = value.split(",");
                        if ("pv".equals(datas[3])) {
                            out.collect(Tuple2.of("pv", 1));
                        }
                    }
                })
                .keyBy(0)
                .sum(1)
                .print("pv by flatmap");


        env.execute();
    }

}

网站独立访客数(UV)的统计

上一个案例中,我们统计的是所有用户对页面的所有浏览行为,也就是说,同一用户的浏览行为会被重复统计。而在实际应用中,我们往往还会关注,到底有多少不同的用户访问了网站,所以另外一个统计流量的重要指标是网站的独立访客数(Unique Visitor,UV)

package com.atguigu.chapter05;

import com.atguigu.bean.UserBehavior;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;

import java.util.HashSet;
import java.util.Set;
 
public class Flink26_Case_UV {
    public static void main(String[] args) throws Exception {

        // 0.创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.从文件读取数据、转换成 bean对象
        SingleOutputStreamOperator userBehaviorDS = env
                .readTextFile("input/UserBehavior.csv")
                .map(new MapFunction() {

                    @Override
                    public UserBehavior map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new UserBehavior(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                Integer.valueOf(datas[2]),
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

        // TODO 实现 UV的统计 :对 userId进行去重,统计
        // => userId存在一个Set中
        // => 算出Set里的元素个数,就是UV值

        // 2.处理数据
        // 2.1 过滤出 pv 行为 => UV 就是 PV的去重,所以行为还是 pv
        SingleOutputStreamOperator userBehaviorFilter = userBehaviorDS.filter(data -> "pv".equals(data.getBehavior()));
        // 2.2 转换成二元组 ("uv",userId)
        // => 第一个给 uv,是为了分组,分组是为了调用 sum或 process等方法
        // => 第二个给userId,是为了将 userId存到 Set里
        // => 这里我们,只需要userId,其他什么商品、品类这些都不需要
        SingleOutputStreamOperator> uvTuple2 = userBehaviorFilter.map(new MapFunction>() {
            @Override
            public Tuple2 map(UserBehavior value) throws Exception {
                return Tuple2.of("uv", value.getUserId());
            }
        });

        // 2.3 按照 uv 分组
        KeyedStream, String> uvKS = uvTuple2.keyBy(data -> data.f0);
        
        // 2.4 使用 process 处理
        SingleOutputStreamOperator uvDS = uvKS.process(
                new KeyedProcessFunction, Integer>() {
                    // 定义一个Set,用来去重并存放 userId
                    private Set uvSet = new HashSet<>();

                    /**
                     * 来一条数据处理一条
                     * @param value
                     * @param ctx
                     * @param out
                     * @throws Exception
                     */
                    @Override
                    public void processElement(Tuple2 value, Context ctx, Collector out) throws Exception {
                        // 来一条数据,就把 userId存到 Set中
                        uvSet.add(value.f1);
                        // 通过采集器,往下游发送 uv值 => Set中元素的个数,就是 UV值
                        out.collect(uvSet.size());
                    }
                }
        );

        uvDS.print("uv");


        env.execute();
    }

}

页面广告点击量统计

电商网站的市场营销商业指标中,除了自身的APP推广,还会考虑到页面上的广告投放(包括自己经营的产品和其它网站的广告)。所以广告相关的统计分析,也是市场营销的重要指标。

对于广告的统计,最简单也最重要的就是页面广告的点击量,网站往往需要根据广告点击量来制定定价策略和调整推广方式,而且也可以借此收集用户的偏好信息。更加具体的应用是,我们可以根据用户的地理位置进行划分,从而总结出不同省份用户对不同广告的偏好,这样更有助于广告的精准投放。

在之前的需求实现中,已经统计的广告的点击次数总和,但是没有实现窗口操作,并且也未增加排名处理,具体实现请参考“热门点击商品”需求。

不同渠道的用户行为统计

package com.atguigu.bean;

/**
 */
public class MarketingUserBehavior {
    /**
     * 用户ID
     */
    private Long userId;
    /**
     * 行为:下载、安装、更新、卸载
     */
    private String behavior;
    /**
     * 渠道:小米、华为、OPPO、VIVO
     */
    private String channel;
    /**
     * 时间戳
     */
    private Long timestamp;

    public MarketingUserBehavior() {
    }

    public MarketingUserBehavior(Long userId, String behavior, String channel, Long timestamp) {
        this.userId = userId;
        this.behavior = behavior;
        this.channel = channel;
        this.timestamp = timestamp;
    }

    public Long getUserId() {
        return userId;
    }

    public void setUserId(Long userId) {
        this.userId = userId;
    }

    public String getBehavior() {
        return behavior;
    }

    public void setBehavior(String behavior) {
        this.behavior = behavior;
    }

    public String getChannel() {
        return channel;
    }

    public void setChannel(String channel) {
        this.channel = channel;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    @Override
    public String toString() {
        return "MarketingUserBehavior{" +
                "userId=" + userId +
                ", behavior='" + behavior + '\'' +
                ", channel='" + channel + '\'' +
                ", timestamp=" + timestamp +
                '}';
    }
}
package com.atguigu.chapter05;

import com.atguigu.bean.MarketingUserBehavior;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.omg.PortableInterceptor.SYSTEM_EXCEPTION;

import java.util.Arrays;
import java.util.List;
import java.util.Random;

/**
 * 不同渠道不同行为 的统计

 */
public class Flink27_Case_APPMarketingAnalysis {
    public static void main(String[] args) throws Exception {
        // 0 执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.读取数据、转换成 bean对象
        DataStreamSource appDS = env.addSource(new AppSource());

        // 2.处理数据:不同渠道不同行为 的统计
        // 2.1 按照 统计的维度 分组 : 渠道、行为 ,多个维度可以拼接在一起
        // 2.1.1 转换成 (渠道_行为,1)二元组
        SingleOutputStreamOperator> channelAndBehaviorTuple2 = appDS.map(new MapFunction>() {
            @Override
            public Tuple2 map(MarketingUserBehavior value) throws Exception {
                return Tuple2.of(value.getChannel() + "_" + value.getBehavior(), 1);
            }
        });
        // 2.1.2 按照 渠道_行为 分组
        KeyedStream, String> channelAndBehaviorKS = channelAndBehaviorTuple2.keyBy(data -> data.f0);

        // 2.2 求和
        SingleOutputStreamOperator> resultDS = channelAndBehaviorKS.sum(1);

        // 3. 输出:打印
        resultDS.print("app marketing analysis by channel and behavior");


        env.execute();
    }


    public static class AppSource implements SourceFunction {

        private boolean flag = true;
        private List behaviorList = Arrays.asList("DOWNLOAD", "INSTALL", "UPDATE", "UNINSTALL");
        private List channelList = Arrays.asList("XIAOMI", "HUAWEI", "OPPO", "VIVO");

        @Override
        public void run(SourceContext ctx) throws Exception {
            while (flag) {
                Random random = new Random();
                ctx.collect(
                        new MarketingUserBehavior(
                                Long.valueOf(random.nextInt(10)),
                                behaviorList.get(random.nextInt(behaviorList.size())),
                                channelList.get(random.nextInt(channelList.size())),
                                System.currentTimeMillis()
                        )
                );
                Thread.sleep(1000L);
            }
        }

        @Override
        public void cancel() {
            flag = false;
        }
    }
}

不同省份的用户点击量

package com.atguigu.bean;


public class AdClickLog {

    /**
     * 用户ID
     */
    private Long userId;
    /**
     * 广告ID
     */
    private Long adId;
    /**
     * 省份
     */
    private String province;
    /**
     * 城市
     */
    private String city;
    /**
     * 时间戳
     */
    private Long timestamp;

    public AdClickLog() {
    }

    public AdClickLog(Long userId, Long adId, String province, String city, Long timestamp) {
        this.userId = userId;
        this.adId = adId;
        this.province = province;
        this.city = city;
        this.timestamp = timestamp;
    }

    public Long getUserId() {
        return userId;
    }

    public void setUserId(Long userId) {
        this.userId = userId;
    }

    public Long getAdId() {
        return adId;
    }

    public void setAdId(Long adId) {
        this.adId = adId;
    }

    public String getProvince() {
        return province;
    }

    public void setProvince(String province) {
        this.province = province;
    }

    public String getCity() {
        return city;
    }

    public void setCity(String city) {
        this.city = city;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    @Override
    public String toString() {
        return "AdClickLog{" +
                "userId=" + userId +
                ", adId=" + adId +
                ", province='" + province + '\'' +
                ", city='" + city + '\'' +
                ", timestamp=" + timestamp +
                '}';
    }
}
package com.atguigu.chapter05;

import com.atguigu.bean.AdClickLog;
import com.atguigu.bean.MarketingUserBehavior;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;

import java.util.Arrays;
import java.util.List;
import java.util.Random;

/**
 * 不同省份、不同广告的点击量 实时统计
 */
public class Flink29_Case_AdClickAnalysis {
    public static void main(String[] args) throws Exception {
        // 0 执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        // 1.读取数据,转成bean对象
        SingleOutputStreamOperator adClickDS = env
                .readTextFile("input/AdClickLog.csv")
                .map(new MapFunction() {
                    @Override
                    public AdClickLog map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new AdClickLog(
                                Long.valueOf(datas[0]),
                                Long.valueOf(datas[1]),
                                datas[2],
                                datas[3],
                                Long.valueOf(datas[4])
                        );
                    }
                });

        // 2.处理数据:不同省份、不同广告的点击量 实时统计
        // 2.1 按照 统计维度 分组: 省份、广告
        adClickDS
                .map(new MapFunction>() {
                    @Override
                    public Tuple2 map(AdClickLog value) throws Exception {
                        return Tuple2.of(value.getProvince() + "_" + value.getAdId(), 1);
                    }
                })
                .keyBy(data -> data.f0)
                .sum(1)
                .print("ad click analysis");


        env.execute();
    }


}

订单支付实时监控

在电商网站中,订单的支付作为直接与营销收入挂钩的一环,在业务流程中非常重要。对于订单而言,为了正确控制业务流程,也为了增加用户的支付意愿,网站一般会设置一个支付失效时间,超过一段时间不支付的订单就会被取消。另外,对于订单的支付,我们还应保证用户支付的正确性,这可以通过第三方支付平台的交易数据来做一个实时对账。来自两条流的订单交易匹配

对于订单支付事件,用户支付完成其实并不算完,我们还得确认平台账户上是否到账了。而往往这会来自不同的日志信息,所以我们要同时读入两条流的数据来做合并处理。

1.订单数据从OrderLog.csv中读取,交易数据从ReceiptLog.csv中读取

OrderLog.csv

34729,create,,1558430842
34730,create,,1558430843
34729,pay,sd76f87d6,1558430844
34730,pay,3hu3k2432,1558430845
34731,create,,1558430846
34731,pay,35jue34we,1558430849
34732,create,,1558430852
34733,create,,1558430855
34734,create,,1558430859
34732,pay,32h3h4b4t,1558430861
34735,create,,1558430862

ReceiptLog.csv

ewr342as4,wechat,1558430845
sd76f87d6,wechat,1558430847
3hu3k2432,alipay,1558430848
8fdsfae83,alipay,1558430850
32h3h4b4t,wechat,1558430852
766lk5nk4,wechat,1558430855
435kjb45d,alipay,1558430859
5k432k4n,wechat,1558430862
435kjb45s,wechat,1558430866

2.读取日志数据转换为样例类方便操作

​
public class OrderEvent {
    private Long orderId;
    private String eventType;
    private String txId;
    private Long eventTime;

    public OrderEvent() {
    }

    public OrderEvent(Long orderId, String eventType, String txId, Long eventTime) {
        this.orderId = orderId;
        this.eventType = eventType;
        this.txId = txId;
        this.eventTime = eventTime;
    }

    public Long getOrderId() {
        return orderId;
    }

    public void setOrderId(Long orderId) {
        this.orderId = orderId;
    }

    public String getEventType() {
        return eventType;
    }

    public void setEventType(String eventType) {
        this.eventType = eventType;
    }

    public String getTxId() {
        return txId;
    }

    public void setTxId(String txId) {
        this.txId = txId;
    }

    public Long getEventTime() {
        return eventTime;
    }

    public void setEventTime(Long eventTime) {
        this.eventTime = eventTime;
    }

    @Override
    public String toString() {
        return "OrderEvent{" +
                "orderId=" + orderId +
                ", eventType='" + eventType + '\'' +
                ", txId='" + txId + '\'' +
                ", eventTime=" + eventTime +
                '}';
    }
}

​
public class TxEvent {
    private String txId;
    private String payChannel;
    private Long eventTime;

    public TxEvent() {
    }

    public TxEvent(String txId, String payChannel, Long eventTime) {
        this.txId = txId;
        this.payChannel = payChannel;
        this.eventTime = eventTime;
    }

    public String getTxId() {
        return txId;
    }

    public void setTxId(String txId) {
        this.txId = txId;
    }

    public String getPayChannel() {
        return payChannel;
    }

    public void setPayChannel(String payChannel) {
        this.payChannel = payChannel;
    }

    public Long getEventTime() {
        return eventTime;
    }

    public void setEventTime(Long eventTime) {
        this.eventTime = eventTime;
    }

    @Override
    public String toString() {
        return "TxEvent{" +
                "txId='" + txId + '\'' +
                ", payChannel='" + payChannel + '\'' +
                ", eventTime=" + eventTime +
                '}';
    }
}
SingleOutputStreamOperator orderDS = env
                .readTextFile("input/OrderLog.csv")
                .map(new MapFunction() {
                    @Override
                    public OrderEvent map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new OrderEvent(
                                Long.valueOf(datas[0]),
                                datas[1],
                                datas[2],
                                Long.valueOf(datas[3])
                        );
                    }
                });
SingleOutputStreamOperator txDS = env
                .readTextFile("input/ReceiptLog.csv")
                .map(new MapFunction() {
                    @Override
                    public TxEvent map(String value) throws Exception {
                        String[] datas = value.split(",");
                        return new TxEvent(
                                datas[0],
                                datas[1],
                                Long.valueOf(datas[2])
                        );
                    }
                });

3.将两个流连接在一起

orderDS.connect(txDS)
.keyBy(order -> order.getTxId(), tx -> tx.getTxId())
.process(
                new CoProcessFunction {
  
                })

 4.因为不同的数据流到达的先后顺序不一致,所以需要匹配对账信息

new CoProcessFunction{
        Map orderMap = new HashMap<>();
        Map txMap = new HashMap<>();

        @Override
        public void processElement1(OrderEvent value, Context ctx, Collector out) throws Exception {
            TxEvent txEvent = txMap.get(value.getTxId());
            if (txEvent == null){
                orderMap.put(value.getTxId(), value);
            }else{
                out.collect("订单["+value.getOrderId()+"]对账成功");
                txMap.remove(value.getTxId());
            }
        }

        @Override
        public void processElement2(TxEvent value, Context ctx, Collector out) throws Exception {
            OrderEvent orderEvent = orderMap.get(value.getTxId());
            if (orderEvent == null){
                txMap.put(value.getTxId(), value);
            }else{
                out.collect("订单["+orderEvent.getOrderId()+"]对账成功");
                orderMap.remove(value.getTxId());
            }
        }
}

 

你可能感兴趣的:(大数据)