在做维表关联如果要求低延时,即维表数据的变更能够被立刻感知到,所以就要求在查询时没有缓存策略,直接查询数据库维表信息。
本篇以实时查询redis为例,要求redis 客户端支持异步查询,可以使用io.lettuce包,支持redis不同模式:单点模式、sentinel模式、集群模式,需要在pom中引入:
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>5.0.5.RELEASE</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.24.Final</version>
</dependency>
关于其不同模式的用法可以参考:https://juejin.im/post/5d8eb73ff265da5ba5329c66
里面做了比较详细的说明,为方便测试使用单点模式,仍以广告业务为例,根据广告位ID从redis里面查询对位的广告主ID。
Redis中数据准备:
hmset 1 aid 1 cid 1
hmset 2 aid 1 cid 2
使用hash结构,key表示广告位ID、aid表示广告主ID、cid表示广告计划ID
定义RichAsyncFunction类型的RedisSide,异步查询Redis
class RedisSide extends RichAsyncFunction[AdData, AdData] {
private var redisClient: RedisClient = _
private var connection: StatefulRedisConnection[String, String] = _
private var async: RedisAsyncCommands[String, String] = _
override def open(parameters: Configuration): Unit = {
val redisUri = "redis://localhost"
redisClient = RedisClient.create(redisUri)
connection = redisClient.connect()
async = connection.async()
}
override def asyncInvoke(input: AdData, resultFuture: ResultFuture[AdData]): Unit = {
val tid = input.tId.toString
async.hgetall(tid).thenAccept(new Consumer[util.Map[String, String]]() {
override def accept(t: util.Map[String, String]): Unit = {
if (t == null || t.size() == 0) {
resultFuture.complete(util.Arrays.asList(input))
return
}
t.foreach(x => {
if ("aid".equals(x._1)) {
val aid = x._2.toInt
var newData = AdData(aid, input.tId, input.clientId, input.actionType, input.time)
resultFuture.complete(util.Arrays.asList(newData))
}
})
}
})
}
//关闭资源
override def close(): Unit = {
if (connection != null) connection.close()
if (redisClient != null) redisClient.shutdown()
}
}
主流程:
case class AdData(aId: Int, tId: Int, clientId: String, actionType: Int, time: Long)
object Demo1 {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val kafkaConfig = new Properties();
kafkaConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
kafkaConfig.put(ConsumerConfig.GROUP_ID_CONFIG, "test1");
val consumer = new FlinkKafkaConsumer[String]("topic1", new SimpleStringSchema(), kafkaConfig);
val ds = env.addSource(consumer)
.map(x => {
val a: Array[String] = x.split(",")
AdData(0, a(0).toInt, a(1), a(2).toInt, a(3).toLong) //默认给0
})
val redisSide: AsyncFunction[AdData, AdData] = new RedisSide
AsyncDataStream.unorderedWait(ds, redisSide, 5L, SECONDS, 1000)
.print()
env.execute("Demo1")
}
}
测试验证
生产数据:
1,clientId1,1,1571646006000
3,clientId1,1,1571646006000
输出:
AdData(1,1,clientId1,1,1571646006000)
AdData(0,3,clientId1,1,1571646006000)
验证完毕,也算是补上维表系列里面的空缺。