目前生产环境开放给其他业务组后,需要实时统计各业务组访问数仓(Apache Doris)频次,分析异常请求用户,做到实时监控及提前预警,同时针对慢查询日志,要留存分析原因,降低慢查询带来的负影响。
具体需求:统计每30S内各用户访问数仓的频次,同时过滤出慢查询,将慢查询单独输出,用于后期分析。
从日志可以看到,数据是以"|“分割开的KV数据(这块要特殊说明一点,如果Doris中的一条查询为慢查询,则会产生两条日志,一条日志为正常查询日志格式,一条为慢查询日志格式),可以按照”|"切分数据,然后定义一个实体来接收数据。
#正常查询产生的日志
2023-04-21 18:07:56,218 [query] |Client=127.0.0.1:45716|User=default_cluster:bi_team|Db=default_cluster:information_schema|State=EOF|Time=326|ScanBytes=35123|ScanRows=1188636|ReturnRows=41|StmtId=20149317|QueryId=25bec2d69ce44452-8a63da4232e9a64d|IsQuery=true|feIp=172.22.197.240|Stmt=SELECT * FROM base.test1 where `_is_delete`='1' LIMIT 5000|CpuTimeMS=12685|SqlHash=0841a9b7ad2049c4346f77fdca0129b1
#慢查询产生的日志:
2023-04-21 18:02:18,306 [slow_query] |Client=127.0.0.1:46080|User=default_cluster:datacenter|Db=default_cluster:base|State=EOF|Time=6924|ScanBytes=141439314|ScanRows=26983397|ReturnRows=1|StmtId=20148539|QueryId=b540f4d08ad64b8e-a9e8b090208fd3d7|IsQuery=true|feIp=172.22.197.240|Stmt=select count(*) ct from base.test2 where _is_delete = 1|CpuTimeMS=25330|SqlHash=bb0c5f8d00f311c556db053c439c59c0
本次采用的事Flink 1.17最新的依赖实现该功能,导入依赖与以前版本稍有区别。
<dependencies>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-javaartifactId>
<version>1.17.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-clientsartifactId>
<version>1.17.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-streaming-javaartifactId>
<version>1.17.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-connector-kafkaartifactId>
<version>1.17.0version>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-statebackend-rocksdbartifactId>
<version>1.17.0version>
<scope>providedscope>
dependency>
<dependency>
<groupId>org.apache.flinkgroupId>
<artifactId>flink-runtime-webartifactId>
<version>1.17.0version>
<scope>testscope>
dependency>
<dependency>
<groupId>org.apache.commonsgroupId>
<artifactId>commons-lang3artifactId>
<version>3.9version>
dependency>
<dependency>
<groupId>commons-iogroupId>
<artifactId>commons-ioartifactId>
<version>2.4version>
dependency>
<dependency>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-log4j12artifactId>
<version>1.7.7version>
dependency>
<dependency>
<groupId>log4jgroupId>
<artifactId>log4jartifactId>
<version>1.2.17version>
dependency>
<dependency>
<groupId>org.projectlombokgroupId>
<artifactId>lombokartifactId>
<version>1.18.12version>
dependency>
<dependency>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-apiartifactId>
<version>1.7.25version>
dependency>
<dependency>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-simpleartifactId>
<version>1.7.25version>
dependency>
dependencies>
package com.bigdata.entity;
import lombok.Data;
/**
* Created by:
*
* @Author:
* @Date: 2023/04/20/14:00
* @Description:
*/
@Data
public class AuditQueryLogEntity {
private long logDate;
private String logType;
private String client;
private String user;
private String db;
private String status;
private long time;
private long scanBytes;
private long scanRows;
private long returnRows;
private String stmtId;
private String queryId;
private String isQuery;
private String feIp;
private String stmt;
private long cpuTimeMS;
private String sqlHash;
public AuditQueryLogEntity(long logDate, String logType, String client, String user, String db, String status, long time, long scanBytes, long scanRows, long returnRows, String stmtId, String queryId, String isQuery, String feIp, String stmt, long cpuTimeMS, String sqlHash) {
this.logDate = logDate;
this.logType = logType;
this.client = client;
this.user = user;
this.db = db;
this.status = status;
this.time = time;
this.scanBytes = scanBytes;
this.scanRows = scanRows;
this.returnRows = returnRows;
this.stmtId = stmtId;
this.queryId = queryId;
this.isQuery = isQuery;
this.feIp = feIp;
this.stmt = stmt;
this.cpuTimeMS = cpuTimeMS;
this.sqlHash = sqlHash;
}
public static AuditQueryLogBuild builder() {
return new AuditQueryLogBuild();
}
public static class AuditQueryLogBuild {
private long logDate;
private String logType;
private String client;
private String user;
private String db;
private String status;
private long time;
private long scanBytes;
private long scanRows;
private long returnRows;
private String stmtId;
private String queryId;
private String isQuery;
private String feIp;
private String stmt;
private long cpuTimeMS;
private String sqlHash;
public AuditQueryLogBuild logDate(long logDate) {
this.logDate = logDate;
return this;
}
public AuditQueryLogBuild logType(String logType) {
this.logType = logType;
return this;
}
public AuditQueryLogBuild client(String client) {
this.client = client;
return this;
}
public AuditQueryLogBuild user(String user) {
this.user = user;
return this;
}
public AuditQueryLogBuild db(String db) {
this.db = db;
return this;
}
public AuditQueryLogBuild status(String status) {
this.status = status;
return this;
}
public AuditQueryLogBuild time(long time) {
this.time = time;
return this;
}
public AuditQueryLogBuild scanBytes(long scanBytes) {
this.scanBytes = scanBytes;
return this;
}
public AuditQueryLogBuild scanRows(long scanRows) {
this.scanRows = scanRows;
return this;
}
public AuditQueryLogBuild returnRows(long returnRows) {
this.returnRows = returnRows;
return this;
}
public AuditQueryLogBuild stmtId(String stmtId) {
this.stmtId = stmtId;
return this;
}
public AuditQueryLogBuild queryId(String queryId) {
this.queryId = queryId;
return this;
}
public AuditQueryLogBuild isQuery(String isQuery) {
this.isQuery = isQuery;
return this;
}
public AuditQueryLogBuild feIp(String feIp) {
this.feIp = feIp;
return this;
}
public AuditQueryLogBuild stmt(String stmt) {
this.stmt = stmt;
return this;
}
public AuditQueryLogBuild cpuTimeMS(long cpuTimeMS) {
this.cpuTimeMS = cpuTimeMS;
return this;
}
public AuditQueryLogBuild sqlHash(String sqlHash) {
this.sqlHash = sqlHash;
return this;
}
public AuditQueryLogEntity build() {
return new AuditQueryLogEntity(this.logDate,
this.logType,
this.client,
this.user,
this.db,
this.status,
this.time,
this.scanBytes,
this.scanRows,
this.returnRows,
this.stmtId,
this.queryId,
this.isQuery,
this.feIp,
this.stmt,
this.cpuTimeMS,
this.sqlHash);
}
}
}
3.定义Kafka数据反序列化格式。
通过实现DeserializationSchema
,将数据封装成AuditQueryLogEntity
。
package com.bigdata.deserializationSchema;
import com.shsc.bigdata.entity.AuditQueryLogEntity;
import com.shsc.bigdata.entity.AuditStreamLoadLogEntity;
import com.shsc.bigdata.utils.DateToTimeStampUtils;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.Objects;
/**
* Created by:
*
* @Author:
* @Date: 2023/04/20/11:13
* @Description: 反序列化kafka msg,封装成AuditQueryLogEntity
*/
public class AuditQueryLogDeSerializer implements DeserializationSchema<AuditQueryLogEntity> {
private static final long serialVersionUID = 1L;
@Override
public AuditQueryLogEntity deserialize(byte[] message) throws IOException {
String auditLog = new String(message);
String[] logArray = auditLog.split("\\|");
if (logArray.length != 16) {
return null;
}
String logType = logArray[0].substring(logArray[0].indexOf("[") + 1, logArray[0].indexOf("]"));
String logDate = logArray[0].substring(0, logArray[0].indexOf(","));
try {
return AuditQueryLogEntity
.builder()
.logDate(DateToTimeStampUtils.toTimeStamp(logDate))
.logType(logType)
.client(sub(logArray[1]))
.user(sub(logArray[2]))
.db(sub(logArray[3]))
.status(sub(logArray[4]))
.time(Long.parseLong(sub(logArray[5])))
.scanBytes(Long.parseLong(sub(logArray[6])))
.scanRows(Long.parseLong(sub(logArray[7])))
.returnRows(Long.parseLong(sub(logArray[8])))
.stmtId(sub(logArray[9]))
.queryId(sub(logArray[10]))
.isQuery(sub(logArray[11]))
.feIp(sub(logArray[12]))
.stmt(sub(logArray[13]))
.cpuTimeMS(Long.parseLong(sub(logArray[14])))
.sqlHash(sub(logArray[15])).build();
} catch (Exception e) {
throw new RuntimeException(e.getMessage());
}
}
@Override
public boolean isEndOfStream(AuditQueryLogEntity nextElement) {
return false;
}
@Override
public TypeInformation<AuditQueryLogEntity> getProducedType() {
return null;
}
private String sub(String str) {
return str.substring(str.indexOf("=") + 1);
}
}
基于事件时间,统计30s内各用户的查询次数, 同时将慢查询日志单独输出
package com.bigdata;
import com.shsc.bigdata.deserializationSchema.AuditQueryLogDeSerializer;
import com.shsc.bigdata.entity.AuditQueryLogEntity;
import com.shsc.bigdata.entity.KafkaSourceEntity;
import com.shsc.bigdata.utils.DateToTimeStampUtils;
import lombok.Data;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.RestOptions;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;
import java.time.Duration;
import java.util.Objects;
/**
* Created by:
*
* @Author:
* @Date: 2023/04/20/15:13
* @Description: 统计30s内各用户的查询次数, 同时将慢查询日志单独输出
*/
public class TumblingTimeWindow3 {
@Data
static class Event {
long logTime;
String user;
int selectCount;
String startTime;
String endTime;
public Event(long logTime, String user, int selectCount, String startTime, String endTime) {
this.logTime = logTime;
this.user = user;
this.selectCount = selectCount;
this.startTime = startTime;
this.endTime = endTime;
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.setInteger(RestOptions.PORT, 8081);
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(conf);
//设置checkpoint周期
env.enableCheckpointing(5000);
//设置checkpoint保存路径
env.getCheckpointConfig().setCheckpointStorage("file:///D:\\test\\flink-demo\\checkpoint");
//定义kafkasource
KafkaSource<AuditQueryLogEntity> kafkaSource = KafkaSource.<AuditQueryLogEntity>builder()
.setTopics("test_topic")
.setValueOnlyDeserializer(new AuditQueryLogDeSerializer())
.setBootstrapServers("127.0.0.1:9092")
.setStartingOffsets(OffsetsInitializer.latest())
.setGroupId("test_v1").build();
SingleOutputStreamOperator<AuditQueryLogEntity> dataStreamSource = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "doris-fe-audit-source").returns(AuditQueryLogEntity.class).setParallelism(5);
//定义旁路输出
OutputTag<AuditQueryLogEntity> slowQueryOut = new OutputTag<AuditQueryLogEntity>("slow-query") {
};
//侧流输出,将慢sql查询日志旁路输出
SingleOutputStreamOperator<AuditQueryLogEntity> outputStreamOperator = dataStreamSource
.filter(Objects::nonNull).process(new ProcessFunction<AuditQueryLogEntity, AuditQueryLogEntity>() {
@Override
public void processElement(AuditQueryLogEntity value, Context ctx, Collector<AuditQueryLogEntity> out) throws Exception {
if (Objects.equals(value.getLogType(), "query")) {
out.collect(value);
} else {
ctx.output(slowQueryOut, value);
}
}
});
//打印旁路输出日志
outputStreamOperator.getSideOutput(slowQueryOut).print("slow-query : ");
//采用滚动窗口统计30s内各用户的查询次数
WindowedStream<Event, String, TimeWindow> window = outputStreamOperator.map(log -> new Event(log.getLogDate(), log.getUser(), 1, null, null)).assignTimestampsAndWatermarks(WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ofSeconds(5))
.withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
@Override
public long extractTimestamp(Event element, long recordTimestamp) {
return element.getLogTime();
}
})).keyBy(e -> e.user).window(TumblingProcessingTimeWindows.of(Time.seconds(30)));
window.aggregate(new MyLoadFunc(), new MyLoadAggResult()).print();
env.execute("sid-out-job-test");
}
static class MyLoadFunc implements AggregateFunction<Event, Integer, Integer> {
@Override
public Integer createAccumulator() {
return 0;
}
@Override
public Integer add(Event loadResult, Integer accumulator) {
return accumulator + 1;
}
@Override
public Integer getResult(Integer accumulator) {
return accumulator;
}
@Override
public Integer merge(Integer integer, Integer acc1) {
return null;
}
}
static class MyLoadAggResult extends ProcessWindowFunction<Integer, Event, String, TimeWindow> {
@Override
public void process(String key, Context context, Iterable<Integer> iterable, Collector<Event> out) throws Exception {
String start = DateToTimeStampUtils.getDateTime(context.window().getStart());
String end = DateToTimeStampUtils.getDateTime(context.window().getEnd());
out.collect(new Event(0L, key, iterable.iterator().next(), start, end));
}
}
}
从结果中可以看出,30s窗口中,统计了各用户的访问次数,然后从旁路输出了一个慢查询的sql。
3> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user1, selectCount=94, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
6> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user2, selectCount=28, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user3, selectCount=1, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user4, selectCount=5, startTime=2023-04-21 18:25:00, endTime=2023-04-21 18:25:30)
slow-query : :6> AuditQueryLogEntity(logDate=1682072744000, logType=slow_query, client=127.0.0.1:49920, user=default_cluster:user3, db=default_cluster:base, status=OK, time=5135, scanBytes=0, scanRows=0, returnRows=0, stmtId=20151608, queryId=2830914d231a4a0e-b528d4c1a5b5848c, isQuery=false, feIp=172.2.11.5, stmt= select * from test , cpuTimeMS=0, sqlHash=4d8733b6018f9f46cefc6906d7061c97)
2> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user2, selectCount=2, startTime=2023-04-21 18:25:30, endTime=2023-04-21 18:26:00)
4> TumblingTimeWindow3.Event(logTime=0, user=default_cluster:user3, selectCount=2, startTime=2023-04-21 18:25:30, endTime=2023-04-21 18:26:00)
只是一个Demo代码,后面还有多大优化,欢迎指正。