在之前的文章 日志系统EFK后续: fluent-bit服务独立 中完成了fluent-bit采集, fluentd转发至kafka, 再到elasticsearch的全部过程. 后面提到要将服务器的日志同步到fluent-bit所在环境, 这个通过rsync做增量同步即可完成, 详细的便不做记录了. 现在主要记录一下kafka中告警日志监控并发送消息通知的过程.
fluentd配置过滤error级别日志
@type record_transformer
enable_ruby
tag ${record["fbKey"].split('/')[3]}
remove_keys fbKey
@type copy
@type rewrite_tag_filter
key level
pattern /^ERROR$/
tag error.fb.dapeng
@type kafka_buffered
brokers kafka服务器ip:9092
topic_key efk
buffer_type file
buffer_path /tmp/buffer
flush_interval 5s
default_topic efk
output_data_type json
compression_codec gzip
max_send_retries 3
required_acks -1
discard_kafka_delivery_failed true
@type kafka_buffered
brokers kafka服务器ip:9092
topic_key efk_error
buffer_type file
buffer_path /tmp/buffer_error
flush_interval 5s
default_topic efk_error
output_data_type json
compression_codec gzip
max_send_retries 3
required_acks -1
discard_kafka_delivery_failed true
-
copy
: 将每个event复制到多个输出, store相当于match -
rewrite_tag_filter
: 根据rule规则为匹配的event重写tag, 并以新的tag发出消息, 并重新从上往下处理, 所以注意重写的tag不要与当前所在的match匹配, 否则会陷入死循环...
此处匹配tag为fb.dapeng的消息:
- 匹配level为ERROR的消息重写tag为error.fb.dapeng,
- 消息直接发到kafka的topic: efk
对于1中tag重写为error.fb.dapeng的消息, 将消息发到kafka的topic: efk_error
这样elasticsearch只用消费kafka的topic: efk, 其中包含全部级别的日志信息,
而对于告警监控monitor则只用消费kafka的topic: efk_error, 其中都是ERROR级别的日志
注意: rewrite_tag_filter
插件需要安装, 修改fluentd的Dockerfile重新构建镜像
FROM fluent/fluentd:v1.2
#增加es插件, kafka插件, rewrite_tag_filter插件
RUN fluent-gem install fluent-plugin-elasticsearch
RUN fluent-gem install fluent-plugin-kafka
RUN fluent-gem install fluent-plugin-rewrite-tag-filter
CMD exec fluentd -c /fluentd/etc/${FLUENTD_CONF} -p /fluentd/plugins $FLUENTD_OPT
monitor项目消费kafka消息
kafka消费处理逻辑如下:
- 首先解析消息json, 取出sessionTid(标记服务调用链, 同一次调用操作, sessionTid相同),
- 对于相同的sessionTid, 只记录第一条error消息(sessionTid缓存10s, 之后清除, 一般同一次调用报错不会间隔10s之久)
- 钉钉群消息通知
@Component
@Slf4j
public class EfkAlarmConsumer {
@Resource
EfkAlarmApp efkAlarmApp;
private final Map cache = new ConcurrentHashMap<>();
private final Timer timer = new Timer();
@KafkaListener(topics = {"efk_error"}, groupId = "efk_error_consumer")
public void processEfkAlarm(ConsumerRecord record) {
String json = record.value();
Log l = resolveLog(json);
if (null == l) {
log.error("非法消息: {}", json);
} else {
log.debug("接收消息 Log: {}", l);
processLog(l);
}
}
private void processLog(Log l) {
final String tid = l.getSessionTid();
Long t = cache.get(tid);
if (t == null) {
cache.put(tid, System.currentTimeMillis());
// 10s 之后tid的数据清除
timer.schedule(new TimerTask() {
@Override
public void run() {
cache.remove(tid);
}
}, 10000);
String currIndex = String.format("dapeng_log_index-%s", new SimpleDateFormat("yyyy.MM.dd").format(new Date()));
// 发钉钉 ...
String text = String.format("%s", l.getMessage());
String title = String.format("[%s] %s: %s[%s] ", efkAlarmApp.getDingTag(), l.getLogtime(), l.getTag(), l.getHostname());
String url = String.format(AppConst.ESHEAD_LINK_URI, currIndex, l.getSessionTid());
DingService.send(efkAlarmApp.getDingWebHook(), msg(text, title, url));
}
}
private Log resolveLog(String json) {
Log l = null;
try {
l = JSON.parseObject(json, Log.class);
} catch (Throwable e) {
log.error("消息转换异常", e);
}
return l;
}
private String msg(String text, String title, String url) {
return String.format(
"{\n" +
" \"msgtype\": \"link\", \n" +
" \"link\": {\n" +
" \"text\": \"%s\", \n" +
" \"title\": \"%s\", \n" +
" \"picUrl\": \"\",\n" +
" \"messageUrl\": \"%s\"\n" +
" }\n" +
"}",
text, title, url);
}
}
消息链接跳转
链接link格式 ESHEAD_LINK_URI
:
public class AppConst {
public static final String ES_BASE_URI = "elasticsearch服务器访问地址";
public static final String ESHEAD_BASE_URI = "elasticsearch-head网页访问地址";
public static final String ESHEAD_LINK_URI = ESHEAD_BASE_URI + "?curr_index=%s&sessionTid=%s&base_uri=" + ES_BASE_URI;
}
修改elasticsearch-head项目中的 index.html
修改 dc-all.yml文件中配置elasticsearch-head, 新增index.html的挂载
- /data/workspace/elasticsearch-head/index.html:/usr/src/app/index.html
OK, 重启elasticsearch-head即可
钉钉消息如下:
点击消息跳转es-head: