Nightingale(n9e)修改告警模板

夜莺的告警模板默认是如下状态,但我需要让他输出预案链接,我并没有搜索到相关文档,这里说一下我的解决方法

Nightingale(n9e)修改告警模板_第1张图片

通过数据库字段查看

搜索告警历史表中的某个告警名称,得出以下信息

use n9e_v5;

select * from alert_his_event where rule_name='inode资源不足-使用率超过90'\G


*************************** 232. row ***************************
                id: 90899
      is_recovered: 1
              cate: prometheus
           cluster: Default
          group_id: 11
        group_name: 源码主机
              hash: 44a
           rule_id: 144
         rule_name: inode资源不足-使用率超过90
         rule_note: 
         rule_prod: 
         rule_algo: 
          severity: 2
 prom_for_duration: 60
           prom_ql: (100 - ((node_filesystem_files_free{job="yuanma_node_exporter"} * 100) / node_filesystem_files{job="yuanma_node_exporter"}))>90
prom_eval_interval: 15
         callbacks: https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=
       runbook_url: https://blog.csdn.net/xxxxxxx
  notify_recovered: 1
   notify_channels: wecom
     notify_groups: 12
 notify_cur_number: 1
      target_ident: 
       target_note: 
first_trigger_time: 1684326941
      trigger_time: 1684326941
     trigger_value: 6.39503
      recover_time: 1684327094
    last_eval_time: 1684327094
              tags: device=/dev/vdb,,env2=ym_node,,env=ym_jenkins,,fstype=ext4,,instance=172.21.0.100:9102,,job=yuanma_node_exporter,,mountpoint=/data,,rulename=inode资源不足-使用率超过90 

 对比告警模板文件

cat /opt/n9e/etc/template/wecom.tpl

**级别状态**: {{if .IsRecovered}}S{{.Severity}} Recovered{{else}}S{{.Severity}} Triggered{{end}}
**规则标题**: {{.RuleName}}{{if .RuleNote}}
**规则备注**: {{.RuleNote}}{{end}}{{if .TargetIdent}}
**监控对象**: {{.TargetIdent}}{{end}}
**监控指标**: {{.TagsJSON}}{{if not .IsRecovered}}
**触发时值**: {{.TriggerValue}}{{end}}
{{if .IsRecovered}}**恢复时间**: {{timeformat .LastEvalTime}}{{else}}**首次触发时间**: {{timeformat .FirstTriggerTime}}{{end}}
{{$time_duration := sub now.Unix .FirstTriggerTime }}{{if .IsRecovered}}{{$time_duration = sub .LastEvalTime .FirstTriggerTime }}{{end}}**持续时长**: {{humanizeDurationInterface $time_duration}}
**发送时间**: {{timestamp}}

结合上面数据库的字段可以得出规律,告警模板中对应到数据库字段就是字段首字母改成大写

因此根据预案链接在数据库中的字段 runbook_url 可以得出模板应该是 RunbookUrl

更改后的模板为:

**级别状态**: {{if .IsRecovered}}S{{.Severity}} Recovered{{else}}S{{.Severity}} Triggered{{end}}
**规则标题**: {{.RuleName}}{{if .RuleNote}}
**规则备注**: {{.RuleNote}}{{end}}{{if .TargetIdent}}
**监控对象**: {{.TargetIdent}}{{end}}
**监控指标**: {{.TagsJSON}}{{if not .IsRecovered}}
**预案链接**: {{.RunbookUrl}}
**触发时值**: {{.TriggerValue}}{{end}}
{{if .IsRecovered}}**恢复时间**: {{timeformat .LastEvalTime}}{{else}}**首次触发时间**: {{timeformat .FirstTriggerTime}}{{end}}
{{$time_duration := sub now.Unix .FirstTriggerTime }}{{if .IsRecovered}}{{$time_duration = sub .LastEvalTime .FirstTriggerTime }}{{end}}**持续时长**: {{humanizeDurationInterface $time_duration}}
**发送时间**: {{timestamp}}

最后重启n9e服务

ps -ef |grep './n9e server' |grep -v grep

kill xxx

cd /opt/n9e

nohup ./n9e server &> server.log &

看下效果

Nightingale(n9e)修改告警模板_第2张图片

 

你可能感兴趣的:(Prometheus,prometheus)