openfalcon - 日志报警实验

  • falcon-logdog配置如下

openfalcon - 日志报警实验_第1张图片

  • 报警策略配置。注意看error=*这个地方,本身error是tag里面的name,本意是只要log这个指标中有error这个name那么不管value是啥都要报警。

openfalcon - 日志报警实验_第2张图片

  • 打入测试数据

这里写图片描述

  • logdog 已经发现了关键字,注意,这个地方还是两行,因为我们有两个exp,而且是包含的关系,我们的测试数据是一个子集,所以匹配到两行

这里写图片描述

  • 那么会不会报警来?。抱歉这个真没有,

openfalcon - 日志报警实验_第3张图片

  • 如果我们去掉error=*呢?结果会如何?

openfalcon - 日志报警实验_第4张图片

  • 报警出现了。但是,为什么只有一个error的,而没有我们的nameMachineOffline呢?

openfalcon - 日志报警实验_第5张图片

  • 怀疑 是计数器知会更新 ”服务器下线: null“的,而”服务器下线”这个并没有计数器增加,再来多测试几次

  • 这次,先准备一个脚本

#!/bin/bash

for x in `seq 1 1000`
do
    echo "服务器下线: null" >> xcloud-cm.2016-10-17_0.info.log && echo "已插入一条数据"
    sleep 1
done
  • 打入数据

openfalcon - 日志报警实验_第6张图片

  • logdog已经exp到了关键字,而且是两个keyword都exp到了

openfalcon - 日志报警实验_第7张图片

  • 再看最终上报的数据 “服务器下线: null”数值一直在攀高

openfalcon - 日志报警实验_第8张图片

  • 看看另外一个 仅仅是 ”服务器下线“ 关键字的counter,键值就是大马路,平哒~~

openfalcon - 日志报警实验_第9张图片

  • 由此我们可以看出, logdog在处理计数器问题上,一条日志也就能使一个counter进行+1,那么我们可能会有两个方面需要考虑
1. 配置文件中,如果把两个地方顺序颠倒,会如何?
2. 如果我们仅仅是把关键字 "服务器下线" 打进去,会如何?
  • 先做第二个,这个简单,可是结果让人有点想不通

  • 上报数据已经是60了,可是图形依旧是不显示, 当然也就没了报警。 那么这个问题就值得回归测试一下,是不是上面其实计数器上报了呢。

#!/bin/bash

for x in `seq 1 1000`
do
    #echo "服务器下线: null" >> xcloud-cm.2016-10-17_0.info.log && echo "已插入一条数据"
    echo "服务器下线" >> xcloud-cm.2016-10-17_0.info.log && echo "已插入一条数据"
    sleep 1
done

这里写图片描述

openfalcon - 日志报警实验_第10张图片

* 重复一遍第一个测试,试试logdog在pushdata的时候是多少。可以看到,虽然exp两个都可以exp到,但是最终上报仍旧是一个有数值,另外一个没有数值。

这里写图片描述

  • 我们调换下配置的顺序再看看。两个的VALUE都是60。
[10/17/16 23:33:27] [DEBG] pushing data: [{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761607,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-private-api,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761607,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=."},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761607,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-dispatcher,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761607,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761549,"value":60,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=.null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761549,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-private-api,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761548,"value":60,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=."}]
  • 我已经晕菜了~~

  • 总结

    • error=* 这种通配符对已falcon来说不好使
    • 避免一条语句被两个策略匹配到。
    • 休息会,再搞。
  • 再来一个实验,这次把 ”服务器下线: null“ 和 ”null“进行对比

[10/17/16 23:40:15] [DEBG] pushing data: [{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762015,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762015,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-dispatcher,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762015,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761955,"value":844,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=.null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761955,"value":845,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476761955,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762015,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-private-api,suffix=log,error=closing.socket.connection.and.attempting.reconnect"}]

[abc@ip-10-0-100-85 falcon-logdog]$ ./control tail | grep push
[10/17/16 23:43:15] [DEBG] pushing data: [{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762195,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-private-api,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762195,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-dispatcher,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762195,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762195,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=closing.socket.connection.and.attempting.reconnect"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762135,"value":39,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=.null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762141,"value":38,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,nameMachineOffline=null"},{"metric":"log","endpoint":"10.0.100.85","timestamp":1476762135,"value":0,"step":60,"counterType":"GAUGE","tags":"prefix=abc-cm,suffix=log,error=closing.socket.connection.and.attempting.reconnect"}]

看了上面的结果,我有点蒙了, 因为数值出现了800度,这个是不可能的哦。 第二条数据倒是都在38左右,但是怎么多了个指标来。

  • 暂时结论
    • 尽量最小化应用logdog
    • 有空看下代码,也许有发现
    • 而且最要命的是 不支持中文, 中文去掉,空格用点代替

你可能感兴趣的:(openfalcon)