最近干起了运维的活;代码改造,搭建elk,搭建告警,此处做些个总结。
Ubuntu14
Elasticsearch 5.1.2
Kibana 5.1.2
官网网址:
https://elastalert.readthedocs.io/en/latest/running_elastalert.html#tutorial
执行:
git clone https://github.com/Yelp/elastalert.git
cd elastalert
python setup.py install //可能需要sudo
Pip install -r requirements.txt //可能需要sudo
cp config.yaml.example config.yaml
注意如果你用的是elk5.0, elastalert master还不支持,你需要自己切换分支到 support_es5
安装完成后会自带三个命令:
elastalert-create-index
ElastAlert 会把执行记录存放到一个 ES 索引中,该命令就是用来创建这个索引的,默认情况下,索引名叫 elastalert_status。其中有 4 个 _type,都有自己的 @timestamp 字段,所以同样也可以用 kibana 来查看这个索引的日志记录情况。
注意:实际上,kibana5.0上并不能打开此索引,可能是兼容性问题,日后在想办法。
elastalert-rule-from-kibana
从 Kibana3 已保存的仪表盘中读取 Filtering 设置,帮助生成 config.yaml 里的配置。不过注意,它只会读取 filtering,不包括 queries。
没使用过
elastalert-test-rule
测试自定义配置中的 rule 设置。
注意:es5的话还不支持 test 功能
运行命令,加载所有rules:
python -m elastalert.elastalert --config ./config.yaml
或者单独执行 rules_folder 里的某个 rule:
python -m elastalert.elastalert --config ./config.yaml --rule ./examele_rules/one_rule.yaml
详见:http://elastalert.readthedocs.io/en/latest/ruletypes.html#alert-content
和 Watcher 类似(或者说也只有这种方式),ElastAlert 配置结构也分几个部分,但是它有自己的命名。(Watcher快出个官方工具吧,纯接口太累了)
详见:http://elastalert.readthedocs.io/en/latest/ruletypes.html#rule-configuration-cheat-sheet
背景:我已尽用docker 搭建好了elk,日志已经接入,就差告警了。
按优先级,有个需求,就是日志 status > 500
时候就行告警。
先在kibana 上把查询语句调试好了,比较简单
status: >=500
编辑elastalert下的config.yaml文件
rules_folder: example_rules
run_every:
seconds: 5 #代表每5秒钟轮询query es。
# ElastAlert will buffer results from the most recent
# period of time, in case some log sources are not in real time
buffer_time:
minutes: 5 #因为日志进入elk会有延迟,可以配制query 的向前的时间范围,这是5分钟,即查询 time[now-5m, now], 这样包括了相对多的日志数了。
# The elasticsearch hostname for metadata writeback
# Note that every rule can have it's own elasticsearch host
es_host: 192.168.1.100
es_port: 9200
#邮箱告警必配
smtp_host: smtp.sina.com
smtp_port: 465
#保存了邮箱验证的账号密码信息
smtp_auth_file: example_rules/smtp_auth_file.yaml
from_addr: xxxx@sina.com
use_ssl: False
# Option basic-auth username and password for elasticsearch
#es_username: someusername
#es_password: somepassword
#elastalert 需要在 es保存数据,这是实用的索引名称,可默认
writeback_index: elastalert_status
# If an alert fails for some reason, ElastAlert will retry
# sending the alert until this time period has elapsed
alert_time_limit:
days: 2
编辑example_rules/example_test.yaml 文件内容如下:
# Alert when the rate of events exceeds a threshold
# (Optional)
# Elasticsearch host
#es_host: 192.168.1.100
# (Optional)
# Elasticsearch port
#es_port: 9200
# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: false
# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# (Required)
# Rule name, must be unique
name: name_alert_qycloud_status_error
# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
# (Required)
# Index to search, wildcard supported
#此规则查询所使用的索引
index: monitor-*
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 5
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
minutes: 5
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
#此处是es5 的写法
filter:
- query_string:
query: "status: >=500"
# (Required)
# The alert is use when a match is found
alert:
#- "email"
- "debug"
- "command"
#我这是用告警使用命令行,自定义更好
pipe_match_json: true
command: ["/home/df/elastalert/php_alert.php"]
# (required, email specific)
# a list of email addresses to send alerts to
email:
- "[email protected]"
上面的规则,说明在5分钟
的时间窗口中,如果status: >=500
达到5次,就就行告警。
启动elastalert服务,监听elasticsearch
python -m elastalert.elastalert --verbose --rule example_rules/example_test.yaml
看下运行情况:
INFO:elastalert:Sleeping for 4 seconds
INFO:elastalert:Queried rule name_alert_qycloud_status_error from 2017-01-29 20:05 CST to 2017-01-29 20:10 CST: 6 / 6 hits
INFO:elastalert:Alert for name_alert_qycloud_status_error at 2017-01-29T12:10:21.651Z:
INFO:elastalert:name_alert_qycloud_status_error
At least 5 events occurred between 2017-01-29 20:05 CST and 2017-01-29 20:10 CST
@read_timestamp: 2017-01-29T12:10:24.043Z
@timestamp: 2017-01-29T12:10:21.651Z
@version: 1
_id: AVnqGcU6GEG-kKWj4PKc
_index: monitor-2017.01.29
_type: json_php_monitor
app: AYSaaS-master
beat: {
"hostname": "dfdeMacBook-Air.local",
"name": "dfdeMacBook-Air.local",
"version": "5.1.1"
}
client: 127.0.0.1
elapsed: 0
ent_id:
error:
......
.......
在看下告警使用的php脚本内容,只是保存告警的第一条log信息:
$fp = fopen('php://stdin', 'r');
$result = '';
while(!feof($fp)) {
$result .= fgets($fp, 128);
}
fclose($fp);
file_put_contents('/tmp/alert_test', $result . "\r\n");
还没有配制成功。。
主要关注的配制参数:
#邮箱告警必配
smtp_host: smtp.sina.com
smtp_port: 465
#保存了邮箱验证的账号密码信息
smtp_auth_file: example_rules/smtp_auth_file.yaml
from_addr: xxxx@sina.com
smtp_auth_file.yaml 内容:
user:xxx
password: xxx
http://kibana.logstash.es/content/elasticsearch/other/elastalert.html
http://blog.csdn.net/pujiaolin/article/details/52252950?locationNum=3
http://www.chinacloud.cn/show.aspx?cid=16&id=20922