使用open-falcon的人估计都会去折腾该监控系统的报警过程,因为一个监控系统的核心功能就是监控报警,报警也是监控的最终目的。所以,了解一个监控系统的报警原理是每一位使用者必有的好奇心。好像是没有弄明白一件事,心理层面就会有一根刺插在那,非要把他拔掉一样。我想这不是对追求知识的执着,而仅仅是强迫症的一种表现。下面,是我对open-falcon报警信息处理过程的分析思路。包括:前期环境的准备、分析过程、处理过程、处理的优化。系统环境: Ubuntu15.04_64bit、open-falcon源码、redis、mysql、golang、gcc等
1、搭建开发环境
1.1安装c语言环境
sudo apt-get install build-essential
sudo tar -zxvf go1.4.2.linux-amd64.tar.gz -C /usr/local/
export GOROOT=/usr/local/go
export GOBIN=$GOROOT/bin
export PATH=$PATH:$GOBIN
export GOPATH=$HOME/goproj
source /etc/profile
go version
sudo apt-get install mysql-server mysql-client libmysqlclient*
wget http://download.redis.io/releases/redis-3.0.5.tar.gz
tar zxvf redis-3.0.5.tar.gz
cd redis-3.0.5/
sudo apt-get install tcl
make
sudo make install
mkdir $HOME/goproj
cd $HOME/goproj
mkdir -p src/github.com
cd src/github.com
git clone --recursive https://github.com/XiaoMi/open-falcon.git
这里以安装alarm模块为例子,其他的可以参考官方文档,我应该也会在博客更新
cd open-falcon/alarm/
sudo chmod 777 /usr/local/go/bin/
go get ./...
./control build
2.1查找redis数据库
使用redis-cli连接redis数据库,查询是否存在报警信息:
key ×
1) "session:obj:fe557589a85711e58528000c29bd7b56"
2) "t:uids:4"
3) "foo"
4) "team:obj:5"
5) "team:id:alarm"
6) "user:obj:6"
7) "team:id:alarm_info"
8) "user:obj:7"
9) "user:obj:8"
10) "user:id:admin"
11) "user:obj:11"
12) "t:uids:6"
13) "t:uids:5"
14) "user:obj:1"
15) "user:obj:10"
2.2阅读open-falcon文档
报警信息是由judge模块产生的,每次产生报警信息都会记录到redis数据库,而且详细划分报警的等级,那么为什么会没有报警在redis里面呢? 来看alarm模块,每次产生报警信息的时候都会及时上报给用户,我们也可以在界面上看到完整的报警信息,但是这些信息却没有在redis查询到。那么,只能开始阅读以上两个模块redis操作的源代码。
2.3阅读open-falcon源码
judge模块使用LPUSH命令写报警信息到redis里面,LPUSH(从队列的左边入队一个或多个元素),把报警信息写到了redis队列里面,等待别的进程获取。到这已经有点眉目了,如果,队列里面的报警信息出队了,所以redis就查询不到报警信息。alarm模块使用BRPOP命令获取redis里的报警信息,BRPOP(删除,并获得该列表中的最后一个元素,或阻塞,直到有一个可用),把报警信息从redis里面出队并且删除该报警信息。
2.4修改源码记录报警信息
judge中redis的报警信息写日志:
log.Printf("redis key is %v", redisKey)
log.Printf("redis value is %v", string(bs))
log.Printf("redis key is %v", redisKey)
log.Printf("redis value is %v", string(bs))
2015/12/22 14:12:48 judge.go:82: redis key is event:p0
2015/12/22 14:12:48 judge.go:83: redis value is {"id":"s_7_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":7,"metric":"mem.memfree.percent"," tags":{},"func":"all(#3)","operator":"\u003c=","rightValue":100,"maxStep":3,"priority":0,"note":"memfree alarm test","tpl":{"id":2,"name":"memer y","parentId":0,"actionId":1,"creator":"admin"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":33.335713095833036,"current Step":3,"eventTime":1450764720,"pushedTags":{}}
2015/12/22 14:04:13 reader.go:65: the redis key is: [event:p0 event:p1 event:p2 event:p3 event:p4 event:p5 0]
2015/12/22 14:04:13 reader.go:66: the redis value is: [event:p0 {"id":"s_7_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":7,"metric":"mem.me mfree.percent","tags":{},"func":"all(#3)","operator":"\u003c=","rightValue":100,"maxStep":3,"priority":0,"note":"memfree alarm test","tpl":{"id" :2,"name":"memery","parentId":0,"actionId":1,"creator":"admin"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":38.62473525 142191,"currentStep":1,"eventTime":1450764120,"pushedTags":{}}]
./control stop
127.0.0.1:6379> KEYS *
127.0.0.1:6379> KEYS *
1) "event:p0"
127.0.0.1:6379> TYPE event:p0
list
127.0.0.1:6379> lpop event:p0
"{\"id\":\"s_7_9e899684e61cce209c14444cfb4e33bc\",\"strategy\":{\"id\":7,\"metric\":\"mem.memfree.percent\",\"tags\":{},\"func\":\"all(#3)\",\"operator\":\"\\u003c=\",\"rightValue\":100,\"maxStep\":3,\"priority\":0,\"note\":\"memfree alarm test\",\"tpl\":{\"id\":2,\"name\":\"memery\",\"parentId\":0,\"actionId\":1,\"creator\":\"admin\"}},\"expression\":null,\"status\":\"PROBLEM\",\"endpoint\":\"bogon\",\"leftValue\":33.335713095833036,\"currentStep\":3,\"eventTime\":1450764720,\"pushedTags\":{}}"
也可以通过C语言程序获取该报警信息。
3.2c语言获取报警信息
连接redis数据库:
redisContext* conn = redisConnect("127.0.0.1",6379);
redisReply* reply = redisCommand(conn,"BRPOP event:p0 0 ");
或者
redisReply* reply = redisCommand(conn,"RPOP event:p0");
BRPOP、RPOP 是redis出队列命令,BRPOP是阻塞模式,0表示一直阻塞;RPOP是非阻塞模式。
/* This is the reply object returned by redisCommand() */
typedef struct redisReply {
int type; /* REDIS_REPLY_* */
long long integer; /* The integer when type is REDIS_REPLY_INTEGER */
int len; /* Length of string */
char *str; /* Used for both REDIS_REPLY_ERROR and REDIS_REPLY_STRING */
size_t elements; /* number of elements, for REDIS_REPLY_ARRAY */
struct redisReply **element; /* elements vector for REDIS_REPLY_ARRAY */
} redisReply;
#define REDIS_REPLY_STRING 1
#define REDIS_REPLY_ARRAY 2
#define REDIS_REPLY_INTEGER 3
#define REDIS_REPLY_NIL 4
#define REDIS_REPLY_STATUS 5
#define REDIS_REPLY_ERROR 6
for(i = 0; i < reply->elements; ++i){
redisReply* childReply = reply->element[i];
if (childReply->type == REDIS_REPLY_STRING)
printf("The value is %s.\n",childReply->str);
}
The value is event:p0.
The value is {"id":"s_3_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":3,"metric":"mem.memfree.percent","tags":{},"func":"all(#3)","operator":"\u003c","rightValue":100,"maxStep":20,"priority":0,"note":"鍐呭瓨浣跨敤閲忓お澶,"tpl":{"id":3,"name":"local","parentId":0,"actionId":2,"creator":"root"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":27.076937721615383,"currentStep":1,"eventTime":1450851060,"pushedTags":{}}.
fd=fopen("./alarm_info_log","rw+");
fwrite(childReply->str,1,childReply->len,fd);
event:p0{"id":"s_3_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":3,"metric":"mem.memfree.percent","tags":{},"func":"all(#3)","operator:"\u003c","rightValue":100,"maxStep":20,"priority":0,"note":"鍐呭瓨浣跨敤閲忓お澶,"tpl":{"id":3,"name":"local","parentId":0,"actionId":2,"creator":"root"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":27.076937721615383,"currentStep":1,"eventTime":1450851060,"pushedTags":{}}liang@bogon:~/redis/proc