使用open-falcon的人估计都会去折腾该监控系统的报警过程,因为一个监控系统的核心功能就是监控报警,报警也是监控的最终目的。所以,了解一个监控系统的报警原理是每一位使用者必有的好奇心。好像是没有弄明白一件事,心理层面就会有一根刺插在那,非要把他拔掉一样。我想这不是对追求知识的执着,而仅仅是强迫症的一种表现。下面,是我对open-falcon报警信息处理过程的分析思路。包括:前期环境的准备、分析过程、处理过程、处理的优化。系统环境: Ubuntu15.04_64bit、open-falcon源码、redis、mysql、golang、gcc等
1、搭建开发环境
1.1安装c语言环境
sudo apt-get install build-essential
sudo tar -zxvf go1.4.2.linux-amd64.tar.gz -C /usr/local/
export GOROOT=/usr/local/go export GOBIN=$GOROOT/bin export PATH=$PATH:$GOBIN export GOPATH=$HOME/goproj
source /etc/profile
go version
sudo apt-get install mysql-server mysql-client libmysqlclient* wget http://download.redis.io/releases/redis-3.0.5.tar.gz tar zxvf redis-3.0.5.tar.gz cd redis-3.0.5/ sudo apt-get install tcl make sudo make install
mkdir $HOME/goproj cd $HOME/goproj mkdir -p src/github.com cd src/github.com git clone --recursive https://github.com/XiaoMi/open-falcon.git
这里以安装alarm模块为例子,其他的可以参考官方文档,我应该也会在博客更新
cd open-falcon/alarm/ sudo chmod 777 /usr/local/go/bin/ go get ./... ./control build
2.1查找redis数据库
使用redis-cli连接redis数据库,查询是否存在报警信息:
key ×
1) "session:obj:fe557589a85711e58528000c29bd7b56" 2) "t:uids:4" 3) "foo" 4) "team:obj:5" 5) "team:id:alarm" 6) "user:obj:6" 7) "team:id:alarm_info" 8) "user:obj:7" 9) "user:obj:8" 10) "user:id:admin" 11) "user:obj:11" 12) "t:uids:6" 13) "t:uids:5" 14) "user:obj:1" 15) "user:obj:10"
2.2阅读open-falcon文档
报警信息是由judge模块产生的,每次产生报警信息都会记录到redis数据库,而且详细划分报警的等级,那么为什么会没有报警在redis里面呢? 来看alarm模块,每次产生报警信息的时候都会及时上报给用户,我们也可以在界面上看到完整的报警信息,但是这些信息却没有在redis查询到。那么,只能开始阅读以上两个模块redis操作的源代码。
2.3阅读open-falcon源码
judge模块使用LPUSH命令写报警信息到redis里面,LPUSH(从队列的左边入队一个或多个元素),把报警信息写到了redis队列里面,等待别的进程获取。到这已经有点眉目了,如果,队列里面的报警信息出队了,所以redis就查询不到报警信息。alarm模块使用BRPOP命令获取redis里的报警信息,BRPOP(删除,并获得该列表中的最后一个元素,或阻塞,直到有一个可用),把报警信息从redis里面出队并且删除该报警信息。
2.4修改源码记录报警信息
judge中redis的报警信息写日志:
log.Printf("redis key is %v", redisKey) log.Printf("redis value is %v", string(bs))
log.Printf("redis key is %v", redisKey) log.Printf("redis value is %v", string(bs))
2015/12/22 14:12:48 judge.go:82: redis key is event:p0 2015/12/22 14:12:48 judge.go:83: redis value is {"id":"s_7_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":7,"metric":"mem.memfree.percent"," tags":{},"func":"all(#3)","operator":"\u003c=","rightValue":100,"maxStep":3,"priority":0,"note":"memfree alarm test","tpl":{"id":2,"name":"memer y","parentId":0,"actionId":1,"creator":"admin"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":33.335713095833036,"current Step":3,"eventTime":1450764720,"pushedTags":{}}
2015/12/22 14:04:13 reader.go:65: the redis key is: [event:p0 event:p1 event:p2 event:p3 event:p4 event:p5 0] 2015/12/22 14:04:13 reader.go:66: the redis value is: [event:p0 {"id":"s_7_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":7,"metric":"mem.me mfree.percent","tags":{},"func":"all(#3)","operator":"\u003c=","rightValue":100,"maxStep":3,"priority":0,"note":"memfree alarm test","tpl":{"id" :2,"name":"memery","parentId":0,"actionId":1,"creator":"admin"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":38.62473525 142191,"currentStep":1,"eventTime":1450764120,"pushedTags":{}}]
./control stop
127.0.0.1:6379> KEYS * 127.0.0.1:6379> KEYS * 1) "event:p0" 127.0.0.1:6379> TYPE event:p0 list 127.0.0.1:6379> lpop event:p0 "{\"id\":\"s_7_9e899684e61cce209c14444cfb4e33bc\",\"strategy\":{\"id\":7,\"metric\":\"mem.memfree.percent\",\"tags\":{},\"func\":\"all(#3)\",\"operator\":\"\\u003c=\",\"rightValue\":100,\"maxStep\":3,\"priority\":0,\"note\":\"memfree alarm test\",\"tpl\":{\"id\":2,\"name\":\"memery\",\"parentId\":0,\"actionId\":1,\"creator\":\"admin\"}},\"expression\":null,\"status\":\"PROBLEM\",\"endpoint\":\"bogon\",\"leftValue\":33.335713095833036,\"currentStep\":3,\"eventTime\":1450764720,\"pushedTags\":{}}"
也可以通过C语言程序获取该报警信息。
3.2c语言获取报警信息
连接redis数据库:
redisContext* conn = redisConnect("127.0.0.1",6379);
redisReply* reply = redisCommand(conn,"BRPOP event:p0 0 ");或者
redisReply* reply = redisCommand(conn,"RPOP event:p0");BRPOP、RPOP 是redis出队列命令,BRPOP是阻塞模式,0表示一直阻塞;RPOP是非阻塞模式。
/* This is the reply object returned by redisCommand() */ typedef struct redisReply { int type; /* REDIS_REPLY_* */ long long integer; /* The integer when type is REDIS_REPLY_INTEGER */ int len; /* Length of string */ char *str; /* Used for both REDIS_REPLY_ERROR and REDIS_REPLY_STRING */ size_t elements; /* number of elements, for REDIS_REPLY_ARRAY */ struct redisReply **element; /* elements vector for REDIS_REPLY_ARRAY */ } redisReply;
#define REDIS_REPLY_STRING 1 #define REDIS_REPLY_ARRAY 2 #define REDIS_REPLY_INTEGER 3 #define REDIS_REPLY_NIL 4 #define REDIS_REPLY_STATUS 5 #define REDIS_REPLY_ERROR 6
for(i = 0; i < reply->elements; ++i){ redisReply* childReply = reply->element[i]; if (childReply->type == REDIS_REPLY_STRING) printf("The value is %s.\n",childReply->str); }
The value is event:p0. The value is {"id":"s_3_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":3,"metric":"mem.memfree.percent","tags":{},"func":"all(#3)","operator":"\u003c","rightValue":100,"maxStep":20,"priority":0,"note":"鍐呭瓨浣跨敤閲忓お澶,"tpl":{"id":3,"name":"local","parentId":0,"actionId":2,"creator":"root"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":27.076937721615383,"currentStep":1,"eventTime":1450851060,"pushedTags":{}}.
fd=fopen("./alarm_info_log","rw+"); fwrite(childReply->str,1,childReply->len,fd);
event:p0{"id":"s_3_9e899684e61cce209c14444cfb4e33bc","strategy":{"id":3,"metric":"mem.memfree.percent","tags":{},"func":"all(#3)","operator:"\u003c","rightValue":100,"maxStep":20,"priority":0,"note":"鍐呭瓨浣跨敤閲忓お澶,"tpl":{"id":3,"name":"local","parentId":0,"actionId":2,"creator":"root"}},"expression":null,"status":"PROBLEM","endpoint":"bogon","leftValue":27.076937721615383,"currentStep":1,"eventTime":1450851060,"pushedTags":{}}liang@bogon:~/redis/proc