zabbix事件与触发器的基本原理

以zabbix 3.4为例。
每个zabbix触发器由一个唯一的trigger id进行标识,触发器条件满足的时候,zabbix生成事件。例如cpu利用率连续5分钟大于90%是一个条件,根据这个条件可以定义一个触发器。cpu利用率的数据在zabbix的术语中叫做一个item监控项。zabbix监控大量的item,例如cpu,磁盘,网络的利用率,ping状态,web服务可用性等等。
触发器有且只有两种状态,“Ok”表示正常,“Problem”表示出现问题,超出了规定的阈值。当触发器的状态变化的时候,一个event发生了。进入Problem状态的触发器,就是一个zabbix problem。
用来应对一个event的动作叫做action,一个action是一个操作及其结果,例如发送邮件通知。
https://www.zabbix.com/documentation/3.4/manual/api/reference/event/object#host
zabbix事件与触发器的基本原理_第1张图片
当有监控item满足触发条件,就会生成触发器事件。事件一旦恢复,并不会更新events表的value字段,而是在event_recovery表中生成一条记录。在event_recovery表中,可以看到一个event事件对应了一个recovery event恢复事件。
zabbix API调用 problem.get 可以获取当前的未解决告警,也就是那些处于Problem状态的未恢复的触发器。
problem表结构定义
MariaDB [zabbix]> describe problem;
+---------------+---------------------+------+-----+---------+-------+
| Field         | Type                | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| eventid       | bigint(20) unsigned | NO   | PRI | NULL    |       |
| source        | int(11)             | NO   | MUL | 0       |       |
| object        | int(11)             | NO   |     | 0       |       |
| objectid      | bigint(20) unsigned | NO   |     | 0       |       |
| clock         | int(11)             | NO   |     | 0       |       |
| ns            | int(11)             | NO   |     | 0       |       |
| r_eventid     | bigint(20) unsigned | YES  | MUL | NULL    |       |
| r_clock       | int(11)             | NO   | MUL | 0       |       |
| r_ns          | int(11)             | NO   |     | 0       |       |
| correlationid | bigint(20) unsigned | YES  |     | NULL    |       |
| userid        | bigint(20) unsigned | YES  |     | NULL    |       |
+---------------+---------------------+------+-----+---------+-------+

获得当前的problem。
# curl -s -S -X POST -d  { "jsonrpc": "2.0", "method": " problem.get ", "params": { "output": [ "extend", "clock", "source", "object" ], "selectAcknowledges": "extend", "selectTags": "extend", "recent": "true", "sortfield": ["eventid"], "sortorder": "DESC" }, "id": 27, "auth": "25311982e31f1b0a6815489e84d82a1c" } -H Content-type: application/json-rpc  http://10.10.144.21:80/zabbix/api_jsonrpc.php
{"jsonrpc":"2.0","result":[{"eventid":" 90548 ","clock":"1508918765","source":"0","object":"0","acknowledges":[],"tags":[]},{"eventid":" 15 ","clock":"1508217852","source":"0","object":"0","acknowledges":[],"tags":[]}],"id":27}

从数据库查看problem。
MariaDB [zabbix]> select * from problem;
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
| eventid | source | object | objectid | clock      | ns        | r_eventid | r_clock    | r_ns      | correlationid | userid |
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
|       15  |      0 |      0 |    13496 | 1508217852 | 654551752 |      NULL |          0 |         0 |      NULL |   NULL |
|    90548  |      0 |      0 |    15353 | 1508918765 | 555842555 |      NULL |          0 |         0 |      NULL |   NULL |
|  125948 |      0 |      0 |    13468 | 1509145833 | 731969958 |    125980 | 1509164493 | 757545190 |      NULL |      0 |
|  125949 |      0 |      0 |    13491 | 1509145387 | 368586554 |    125963 | 1509164439 | 306719861 |      NULL |      0 |
|  125950 |      0 |      0 |    15247 | 1509145397 | 402078177 |    125964 | 1509164439 | 412572051 |      NULL |      0 |
|  125951 |      0 |      0 |    15264 | 1509145378 | 343791816 |    125965 | 1509164439 | 265666274 |      NULL |      0 |
|  125952 |      0 |      0 |    15298 | 1509145400 | 435134528 |    125966 | 1509164439 | 458663016 |      NULL |      0 |
|  125953 |      0 |      0 |    15327 | 1509145409 | 481315206 |    125967 | 1509164439 | 493946984 |      NULL |      0 |
|  125954 |      0 |      0 |    15348 | 1509145376 | 340446199 |    125968 | 1509164439 | 262846921 |      NULL |      0 |
|  125955 |      0 |      0 |    13470 | 1509145835 | 756466476 |    125998 | 1509164555 | 977155157 |      NULL |      0 |
|  125956 |      0 |      0 |    13560 | 1509145824 | 655417514 |    125997 | 1509164544 | 934757217 |      NULL |      0 |
|  125957 |      0 |      0 |    13472 | 1509145837 | 765698913 |    125982 | 1509164497 | 774275470 |      NULL |      0 |
|  125958 |      0 |      0 |    13474 | 1509145839 | 783852885 |    125984 | 1509164499 | 776906869 |      NULL |      0 |
|  125959 |      0 |      0 |    13483 | 1509145848 |  53121952 |    125985 | 1509164508 | 798965064 |      NULL |      0 |
|  125960 |      0 |      0 |    13484 | 1509145849 |  60008025 |    125986 | 1509164509 | 801412685 |      NULL |      0 |
|  125961 |      0 |      0 |    13471 | 1509145836 | 759186160 |    125981 | 1509164496 | 772208321 |      NULL |      0 |
|  125962 |      0 |      0 |    13473 | 1509146738 | 270746156 |    125983 | 1509164498 | 775775544 |      NULL |      0 |
|  125969 |      0 |      0 |    13479 | 1509164439 | 526244125 |    125999 | 1509164564 |  84841234 |      NULL |      0 |
|  129000 |      0 |      0 |    13498 | 1509182541 | 322822853 |    129011 | 1509182601 | 538799948 |      NULL |      0 |
+---------+--------+--------+----------+------------+-----------+-----------+------------+-----------+-----------+--------+
19 rows in set (0.01 sec)

zabbix后台的problem代码逻辑
/usr/share/zabbix/include/classes/mvc/CRouter.php 是zabbix mvc架构的 总体路线图
action是请求request。
control 用来处理action。
view 用于生成页面,HTML, CSV, JSON 等内容。
layout 用于渲染render页面。

例如
action        widget.problems.view
control      CControllerWidgetProblemsView 
layout        layout.widget    
view         monitoring.widget.problems.view

沿着这个思路,就可以知道获取problem数据是在CControllerWidgetProblemsView。
/usr/share/zabbix/include/classes/api/services/CProblem.php
CProblem从zabbix数据库problem表取数据。


触发器和事件举例。
查看触发器id为15353的触发器。
MariaDB [zabbix]> select * from triggers where triggerid=' 15353 ';
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+
| triggerid |  expression   | description                       | url | status | value | priority | lastchange | comments | error | templateid | type | state | flags | recovery_mode | recovery_expression | correlation_mode | correlation_tag | manual_close |
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+
|      15353  | { 16631 }>300 | Too many processes on {HOST.NAME} |     |      0 |     1 |        2 | 1508918765 |          |       |      10190 |    0 |     0 |     0 |             0 |                     |                0 |                 |            0 |
+-----------+-------------+-----------------------------------+-----+--------+-------+----------+------------+----------+-------+------------+------+-------+-------+---------------+---------------------+------------------+-----------------+--------------+

上面的触发器表达式是{16631}>300。16631是functionid。

MariaDB [zabbix]> describe functions;
+------------+---------------------+------+-----+---------+-------+
| Field      | Type                | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| functionid | bigint(20) unsigned | NO   | PRI | NULL    |       |
| itemid     | bigint(20) unsigned | NO   | MUL | NULL    |       |
| triggerid  | bigint(20) unsigned | NO   | MUL | NULL    |       |
| function   | varchar(12)         | NO   |     |         |       |
| parameter  | varchar(255)        | NO   |     | 0       |       |
+------------+---------------------+------+-----+---------+-------+

MariaDB [zabbix]> select * from functions where functionid='16631';
+------------+--------+-----------+----------+-----------+
| functionid | itemid | triggerid | function | parameter |
+------------+--------+-----------+----------+-----------+
|       16631  |   28621  |      15353  | avg      | 5m        |
+------------+--------+-----------+----------+-----------+

MariaDB [zabbix]> select itemid,hostid,name,key_,description from items where itemid='28621';
+--------+--------+---------------------+------------+-----------------------------------------+
| itemid | hostid | name                | key_       | description                             |
+--------+--------+---------------------+------------+-----------------------------------------+
|   28621  |   10259  | Number of processes | proc.num[] | Total number of processes in any state. |
+--------+--------+---------------------+------------+-----------------------------------------+

hostid 10259对应host gb21。

triggerid是唯一的,例如15353触发器是“ Too many processes on {HOST.NAME}”。这个触发器,应用到了host gb21。
[root@gb21 vmtest]# ps -ef|wc -l
336
所以满足触发器条件。
zabbix事件与触发器的基本原理_第2张图片

下面是一条已解决告警。
http://10.10.144.21/zabbix/tr_events.php?triggerid=15353&eventid=90504
zabbix事件与触发器的基本原理_第3张图片

MariaDB [zabbix]> describe events;
+--------------+---------------------+------+-----+---------+-------+
| Field        | Type                | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+-------+
| eventid      | bigint(20) unsigned | NO   | PRI | NULL    |       |
| source       | int(11)             | NO   | MUL | 0       |       |
| object       | int(11)             | NO   |     | 0       |       |
| objectid     | bigint(20) unsigned | NO   |     | 0       |       |
| clock        | int(11)             | NO   |     | 0       |       |
| value        | int(11)             | NO   |     | 0       |       |
| acknowledged | int(11)             | NO   |     | 0       |       |
| ns           | int(11)             | NO   |     | 0       |       |
+--------------+---------------------+------+-----+---------+-------+
8 rows in set (0.00 sec)


MariaDB [zabbix]> select * from events where eventid='90504';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
|   90504 |      0 |      0 |     15353  | 1508766001 |     1 |            0 | 989854339 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+

下面是一条未解决告警。也就是说,触发器处于Problem状态。
http://10.10.144.21/zabbix/tr_events.php?triggerid=15353&eventid=90548
zabbix事件与触发器的基本原理_第4张图片

MariaDB [zabbix]> select * from events where eventid=' 90548 ';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
|   90548 |      0 |      0 |     15353  | 1508918765 |     1 |            0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+

MariaDB [zabbix]> describe triggers;
+---------------------+---------------------+------+-----+---------+-------+
| Field               | Type                | Null | Key | Default | Extra |
+---------------------+---------------------+------+-----+---------+-------+
triggerid            | bigint(20) unsigned | NO   | PRI | NULL    |       |
| expression          | varchar(2048)       | NO   |     |         |       |
| description         | varchar(255)        | NO   |     |         |       |
| url                 | varchar(255)        | NO   |     |         |       |
| status              | int(11)             | NO   | MUL | 0       |       |
| value               | int(11)             | NO   | MUL | 0       |       |
| priority            | int(11)             | NO   |     | 0       |       |
| lastchange          | int(11)             | NO   |     | 0       |       |
| comments            | text                | NO   |     | NULL    |       |
| error               | varchar(2048)       | NO   |     |         |       |
| templateid          | bigint(20) unsigned | YES  | MUL | NULL    |       |
| type                | int(11)             | NO   |     | 0       |       |
| state               | int(11)             | NO   |     | 0       |       |
| flags               | int(11)             | NO   |     | 0       |       |
| recovery_mode       | int(11)             | NO   |     | 0       |       |
| recovery_expression | varchar(2048)       | NO   |     |         |       |
| correlation_mode    | int(11)             | NO   |     | 0       |       |
| correlation_tag     | varchar(255)        | NO   |     |         |       |
| manual_close        | int(11)             | NO   |     | 0       |       |
+---------------------+---------------------+------+-----+---------+-------+

MariaDB [zabbix]> select count(*) from events where objectid=' 15353 ';
+----------+
| count(*) |
+----------+
|      847 |
+----------+
当触发条件满足的时候,zabbix生成一个触发器事件event。

MariaDB [zabbix]> select count(*) from events where object='0' and value='1' and objectid=' 15353 ';
+----------+
| count(*) |
+----------+
|      424 |
+----------+
未解决的告警有424条?其实不然,还要参考 event_recovery 表。
MariaDB [zabbix]> describe event_recovery;
+---------------+---------------------+------+-----+---------+-------+
| Field         | Type                | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
eventid        | bigint(20) unsigned | NO   | PRI | NULL    |       |
r_eventid      | bigint(20) unsigned | NO   | MUL | NULL    |       |
| c_eventid     | bigint(20) unsigned | YES  | MUL | NULL    |       |
| correlationid | bigint(20) unsigned | YES  |     | NULL    |       |
| userid        | bigint(20) unsigned | YES  |     | NULL    |       |
+---------------+---------------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

MariaDB [zabbix]> select * from events where object='0' and value='1' and objectid='15353';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
......
|   90504 |      0 |      0 |    15353 | 1508766001 |     1 |            0 | 989854339 |
|   90548 |      0 |      0 |    15353 | 1508918765 |     1 |            0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
424 rows in set (0.01 sec)

以90504和90548为例。

MariaDB [zabbix]> select * from event_recovery where eventid=' 90548 ';
Empty set (0.00 sec)

90504事件已经解决。
MariaDB [zabbix]> select * from  event_recovery  where eventid='90504';
+---------+-----------+-----------+---------------+--------+
| eventid | r_eventid | c_eventid | correlationid | userid |
+---------+-----------+-----------+---------------+--------+
|    90504  |      90522  |      NULL |          NULL |   NULL |
+---------+-----------+-----------+---------------+--------+
1 row in set (0.00 sec)

MariaDB [zabbix]> select * from  events  where eventid='90522';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      |  value  | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
|    90522  |      0 |      0 |    15353 | 1508766301 |      0  |            0 | 524849764 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
1 row in set (0.00 sec)

90548事件没有解决。
MariaDB [zabbix]> select * from  events  where eventid='90548';
+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
|    90548  |      0 |      0 |    15353 | 1508918765 |      1  |            0 | 555842555 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
1 row in set (0.00 sec)


你可能感兴趣的:(运维)