一、通过aodh + gnocchi 创建并产生告警
1、 通过以下3个命令,找出最后gnocchi measures show 能获取到采样值的数据,当前使用swift的容器个数,这个相对虚机的cpu使用率来说,相对比较容易构造:
gnocchi resource list
gnocchi metric list
gnocchi measures show --resource-id 81930c5e-915a-447e-bd47-2d6675b84140 --aggregation min storage.objects.containers
2、创建使用gnocchi最为存储后端的告警:(当resource-id=81930c5e-915a-447e-bd47-2d6675b84140 对应账户下的容器数量大于8的时候,产生告警)
(当前ceilometer创建的告警 支持自己原生方式(数据从mongodb拉取),以及使用 aodh进行对接的方式(数据从gnocchi获取),具体代码后面进行分析)
ceilometer alarm-gnocchi-resources-threshold-create --name gnocchi_storage_objects_containers_large --description 'storage.objects.containers' --alarm-action 'log://' --evaluation-periods 3 --aggregation-method max --comparison-operator gt --threshold 8 -m storage.objects.containers --resource-type swift_account --resource-id 81930c5e-915a-447e-bd47-2d6675b84140
3、通过swift的命令,创建resource-id=81930c5e-915a-447e-bd47-2d6675b84140 对应账户下的容器
swift upload test001 test
swift upload test002 test
4、查询告警状态,当创建的容器超过8的时候,则告警状态变为alarm
aodh alarm list
+--------------------------------------+-----------------------------+------------------------------------------+-------+----------+---------+
| alarm_id | type | name | state | severity | enabled |
+--------------------------------------+-----------------------------+------------------------------------------+-------+----------+---------+
| 1f2fcc31-aae9-4381-bc00-4aac36f7fdca | gnocchi_resources_threshold | gnocchi_storage_objects_containers_large | ok | low | True |
+--------------------------------------+-----------------------------+------------------------------------------+-------+----------+---------+
5、由于gnocchi在计算告警状态的时候,要求告警统计周期与采样保存的力度granularity需保持一致,为了达到以上效果,本人修改了下代码,去除判断一致的代码
大家根据自己需要,看是否需要修改
#在 aodh/evaluator/gnocchi.py 中GnocchiResourceThresholdEvaluator使用的是父类的_sanitize,自己没有实现该方法:
class GnocchiBase(threshold.ThresholdEvaluator):
@staticmethod
def _sanitize(rule, statistics):
statistics = [stats[2] for stats in statistics
if stats[1] == rule['granularity']]
statistics = statistics[-rule['evaluation_periods']:]
return statistics
#我这边实现如下,去除了if stats[1] == rule['granularity'] 的判断
class GnocchiResourceThresholdEvaluator(GnocchiBase):
@staticmethod
def _sanitize(rule, statistics):
statistics = [stats[2] for stats in statistics]
statistics = statistics[-rule['evaluation_periods']:]
return statistics
二、# 创建metric告警
ceilometer alarm-gnocchi-aggregation-by-metrics-threshold-create --name metric_alarm001 --severity low --alarm-action 'log://' --granularity 60 --evaluation-periods 3 --aggregation-method max --comparison-operator gt --threshold 8 -m test
+---------------------------+-----------------------------------------------------+
| Property | Value |
+---------------------------+-----------------------------------------------------+
| aggregation_method | max |
| alarm_actions | ["log://"] |
| alarm_id | bb032ca7-f6a5-4374-b654-9ce0c0112982 |
| comparison_operator | gt |
| description | gnocchi_aggregation_by_metrics_threshold alarm rule |
| enabled | True |
| evaluation_periods | 3 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metrics | ["test"] |
| name | metric_alarm001 |
| ok_actions | [] |
| project_id | d0fb3737f91749139a2a520e26712986 |
| repeat_actions | False |
| severity | low |
| state | insufficient data |
| threshold | 8.0 |
| type | gnocchi_aggregation_by_metrics_threshold |
| user_id | b07d617fa2db48b4a80de16761ba8b05 |
+---------------------------+-----------------------------------------------------+
#刷新告警
ceilometer alarm-gnocchi-aggregation-by-metrics-threshold-update bb032ca7-f6a5-4374-b654-9ce0c0112982 -m b733912a-eef6-4b08-a92d-fd6441eb43e5 -m 37e5e476-40b1-4486-ae36-fa67e0dcec85
+---------------------------+-----------------------------------------------------+
| Property | Value |
+---------------------------+-----------------------------------------------------+
| aggregation_method | max |
| alarm_actions | ["log://"] |
| alarm_id | bb032ca7-f6a5-4374-b654-9ce0c0112982 |
| comparison_operator | gt |
| description | gnocchi_aggregation_by_metrics_threshold alarm rule |
| enabled | True |
| evaluation_periods | 3 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metrics | ["b733912a-eef6-4b08-a92d-fd6441eb43e5", |
| | "37e5e476-40b1-4486-ae36-fa67e0dcec85"] |
| name | metric_alarm001 |
| ok_actions | [] |
| project_id | d0fb3737f91749139a2a520e26712986 |
| repeat_actions | False |
| severity | low |
| state | insufficient data |
| threshold | 8.0 |
| type | gnocchi_aggregation_by_metrics_threshold |
| user_id | b07d617fa2db48b4a80de16761ba8b05 |
+---------------------------+-----------------------------------------------------+
三、使用webhook 作为告警通知的接收端
webhook的功能主要是参照 http://www.cnblogs.com/yippee/p/4737017.html 进行编写的。
即告警回调的webhook url填写(--alarm-action 'http://127.0.0.1:8080/').
具体如何搭建,见具体的链接。
1、创建告警
ceilometer alarm-gnocchi-resources-threshold-create --name gnocchi_storage_objects_containers_large --description 'storage.objects.containers' --alarm-action 'http://127.0.0.1:8080/' --evaluation-periods 3 --aggregation-method max --comparison-operator gt --threshold 8 -m storage.objects.containers --resource-type swift_account --resource-id 81930c5e-915a-447e-bd47-2d6675b84140
2、搭建webhook站点
pecan create ceilometeralarm
python setup.py develop
controllers/root.py
from pecan import expose, redirect
from webob.exc import status_map
class RootController(object):
def __init__(self):
self.status = 'hello world!!'
# @expose(generic=True, template='index.html')
@expose(generic=True, template='json')
def index(self):
# return dict()
return self.status
# HTTP POST /
@index.when(method='POST', template='json')
def index_POST(self, **kw):
self.status = 'alarm'
return kw
3、启动
pecan serve config.py
浏览器访问http://127.0.0.1:8080hello world!!
待构造出告警条件后,刷新alarm!!
4、其中
def index_POST(self, **kw):
self.status = 'alarm'
return kw
中 kw 能获取到的信息有:
{'severity': 'low', 'alarm_name': 'gnocchi_storage_objects_containers_large', 'current': 'alarm', 'alarm_id': 'c35f14f3-e8c5-4d93-bd68-5d410dd620f7', 'reason': 'Transition to alarm due to 3 samples outside threshold, most recent: 4.0', 'reason_data': {'count': 3, 'most_recent': 4.0, 'type': 'threshold', 'disposition': 'outside'}, 'previous': 'ok'}
能获取到告警的id,根据告警id,在aodh中可以查询到对应的告警id关联的resource。