停掉其他节点的ceilometer服务,保留node-5、node-4
因为只留下了一台compute节点,要实时的查看收集的数据,所以需要先在有运行ceilometer-compute服务的计算节点找出一个正在运行的虚拟机,并得到其ID
ps aux | grep qemu
找到其中一条
root 12139 0.2 2.9 5713836 1970028 ? Sl 6月29 57:44 /usr/libexec/qemu-kvm -name instance-000013f9 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu SandyBridge,+erms,+smep,+fsgsbase,+pdpe1gb,+
通过其name instance-000013f9找到ID
./detail.sh --auth ./openrc --name instance-000013f9
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | AUTO |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | node-4.eayun.com |
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-4.eayun.com |
| OS-EXT-SRV-ATTR:instance_name | instance-000013f9 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2015-06-29T06:16:53.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2015-06-29T06:16:05Z |
| flavor | m1.small (2) |
| hostId | b3c3baad8d219f1f9ffd4dbbcdda583f9550cd9aee1700ab674fbf6d |
| hunt-nettest network | 172.16.88.13, 25.0.0.185 |
| id | 0da3f008-60d7-4b10-a7eb-b143a842a25b |
| image | Attempt to boot from volume - no image supplied |
| key_name | hunt |
| metadata | {} |
| name | hunt-nettest-left5 |
| os-extended-volumes:volumes_attached | [{"id": "90a25db3-628c-4024-b070-081dce2fd7b6"}] |
| progress | 0 |
| security_groups | default |
| status | ACTIVE |
| tenant_id | 3846bfe69b4a49948b8056d5f9c76859 |
| updated | 2015-06-29T06:17:50Z |
| user_id | 3dbf0919d60d4025842e6ea149e4aeba |
+--------------------------------------+----------------------------------------------------------+
ID为0da3f008-60d7-4b10-a7eb-b143a842a25b
ceilometer sample-list -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu
可以看到最新收集的sample
+--------------------------------------+------+------------+-------------+------+---------------------+
| Resource ID | Name | Type | Volume | Unit | Timestamp |
+--------------------------------------+------+------------+-------------+------+---------------------+
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu | cumulative | 3.47939e+12 | ns | 2015-07-13T08:20:57 |
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu | cumulative | 3.47846e+12 | ns | 2015-07-13T08:10:57 |
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu | cumulative | 3.47839e+12 | ns | 2015-07-13T08:10:08 |
ceilometer-statistics -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu
这样统计的值是最早的那一时刻的值
ceilometer statistics -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu -p 60
即依次统计60分钟內的数据
在alarm的源码中,threshold是通过传入query和peried使用ceilometerclient查询得到statistics
一个alarm-threshold
+---------------------------+-----------------------------------------------------+
| Property | Value |
+---------------------------+-----------------------------------------------------+
| alarm_actions | [u'log://'] |
| alarm_id | e321ce64-9054-41d8-b924-1095a9657478 |
| comparison_operator | gt |
| description | overheating? |
| enabled | True |
| evaluation_periods | 1 |
| exclude_outliers | False |
| insufficient_data_actions | [] |
| meter_name | cpu_util |
| name | tester_cpu_high |
| ok_actions | [u'log://'] |
| period | 30 |
| project_id | |
| query | resource_id == 9af11e66-30ef-42cf-8f48-bc4bfb03cc03 |
| repeat_actions | True |
| state | insufficient data |
| statistic | avg |
| threshold | 10.0 |
| type | threshold |
| user_id | 3dbf0919d60d4025842e6ea149e4aeba |
+---------------------------+-----------------------------------------------------+
ceilometer alarm-create
Optional arguments:
--name <NAME> Name of the alarm (must be unique per tenant).
Required.
--project-id <PROJECT_ID> Tenant to associate with alarm (only settable
by admin users).
--user-id <USER_ID> User to associate with alarm (only settable by
admin users).
--description <DESCRIPTION> Free text description of the alarm.
--state <STATE> State of the alarm, one of: ['ok', 'alarm',
'insufficient data']
--enabled {True|False} True if alarm evaluation/actioning is enabled.
--alarm-action <Webhook URL> URL to invoke when state transitions to alarm.
May be used multiple times. Defaults to None.
--ok-action <Webhook URL> URL to invoke when state transitions to OK.
May be used multiple times. Defaults to None.
--insufficient-data-action <Webhook URL>
URL to invoke when state transitions to
insufficient data. May be used multiple times.
Defaults to None.
--time-constraint <Time Constraint>
Only evaluate the alarm if the time at
evaluation is within this time constraint.
Start point(s) of the constraint are specified
with a cron expression , whereas its duration
is given in seconds. Can be specified multiple
times for multiple time constraints, format
is: name=<CONSTRAINT_NAME>;start=<CRON>;durati
on=<SECONDS>;[description=<DESCRIPTION>;[timez
one=<IANA Timezone>]] Defaults to None.
可以指定该alarm进行检查的时间列表。
-m <METRIC>, --meter-name <METRIC>
Metric to evaluate against. Required.
--period <PERIOD> Length of each period (seconds) to evaluate
over.
--evaluation-periods <COUNT> Number of periods to evaluate over.
--statistic <STATISTIC> Statistic to evaluate, one of: ['max', 'min',
'avg', 'sum', 'count'].
--comparison-operator <OPERATOR>
Operator to compare with, one of: ['lt', 'le',
'eq', 'ne', 'ge', 'gt'].
--threshold <THRESHOLD> Threshold to evaluate against. Required.
-q <QUERY>, --query <QUERY> key[op]data_type::value; list. data_type is
optional, but if supplied must be string,
integer, float, or boolean.
--repeat-actions {True|False}
True if actions should be repeatedly notified
while alarm remains in target state. Defaults
to False.
若这一次检查的状态跟上一次的状态是一样的,这个 就决定是否再次执行action。
一个问题是:
预警的时候有一个周期,这个是指评估周期时长,假设为60s,还有一个周期,意思指取多少次,假设2次;则评估时间真正周期是ep(a)+look_back=120s+60s,
而sample中也有一个周期,默认为600s,意思是10分钟才会收集一次数据;
当评估时段刚好在未收集sample的真空时间内,就会返回空的statistics,即insufficient data;所以最好是ep(a)+l>=p(s)
这样预警周期小于了sample周期,会出现后面几次的预警,其用来判断的数据都是不变的;
在进行预警评估的时候,首先会根据判断exclude_outliers来得到look_back(或evaluation_periods),然后算出window大小,根据当前时间得到一个时间区间;
用这个时间区间加上meter去使用statistics客户端查询,若没有数据则返回空,刷新state为insufficient data,若有则刷新state为ok;
一个alarm-combination
+---------------------------+-------------------------------------------+
| Property | Value |
+---------------------------+-------------------------------------------+
| alarm_actions | [u'log://'] |
| alarm_id | 6619fa1d-8ca1-4bd6-bb9a-f08f066c5fc9 |
| alarm_ids | [u'28b38841-6035-4a22-be69-5a632a878ed6', |
| | u'e321ce64-9054-41d8-b924-1095a9657478'] |
| description | cpu and mem |
| enabled | True |
| insufficient_data_actions | [] |
| name | cpu_and_mem |
| ok_actions | [u'log://'] |
| operator | and |
| project_id | |
| repeat_actions | True |
| state | insufficient data |
| type | combination |
| user_id | 3dbf0919d60d4025842e6ea149e4aeba |
+---------------------------+-------------------------------------------+
ceilometer-alarm-combination-create
Optional arguments:
--name <NAME> Name of the alarm (must be unique per tenant).
Required.
--project-id <PROJECT_ID> Tenant to associate with alarm (only settable
by admin users).
--user-id <USER_ID> User to associate with alarm (only settable by
admin users).
--description <DESCRIPTION> Free text description of the alarm.
--state <STATE> State of the alarm, one of: ['ok', 'alarm',
'insufficient data']
--enabled {True|False} True if alarm evaluation/actioning is enabled.
--alarm-action <Webhook URL> URL to invoke when state transitions to alarm.
May be used multiple times. Defaults to None.
--ok-action <Webhook URL> URL to invoke when state transitions to OK.
May be used multiple times. Defaults to None.
--insufficient-data-action <Webhook URL>
URL to invoke when state transitions to
insufficient data. May be used multiple times.
Defaults to None.
--time-constraint <Time Constraint>
Only evaluate the alarm if the time at
evaluation is within this time constraint.
Start point(s) of the constraint are specified
with a cron expression , whereas its duration
is given in seconds. Can be specified multiple
times for multiple time constraints, format
is: name=<CONSTRAINT_NAME>;start=<CRON>;durati
on=<SECONDS>;[description=<DESCRIPTION>;[timez
one=<IANA Timezone>]] Defaults to None.
--alarm_ids <ALARM IDS> List of alarm IDs. Required.
--operator <OPERATOR> Operator to compare with, one of: ['and',
'or'].
--repeat-actions {True|False}
True if actions should be repeatedly notified
while alarm remains in target state. Defaults
to False.