ceilometer使用

ceilometer深入使用

停掉其他节点的ceilometer服务,保留node-5、node-4

因为只留下了一台compute节点,要实时的查看收集的数据,所以需要先在有运行ceilometer-compute服务的计算节点找出一个正在运行的虚拟机,并得到其ID

ps aux | grep qemu

找到其中一条

root     12139  0.2  2.9 5713836 1970028 ?     Sl   6月29  57:44 /usr/libexec/qemu-kvm -name instance-000013f9 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu SandyBridge,+erms,+smep,+fsgsbase,+pdpe1gb,+

通过其name instance-000013f9找到ID

./detail.sh --auth ./openrc --name instance-000013f9
+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                     |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | node-4.eayun.com                                         |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | node-4.eayun.com                                         |
| OS-EXT-SRV-ATTR:instance_name        | instance-000013f9                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-06-29T06:16:53.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-06-29T06:16:05Z                                     |
| flavor                               | m1.small (2)                                             |
| hostId                               | b3c3baad8d219f1f9ffd4dbbcdda583f9550cd9aee1700ab674fbf6d |
| hunt-nettest network                 | 172.16.88.13, 25.0.0.185                                 |
| id                                   | 0da3f008-60d7-4b10-a7eb-b143a842a25b                     |
| image                                | Attempt to boot from volume - no image supplied          |
| key_name                             | hunt                                                     |
| metadata                             | {}                                                       |
| name                                 | hunt-nettest-left5                                       |
| os-extended-volumes:volumes_attached | [{"id": "90a25db3-628c-4024-b070-081dce2fd7b6"}]         |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | 3846bfe69b4a49948b8056d5f9c76859                         |
| updated                              | 2015-06-29T06:17:50Z                                     |
| user_id                              | 3dbf0919d60d4025842e6ea149e4aeba                         |
+--------------------------------------+----------------------------------------------------------+

ID为0da3f008-60d7-4b10-a7eb-b143a842a25b

ceilometer sample-list -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu

可以看到最新收集的sample

+--------------------------------------+------+------------+-------------+------+---------------------+
| Resource ID                          | Name | Type       | Volume      | Unit | Timestamp           |
+--------------------------------------+------+------------+-------------+------+---------------------+
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu  | cumulative | 3.47939e+12 | ns   | 2015-07-13T08:20:57 |
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu  | cumulative | 3.47846e+12 | ns   | 2015-07-13T08:10:57 |
| 0da3f008-60d7-4b10-a7eb-b143a842a25b | cpu  | cumulative | 3.47839e+12 | ns   | 2015-07-13T08:10:08 |
ceilometer-statistics -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu
  • -m 必须参数

这样统计的值是最早的那一时刻的值

ceilometer statistics -q resource_id=0da3f008-60d7-4b10-a7eb-b143a842a25b -m cpu -p 60
  • -p 周期

即依次统计60分钟內的数据

在alarm的源码中,threshold是通过传入query和peried使用ceilometerclient查询得到statistics

一个alarm-threshold

+---------------------------+-----------------------------------------------------+
| Property                  | Value                                               |
+---------------------------+-----------------------------------------------------+
| alarm_actions             | [u'log://']                                         |
| alarm_id                  | e321ce64-9054-41d8-b924-1095a9657478                |
| comparison_operator       | gt                                                  |
| description               | overheating?                                        |
| enabled                   | True                                                |
| evaluation_periods        | 1                                                   |
| exclude_outliers          | False                                               |
| insufficient_data_actions | []                                                  |
| meter_name                | cpu_util                                            |
| name                      | tester_cpu_high                                     |
| ok_actions                | [u'log://']                                         |
| period                    | 30                                                  |
| project_id                |                                                     |
| query                     | resource_id == 9af11e66-30ef-42cf-8f48-bc4bfb03cc03 |
| repeat_actions            | True                                                |
| state                     | insufficient data                                   |
| statistic                 | avg                                                 |
| threshold                 | 10.0                                                |
| type                      | threshold                                           |
| user_id                   | 3dbf0919d60d4025842e6ea149e4aeba                    |
+---------------------------+-----------------------------------------------------+

ceilometer alarm-create

Optional arguments:
  --name <NAME>                 Name of the alarm (must be unique per tenant).
                                Required.
  --project-id <PROJECT_ID>     Tenant to associate with alarm (only settable
                                by admin users).
  --user-id <USER_ID>           User to associate with alarm (only settable by
                                admin users).
  --description <DESCRIPTION>   Free text description of the alarm.
  --state <STATE>               State of the alarm, one of: ['ok', 'alarm',
                                'insufficient data']
  --enabled {True|False}        True if alarm evaluation/actioning is enabled.
  --alarm-action <Webhook URL>  URL to invoke when state transitions to alarm.
                                May be used multiple times. Defaults to None.
  --ok-action <Webhook URL>     URL to invoke when state transitions to OK.
                                May be used multiple times. Defaults to None.
  --insufficient-data-action <Webhook URL>
                                URL to invoke when state transitions to
                                insufficient data. May be used multiple times.
                                Defaults to None.
  --time-constraint <Time Constraint>
                                Only evaluate the alarm if the time at
                                evaluation is within this time constraint.
                                Start point(s) of the constraint are specified
                                with a cron expression , whereas its duration
                                is given in seconds. Can be specified multiple
                                times for multiple time constraints, format
                                is: name=<CONSTRAINT_NAME>;start=<CRON>;durati
                                on=<SECONDS>;[description=<DESCRIPTION>;[timez
                                one=<IANA Timezone>]] Defaults to None.
                                可以指定该alarm进行检查的时间列表。
  -m <METRIC>, --meter-name <METRIC>
                                Metric to evaluate against. Required.
  --period <PERIOD>             Length of each period (seconds) to evaluate
                                over.
  --evaluation-periods <COUNT>  Number of periods to evaluate over.
  --statistic <STATISTIC>       Statistic to evaluate, one of: ['max', 'min',
                                'avg', 'sum', 'count'].
  --comparison-operator <OPERATOR>
                                Operator to compare with, one of: ['lt', 'le',
                                'eq', 'ne', 'ge', 'gt'].
  --threshold <THRESHOLD>       Threshold to evaluate against. Required.
  -q <QUERY>, --query <QUERY>   key[op]data_type::value; list. data_type is
                                optional, but if supplied must be string,
                                integer, float, or boolean.
  --repeat-actions {True|False}
                                True if actions should be repeatedly notified
                                while alarm remains in target state. Defaults
                                to False.
                                若这一次检查的状态跟上一次的状态是一样的,这个											就决定是否再次执行action。

一个问题是:
预警的时候有一个周期,这个是指评估周期时长,假设为60s,还有一个周期,意思指取多少次,假设2次;则评估时间真正周期是ep(a)+look_back=120s+60s,
而sample中也有一个周期,默认为600s,意思是10分钟才会收集一次数据;
当评估时段刚好在未收集sample的真空时间内,就会返回空的statistics,即insufficient data;所以最好是e
p(a)+l>=p(s)
这样预警周期小于了sample周期,会出现后面几次的预警,其用来判断的数据都是不变的;
在进行预警评估的时候,首先会根据判断exclude_outliers来得到look_back(或evaluation_periods),然后算出window大小,根据当前时间得到一个时间区间;
用这个时间区间加上meter去使用statistics客户端查询,若没有数据则返回空,刷新state为insufficient data,若有则刷新state为ok;

一个alarm-combination

+---------------------------+-------------------------------------------+
| Property                  | Value                                     |
+---------------------------+-------------------------------------------+
| alarm_actions             | [u'log://']                               |
| alarm_id                  | 6619fa1d-8ca1-4bd6-bb9a-f08f066c5fc9      |
| alarm_ids                 | [u'28b38841-6035-4a22-be69-5a632a878ed6', |
|                           | u'e321ce64-9054-41d8-b924-1095a9657478']  |
| description               | cpu and mem                               |
| enabled                   | True                                      |
| insufficient_data_actions | []                                        |
| name                      | cpu_and_mem                               |
| ok_actions                | [u'log://']                               |
| operator                  | and                                       |
| project_id                |                                           |
| repeat_actions            | True                                      |
| state                     | insufficient data                         |
| type                      | combination                               |
| user_id                   | 3dbf0919d60d4025842e6ea149e4aeba          |
+---------------------------+-------------------------------------------+

ceilometer-alarm-combination-create

Optional arguments:
  --name <NAME>                 Name of the alarm (must be unique per tenant).
                                Required.
  --project-id <PROJECT_ID>     Tenant to associate with alarm (only settable
                                by admin users).
  --user-id <USER_ID>           User to associate with alarm (only settable by
                                admin users).
  --description <DESCRIPTION>   Free text description of the alarm.
  --state <STATE>               State of the alarm, one of: ['ok', 'alarm',
                                'insufficient data']
  --enabled {True|False}        True if alarm evaluation/actioning is enabled.
  --alarm-action <Webhook URL>  URL to invoke when state transitions to alarm.
                                May be used multiple times. Defaults to None.
  --ok-action <Webhook URL>     URL to invoke when state transitions to OK.
                                May be used multiple times. Defaults to None.
  --insufficient-data-action <Webhook URL>
                                URL to invoke when state transitions to
                                insufficient data. May be used multiple times.
                                Defaults to None.
  --time-constraint <Time Constraint>
                                Only evaluate the alarm if the time at
                                evaluation is within this time constraint.
                                Start point(s) of the constraint are specified
                                with a cron expression , whereas its duration
                                is given in seconds. Can be specified multiple
                                times for multiple time constraints, format
                                is: name=<CONSTRAINT_NAME>;start=<CRON>;durati
                                on=<SECONDS>;[description=<DESCRIPTION>;[timez
                                one=<IANA Timezone>]] Defaults to None.
  --alarm_ids <ALARM IDS>       List of alarm IDs. Required.
  --operator <OPERATOR>         Operator to compare with, one of: ['and',
                                'or'].
  --repeat-actions {True|False}
                                True if actions should be repeatedly notified
                                while alarm remains in target state. Defaults
                                to False.

你可能感兴趣的:(使用)