Ironic部署时,经常会出现No valid host错误,这类错误常见的一个原因是hypervisor的资源不够了。
Issue
hypervisor 可用资源为0
[root@cloud-b12-02 ~]# source admin-openrc.sh
[root@cloud-b12-02 ~(admin)]# nova hypervisor-list
+----+----------------------------------------------+-------+----------+
| ID | Hypervisor hostname | State | Status |
+----+----------------------------------------------+-------+----------+
| 4 | cloud-sz-compute-b11-04.sz.cloud.genomics.cn | up | enabled |
| 7 | cloud-sz-compute-b10-01.sz.cloud.genomics.cn | up | enabled |
| 10 | cloud-sz-compute-b11-03.sz.cloud.genomics.cn | up | enabled |
| 13 | cloud-sz-compute-b10-02.sz.cloud.genomics.cn | up | enabled |
| 16 | cloud-sz-compute-b11-01.sz.cloud.genomics.cn | up | enabled |
| 19 | cloud-sz-compute-b11-02.sz.cloud.genomics.cn | up | enabled |
| 25 | cloud-sz-compute-f18-03.sz.cloud.genomics.cn | down | disabled |
| 55 | dbbda9bf-2fec-47ed-9a26-142f3d34c8d3 | up | enabled |
+----+----------------------------------------------+-------+----------+
[root@cloud-b12-02 ~(admin)]# nova hypervisor-show 55
+-------------------------+--------------------------------------+
| Property | Value |
+-------------------------+--------------------------------------+
| cpu_info | {} |
| current_workload | 0 |
| disk_available_least | 0 |
| free_disk_gb | 0 |
| free_ram_mb | 0 |
| host_ip | 10.54.12.23 |
| hypervisor_hostname | dbbda9bf-2fec-47ed-9a26-142f3d34c8d3 |
| hypervisor_type | ironic |
| hypervisor_version | 1 |
| id | 55 |
| local_gb | 0 |
| local_gb_used | 0 |
| memory_mb | 0 |
| memory_mb_used | 0 |
| running_vms | 0 |
| service_disabled_reason | None |
| service_host | cloud-b12-03-ironic |
| service_id | 154 |
| state | up |
| status | enabled |
| vcpus | 0 |
| vcpus_used | 0 |
+-------------------------+--------------------------------------+
可用资源为0,导致deploy裸机时报"no valid host"
Investigation
在该节点上,查一下resource_tracker
2017-11-14 22:09:54.297 7 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: d563d14f74ef47c9990bcbd147e576d1 __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:299
2017-11-14 22:09:54.298 7 INFO nova.compute.resource_tracker [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] Final resource view: name=dbbda9bf-2fec-47ed-9a26-142f3d34c8d3 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2017-11-14 22:09:54.299 7 DEBUG nova.compute.resource_tracker [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] Compute_service record updated for cloud-b12-03-ironic:dbbda9bf-2fec-47ed-9a26-142f3d34c8d3 _update_available_resource /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:626
定时任务
Running periodic task ComputeManager.update_available_resource
2017-11-14 22:07:52.018 7 DEBUG nova.compute.resource_tracker [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] Hypervisor: VCPU information unavailable _report_hypervisor_resource_view /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:658
2017-11-14 22:07:52.019 7 DEBUG nova.compute.resource_tracker [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] Hypervisor/Node resource view: name=dbbda9bf-2fec-47ed-9a26-142f3d34c8d3 free_ram=0MB free_disk=0GB free_vcpus=unknown pci_devices=None _report_hypervisor_resource_view /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:672
Analysis
在添加一个裸机节点(Ironic node-create)之后,就会生成对应的一个hypervisor .
[root@cloud-b12-03 nova]# openstack hypervisor list
+----+----------------------------------------------+-----------------+--------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----+----------------------------------------------+-----------------+--------------+-------+
| 4 | cloud-sz-compute-b11-04.sz.cloud.genomics.cn | QEMU | 10.54.12.27 | up |
| 7 | cloud-sz-compute-b10-01.sz.cloud.genomics.cn | QEMU | 10.54.12.28 | up |
| 10 | cloud-sz-compute-b11-03.sz.cloud.genomics.cn | QEMU | 10.54.12.26 | up |
| 13 | cloud-sz-compute-b10-02.sz.cloud.genomics.cn | QEMU | 10.54.12.29 | up |
| 16 | cloud-sz-compute-b11-01.sz.cloud.genomics.cn | QEMU | 10.54.12.24 | up |
| 19 | cloud-sz-compute-b11-02.sz.cloud.genomics.cn | QEMU | 10.54.12.25 | up |
| 25 | cloud-sz-compute-f18-03.sz.cloud.genomics.cn | QEMU | 10.54.12.183 | down |
| 61 | ef89b610-96ab-473f-a0d8-9294d7efd4d8 | ironic | 10.54.12.23 | up |
+----+----------------------------------------------+-----------------+--------------+-------+
[root@cloud-b12-03 nova]# ironic node-list
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+
| ef89b610-96ab-473f-a0d8-9294d7efd4d8 | node1 | None | power off | available | False |
+--------------------------------------+-------+---------------+-------------+--------------------+-------------+
[root@cloud-sz-kolla-2 ironic-deploy-test]# openstack hypervisor show 61
+----------------------+--------------------------------------+
| Field | Value |
+----------------------+--------------------------------------+
| aggregates | [u'baremetal-hosts'] |
| cpu_info | |
| current_workload | 0 |
| disk_available_least | 10 |
| free_disk_gb | 10 |
| free_ram_mb | 4000 |
| host_ip | 10.54.12.23 |
| hypervisor_hostname | ef89b610-96ab-473f-a0d8-9294d7efd4d8 |
| hypervisor_type | ironic |
| hypervisor_version | 1 |
| id | 61 |
| local_gb | 10 |
| local_gb_used | 0 |
| memory_mb | 4000 |
| memory_mb_used | 0 |
| running_vms | 0 |
| service_host | cloud-b12-03-ironic |
| service_id | 154 |
| state | up |
| status | enabled |
| vcpus | 1 |
| vcpus_used | 0 |
+----------------------+--------------------------------------+
## log
正常情况下:
```log
2017-11-14 22:40:26.081 7 DEBUG oslo_messaging._drivers.amqpdriver [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] CALL msg_id: 629e4b456d334e5ab4fd6b947b99ba5b exchange 'nova' topic 'conductor' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:442
2017-11-14 22:40:26.101 7 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 629e4b456d334e5ab4fd6b947b99ba5b __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:299
2017-11-14 22:40:26.139 7 DEBUG nova.virt.ironic.driver [req-d27d05d9-d132-40d4-81c7-503ca2a42849 - - - - -] Returning 1 available node(s) get_available_nodes /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:608
2017-11-14 23:38:41.266 6 DEBUG nova.servicegroup.drivers.db [req-3be3c1b2-64cc-4c74-97b2-150e8beb3458 - - - - -] DB_Driver: join new ServiceGroup member cloud-b12-03-ironic to the compute group, service = join /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:47
2017-11-14 23:53:53.658 6 INFO nova.compute.resource_tracker [req-7c6362f1-4f6c-4ce0-9c45-2913d9b694c3 - - - - -] Final resource view: name=ef89b610-96ab-473f-a0d8-9294d7efd4d8 phys_ram=0MB used_ram=1024MB phys_disk=0GB used_disk=10GB total_vcpus=0 used_vcpus=0 pci_stats=[]
该节点部署完了之后,hypervisor的可用资源就变为0了。
hypervisor resouce statistic
日志中的resource tracker数据是“注册数据-部署时使用的flavor的数据” 得出来的。
ironic node-update node1 add \
properties/cpus=1 \
properties/memory_mb=4000 \
properties/local_gb=10 \
properties/cpu_arch="x86_64" \
properties/capabilities="boot_option:local"
[root@cloud-sz-kolla-b13-01 ironic]# openstack flavor show b1.half
+----------------------------+----------------------------------------------------+
| Field | Value |
+----------------------------+----------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 10 |
| id | 205bcb85-cf81-4911-83f6-65b392ac24f2 |
| name | b1.half |
| os-flavor-access:is_public | True |
| properties | baremetal='true', capabilities:boot_option='local' |
| ram | 1024 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+----------------------------------------------------+
KB
在Build阶段删除Instance
build时删除Instance 之后,发现hypervisor可用资源为0, 过10秒之后忘记观察结果了
[root@cloud-sz-kolla-2 ironic-deploy-centos6]# openstack server list
+------------------------+------------------------+--------+------------------------+-------------------------+
| ID | Name | Status | Networks | Image Name |
+------------------------+------------------------+--------+------------------------+-------------------------+
| 0f90dafe-21f6-42cf-979 | bare1 | BUILD | provision=10.54.0.103 | bm-user-half-centos6-os
对同一个baremetal node 反复deploy
deploy成功, 然后delete server, 然后查看hypervisor stats 发现刚开始是0, 过几秒之后变正常。