Kolla-ansible 修复 异常停机的openstack 集群 记录

        因办公机房异常停电,openstack实验环境集群无法再常启用,尝试用kolla-ansible工具,重启集群。

一、环境

[root@kolla-ansible-master ~]# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)

[root@kolla-ansible-master ~]# ansible --version
ansible 2.7.18
[root@kolla-ansible-master ~]# pip list | grep kolla-ansible
kolla-ansible                    7.2.2.dev9

[root@kolla-ansible-master ~]# openstack --version
openstack 5.2.1

二、记录

1、状态

    来电重启机器群后,部分openstack容器异常重启,集群不能正常工作

 kolla-ansible-master:4000/kolla/centos-source-heat-engine:rocky               "dumb-init --single-…"   15 months ago       Up About a minute                                           heat_engine
c07e5d01adce        kolla-ansible-master:4000/kolla/centos-source-heat-api-cfn:rocky              "dumb-init --single-…"   15 months ago       Restarting (1) 1 second ago                                 heat_api_cfn
88b7a106dcd8        kolla-ansible-master:4000/kolla/centos-source-heat-api:rocky                  "dumb-init --single-…"   15 months ago       Restarting (1) Less than a second ago                       heat_api
82b5983614e0        kolla-ansible-master:4000/kolla/centos-source-neutron-server:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           neutron_server
feaf96f16403        kolla-ansible-master:4000/kolla/centos-source-nova-compute-ironic:rocky       "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_compute_ironic
cb9184ff5506        kolla-ansible-master:4000/kolla/centos-source-nova-novncproxy:rocky           "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_novncproxy
17bf7758070d        kolla-ansible-master:4000/kolla/centos-source-nova-consoleauth:rocky          "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_consoleauth
619d66b56612        kolla-ansible-master:4000/kolla/centos-source-nova-conductor:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_conductor
249b423c2728        kolla-ansible-master:4000/kolla/centos-source-nova-scheduler:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_scheduler
beace5f229e2        kolla-ansible-master:4000/kolla/centos-source-nova-api:rocky                  "dumb-init --single-…"   15 months ago       Restarting (1) 5 seconds ago                                nova_api

2、检查问题

检查日志和容器发现nova-api异常,不断重器

“Restarting (1) 5 seconds ago                                nova_api”,

而其之下的服务运行正常。

3、尝试修复

3.1 停止虚拟机Server

[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| ID                                   | Name  | Status | Networks              | Image  | Flavor  |
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | ACTIVE | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+--------+-----------------------+--------+---------+

[root@kolla-ansible-master ~]# openstack server stop demo1

3.2 停止Nova服务

[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05  stop --tags nova
Stop Kolla containers : ansible-playbook -i ./multinode05 -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  --tags nova /usr/share/kolla-ansible/ansible/stop.yml 

PLAY [all] ******************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************************************************************************
ok: [localhost]
ok: [compute01]
ok: [compute03]
ok: [compute02]
ok: [network01]
ok: [controller01]

PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01                  : ok=1    changed=0    unreachable=0    failed=0   
compute02                  : ok=1    changed=0    unreachable=0    failed=0   
compute03                  : ok=1    changed=0    unreachable=0    failed=0   
controller01               : ok=1    changed=0    unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
network01                  : ok=1    changed=0    unreachable=0    failed=0

3.3 重启Nova

[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05  deploy --tags nova


PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01                  : ok=42   changed=0    unreachable=0    failed=0   
compute02                  : ok=42   changed=0    unreachable=0    failed=0   
compute03                  : ok=42   changed=0    unreachable=0    failed=0   
controller01               : ok=56   changed=2    unreachable=0    failed=0   
localhost                  : ok=2    changed=0    unreachable=0    failed=0   
network01                  : ok=2    changed=0    unreachable=0    failed=0 

4.4 重启虚拟机

[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| ID                                   | Name  | Status  | Networks              | Image  | Flavor  |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | SHUTOFF | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
[root@kolla-ansible-master ~]# openstack server start demo1
[root@kolla-ansible-master ~]# 

写在最后:生产环境下,断电异常的概率极小,日常以替换某个设备、主机为主。实验环境下,也完全可以重新部署,这里仅记录一种修复集群的思路。

你可能感兴趣的:(ansible,openstack)