本文总结了最近一段时间使用kolla-ansible部署openstack stein的实践,期间碰到了很多问题。在不断的总结、定位、发现中探索,最终搭建完成。
部分内容总结较为粗略,读者在亲自搭建过程中需要稍加调整。。还请谅解,有问题欢迎来信讨论。
https://www.jianshu.com/p/9ebf1ae47df2
依照此步骤可以完整部署stein版本。但是provision虚机失败,可尝试后续修补方案。添加enable_octavia 时部署会失败,因为缺少相应的keypair。
https://blog.csdn.net/zhongbeida_xue/article/details/84587273
rocky 版本的kolla-ansible部署方式,撰写时间2018年11月份,因为当时的master为rocky版,而现在是T版本,所以其中的git clone需要加参数 -b stable/stein才能顺利完成部署。
https://blog.csdn.net/Kgong/article/details/105579683
这个连接中提供了完整的命令集合可参考。
https://linux.cn/article-9516-1.html
如何确认CPU支持虚拟化。
https://bugs.launchpad.net/kolla-ansible/+bug/1787760
部署后虚机无法provision,查看log发现跟privsep有关系。先后报了两个错:
Incorrect configuration file: /etc/nova/rootwrap.conf
privsep helper command exited non-zero (97)
Executable not found: privsep-helper (filter match = privsep-helper)
privsep helper command exited non-zero (96)
可查看问题与定位部分的描述。
官方文档写的“do right thing”就是扯淡。所以 它开头写的“step by step”根本不可能完成部署。
但文档中给出了些有助于理解kolla-ansible 和模块细节的资料。OpenStack Docs 给人的感觉是 你想要一片树叶,它总会给你一片森林。 最后在森林里迷失。
kolla-stein
kolla-ansible/stein/quickstart.htm
OpenStack Docs: Advanced Configuration
OpenStack Docs: Get images
kolla-ansible 部署openstack(vmware,all-in-one) - 简书
Kolla部署Stein版OpenStaak - 简书
部署之前需要了解两个kolla部署相关的知识:
customize configuration (advanced configuration)
https://docs.openstack.org/kolla-ansible/train/admin/advanced-configuration.html
需要两个interface, 其中一个做管理,另一个做neutron external network
zong-net 网络已经添加 到 external network的 router上。
vio 中经常会出现两个网口的路由配置问题,需要删掉zong-net 这个默认路由。
原始配置
interface |
修改前 |
修改后 |
ifcfg-ens32
-> ifcfg-ens160 |
TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=ens32 UUID=0127e0c2-fb46-4dfb-a1c6-209b3012f2eb DEVICE=ens32 ONBOOT=yes |
TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPADDR=10.145.64.104 NETMASK=255.255.192.0 GATEWAY=10.145.127.254 IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=ens160 UUID=0127e0c2-fb46-4dfb-a1c6-209b3012f2eb DEVICE=ens160 ONBOOT=yes |
ifcfg-ens192 |
无 |
TYPE=Ethernet |
systemctl stop NetworkManager firewalld
systemctl disable NetworkManager firewalld
sed -i "s/SELINUX=enforcing/SELINUX=disabled/" /etc/selinux/config
setenforce 0
selinux作为高逼格的访问控制能力,在初级学习实践过程中都是会被爽快的干掉的。
kolla 安装stein需要至少30G的空间 存放各种docker image
fdisk /dev/sda
partprobe
pvcreate /dev/sda4
vgextend -v cl /dev/sda4
lvextend -L 140G /dev/mapper/cl-root00
resize2fs /dev/mapper/cl-root00
使用virtualenv的方式可以避免对系统的python 环境造成影响。
yum install -y python-virtualenv
[root@kolla ~]# virtualenv /root/venv/kolla
New python executable in /root/venv/kolla/bin/python
Installing setuptools, pip, wheel...done.
[root@kolla ~]# source /root/venv/kolla/bin/activate
(kolla) [root@kolla ~]# pip install -U pip
(kolla) [root@kolla ~]# pip install ansible
事实证明 OpenStack 官网https://docs.openstack.org/project-deploy-guide/kolla-ansible/stein/quickstart.html# 提到的deployment 方式的部署方式根本无法成功完成,因为 各个版本的OpenStack 需要各自版本的kolla 及 kolla-ansible。 所以所谓的yum 和pip方式安装kolla-ansible 就是胡扯,只能git clone从源码安装(到特定branch下)。
git clone https://github.com/openstack/kolla -b stable/stein
git clone https://github.com/openstack/kolla-ansible -b stable/stein
pip install -r kolla/requirements.txt
pip install -r kolla-ansible/requirements.txt
cd kolla-ansible && python setup.py install
(kolla) [root@kolla kolla-ansible]# which kolla-ansible
/root/venv/kolla/bin/kolla-ansible
(kolla) [root@kolla kolla-ansible]# mkdir /etc/kolla && cp etc/kolla/* /etc/kolla
(kolla) [root@kolla kolla-ansible]# cp ansible/inventory/* ~
编辑 /etc/kolla/globals.yml 文件
(kolla) [root@kolla kolla-ansible]# diff etc/kolla/globals.yml /etc/kolla/globals.yml
15c15
< #kolla_base_distro: "centos"
---
> kolla_base_distro: "centos"
18c18
< #kolla_install_type: "binary"
---
> kolla_install_type: "source"
21c21
< #openstack_release: ""
---
> openstack_release: "stein"
31c31
< kolla_internal_vip_address: "10.10.10.254"
---
> kolla_internal_vip_address: "10.145.64.104"
89c89
< #network_interface: "eth0"
---
> network_interface: "ens160"
107c107
< #neutron_external_interface: "eth1"
---
> neutron_external_interface: "ens192"
192c192
< #enable_haproxy: "yes"
---
> enable_haproxy: "no"
459c459
< #nova_compute_virt_type: "kvm"
---
> nova_compute_virt_type: "qemu"
kolla_install_type: "source" 这里改成 source, 虽然这里说的支持binary,但是据说binary 不如 source稳定,奇葩。
openstack_release这里改成stein,且只能是stein,因为我们用的stein branch的 kolla 和kolla-ansible
我们是用/root/all-in-one的inventory 部署。 不需要修改 。
# kolla-ansible bootstrap-servers 这个命令会初始化系统的功能,
比如安装docker 运行环境。
安装yum 包
cd /root/kolla-ansible/kolla_ansible/cmd
# python genpwd.py
如果想要,修改相关的password,例如keystone_admin_password 修改为default,方便后边登录horizon方便
localhost 既是ansible 的控制端,也是inventory,所以localhost上的/usr/bin/python 环境也需要 单独配置,比如安装 docker 依赖包。
执行以下命令,注意命令需要退出当前virtualenv后执行。
yum install epel-release
yum install python-pip
pip install -U pip
pip install argparse oauth pyserial # 这两行没有碰到问题就不用 执行。
pip install --ignore-installed requests
pip install docker
# kolla-ansible prechecks
这个阶段花费时间较长,kolla-ansible会拉取docker image,首次部署一般不会有什么问题,反复kolla-ansible deploy 会偶尔出现某个docker container无法启动而持续restart的情况。这个得实际查看log 找到具体原因。粗暴的解决办法:重新run kola-ansible deploy
生成/etc/kolla/admin-openrc.sh
horizon 的启动需要一段时间,所以要等待 horizon 启动5分钟后才能访问http://10.145.64.104
从第一个container启动 到执行完毕 共14 分钟 。31个container, 30G /var/lib/docker
新起一个venv
virtualenv /root/venv/openstackclient
source /root/venv/openstackclient/bin/activate
pip install -U pip # 将 pip升级到最新版本,不然会出错。
pip install python-openstackclient
source /etc/kolla/admin-openrc.sh
openstack server list
执行init-runonce
cd /root/kolla-ansible/tools
修改 init-runonce 文件
EXT_NET_CIDR='192.168.100.0/24'
EXT_NET_RANGE='start=192.168.100.100,end=192.168.100.199'
EXT_NET_GATEWAY='192.168.100.1'
./init-runonce
如果此命令运行出错,需要删除init-runonce产生的痕迹,如创建的SG network image,重新run。也就是说此命令不可重入(不幂等)。
provision 虚机会出现的问题参考问题定位部分
task path: /root/venv/kolla/share/kolla-ansible/ansible/roles/baremetal/tasks/install.yml:49
fatal: [localhost]: FAILED! => {
"msg": "[u'{{ docker_apt_package }}', u'git', u'{% if not easy_install_available %}python-pip{% endif %}', u'python-setuptools', u'{% if enable_host_ntp | bool %}ntp{% endif %}', u'{% if enable_ceph_nfs|bool %}rpcbind{% endif %}']: {{ not (ansible_distribution == 'Ubuntu' and\n ansible_distribution_major_version is version(18, 'ge'))\n and\n not (ansible_distribution == 'Debian' and\n ansible_distribution_major_version is version(10, 'ge')) }}: template error while templating string: no test named 'version'. String: {{ not (ansible_distribution == 'Ubuntu' and\n ansible_distribution_major_version is version(18, 'ge'))\n and\n not (ansible_distribution == 'Debian' and\n ansible_distribution_major_version is version(10, 'ge')) }}"
ansible 的 版本不对。不要使用yum install ansible
而是在python env 下使用 pip install ansible
TASK [prechecks : Checking docker SDK version] *********************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["/usr/bin/python", "-c", "import docker; print docker.__version__"], "delta": "0:00:00.012579", "end": "2020-04-26 02:17:35.653507", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2020-04-26 02:17:35.640928", "stderr": "Traceback (most recent call last):\n File \"
localhost 既是ansible 的控制端,也是inventory,所以localhost上的/usr/bin/python 环境也需要 单独配置,比如安装 docker 依赖包。
执行以下命令
79 yum install epel-release
80 yum install python-pip
81 pip install -U pip
83 pip install argparse oauth pyserial # 这两行没有碰到问题就不用 执行。
84 pip install --ignore-installed requests
82 pip install docker
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
需要 pip install --ignore-installed PyYAML
ERROR: Command errored out with exit status 1: /root/venv/openstackclient/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-edS76A/subprocess32/setup.py'"'"'; __file__='"'"'/tmp/pip-install-edS76A/subprocess32/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-a6JBhp/install-record.txt --single-version-externally-managed --compile --install-headers /root/venv/openstackclient/include/site/python2.7/subprocess32 Check the logs for full command output.
需要预先安装 依赖包,如下 。
yum -y install python-devel libffi-devel gcc openssl-devel libselinux-python
File "/root/venv/openstackclient/lib/python2.7/site-packages/openstack/utils.py", line 13, in
import queue
ImportError: No module named queue
修改两处源码:
/root/venv/openstackclient/lib/python2.7/site-packages/openstack/cloud/openstackcloud.py
/root/venv/openstackclient/lib/python2.7/site-packages/openstack/utils.py
import queue 改成:import Queue as queue
根本原因是python2 和 python3 下Queue模块的兼容问题。
fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
destroy all重新部署。。
从nova的log /var/lib/docker/volumes/kolla_logs/_data/nova 中可以看到以下信息:
nova/nova-compute.log:783:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 641, in create_image
nova/nova-compute.log:784:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] _update_utime_ignore_eacces(base)
nova/nova-compute.log:785:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 72, in _update_utime_ignore_eacces
nova/nova-compute.log:786:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] nova.privsep.path.utime(path)
nova/nova-compute.log:787:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 243, in _wrap
nova/nova-compute.log:788:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] self.start()
nova/nova-compute.log:789:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 254, in start
nova/nova-compute.log:790:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] channel = daemon.RootwrapClientChannel(context=self)
nova/nova-compute.log:791:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 328, in __init__
nova/nova-compute.log:792:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] raise FailedToDropPrivileges(msg)
nova/nova-compute.log:793:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] FailedToDropPrivileges: privsep helper command exited non-zero (97)
nova/nova-compute.log:794:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]
究其原因是这个WARNING导致的:
2020-04-26 05:42:28.875 6 WARNING oslo.privsep.daemon [-] privsep log: /var/lib/kolla/venv/bin/nova-rootwrap: Incorrect configuration file: /etc/nova/rootwrap.conf
解决:
部署结束后做两个操作:
进入到nova_compute docker container中
1 将缺失的rootwrap.conf文件补齐。
cp /nova-base-source/nova-19.1.0/etc/nova/rootwrap.conf /etc/nova/
2 把/var/lib/kolla/venv/bin/privsep-helper所在路径加到/etc/nova/rootwrap.conf文件中
exec_dirs=/sbin,/usr/sbin,/bin,/usr/bin,/usr/local/sbin,/usr/local/bin,/var/lib/kolla/venv/bin
确实是 如果加上了vif_plugging_is_fatal: False 就可以避免这个问题,但是根本问题是,为什么会timeout,neutron明明已经完成了对vif的plugging,但是nova没有收到。
问题的解决方法有两个:
1 添加vif_plugging_is_fatal: False 重启nova_compute 节点,但是如何做到更新配置后重启。。
vif_plugging_is_fatal: False
vif_plugging_timeout: 0
参见下边的customize configuration
https://docs.openstack.org/kolla-ansible/train/admin/advanced-configuration.html
2 找到为什么neutron无法连接nova 以返回success的原因。
https://ask.openstack.org/en/question/26938/virtualinterfacecreateexception-virtual-interface-creation-failed/
try:
with self.virtapi.wait_for_instance_event(
instance, events, deadline=timeout,
error_callback=self._neutron_failed_callback):
self.plug_vifs(instance, network_info)
self.firewall_driver.setup_basic_filtering(instance,
network_info)
self.firewall_driver.prepare_instance_filter(instance,
network_info)
with self._lxc_disk_handler(context, instance,
instance.image_meta,
block_device_info):
guest = self._create_domain(
xml, pause=pause, power_on=power_on,
post_xml_callback=post_xml_callback)
self.firewall_driver.apply_instance_filter(instance,
network_info)
except exception.VirtualInterfaceCreateException:
# Neutron reported failure and we didn't swallow it, so
# bail here
with excutils.save_and_reraise_exception():
self._cleanup_failed_start(context, instance, network_info,
block_device_info, guest,
destroy_disks_on_failure)
except eventlet.timeout.Timeout:
# We never heard from Neutron
LOG.warning('Timeout waiting for %(events)s for '
'instance with vm_state %(vm_state)s and '
'task_state %(task_state)s.',
{'events': events,
'vm_state': instance.vm_state,
'task_state': instance.task_state},
instance=instance)
if CONF.vif_plugging_is_fatal:
self._cleanup_failed_start(context, instance, network_info,
block_device_info, guest,
destroy_disks_on_failure)
raise exception.VirtualInterfaceCreateException()
yum install psmisc
[Tue Apr 28 02:38:47.095960 2020] [:error] [pid 26] [client 172.18.209.103:59638] Target WSGI script not found or unable to stat: /var/lib/kolla/venv/lib/python2.7/site-packages/openstack_dashboard/wsgi
进入到docker container中执行 cd /openstack-source-base && python setup.py install