使用kolla-ansible all-in-one部署OpenStack Stein

本文总结了最近一段时间使用kolla-ansible部署openstack stein的实践,期间碰到了很多问题。在不断的总结、定位、发现中探索,最终搭建完成。

部分内容总结较为粗略,读者在亲自搭建过程中需要稍加调整。。还请谅解,有问题欢迎来信讨论。

 

参考资料

 

  1. Kolla部署Stein版OpenStack

https://www.jianshu.com/p/9ebf1ae47df2

 

依照此步骤可以完整部署stein版本。但是provision虚机失败,可尝试后续修补方案。添加enable_octavia 时部署会失败,因为缺少相应的keypair。

 

  1. Openstack Kolla-Ansible安装部署

https://blog.csdn.net/zhongbeida_xue/article/details/84587273

 

rocky 版本的kolla-ansible部署方式,撰写时间2018年11月份,因为当时的master为rocky版,而现在是T版本,所以其中的git clone需要加参数 -b stable/stein才能顺利完成部署。

 

  1. 使用 Kolla 安装 F5 Lbaasv2 agent (Stein)_Pzhang的博客-CSDN博客

https://blog.csdn.net/Kgong/article/details/105579683

 

这个连接中提供了完整的命令集合可参考。

 

  1. 如何知道 CPU 是否支持虚拟化技术(VT)

https://linux.cn/article-9516-1.html

 

如何确认CPU支持虚拟化。

 

  1. Zun: Error on running privsep helper command

https://bugs.launchpad.net/kolla-ansible/+bug/1787760

 

部署后虚机无法provision,查看log发现跟privsep有关系。先后报了两个错:

 

Incorrect configuration file: /etc/nova/rootwrap.conf

privsep helper command exited non-zero (97)

 

Executable not found: privsep-helper (filter match = privsep-helper)

privsep helper command exited non-zero (96)

 

可查看问题与定位部分的描述。

 

  1. Kolla-ansible 官方文档:https://docs.openstack.org/project-deploy-guide/kolla-ansible/stein/quickstart.html#

 

官方文档写的“do right thing”就是扯淡。所以 它开头写的“step by step”根本不可能完成部署。

但文档中给出了些有助于理解kolla-ansible 和模块细节的资料。OpenStack Docs 给人的感觉是 你想要一片树叶,它总会给你一片森林。 最后在森林里迷失。

 

  1. 其他相关参考:

kolla-stein

kolla-ansible/stein/quickstart.htm

OpenStack Docs: Advanced Configuration

OpenStack Docs: Get images

kolla-ansible 部署openstack(vmware,all-in-one) - 简书

Kolla部署Stein版OpenStaak - 简书

 

完整部署过程

 

部署之前需要了解两个kolla部署相关的知识:

  1. 如何做定制化配置/etc/kolla/config/

customize configuration (advanced configuration)

https://docs.openstack.org/kolla-ansible/train/admin/advanced-configuration.html

  1. Kolla部署各组件的日志目录:/var/lib/docker/volumes/kolla_logs/_data/nova

 

创建虚机

 

 

需要两个interface, 其中一个做管理,另一个做neutron external network

zong-net 网络已经添加 到 external network的 router上。 

vio 中经常会出现两个网口的路由配置问题,需要删掉zong-net 这个默认路由。

 

更改网络配置

原始配置

 

interface

修改前

修改后

ifcfg-ens32

 

-> ifcfg-ens160

TYPE=Ethernet

BOOTPROTO=dhcp

DEFROUTE=yes

PEERDNS=yes

PEERROUTES=yes

IPV4_FAILURE_FATAL=no

IPV6INIT=yes

IPV6_AUTOCONF=yes

IPV6_DEFROUTE=yes

IPV6_PEERDNS=yes

IPV6_PEERROUTES=yes

IPV6_FAILURE_FATAL=no

IPV6_ADDR_GEN_MODE=stable-privacy

NAME=ens32

UUID=0127e0c2-fb46-4dfb-a1c6-209b3012f2eb

DEVICE=ens32

ONBOOT=yes

TYPE=Ethernet

BOOTPROTO=static

DEFROUTE=yes

PEERDNS=yes

PEERROUTES=yes

IPV4_FAILURE_FATAL=no

IPADDR=10.145.64.104

NETMASK=255.255.192.0

GATEWAY=10.145.127.254

IPV6INIT=yes

IPV6_AUTOCONF=yes

IPV6_DEFROUTE=yes

IPV6_PEERDNS=yes

IPV6_PEERROUTES=yes

IPV6_FAILURE_FATAL=no

IPV6_ADDR_GEN_MODE=stable-privacy

NAME=ens160

UUID=0127e0c2-fb46-4dfb-a1c6-209b3012f2eb

DEVICE=ens160

ONBOOT=yes

ifcfg-ens192

TYPE=Ethernet
BOOTPROTO=static
DEVICE=ens192
ONBOOT=yes

 

关闭防火墙\NetworkManager 和SELinux

 

systemctl stop NetworkManager firewalld
systemctl disable NetworkManager firewalld
sed -i "s/SELINUX=enforcing/SELINUX=disabled/" /etc/selinux/config
setenforce 0

 

selinux作为高逼格的访问控制能力,在初级学习实践过程中都是会被爽快的干掉的。

 

配置lvm 增大磁盘空间

 

kolla 安装stein需要至少30G的空间 存放各种docker image

 

fdisk /dev/sda

 

partprobe

 

pvcreate /dev/sda4

 

vgextend -v cl /dev/sda4

 

lvextend  -L 140G /dev/mapper/cl-root00

 

resize2fs /dev/mapper/cl-root00

 

安装virtualenv

 

 

使用virtualenv的方式可以避免对系统的python 环境造成影响。

yum install -y python-virtualenv

 

 

创建python virtual env 并activiate

 

[root@kolla ~]# virtualenv /root/venv/kolla

New python executable in /root/venv/kolla/bin/python

Installing setuptools, pip, wheel...done.

[root@kolla ~]# source /root/venv/kolla/bin/activate

(kolla) [root@kolla ~]# pip install -U pip

 

安装必要的 pip 包

 

(kolla) [root@kolla ~]# pip install ansible

 

下载kolla 和kolla-ansible 并安装依赖环境

 

事实证明 OpenStack 官网https://docs.openstack.org/project-deploy-guide/kolla-ansible/stein/quickstart.html# 提到的deployment 方式的部署方式根本无法成功完成,因为 各个版本的OpenStack 需要各自版本的kolla 及 kolla-ansible。 所以所谓的yum 和pip方式安装kolla-ansible 就是胡扯,只能git clone从源码安装(到特定branch下)。

 

git clone https://github.com/openstack/kolla -b stable/stein

git clone https://github.com/openstack/kolla-ansible -b stable/stein

 

pip install -r kolla/requirements.txt

pip install -r kolla-ansible/requirements.txt

 

cd kolla-ansible &&  python setup.py install

 

(kolla) [root@kolla kolla-ansible]# which kolla-ansible

/root/venv/kolla/bin/kolla-ansible

 

 构建部署目录

 

(kolla) [root@kolla kolla-ansible]# mkdir /etc/kolla && cp etc/kolla/* /etc/kolla

(kolla) [root@kolla kolla-ansible]# cp ansible/inventory/* ~

 

编辑 /etc/kolla/globals.yml 文件

 

(kolla) [root@kolla kolla-ansible]# diff etc/kolla/globals.yml /etc/kolla/globals.yml

15c15

< #kolla_base_distro: "centos"

---

> kolla_base_distro: "centos"

18c18

< #kolla_install_type: "binary"

---

> kolla_install_type: "source"

21c21

< #openstack_release: ""

---

> openstack_release: "stein"

31c31

< kolla_internal_vip_address: "10.10.10.254"

---

> kolla_internal_vip_address: "10.145.64.104"

89c89

< #network_interface: "eth0"

---

> network_interface: "ens160"

107c107

< #neutron_external_interface: "eth1"

---

> neutron_external_interface: "ens192"

192c192

< #enable_haproxy: "yes"

---

> enable_haproxy: "no"

459c459

< #nova_compute_virt_type: "kvm"

---

> nova_compute_virt_type: "qemu"

 

kolla_install_type: "source" 这里改成 source, 虽然这里说的支持binary,但是据说binary 不如 source稳定,奇葩。

 

openstack_release这里改成stein,且只能是stein,因为我们用的stein branch的 kolla 和kolla-ansible

 

我们是用/root/all-in-one的inventory 部署。 不需要修改 。

 

执行bootstrap-servers

 

# kolla-ansible bootstrap-servers 这个命令会初始化系统的功能,

比如安装docker 运行环境。

安装yum 包

 

执行 python genpwd.py 或者执行kolla-genpwd

 

cd /root/kolla-ansible/kolla_ansible/cmd

# python genpwd.py

如果想要,修改相关的password,例如keystone_admin_password 修改为default,方便后边登录horizon方便

 

inventory端安装docker依赖包

 

localhost 既是ansible 的控制端,也是inventory,所以localhost上的/usr/bin/python 环境也需要 单独配置,比如安装 docker 依赖包。

 

执行以下命令,注意命令需要退出当前virtualenv后执行。

 

   yum install epel-release

   yum install python-pip

   pip install -U pip

 

   pip install argparse oauth pyserial     # 这两行没有碰到问题就不用 执行。

   pip install --ignore-installed requests

 

   pip install docker

 

 

执行prechecks

 

# kolla-ansible prechecks

 

执行kolla-ansible deploy

 

这个阶段花费时间较长,kolla-ansible会拉取docker image,首次部署一般不会有什么问题,反复kolla-ansible deploy 会偶尔出现某个docker container无法启动而持续restart的情况。这个得实际查看log 找到具体原因。粗暴的解决办法:重新run kola-ansible deploy

 

执行 kolla-ansible post-deploy

 

生成/etc/kolla/admin-openrc.sh

 

horizon 的启动需要一段时间,所以要等待 horizon 启动5分钟后才能访问http://10.145.64.104

 

从第一个container启动 到执行完毕 共14 分钟 。31个container, 30G /var/lib/docker

 

安装openstackclient

 

新起一个venv

virtualenv /root/venv/openstackclient

 

source /root/venv/openstackclient/bin/activate

pip install -U pip  # 将 pip升级到最新版本,不然会出错。

 

pip install python-openstackclient

 

使用openstackclient 运行init-runonce

 

source /etc/kolla/admin-openrc.sh

openstack server list

 

执行init-runonce

 

cd /root/kolla-ansible/tools

 

 

修改 init-runonce 文件

 

EXT_NET_CIDR='192.168.100.0/24'

EXT_NET_RANGE='start=192.168.100.100,end=192.168.100.199'

EXT_NET_GATEWAY='192.168.100.1'

 

./init-runonce

 

如果此命令运行出错,需要删除init-runonce产生的痕迹,如创建的SG network image,重新run。也就是说此命令不可重入(不幂等)。

 

provision 虚机尝试

 

provision 虚机会出现的问题参考问题定位部分

 

问题与定位

 

问题:bootstrap-servers时ansible 版本不对

 

task path: /root/venv/kolla/share/kolla-ansible/ansible/roles/baremetal/tasks/install.yml:49

fatal: [localhost]: FAILED! => {

    "msg": "[u'{{ docker_apt_package }}', u'git', u'{% if not easy_install_available %}python-pip{% endif %}', u'python-setuptools', u'{% if enable_host_ntp | bool %}ntp{% endif %}', u'{% if enable_ceph_nfs|bool %}rpcbind{% endif %}']: {{ not (ansible_distribution == 'Ubuntu' and\n        ansible_distribution_major_version is version(18, 'ge'))\n   and\n   not (ansible_distribution == 'Debian' and\n        ansible_distribution_major_version is version(10, 'ge')) }}: template error while templating string: no test named 'version'. String: {{ not (ansible_distribution == 'Ubuntu' and\n        ansible_distribution_major_version is version(18, 'ge'))\n   and\n   not (ansible_distribution == 'Debian' and\n        ansible_distribution_major_version is version(10, 'ge')) }}"

 

ansible 的 版本不对。不要使用yum install ansible

而是在python env 下使用 pip install ansible

 

 

问题:执行prechecks 时ImportError: No module named docker

 

TASK [prechecks : Checking docker SDK version] *********************************************************************

fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["/usr/bin/python", "-c", "import docker; print docker.__version__"], "delta": "0:00:00.012579", "end": "2020-04-26 02:17:35.653507", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2020-04-26 02:17:35.640928", "stderr": "Traceback (most recent call last):\n  File \"\", line 1, in \nImportError: No module named docker", "stderr_lines": ["Traceback (most recent call last):", "  File \"\", line 1, in ", "ImportError: No module named docker"], "stdout": "", "stdout_lines": []}

 

localhost 既是ansible 的控制端,也是inventory,所以localhost上的/usr/bin/python 环境也需要 单独配置,比如安装 docker 依赖包。

 

执行以下命令

 

   79  yum install epel-release

   80  yum install python-pip

   81  pip install -U pip

 

   83  pip install argparse oauth pyserial     # 这两行没有碰到问题就不用 执行。

   84  pip install --ignore-installed requests

 

   82  pip install docker

 

 

问题:Cannot uninstall 'PyYAML'

 

ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

 需要 pip install --ignore-installed PyYAML

 

问题:pip install python-openstackclient 时失败

 

ERROR: Command errored out with exit status 1: /root/venv/openstackclient/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-edS76A/subprocess32/setup.py'"'"'; __file__='"'"'/tmp/pip-install-edS76A/subprocess32/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-a6JBhp/install-record.txt --single-version-externally-managed --compile --install-headers /root/venv/openstackclient/include/site/python2.7/subprocess32 Check the logs for full command output.

 

需要预先安装 依赖包,如下 。

yum -y install python-devel libffi-devel gcc openssl-devel libselinux-python

 

问题:ImportError: No module named queue

 

  File "/root/venv/openstackclient/lib/python2.7/site-packages/openstack/utils.py", line 13, in

    import queue

ImportError: No module named queue

 

修改两处源码:

 

/root/venv/openstackclient/lib/python2.7/site-packages/openstack/cloud/openstackcloud.py

/root/venv/openstackclient/lib/python2.7/site-packages/openstack/utils.py

 

import queue  改成:import Queue as queue

 

根本原因是python2 和 python3 下Queue模块的兼容问题。

 

问题:the output has been hidden due to the fact that 'no_log: true'

 

fatal: [localhost]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}

 

destroy all重新部署。。

 

问题:FailedToDropPrivileges: privsep helper command exited non-zero

 

从nova的log /var/lib/docker/volumes/kolla_logs/_data/nova 中可以看到以下信息:

 

nova/nova-compute.log:783:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]   File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 641, in create_image

nova/nova-compute.log:784:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]     _update_utime_ignore_eacces(base)

nova/nova-compute.log:785:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]   File "/var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/imagebackend.py", line 72, in _update_utime_ignore_eacces

nova/nova-compute.log:786:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]     nova.privsep.path.utime(path)

nova/nova-compute.log:787:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]   File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 243, in _wrap

nova/nova-compute.log:788:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]     self.start()

nova/nova-compute.log:789:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]   File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 254, in start

nova/nova-compute.log:790:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]     channel = daemon.RootwrapClientChannel(context=self)

nova/nova-compute.log:791:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]   File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 328, in __init__

nova/nova-compute.log:792:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]     raise FailedToDropPrivileges(msg)

nova/nova-compute.log:793:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078] FailedToDropPrivileges: privsep helper command exited non-zero (97)

nova/nova-compute.log:794:2020-04-26 05:42:32.798 6 ERROR nova.compute.manager [instance: 8f722a45-2a75-44cb-a1f5-6d3f04b91078]

 

究其原因是这个WARNING导致的:

 

2020-04-26 05:42:28.875 6 WARNING oslo.privsep.daemon [-] privsep log: /var/lib/kolla/venv/bin/nova-rootwrap: Incorrect configuration file: /etc/nova/rootwrap.conf

 

解决:

 

部署结束后做两个操作:

进入到nova_compute docker container中

1 将缺失的rootwrap.conf文件补齐。

cp /nova-base-source/nova-19.1.0/etc/nova/rootwrap.conf /etc/nova/

2 把/var/lib/kolla/venv/bin/privsep-helper所在路径加到/etc/nova/rootwrap.conf文件中

exec_dirs=/sbin,/usr/sbin,/bin,/usr/bin,/usr/local/sbin,/usr/local/bin,/var/lib/kolla/venv/bin

 

问题:VirtualInterfaceCreateException: Virtual Interface creation failed

 

确实是 如果加上了vif_plugging_is_fatal: False 就可以避免这个问题,但是根本问题是,为什么会timeout,neutron明明已经完成了对vif的plugging,但是nova没有收到。

问题的解决方法有两个:

1 添加vif_plugging_is_fatal: False 重启nova_compute 节点,但是如何做到更新配置后重启。。

 

vif_plugging_is_fatal: False

vif_plugging_timeout: 0

 

参见下边的customize configuration

https://docs.openstack.org/kolla-ansible/train/admin/advanced-configuration.html

 

2 找到为什么neutron无法连接nova 以返回success的原因。

 

https://ask.openstack.org/en/question/26938/virtualinterfacecreateexception-virtual-interface-creation-failed/

 

        try:

            with self.virtapi.wait_for_instance_event(

                    instance, events, deadline=timeout,

                    error_callback=self._neutron_failed_callback):

                self.plug_vifs(instance, network_info)

                self.firewall_driver.setup_basic_filtering(instance,

                                                           network_info)

                self.firewall_driver.prepare_instance_filter(instance,

                                                             network_info)

                with self._lxc_disk_handler(context, instance,

                                            instance.image_meta,

                                            block_device_info):

                    guest = self._create_domain(

                        xml, pause=pause, power_on=power_on,

                        post_xml_callback=post_xml_callback)

 

                self.firewall_driver.apply_instance_filter(instance,

                                                           network_info)

        except exception.VirtualInterfaceCreateException:

            # Neutron reported failure and we didn't swallow it, so

            # bail here

            with excutils.save_and_reraise_exception():

                self._cleanup_failed_start(context, instance, network_info,

                                           block_device_info, guest,

                                           destroy_disks_on_failure)

        except eventlet.timeout.Timeout:

            # We never heard from Neutron

            LOG.warning('Timeout waiting for %(events)s for '

                        'instance with vm_state %(vm_state)s and '

                        'task_state %(task_state)s.',

                        {'events': events,

                         'vm_state': instance.vm_state,

                         'task_state': instance.task_state},

                        instance=instance)

            if CONF.vif_plugging_is_fatal:

                self._cleanup_failed_start(context, instance, network_info,

                                           block_device_info, guest,

                                           destroy_disks_on_failure)

                raise exception.VirtualInterfaceCreateException()

 

 

问题:killall: command not found

 

yum install psmisc  

 

 

问题:openstack horizon无法访问

 

[Tue Apr 28 02:38:47.095960 2020] [:error] [pid 26] [client 172.18.209.103:59638] Target WSGI script not found or unable to stat: /var/lib/kolla/venv/lib/python2.7/site-packages/openstack_dashboard/wsgi

 

进入到docker container中执行 cd /openstack-source-base && python setup.py install

你可能感兴趣的:(openstack)