关于OpenStack实例冷热迁移相关问题处理

文章目录

      • 一 、热迁移问题处理
        • 1.1 libvirt 远程连接拒绝
      • 二、冷迁移问题处理
        • 2.1 ssh命令执行失败

一 、热迁移问题处理

将实例从compute01节点热迁移compute02 节点

1.1 libvirt 远程连接拒绝

错误详情

# 查看 compute01 节点nova日志  vim /var/log/nova/nova-compute.log

nova.virt.libvirt.driver [-] [instance: 41569ceb-335d-42ca-b6f8-9a562105081e] Live Migration failure: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02/system: unable to connect to server at 'compute02:16509': Connection refused: libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tcp://compute02/system: unable to connect to server at 'compute02:16509': Connection refused

需配置主机compute02libvirt,使得compute01可以远程连接其libvirt


# 1、备份原配置文件
sed -i.default -e '/^#/d' -e '/^$/d' /etc/libvirt/libvirtd.conf

# 2、编辑配置文件/etc/libvirt/libvirtd.conf
# vim  /etc/libvirt/libvirtd.conf
listen_addr = "0.0.0.0"
listen_tls = 0
listen_tcp = 1
unix_sock_group = "root"
unix_sock_rw_perms = "0777"
auth_unix_ro = "none"
auth_unix_rw = "none"
log_filters="2:qemu_monitor_json 2:qemu_driver"
log_outputs="2:file:/var/log/libvirt/libvirtd.log"
tcp_port = "16509"
auth_tcp = "none"

# 3、 编辑文件 vim /etc/sysconfig/libvirtd
# vim /etc/sysconfig/libvirtd
LIBVIRTD_ARGS="--listen"

# 4、 重启libvirtd活动
[root@compute02 libvirt]# systemctl restart libvirtd

# 5、
[root@controller ~]# openstack server migrate 41569ceb-335d-42ca-b6f8-9a562105081e --live compute02

compute01主机上测试能否连接compute02libvirt,测试成功

controller上通过命令行手动测试热迁移,成功

[root@compute01 ~]# virsh connect qemu+tcp://compute02:16509/system

二、冷迁移问题处理

将实例从compute01节点冷迁移至compute02 节点

2.1 ssh命令执行失败

错误详情

# 查看 compute01 节点nova日志  vim /var/log/nova/nova-compute.log
021-10-13 08:44:13.178 32957 ERROR oslo_messaging.rpc.server [req-560e23d7-c795-47a8-90a8-22484fedf434 03b0360129f84f9790081df4cebf7844 b5a1eb4ee8374fa1aa88cd4b59afda98 - default default] Exception during message handling: ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes 192.168.204.175 mkdir -p /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e
Exit code: 255
Stdout: u''
Stderr: u'Host key verification failed.\r\n'

需要配置compute01主机可以免密登录compute02 主机

# 在compute01执行 
[root@compute01 nova]# ssh-copy-id [email protected]
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
ERROR: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
ERROR: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
ERROR: Someone could be eavesdropping on you right now (man-in-the-middle attack)!
ERROR: It is also possible that a host key has just been changed.
ERROR: The fingerprint for the ECDSA key sent by the remote host is
ERROR: SHA256:GvRgHIb8ZFbDaHsQKcpZHg16WXhN1ZkD5WasJ4rnhak.
ERROR: Please contact your system administrator.
ERROR: Add correct host key in /root/.ssh/known_hosts to get rid of this message.
ERROR: Offending ECDSA key in /root/.ssh/known_hosts:4
ERROR: ECDSA host key for 192.168.204.175 has changed and you have requested strict checking.
ERROR: Host key verification failed.

# 该报错需删除 compute01主机下的 /root/.ssh/known_hosts 文件中,关于 192.168.204.175 的数据行
[root@compute01 ~]# vim /root/.ssh/known_hosts 
# 在compute01执行 
[root@compute01 nova]# ssh-copy-id [email protected]
[root@compute01 ~]# ssh-copy-id [email protected]
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
[email protected]'s password: 
sh: .ssh/authorized_keys: Permission denied

# 该报错需修改compute02机器的 .ssh/authorized_keys 文件权限
[root@compute02 .ssh]# chmod 700 ~/.ssh
[root@compute02 .ssh]# chmod 600 authorized_keys
# 但是发现compute02机器的 .ssh/authorized_keys 文件为只读,且root用户也无法删除或者修改该文件,导致在compoute01上执行ssh-copy-id命令时,无法修改compute02机器上的authorized_keys文件
# 查询被附加权限
[root@compute02 .ssh]# lsattr authorized_keys
----i----------- authorized_keys
# 取消该权限
[root@compute02 .ssh]# chattr -i authorized_keys

# 在compute01执行 
[root@compute01 nova]# ssh-copy-id [email protected]
# 成功设置免密登录 compute02

免密登录设置成功,测试冷迁移,仍失败,且报错内容和最开始没有配置免密登录时的错误一致

# 查看 compute01 节点nova日志  vim /var/log/nova/nova-compute.log
021-10-13 08:44:13.178 32957 ERROR oslo_messaging.rpc.server [req-560e23d7-c795-47a8-90a8-22484fedf434 03b0360129f84f9790081df4cebf7844 b5a1eb4ee8374fa1aa88cd4b59afda98 - default default] Exception during message handling: ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
Command: ssh -o BatchMode=yes 192.168.204.175 mkdir -p /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e
Exit code: 255
Stdout: u''
Stderr: u'Host key verification failed.\r\n'

# 尝试在compute01执行 ssh -o BatchMode=yes 192.168.204.175 mkdir -p /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e
[root@compute01 nova]# ssh -o BatchMode=yes 192.168.204.175 mkdir -p /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e

# 命令执行成功,且在compute02上能看到 /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e目录
[root@compute02 ~]# ls /var/lib/nova/instances/41569ceb-335d-42ca-b6f8-9a562105081e

# 分析
# 我们设置的是compute01的root用户,免密登录compute02的root用户,nova-compute服务启动时使用的是nova用户。因此仍提示ssh失败
# 修改nova-compute服务的启动配置,设置User为root (默认是nova,刚我们是设置的root用户的免密)
[root@compute01 nova]# vim /usr/lib/systemd/system/openstack-nova-compute.service
[Service]
...
User=root

[root@compute01 nova]# systemctl daemon-reload
[root@compute01 nova]# systemctl restart openstack-nova-compute.service

再次执行 冷迁移,提示成功!!!

冷迁移在完成迁移后,需要手动确认是否确认或者撤销此次迁移

关于OpenStack实例冷热迁移相关问题处理_第1张图片

可通过配置compute节点的/etc/nova/nova.conf文件来自动确认此次迁移

root@compute02 ~]# vim  /etc/nova/nova.conf
[DEFAULT]
...
resize_confirm_window=30

# 表示迁移完成后,实例的待确认状态持续时间大于resize_confirm_window时,自动确认提交迁移

你可能感兴趣的:(OpenStack,云计算,1024程序员节)