【OpenStack】虚拟机在线迁移失败问题及解决办法

声明:

本博客欢迎转发,但请保留原作者信息!

新浪微博:@孔令贤HW

博客地址:http://blog.csdn.net/lynn_kong

内容系本人学习、研究和总结,如有雷同,实属荣幸!


更新历史;

2013.07.17  该问题在最新的主干分支中不存在了,因为nova-scheduler中的代码有部分重构过,不知是无意还是有意,修复了这个bug


version:OpenStack Grizzly 2013.1.2

hypervisor:KVM

shared  storage: no


1、问题描述

我的环境上有一台虚拟机,信息如下:

root@controller:~# nova show ubuntu_bdm_with_keypair
+-------------------------------------+----------------------------------------------------------+
| Property                            | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| status                              | ACTIVE                                                   |
| updated                             | 2013-07-11T05:15:46Z                                     |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-SRV-ATTR:host                | controller                                               |
| key_name                            | mykey                                                    |
| image                               | Attempt to boot from volume - no image supplied          |
| hostId                              | 6c1b0e7f432cdca4fe62380f271d7b83999b3aeee6e6893cc90db44a |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000005                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname | controller.konglingxian.com                              |
| flavor                              | kong.flavor (6)                                          |
| id                                  | 6cd558d9-e924-4598-8e63-e86a20929bd9                     |
| security_groups                     | [{u'name': u'default'}]                                  |
| demo_net1 network                   | 10.1.1.2, 128.3.11.101                                   |
| user_id                             | 9bfc5979e8774ea99f54ba78d07c3bc0                         |
| name                                | ubuntu_bdm_with_keypair                                  |
| created                             | 2013-07-10T07:48:04Z                                     |
| tenant_id                           | d9a9b59b0be94489a85f51ba3ced15ce                         |
| OS-DCF:diskConfig                   | MANUAL                                                   |
| metadata                            | {}                                                       |
| accessIPv4                          |                                                          |
| accessIPv6                          |                                                          |
| progress                            | 0                                                        |
| OS-EXT-STS:power_state              | 1                                                        |
| OS-EXT-AZ:availability_zone         | nova                                                     |
| config_drive                        |                                                          |
+-------------------------------------+----------------------------------------------------------+
环境上两个节点:

root@controller:~# nova hypervisor-list
+----+-----------------------------+
| ID | Hypervisor hostname         |
+----+-----------------------------+
| 1  | compute.konglingxian.com    |
| 2  | controller.konglingxian.com |
+----+-----------------------------+
虚拟机正常运行,对虚拟机执行live-migration操作:

nova live-migration --block-migrate 6cd558d9-e924-4598-8e63-e86a20929bd9

返回的异常信息如下:

{
    "badRequest": {
        "message": "Live migration of instance 6cd558d9-e924-4598-8e63-e86a20929bd9 to host controller failed",
        "code": 400
    }
}


2、问题分析

先查看日志中的异常堆栈:

2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions Traceback (most recent call last):
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 117, in live_migration
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     context, ex, {})
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     self.gen.next()
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 96, in live_migration
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     block_migration, disk_over_commit)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/driver.py", line 196, in schedule_live_migration
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     ignore_hosts)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/driver.py", line 272, in _live_migration_dest_check
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     filter_properties)[0]
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 146, in select_hosts
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     request_spec, filter_properties, instance_uuids)]
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 336, in _schedule
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     filter_properties)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py", line 342, in get_filtered_hosts
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     hosts, filter_properties)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/filters.py", line 53, in get_filtered_objects
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     return list(objs)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/filters.py", line 39, in filter_all
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     if self._filter_one(obj, filter_properties):
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/filters/__init__.py", line 30, in _filter_one
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     return self.host_passes(obj, filter_properties)
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions   File "/usr/lib/python2.7/dist-packages/nova/scheduler/filters/image_props_filter.py", line 78, in host_passes
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions     image_props = spec.get('image', {}).get('properties', {})
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions AttributeError: 'NoneType' object has no attribute 'get'
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44.979 32348 TRACE nova.api.openstack.compute.contrib.admin_actions 
2013-07-10 15:07:44 INFO [nova.api.openstack.wsgi 673] [32348] HTTP exception thrown: Live migration of instance 6cd558d9-e924-4598-8e63-e86a20929bd9 to another host failed

原来问题出在调度上,日志说的很明显了,是在image_props_filter中出现异常,spec.get('image', {})返回了None,导致python异常。那么spec.get('image', {})为什么返回None呢?从代码追溯一下spec中的image属性从何而来:

            if not instance_ref['image_ref']:
                image = None
            else:
                image = self.image_service.show(context,
                                                instance_ref['image_ref'])
            request_spec = {'instance_properties': instance_ref,
                            'instance_type': instance_type,
                            'instance_uuids': [instance_ref['uuid']],
                            'image': image}
再回头看一下虚拟机信息,发现这个虚拟机是一个后端卷启动的虚拟机(boot from volume),至此,问题根因分析清楚。


3、问题解决

有两种解决方法:
1)修改在线迁移虚拟机的命令参数,强制指定目的主机,跳过schedule的阶段,改成如下(注意,如果是后端卷启动,就不能加--block-migrate参数,详细原因请参见:http://blog.csdn.net/lynn_kong/article/details/9186201):
nova live-migration 6cd558d9-e924-4598-8e63-e86a20929bd9 compute
2)修改Nova的配置项scheduler_default_filters(默认配置是['RetryFilter', 'AvailabilityZoneFilter', 'RamFilter', 'ComputeFilter', 'ComputeCapabilitiesFilter', 'ImagePropertiesFilter']),将其中的ImagePropertiesFilter删除,重启nova-scheduler进程后再次执行迁移,成功。

你可能感兴趣的:(filter,openstack,live,migrate)