(原)KVM环境下虚拟机迁移失败问题解决
使用开源云工具OpenNebula3.8.1在KVM环境下虚拟机迁移失败问题解决。
1、虚拟机迁移失败1日志:
1、虚拟机迁移失败1日志:
Fri Mar 8 17:57:18 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Fri Mar 8 17:57:30 2013 [VMM][I]: ExitCode: 0
Fri Mar 8 17:57:30 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Fri Mar 8 17:57:30 2013 [VMM][I]: ExitCode: 0
Fri Mar 8 17:57:30 2013 [VMM][I]: Successfully execute network driver operation: clean.
Fri Mar 8 17:58:14 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Fri Mar 8 17:58:14 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/42/disk.0
Fri Mar 8 17:58:14 2013 [TM][I]: ExitCode: 0
Fri Mar 8 18:02:28 2013 [TM][I]: mv: Moving bcec162:/one_images_3.8.1/0/42 to node153:/one_images_3.8.1/0/42
Fri Mar 8 18:02:28 2013 [TM][I]: ExitCode: 0
Fri Mar 8 18:02:29 2013 [LCM][I]: New VM state is BOOT
Fri Mar 8 18:02:30 2013 [VMM][I]: ExitCode: 0
Fri Mar 8 18:02:30 2013 [VMM][I]: Successfully execute network driver operation: pre.
Fri Mar 8 18:02:33 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/42/checkpoint node153 42 node153
Fri Mar 8 18:02:33 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/42/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/42/checkpoint
Fri Mar 8 18:02:33 2013 [VMM][I]: error: unable to set user and group to '0:0' on '/one_images_3.8.1/0/42/disk.1': No such file or directory
Fri Mar 8 18:02:33 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/42/checkpoint
Fri Mar 8 18:02:33 2013 [VMM][I]: ExitCode: 1
Fri Mar 8 18:02:33 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Fri Mar 8 18:02:33 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/42/checkpoint
Fri Mar 8 18:02:34 2013 [DiM][I]: New VM state is FAILED
Sat Mar 9 09:23:46 2013 [DiM][I]: New VM state is DONE.
Sat Mar 9 09:23:46 2013 [TM][W]: Ignored: LOG I 42 ExitCode: 0
Sat Mar 9 09:23:47 2013 [TM][W]: Ignored: LOG I 42 delete: Deleting /one_images_3.8.1/0/42
Sat Mar 9 09:23:47 2013 [TM][W]: Ignored: LOG I 42 ExitCode: 0
Sat Mar 9 09:23:47 2013 [TM][W]: Ignored: TRANSFER SUCCESS 42 -
解决方法:
在mv脚本中TAR拷贝命令前面增加$SUDO命令.
$ONE_LOCATION/var/remotes/tm/ssh/mv
#!/bin/bash
# -------------------------------------------------------------------------- #
# Copyright 2002-2012, OpenNebula Project Leads (OpenNebula.org) #
# #
# Licensed under the Apache License, Version 2.0 (the "License"); you may #
# not use this file except in compliance with the License. You may obtain #
# a copy of the License at #
# #
# http://www.apache.org/licenses/LICENSE-2.0 #
# #
# Unless required by applicable law or agreed to in writing, software #
# distributed under the License is distributed on an "AS IS" BASIS, #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. #
# See the License for the specific language governing permissions and #
# limitations under the License. #
#--------------------------------------------------------------------------- #
# MV <hostA:system_ds/disk.i|hostB:system_ds/disk.i> vmid dsid
# <hostA:system_ds/|hostB:system_ds/>
# - hostX is the target host to deploy the VM
# - system_ds is the path for the system datastore in the host
# - vmid is the id of the VM
# - dsid is the target datastore (0 is the system datastore)
SRC=$1
DST=$2
VMID=$3
DSID=$4
if [ -z "${ONE_LOCATION}" ]; then
TMCOMMON=/var/lib/one/remotes/tm/tm_common.sh
else
TMCOMMON=$ONE_LOCATION/var/remotes/tm/tm_common.sh
fi
. $TMCOMMON
#-------------------------------------------------------------------------------
# Return if moving a disk, we will move them when moving the whole system_ds
# directory for the VM
#-------------------------------------------------------------------------------
SRC=`fix_dir_slashes $SRC`
DST=`fix_dir_slashes $DST`
SRC_PATH=`arg_path $SRC`
DST_PATH=`arg_path $DST`
SRC_HOST=`arg_host $SRC`
DST_HOST=`arg_host $DST`
DST_DIR=`dirname $DST_PATH`
SRC_DS_DIR=`dirname $SRC_PATH`
SRC_VM_DIR=`basename $SRC_PATH`
if [ `is_disk $DST_PATH` -eq 1 ]; then
log "-------------------------$DST_PATH"
exit 0
fi
if [ "$SRC" == "$DST" ]; then
exit 0
fi
ssh_make_path "$DST_HOST" "$DST_DIR"
log "Moving $SRC to $DST"
ssh_exec_and_log "$DST_HOST" "rm -rf '$DST_PATH'" \
"Error removing target path to prevent overwrite errors"
TAR_COPY="$SSH $SRC_HOST '$SUDO $TAR -C $SRC_DS_DIR -cf - $SRC_VM_DIR'"
TAR_COPY="$TAR_COPY | $SSH $DST_HOST '$TAR -C $DST_DIR -xf -'"
exec_and_log "eval $TAR_COPY" "Error copying disk directory to target host"
exec_and_log "$SSH $SRC_HOST rm -rf $SRC_PATH"
exit 0
-------------------------------------------------------------------------------------------
2、虚拟机迁移失败2日志:
Sat Mar 9 09:34:12 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Sat Mar 9 09:34:24 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 09:34:24 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Sat Mar 9 09:34:24 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 09:34:24 2013 [VMM][I]: Successfully execute network driver operation: clean.
Sat Mar 9 09:34:25 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Sat Mar 9 09:34:25 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/43/disk.0
Sat Mar 9 09:34:25 2013 [TM][I]: ExitCode: 0
Sat Mar 9 09:36:38 2013 [TM][I]: mv: Moving node153:/one_images_3.8.1/0/43 to bcec162:/one_images_3.8.1/0/43
Sat Mar 9 09:36:38 2013 [TM][I]: mv: -------------------target copyy
Sat Mar 9 09:36:38 2013 [TM][I]: mv: ++++++++++++++++++++++end copy
Sat Mar 9 09:36:38 2013 [TM][I]: ExitCode: 0
Sat Mar 9 09:36:38 2013 [LCM][I]: New VM state is BOOT
Sat Mar 9 09:36:38 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 09:36:38 2013 [VMM][I]: Successfully execute network driver operation: pre.
Sat Mar 9 09:36:42 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/43/checkpoint bcec162 43 bcec162
Sat Mar 9 09:36:42 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/43/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 09:36:42 2013 [VMM][I]: error: internal error process exited while connecting to monitor: Supported machines are:
Sat Mar 9 09:36:42 2013 [VMM][I]: pc RHEL 6.0.0 PC (alias of rhel6.0.0)
Sat Mar 9 09:36:42 2013 [VMM][I]: rhel6.0.0 RHEL 6.0.0 PC (default)
Sat Mar 9 09:36:42 2013 [VMM][I]: rhel5.5.0 RHEL 5.5.0 PC
Sat Mar 9 09:36:42 2013 [VMM][I]: rhel5.4.4 RHEL 5.4.4 PC
Sat Mar 9 09:36:42 2013 [VMM][I]: rhel5.4.0 RHEL 5.4.0 PC
Sat Mar 9 09:36:42 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 09:36:42 2013 [VMM][I]: ExitCode: 1
Sat Mar 9 09:36:42 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Sat Mar 9 09:36:42 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 09:36:42 2013 [DiM][I]: New VM state is FAILED
登陆到节点:
[root@bcec162 43]# virsh restore checkpoint
错误:从 checkpoint 恢复域失败
错误:internal error process exited while connecting to monitor: Supported machines are:
pc RHEL 6.0.0 PC (alias of rhel6.0.0)
rhel6.0.0 RHEL 6.0.0 PC (default)
rhel5.5.0 RHEL 5.5.0 PC
rhel5.4.4 RHEL 5.4.4 PC
rhel5.4.0 RHEL 5.4.0 PC
修改了bcec162节点的/etc/libvirt/qemu.conf文件:
# The user ID for QEMU processes run by the system instance
user = "root"
# The group ID for QEMU processes run by the system instance
group = "root"
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
#dynamic_ownership = 0
bcec162节点迁移到node153节点成功。
[root@node153 43]# ll
total 5075464
-rw-r--r-- 1 root root 287215779 Mar 8 11:11 checkpoint
-rw-r--r-- 1 oneadmin kvm 283538737 Mar 9 09:34 checkpoint.1362712278
-rw-r--r-- 1 oneadmin kvm 920 Mar 9 09:26 deployment.0
-rw-r--r-- 1 root root 4621008896 Mar 9 10:14 disk.0
-rw-r----- 1 root root 401408 Mar 9 09:26 disk.1
lrwxrwxrwx 1 oneadmin kvm 29 Mar 9 10:09 disk.1.iso -> /one_images_3.8.1/0/43/disk.1
--------------------------------------------------------------------------------------------------------
3、仅修改node152节点的/etc/libvirt/qemu.conf文件:
# The user ID for QEMU processes run by the system instance
#user = "root"
# The group ID for QEMU processes run by the system instance
#group = "root"
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
dynamic_ownership = 0
从bcec162节点迁移到node152不成功,日志如下:
Sat Mar 9 10:31:47 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Sat Mar 9 10:31:54 2013 [VMM][I]: save: Moving old checkpoint file /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 10:31:54 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 10:31:54 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Sat Mar 9 10:31:54 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 10:31:54 2013 [VMM][I]: Successfully execute network driver operation: clean.
Sat Mar 9 10:31:55 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Sat Mar 9 10:31:55 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/43/disk.0
Sat Mar 9 10:31:55 2013 [TM][I]: ExitCode: 0
Sat Mar 9 10:35:02 2013 [TM][I]: mv: Moving bcec162:/one_images_3.8.1/0/43 to node152:/one_images_3.8.1/0/43
Sat Mar 9 10:35:02 2013 [TM][I]: mv: -------------------target copyy
Sat Mar 9 10:35:02 2013 [TM][I]: mv: ++++++++++++++++++++++end copy
Sat Mar 9 10:35:02 2013 [TM][I]: ExitCode: 0
Sat Mar 9 10:35:02 2013 [LCM][I]: New VM state is BOOT
Sat Mar 9 10:35:03 2013 [VMM][I]: ExitCode: 0
Sat Mar 9 10:35:03 2013 [VMM][I]: Successfully execute network driver operation: pre.
Sat Mar 9 10:35:07 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/43/checkpoint node152 43 node152
Sat Mar 9 10:35:07 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/43/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 10:35:07 2013 [VMM][I]: error: operation failed: failed to retrieve chardev info in qemu with 'info chardev'
Sat Mar 9 10:35:07 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 10:35:07 2013 [VMM][I]: ExitCode: 1
Sat Mar 9 10:35:07 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Sat Mar 9 10:35:07 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar 9 10:35:07 2013 [DiM][I]: New VM state is FAILED
登陆到node152节点执行restore命令:
[root@node152 43]# virsh restore checkpoint
error: Failed to restore domain from checkpoint
error: internal error process exited while connecting to monitor: qemu: could not open disk image /one_images_3.8.1/0/43/disk.0: Permission denied
将/etc/libvirt/qemu.conf文件中注释掉dynamic_ownership=0,开启user=root和group=root.
如果开启dynamic_ownership则恢复虚拟机也会报出上面的错误信息。
在node152节点上恢复虚拟机:
[root@node152 43]# virsh restore checkpoint
Domain restored from checkpoint
[root@node152 43]# virsh list
Id Name State
----------------------------------
117 one-43 running
参考文章:
https://wiki.archlinux.org/index.php/QEMU_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)
http://hi.baidu.com/juacm/item/f1fc3f98d8428ad07a7f01e2
转载请保持原链接:http://www.blogjava.net/ldwblog/archive/2013/03/08/396187.html