(原)KVM环境下虚拟机迁移失败问题解决

(原)KVM环境下虚拟机迁移失败问题解决
使用开源云工具OpenNebula3.8.1在KVM环境下虚拟机迁移失败问题解决。
1、虚拟机迁移失败1日志:
Fri Mar  8 17:57:18 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Fri Mar  8 17:57:30 2013 [VMM][I]: ExitCode: 0
Fri Mar  8 17:57:30 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Fri Mar  8 17:57:30 2013 [VMM][I]: ExitCode: 0
Fri Mar  8 17:57:30 2013 [VMM][I]: Successfully execute network driver operation: clean.
Fri Mar  8 17:58:14 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Fri Mar  8 17:58:14 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/42/disk.0
Fri Mar  8 17:58:14 2013 [TM][I]: ExitCode: 0
Fri Mar  8 18:02:28 2013 [TM][I]: mv: Moving bcec162:/one_images_3.8.1/0/42 to node153:/one_images_3.8.1/0/42
Fri Mar  8 18:02:28 2013 [TM][I]: ExitCode: 0
Fri Mar  8 18:02:29 2013 [LCM][I]: New VM state is BOOT
Fri Mar  8 18:02:30 2013 [VMM][I]: ExitCode: 0
Fri Mar  8 18:02:30 2013 [VMM][I]: Successfully execute network driver operation: pre.
Fri Mar  8 18:02:33 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/42/checkpoint node153 42 node153
Fri Mar  8 18:02:33 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/42/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/42/checkpoint
Fri Mar  8 18:02:33 2013 [VMM][I]: error: unable to set user and group to '0:0' on '/one_images_3.8.1/0/42/disk.1': No such file or directory
Fri Mar  8 18:02:33 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/42/checkpoint
Fri Mar  8 18:02:33 2013 [VMM][I]: ExitCode: 1
Fri Mar  8 18:02:33 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Fri Mar  8 18:02:33 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/42/checkpoint
Fri Mar  8 18:02:34 2013 [DiM][I]: New VM state is FAILED
Sat Mar  9 09:23:46 2013 [DiM][I]: New VM state is DONE.
Sat Mar  9 09:23:46 2013 [TM][W]: Ignored: LOG I 42 ExitCode: 0
Sat Mar  9 09:23:47 2013 [TM][W]: Ignored: LOG I 42 delete: Deleting /one_images_3.8.1/0/42
Sat Mar  9 09:23:47 2013 [TM][W]: Ignored: LOG I 42 ExitCode: 0
Sat Mar  9 09:23:47 2013 [TM][W]: Ignored: TRANSFER SUCCESS 42 -
解决方法:
在mv脚本中TAR拷贝命令前面增加$SUDO命令.
$ONE_LOCATION/var/remotes/tm/ssh/mv
#!/bin/bash
# -------------------------------------------------------------------------- #
# Copyright 2002-2012, OpenNebula Project Leads (OpenNebula.org)             #
#                                                                            #
# Licensed under the Apache License, Version 2.0 (the "License"); you may    #
# not use this file except in compliance with the License. You may obtain    #
# a copy of the License at                                                   #
#                                                                            #
# http://www.apache.org/licenses/LICENSE-2.0                                 #
#                                                                            #
# Unless required by applicable law or agreed to in writing, software        #
# distributed under the License is distributed on an "AS IS" BASIS,          #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   #
# See the License for the specific language governing permissions and        #
# limitations under the License.                                             #
#--------------------------------------------------------------------------- #
# MV <hostA:system_ds/disk.i|hostB:system_ds/disk.i> vmid dsid
#    <hostA:system_ds/|hostB:system_ds/>
#   - hostX is the target host to deploy the VM
#   - system_ds is the path for the system datastore in the host
#   - vmid is the id of the VM
#   - dsid is the target datastore (0 is the system datastore)
SRC=$1
DST=$2
VMID=$3
DSID=$4
if [ -z "${ONE_LOCATION}" ]; then
    TMCOMMON=/var/lib/one/remotes/tm/tm_common.sh
else
    TMCOMMON=$ONE_LOCATION/var/remotes/tm/tm_common.sh
fi
. $TMCOMMON
#-------------------------------------------------------------------------------
# Return if moving a disk, we will move them when moving the whole system_ds
# directory for the VM
#-------------------------------------------------------------------------------
SRC=`fix_dir_slashes $SRC`
DST=`fix_dir_slashes $DST`
SRC_PATH=`arg_path $SRC`
DST_PATH=`arg_path $DST`
SRC_HOST=`arg_host $SRC`
DST_HOST=`arg_host $DST`
DST_DIR=`dirname $DST_PATH`
SRC_DS_DIR=`dirname  $SRC_PATH`
SRC_VM_DIR=`basename $SRC_PATH`
if [ `is_disk $DST_PATH` -eq 1 ]; then
    log "-------------------------$DST_PATH"
exit 0
fi
if [ "$SRC" == "$DST" ]; then
    exit 0
fi
ssh_make_path "$DST_HOST" "$DST_DIR"
log "Moving $SRC to $DST"
ssh_exec_and_log "$DST_HOST" "rm -rf '$DST_PATH'" \
    "Error removing target path to prevent overwrite errors"
TAR_COPY="$SSH $SRC_HOST '$SUDO $TAR -C $SRC_DS_DIR -cf - $SRC_VM_DIR'"
TAR_COPY="$TAR_COPY | $SSH $DST_HOST '$TAR -C $DST_DIR -xf -'"
exec_and_log "eval $TAR_COPY" "Error copying disk directory to target host"
exec_and_log "$SSH $SRC_HOST rm -rf $SRC_PATH"
exit 0
-------------------------------------------------------------------------------------------
2、虚拟机迁移失败2日志:
Sat Mar  9 09:34:12 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Sat Mar  9 09:34:24 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 09:34:24 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Sat Mar  9 09:34:24 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 09:34:24 2013 [VMM][I]: Successfully execute network driver operation: clean.
Sat Mar  9 09:34:25 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Sat Mar  9 09:34:25 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/43/disk.0
Sat Mar  9 09:34:25 2013 [TM][I]: ExitCode: 0
Sat Mar  9 09:36:38 2013 [TM][I]: mv: Moving node153:/one_images_3.8.1/0/43 to bcec162:/one_images_3.8.1/0/43
Sat Mar  9 09:36:38 2013 [TM][I]: mv: -------------------target copyy
Sat Mar  9 09:36:38 2013 [TM][I]: mv: ++++++++++++++++++++++end copy
Sat Mar  9 09:36:38 2013 [TM][I]: ExitCode: 0
Sat Mar  9 09:36:38 2013 [LCM][I]: New VM state is BOOT
Sat Mar  9 09:36:38 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 09:36:38 2013 [VMM][I]: Successfully execute network driver operation: pre.
Sat Mar  9 09:36:42 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/43/checkpoint bcec162 43 bcec162
Sat Mar  9 09:36:42 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/43/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 09:36:42 2013 [VMM][I]: error: internal error process exited while connecting to monitor: Supported machines are:
Sat Mar  9 09:36:42 2013 [VMM][I]: pc         RHEL 6.0.0 PC (alias of rhel6.0.0)
Sat Mar  9 09:36:42 2013 [VMM][I]: rhel6.0.0  RHEL 6.0.0 PC (default)
Sat Mar  9 09:36:42 2013 [VMM][I]: rhel5.5.0  RHEL 5.5.0 PC
Sat Mar  9 09:36:42 2013 [VMM][I]: rhel5.4.4  RHEL 5.4.4 PC
Sat Mar  9 09:36:42 2013 [VMM][I]: rhel5.4.0  RHEL 5.4.0 PC
Sat Mar  9 09:36:42 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 09:36:42 2013 [VMM][I]: ExitCode: 1
Sat Mar  9 09:36:42 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Sat Mar  9 09:36:42 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 09:36:42 2013 [DiM][I]: New VM state is FAILED
登陆到节点:
[root@bcec162 43]# virsh restore checkpoint 
错误:从 checkpoint 恢复域失败
错误:internal error process exited while connecting to monitor: Supported machines are:
pc         RHEL 6.0.0 PC (alias of rhel6.0.0)
rhel6.0.0  RHEL 6.0.0 PC (default)
rhel5.5.0  RHEL 5.5.0 PC
rhel5.4.4  RHEL 5.4.4 PC
rhel5.4.0  RHEL 5.4.0 PC
修改了bcec162节点的/etc/libvirt/qemu.conf文件:
# The user ID for QEMU processes run by the system instance
user = "root"
# The group ID for QEMU processes run by the system instance
group = "root"
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
#dynamic_ownership = 0
bcec162节点迁移到node153节点成功。
[root@node153 43]# ll
total 5075464
-rw-r--r-- 1 root     root  287215779 Mar  8 11:11 checkpoint
-rw-r--r-- 1 oneadmin kvm   283538737 Mar  9 09:34 checkpoint.1362712278
-rw-r--r-- 1 oneadmin kvm         920 Mar  9 09:26 deployment.0
-rw-r--r-- 1 root     root 4621008896 Mar  9 10:14 disk.0
-rw-r----- 1 root     root     401408 Mar  9 09:26 disk.1
lrwxrwxrwx 1 oneadmin kvm          29 Mar  9 10:09 disk.1.iso -> /one_images_3.8.1/0/43/disk.1
--------------------------------------------------------------------------------------------------------
3、仅修改node152节点的/etc/libvirt/qemu.conf文件:
# The user ID for QEMU processes run by the system instance
#user = "root"
# The group ID for QEMU processes run by the system instance
#group = "root"
# Whether libvirt should dynamically change file ownership
# to match the configured user/group above. Defaults to 1.
# Set to 0 to disable file ownership changes.
dynamic_ownership = 0
从bcec162节点迁移到node152不成功,日志如下:
Sat Mar  9 10:31:47 2013 [LCM][I]: New VM state is SAVE_MIGRATE
Sat Mar  9 10:31:54 2013 [VMM][I]: save: Moving old checkpoint file /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 10:31:54 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 10:31:54 2013 [VMM][I]: Successfully execute virtualization driver operation: save.
Sat Mar  9 10:31:54 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 10:31:54 2013 [VMM][I]: Successfully execute network driver operation: clean.
Sat Mar  9 10:31:55 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
Sat Mar  9 10:31:55 2013 [TM][I]: mv: -------------------------/one_images_3.8.1/0/43/disk.0
Sat Mar  9 10:31:55 2013 [TM][I]: ExitCode: 0
Sat Mar  9 10:35:02 2013 [TM][I]: mv: Moving bcec162:/one_images_3.8.1/0/43 to node152:/one_images_3.8.1/0/43
Sat Mar  9 10:35:02 2013 [TM][I]: mv: -------------------target copyy
Sat Mar  9 10:35:02 2013 [TM][I]: mv: ++++++++++++++++++++++end copy
Sat Mar  9 10:35:02 2013 [TM][I]: ExitCode: 0
Sat Mar  9 10:35:02 2013 [LCM][I]: New VM state is BOOT
Sat Mar  9 10:35:03 2013 [VMM][I]: ExitCode: 0
Sat Mar  9 10:35:03 2013 [VMM][I]: Successfully execute network driver operation: pre.
Sat Mar  9 10:35:07 2013 [VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/restore /one_images_3.8.1/0/43/checkpoint node152 43 node152
Sat Mar  9 10:35:07 2013 [VMM][E]: restore: Command "virsh --connect qemu:///system restore /one_images_3.8.1/0/43/checkpoint" failed: error: Failed to restore domain from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 10:35:07 2013 [VMM][I]: error: operation failed: failed to retrieve chardev info in qemu with 'info chardev'
Sat Mar  9 10:35:07 2013 [VMM][E]: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 10:35:07 2013 [VMM][I]: ExitCode: 1
Sat Mar  9 10:35:07 2013 [VMM][I]: Failed to execute virtualization driver operation: restore.
Sat Mar  9 10:35:07 2013 [VMM][E]: Error restoring VM: Could not restore from /one_images_3.8.1/0/43/checkpoint
Sat Mar  9 10:35:07 2013 [DiM][I]: New VM state is FAILED
登陆到node152节点执行restore命令:
[root@node152 43]# virsh restore checkpoint
error: Failed to restore domain from checkpoint
error: internal error process exited while connecting to monitor: qemu: could not open disk image /one_images_3.8.1/0/43/disk.0: Permission denied
将/etc/libvirt/qemu.conf文件中注释掉dynamic_ownership=0,开启user=root和group=root. 
如果开启dynamic_ownership则恢复虚拟机也会报出上面的错误信息。
在node152节点上恢复虚拟机:
[root@node152 43]# virsh restore checkpoint
Domain restored from checkpoint
[root@node152 43]# virsh list
 Id Name                 State
----------------------------------
117 one-43               running
参考文章:
https://wiki.archlinux.org/index.php/QEMU_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87)
http://hi.baidu.com/juacm/item/f1fc3f98d8428ad07a7f01e2

转载请保持原链接:http://www.blogjava.net/ldwblog/archive/2013/03/08/396187.html

你可能感兴趣的:((原)KVM环境下虚拟机迁移失败问题解决)