问题描述:
Cloudstack4.0集成KVM,可以正常添加主机,并且可以正常操作到ZONE启用,但是到系统VM启动的时候就开始报错,报异常。
`/mnt/xx': Invalid argument
Cloudstack Management:
/var/log/cloud/management/management-server.log
2013-08-14 03:09:09,161 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode 1993 2013-08-14 03:09:09,721 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers. 1994 2013-08-14 03:09:25,572 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) VmStatsCollector is running... 1995 2013-08-14 03:09:25,587 DEBUG [cloud.server.StatsCollector] (StatsCollector-3:null) StorageCollector is running... 1996 2013-08-14 03:09:25,589 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) HostStatsCollector is running... 1997 2013-08-14 03:09:39,160 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode 1998 2013-08-14 03:09:39,721 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers
2013-08-13 15:28:01,634 WARN [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Exception while trying to start console proxy 9023 com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to get answer that is of class com.cloud.agent.api.Star tAnswer 9024 at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:847) 9025 at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:472) 9026 at com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:465) 9027 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.startProxy(ConsoleProxyManagerImpl.java:627) 9028 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.allocCapacity(ConsoleProxyManagerImpl.java:1164) 9029 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:1981) 9030 at com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:173) 9031 at com.cloud.vm.SystemVmLoadScanner.loadScan(SystemVmLoadScanner.java:113) 9032 at com.cloud.vm.SystemVmLoadScanner.access$100(SystemVmLoadScanner.java:34) 9033 at com.cloud.vm.SystemVmLoadScanner$1.reallyRun(SystemVmLoadScanner.java:83) 9034 at com.cloud.vm.SystemVmLoadScanner$1.run(SystemVmLoadScanner.java:73) 9035 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 9036 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) 9037 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) 9038 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) 9039 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) 9040 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 9041 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 9042 at java.lang.Thread.run(Thread.java:679) 9043 Caused by: com.cloud.utils.exception.CloudRuntimeException: Unable to get answer that is of class com.cloud.agent.api.StartAnswer 9044 at com.cloud.agent.manager.Commands.getAnswer(Commands.java:80) 9045 at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:783) 9046 ... 19 more
KVM Host(Cloudstack Aent):
/var/log/cloud/agent/agent.log
com.cloud.utils.exception.CloudRuntimeException: org.libvirt.LibvirtException: cannot create path '/mnt/2c65613e-e5a3-3443-96c9-272fd60502ee/v-2-VM-patchdisk': Invalid argument at com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.createPhysicalDisk(LibvirtStorageAdaptor.java:556) at com.cloud.hypervisor.kvm.storage.LibvirtStoragePool.createPhysicalDisk(LibvirtStoragePool.java:101) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createPatchVbd(LibvirtComputingResource.java:2980) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVbd(LibvirtComputingResource.java:2943) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2808) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1035) at com.cloud.agent.Agent.processRequest(Agent.java:518) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:831) at com.cloud.utils.nio.Task.run(Task.java:83) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) 2013-08-13 17:41:17,886 WARN [cloud.agent.Agent] (agentRequest-Handler-2:null) Caught: java.lang.NullPointerException at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.cleanupVMNetworks(LibvirtComputingResource.java:3922) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.handleVmStartFailure(LibvirtComputingResource.java:2709) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:2834) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1035) at com.cloud.agent.Agent.processRequest(Agent.java:518) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:831)
问题分析:
这个问题的原因从CloudStack日志文件里面很难找出如何解决该问题的出口,这个问题从上面的日志分析,不管是从Cloudstack管理节点还是安装代理软件的KVM节点,问题大概出现在主存储上面,但是又并不是权限的问题。
查看存储节点NFS的配置文件:
[root@storage252 ~]# cat /etc/exports /primary *(rw,async,no_root_squash) /secondary *(rw,async,no_root_squash) [root@storage252 ~]# ll /primary/ /secondary/ -d drwxrwxrwx 3 root root 4096 Aug 14 09:09 /primary/ drwxrwxrwx 3 root root 4096 Aug 13 18:33 /secondary/ [root@storage252 ~]# service nfs status rpc.svcgssd is stopped rpc.mountd (pid 26157) is running... nfsd (pid 26222 26221 26220 26219 26218 26217 26216 26215) is running... rpc.rquotad (pid 26153) is running... [root@storage252 ~]# exportfs /primary <world> /secondary <world>
可以看出NFS服务器的配置文件跟导出的目录均没有问题。
手动挂载NFS导出的目录到KVM 主机上
[root@kvm01 ~]# showmount -e 192.168.150.252 Export list for 192.168.150.252: /secondary * /primary * [root@kvm01 ~]# mkdir /mnt/1 [root@kvm01 ~]# mkdir /mnt/2 [root@kvm01 ~]# mount -t nfs 192.168.150.252:/primary /mnt/1 [root@kvm01 ~]# mount -t nfs 192.168.150.252:/secondary /mnt/2 [root@kvm01 ~]# ll /mnt/ total 8 drwxrwxrwx. 3 nobody nobody 4096 Aug 14 09:09 1 drwxrwxrwx. 3 nobody nobody 4096 Aug 13 18:33 2
创建目录查看权限是否没有限制
[root@kvm01 ~]# touch /mnt/1/test1 [root@kvm01 ~]# touch /mnt/2/test1 [root@kvm01 ~]# ll /mnt/1/ total 1 -rw-r--r--. 1 nobody nobody 0 Aug 14 09:35 test1 [root@kvm01 ~]# ll /mnt/2/ total 1 -rw-r--r--. 1 nobody nobody 0 Aug 14 09:35 test1
可以看出KVM主机对主存储跟二级存储目录均有可写权限。而且日志中也没有显示Operation xxx的报错。
但是可以发现挂载到KVM 主机上面的NFS目录的属主跟属组都是nobody,而我们在NFS服务器上面设置了no_root_squash参数,表明客户端使用root用户创建的文件的权限也应该是root.root,而非nobody.nobody。
从而去查看两个节点的系统日志文件,/var/log/message
Cloudstack Management:
Aug 13 16:50:25 storage252 rpc.idmapd[19778]: nss_getpwnam: name '0' does not map into domain 'clovem.com' Aug 13 16:50:25 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com' Aug 13 16:55:54 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com' Aug 13 17:00:56 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com' Aug 13 17:06:24 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com' Aug 13 17:11:54 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com' Aug 13 17:17:24 storage252 rpc.idmapd[19778]: nss_getpwnam: name '[email protected]' does not map into domain 'clovem.com'
KVM Host(Cloudstack Aent):
Aug 13 15:23:35 kvm01 kernel: FS-Cache: Netfs 'nfs' registered for caching 2333 Aug 13 15:23:35 kvm01 nfsidmap[13080]: nss_getpwnam: name '[email protected]' does not map into domain 'sjcloud.cn' 2334 Aug 13 15:26:48 kvm01 kernel: NFS: v4 server 192.168.150.252 does not accept raw uid/gids. Reenabling the idmapper. 2335 Aug 13 15:37:22 kvm01 kernel: lo: Disabled Privacy Extensions 2336 Aug 13 15:40:33 kvm01 gnome-session[17824]: WARNING: GSIdleMonitor: IDLETIME counter not found 2337 Aug 13 15:40:33 kvm01 gnome-session[17824]: WARNING: Unable to determine session: Unable to lookup session information for process '17824'
问题解决
从上面的分析可以看出,问题出在两个节点的域不一样,导致在进行NFS映射的时候出现了问题。
查看两个节点的主机名:
[root@storage252 ~]# hostname --fqdn storage252.clovem.com [root@kvm01 ~]# hostname --fqdn kvm01.sjcloud.cn
将两个节点的域进行统一即可。
但是如果仅仅是玩NFS,跟Cloudstack无关的话,可以通过
[root@kvm01 ~]
# mount -t nfs -o vers=3 ip:/dir /localdir 即可
今天找到了最方面的解决方法:
在服务端/etc/exports文件下,指定导出目录的参数添加一个fsid=0参数即可
如:
/export_dir *(rw,fsid=1,async,no_root_squash)
/export_dir *(rw,fsid=2,async,no_root_squash)