初探oVirt-使用小结FAQ
2016/11/15
【Q01】如何快速的部署 ovirt 环境 A:如下。 1、相关主机的防火墙内网互通。缓存ovirt的软件包到本地yum源,配置各节点使用本地yum源; 2、配置engine,不要选择自动配置防火墙; 3、在节点机上手动安装 vdsm 和 vdsm-cli ; 4、在engine的页面上新增host,不要选择自动配置防火墙。 【Q02】:执行virsh命令时,会提示需要用户验证(Please enter your authentication name),看错误提示似乎和配置vdsm服务后,使用了SASL有关系,怎么解决? A:使用工具“saslpasswd2 - set a user’s sasl password”来创建用户。 问题发生时是这样的: # virsh list Please enter your authentication name: Please enter your password: error: Failed to reconnect to the hypervisor error: no valid connection error: authentication failed: Failed to step SASL negotiation: -1 (SASL(-1): generic failure: All-whitespace username.) 我们来创建一个用户: # saslpasswd2 -a libvirt mYusernAme Password: mYpasswOrd Again (for verification): mYpasswOrd 其中,-a 参数跟着 appname,这里我们需要指定的是libvirt服务 原因是:vdsm在加入ovirt时会使用sasl再次加密libvirt 再次测试: # virsh list Please enter your authentication name: mYusernAme Please enter your password: Id Name State ---------------------------------------------------- 1 tvm-test-template running 2 tvm-test-clone running 3 tvm-test-clone-from-snapshot running 4 testpool001 running 5 testpool007 running 6 testpool006 running 符合预期。 【Q03】:执行ovirt界面上的针对vm的重启操作,ovirt的web界面有提示状态的变更,,但vm的console看并未重启,这是怎么回事? A:vm里面没有安装agent,在linux下面是:ovirt-guest-agent 安装 ovirt-guest-agent 在vm上先安装ovirt-release35.rpm这个yum源。 # yum -y install http://plain.resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm # yum -y install ovirt-guest-agent 启动服务 # service ovirt-guest-agent start # chkconfig ovirt-guest-agent on 【Q04】:克隆VM时,磁盘等待很久还没准备就绪 A:状态:被克隆对象,附加了一个大容量磁盘(2T)。 检查所在host上运行的进程,找到qemu-img,检查是否卡死,手动结束。 【Q05】:使用glusterfs服务时,报错 “glusterfs: failed to get the 'volume file' from server” A:先检查gluster版本,保持一致。host启用gluster服务后安装的版本,根据ovirt的源来分析,可能是官网最新的版本。 默认ovirt在安装时,使用的是:ovirt-3.5-dependencies.repo,,当前会下载glusterfs/3.7 客户端手动安装官网的新版本: # wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-3.7.4-2.el6.x86_64.rpm # wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-libs-3.7.4-2.el6.x86_64.rpm # wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-client-xlators-3.7.4-2.el6.x86_64.rpm # wget http://download.gluster.org/pub/gluster/glusterfs/3.7/LATEST/CentOS/epel-6/x86_64/glusterfs-fuse-3.7.4-2.el6.x86_64.rpm # rpm -ivh *.rpm 【Q06】:不使用ovirt管理glusterfs,自己配置glusterfs,怎么做?数据域如何挂载,做了哪些优化工作? A: 首先,ovirt的优化做了如下工作: --- 优化后,配置将做如下调整: Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: enable nfs.disable: off performance.readdir-ahead: on --- 其次,集群内每个主机需要能正常解析gluster的节点名称->IP的映射(不仅是“新建域”时指定的那台主机需要配置hosts或者dns服务器的A记录) 再次,防火墙 示例为在ovirt中启用 gluster 服务后的防火墙配置: [root@n86 network-scripts]# cat /etc/sysconfig/iptables # oVirt default firewall configuration. Automatically generated by vdsm bootstrap script. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT # vdsm -A INPUT -p tcp --dport 54321 -j ACCEPT # rpc.statd -A INPUT -p tcp --dport 111 -j ACCEPT -A INPUT -p udp --dport 111 -j ACCEPT # SSH -A INPUT -p tcp --dport 22 -j ACCEPT # snmp -A INPUT -p udp --dport 161 -j ACCEPT # libvirt tls -A INPUT -p tcp --dport 16514 -j ACCEPT # guest consoles -A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT # migration -A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT # glusterd -A INPUT -p tcp -m tcp --dport 24007 -j ACCEPT # gluster swift -A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT # portmapper -A INPUT -p tcp -m tcp --dport 38465 -j ACCEPT -A INPUT -p tcp -m tcp --dport 38466 -j ACCEPT # nfs -A INPUT -p tcp -m tcp --dport 38467 -j ACCEPT -A INPUT -p tcp -m tcp --dport 2049 -j ACCEPT -A INPUT -p tcp -m tcp --dport 38469 -j ACCEPT # nrpe -A INPUT -p tcp --dport 5666 -j ACCEPT # status -A INPUT -p tcp -m tcp --dport 39543 -j ACCEPT -A INPUT -p tcp -m tcp --dport 55863 -j ACCEPT # nlockmgr -A INPUT -p tcp -m tcp --dport 38468 -j ACCEPT -A INPUT -p udp -m udp --dport 963 -j ACCEPT -A INPUT -p tcp -m tcp --dport 965 -j ACCEPT # ctdbd -A INPUT -p tcp -m tcp --dport 4379 -j ACCEPT # smbd -A INPUT -p tcp -m tcp --dport 139 -j ACCEPT -A INPUT -p tcp -m tcp --dport 445 -j ACCEPT # Ports for gluster volume bricks (default 100 ports) -A INPUT -p tcp -m tcp --dport 24009:24108 -j ACCEPT -A INPUT -p tcp -m tcp --dport 49152:49251 -j ACCEPT # Reject any other input traffic -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited COMMIT 【手动配置gluster服务】 1)网卡配置(包括n86, n72, n73) 注:接入ovirt时,默认将自动建立一个网桥ovirtmgmt桥接到其中一个端口上(例如em1)。 em1 -> 10.50.200.0/24 em2+em3=bond1 -> br1 ->10.60.200.0/24 [root@n86 network-scripts]# cat ifcfg-em1 DEVICE=em1 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=none IPADDR=10.50.200.86 PREFIX=24 GATEWAY=10.50.200.1 [root@n86 network-scripts]# cat ifcfg-em2 DEVICE=em2 MASTER=bond1 SLAVE=yes ONBOOT=yes MTU=1500 NM_CONTROLLED=no [root@n86 network-scripts]# cat ifcfg-em3 DEVICE=em3 MASTER=bond1 SLAVE=yes ONBOOT=yes MTU=1500 NM_CONTROLLED=no [root@n86 network-scripts]# cat ifcfg-bond1 DEVICE=bond1 BONDING_OPTS='mode=5 miimon=100' BRIDGE=br1 ONBOOT=yes MTU=1500 NM_CONTROLLED=no HOTPLUG=no [root@n86 network-scripts]# cat ifcfg-br1 DEVICE=br1 TYPE=Bridge DELAY=0 STP=off ONBOOT=yes IPADDR=10.60.200.86 NETMASK=255.255.255.0 BOOTPROTO=none MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no 2)存储配置-glusterfs集群:n86, n72, n73,(示例提供3副本作为数据域) 【数据盘分区】 如果分区所在设备已经挂载,要先卸载并删掉现有系统。 yum install lvm2 xfsprogs -y pvcreate /dev/sdb vgcreate vg0 /dev/sdb lvcreate -l 100%FREE -n lv01 vg0 mkfs.xfs -f -i size=512 /dev/vg0/lv01 mkdir /data cat <<_EOF >>/etc/fstab UUID=$(blkid /dev/vg0/lv01 |cut -d'"' -f2) /data xfs defaults 0 0 _EOF mount -a # df -h |grep data /dev/mapper/vg0-lv01 16T 33M 16T 1% /data 【配置服务】 [root@n86 ~]# yum install glusterfs-server [root@n86 ~]# service glusterd start [root@n86 ~]# chkconfig glusterd on 【配置集群】 [root@n86 ~]# gluster peer probe 10.60.200.72 [root@n86 ~]# gluster peer probe 10.60.200.73 每台集群节点上建立目录 [root@n86 ~]# mkdir /data/gv1/brick1 -p 【提供data域】 创建卷gv1作为主数据域: [root@n86 ~]# gluster volume create gv1 replica 3 transport tcp \ 10.60.200.86:/data/gv1/brick1 \ 10.60.200.72:/data/gv1/brick1 \ 10.60.200.73:/data/gv1/brick1 【启动】 [root@n86 ~]# gluster volume start gv1 【查看现状】 [root@n86 ~]# gluster volume info Volume Name: gv1 Type: Replicate Volume ID: 32b1866c-1743-4dd9-9429-6ecfdfa168a2 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.60.200.86:/data/gv1/brick1 Brick2: 10.60.200.72:/data/gv1/brick1 Brick3: 10.60.200.73:/data/gv1/brick1 ---配置卷,以gv1为例: gluster volume set gv1 diagnostics.count-fop-hits on gluster volume set gv1 diagnostics.latency-measurement on gluster volume set gv1 storage.owner-gid 36 gluster volume set gv1 storage.owner-uid 36 gluster volume set gv1 cluster.server-quorum-type server gluster volume set gv1 cluster.quorum-type auto gluster volume set gv1 network.remote-dio enable gluster volume set gv1 cluster.eager-lock enable gluster volume set gv1 performance.stat-prefetch off gluster volume set gv1 performance.io-cache off gluster volume set gv1 performance.read-ahead off gluster volume set gv1 performance.quick-read off gluster volume set gv1 auth.allow \* gluster volume set gv1 user.cifs enable gluster volume set gv1 nfs.disable off ---配置卷 在1台节点上挂载卷gv1测试 [root@n93 ~]# mount -t glusterfs 10.60.200.86:/gv1 /mnt [root@n93 ~]# df -h /mnt Filesystem Size Used Avail Use% Mounted on 10.60.200.86:/gv1 16T 39M 16T 1% /mnt 3)配置存储(Storage) 【数据域】 在ovirt上配置页面: “新建域” 名称:data-gv1 域功能:DATA/GlusterFS 使用主机:随便选择一台 路径:10.50.200.72:/gv1 挂载选项:backupvolfile-server=10.50.200.73,backupvolfile-server=10.50.200.86 在ovirt上配置页面: “新建域” 名称:data-gv1-bak 域功能:DATA/NFS 使用主机:随便选择一台 路径:10.60.200.93:/data/ovirt/data 【iso域】 在ovirt上配置页面: “新建域” 名称:iso 域功能:ISO/NFS 使用主机:随便选择一台 路径:10.60.200.93:/data/ovirt/iso 【导出域】 在ovirt上配置页面: “新建域” 名称:export 域功能:EXPORT/NFS 使用主机:随便选择一台 路径:10.60.200.93:/data/ovirt/export 【Q07】:提示“执行动作 添加存储连接 时出错: 试图挂载目标时出现问题” A:注意:填写“路径”时,注意末尾不要出现空格,否则会失败,通过查看挂载节点上/var/log/vdsm.log,可以分析原因,例如,日志显示: Storage.StorageServer.MountConnection::(connect) Mount failed: (32, ';mount.nfs: access denied by server while mounting 10.50.200. 93:/data/ovirt/iso \n') 【错误】 路径:10.50.200.93:/data/ovirt/iso [iso后边跟着一个空格] 【正确】 路径:10.50.200.93:/data/ovirt/iso[iso后边没空格] 假设已经挂载完毕ISO,我们需要增加OS进来,这里有个小技巧: 查看iso所在的NFS服务器(10.50.200.93)的路径 # pwd /data/ovirt/iso/62a1b5e0-730f-47db-8057-3ed0fda7b83a/p_w_picpaths/11111111-1111-1111-1111-111111111111 我们可以直接cd到这个目录下,将OS文件上传到这里,修改权限 # chown -R 36:36 . 回到web端,查看iso域的映像即可。 【Q08】配置 ovirt-hosted-engine-setup 时,各种报错,怎么解决? A: 1)防火墙相关 如果你把DNS这类服务也是用同一台物理机提供服务,则在配置vm和把host加入cluster时,防火墙的配置会被更新,因而影响到DNS解析。 ##################配置vm时################## 【(1) Continue setup - VM installation is complete】 防火墙配置已经被更新为仅允许ssh和vnc服务通过 [root@n93 ~]# cat /etc/sysconfig/iptables # Generated by ovirt-hosted-engine-setup installer #filtering rules *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -i lo -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type any -j ACCEPT -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 5900 -j ACCEPT -A INPUT -p udp -m state --state NEW -m udp --dport 5900 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 5901 -j ACCEPT -A INPUT -p udp -m state --state NEW -m udp --dport 5901 -j ACCEPT #drop all rule -A INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT ##################把host加入cluster时################## 【Enter the name of the cluster to which you want to add the host (Default) [Default]: 】 防火墙被更新为:运行vdsm相关服务通过。 [root@n93 ~]# cat /etc/sysconfig/iptables # oVirt default firewall configuration. Automatically generated by vdsm bootstrap script. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT # vdsm -A INPUT -p tcp --dport 54321 -j ACCEPT # rpc.statd -A INPUT -p tcp --dport 111 -j ACCEPT -A INPUT -p udp --dport 111 -j ACCEPT # SSH -A INPUT -p tcp --dport 22 -j ACCEPT # snmp -A INPUT -p udp --dport 161 -j ACCEPT # libvirt tls -A INPUT -p tcp --dport 16514 -j ACCEPT # guest consoles -A INPUT -p tcp -m multiport --dports 5900:6923 -j ACCEPT # migration -A INPUT -p tcp -m multiport --dports 49152:49216 -j ACCEPT # Reject any other input traffic -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -m physdev ! --physdev-is-bridged -j REJECT --reject-with icmp-host-prohibited COMMIT 解决方法:使用其他的主机来提供DNS和NFS服务。 2)DNS相关 ##################把host加入cluster时################## 报错: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Host address must be a FQDN or a valid IP address [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default 上面这个异常的可能原因是:是通过DNS服务器而不是/etc/hosts来解析主机名,而我们使用的是hosts配置,因此无法解析。 查看日志,得到的是400错误。 2015-09-28 15:28:59 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND Enter the name of the cluster to which you want to add the host (Default) [Default]: 2015-09-28 15:29:34 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:626 Adding the host to the cluster 2015-09-28 15:29:36 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:654 Cannot add the host to cluster Default Traceback (most recent call last): File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 645, in _closeup otopicons.NetEnv.IPTABLES_ENABLE File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/brokers.py", line 13280, in add headers={"Expect":expect, "Correlation-Id":correlation_id} File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 88, in add return self.request('POST', url, body, headers) File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 118, in request persistent_auth=self._persistent_auth) File "/usr/lib/python2.6/site-packages/ovirtsdk/infrastructure/proxy.py", line 146, in __doRequest persistent_auth=persistent_auth File "/usr/lib/python2.6/site-packages/ovirtsdk/web/connection.py", line 134, in doRequest raise RequestError, response RequestError: status: 400 reason: Bad Request detail: Host address must be a FQDN or a valid IP address 2015-09-28 15:29:36 ERROR otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:662 Cannot automatically add the host to cluster Default: Host address must be a FQDN or a valid IP address 2015-09-28 15:29:36 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 669, in _closeup cluster=cluster_name, RuntimeError: Cannot add the host to cluster Default 2015-09-28 15:29:36 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Cannot add the host to cluster Default 猜想:在把host加入cluster时,使用的是主机名,而这个主机名,需要通过DNS服务来提供解析。 ##################把host加入cluster时################## 报错: [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. 结合布置步骤: 找到配置这个字符串的地方:vm_hosted_e01 说明,这个位置配置的是在web界面中配置主机时,对应的“名称”字段。 【配置步骤】 (略) Please specify an alias for the Hosted Engine p_w_picpath [hosted_engine]: Enter the name which will be used to identify this host inside the Administrator Portal [hosted_engine_1]: (略) [ INFO ] Stage: Setup validation --== CONFIGURATION PREVIEW ==-- Engine FQDN : e01.test Bridge name : ovirtmgmt SSH daemon port : 22 Firewall manager : iptables Gateway address : 10.50.200.1 Host name for web application : hosted_engine_1 Host ID : 1 Image alias : hosted_engine Image size GB : 40 Storage connection : 10.50.200.93:/data/ovirt/p_w_picpaths Console type : vnc Memory size MB : 8192 MAC address : 00:16:3e:7b:18:b9 Boot type : cdrom Number of CPUs : 4 ISO p_w_picpath (for cdrom boot) : /data/ovirt/iso/CentOS-6.5-x86_64-bin-DVD1.iso CPU Type : model_SandyBridge Please confirm installation settings (Yes, No)[Yes]: (略) The VM has been started. Install the OS and shut down or reboot it. To continue please make a selection: (1) Continue setup - VM installation is complete (2) Reboot the VM and restart installation (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: (略) The VM has been started. Install the OS and shut down or reboot it. To continue please make a selection: (1) Continue setup - VM installation is complete (2) Reboot the VM and restart installation (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: Waiting for VM to shut down... [ INFO ] Creating VM (略) Please install and setup the engine in the VM. You may also be interested in installing ovirt-guest-agent-common package in the VM. To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add hosted_engine_1 to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. 查看日志: 2015-09-29 05:05:47,858 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Command TimeBoundPollVDSCommand(HostName = hosted_engine_1, HostId = 54878c22-956f-4102-91c9-f9b15e467814) execution failed. Exception: VDSNetworkException: VDSGenericException: VDSNetworkException: Timeout during xml-rpc call 2015-09-29 05:05:47,860 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.TimeBoundPollVDSCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Timeout waiting for VDSM response. java.util.concurrent.TimeoutException 2015-09-29 05:05:47,867 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [5dad09df] Host installation failed for host 54878c22-956f-4102-91c9-f9b15e467814, hosted_engine_1.: org.ovirt.engine.core.bll.VdsCommand$VdsInstallException: Host not reachable 看到没,,主机不可达。。我们配置的主机“名称”,应该是能解析才对。 结论:推荐配置一个独立的,不受影响的DNS服务器,为集群提供域名解析。 先说下,,我之前报错是在同一台主机A上配置了DNS服务,然后用工具ovirt-hosted-engine-setup 一步步配置的 后来争取的操作是在主机B上配置了DNS服务,其他没变,最终顺利的配置成功。 我观察到,主机A在配置过程中会更新2次防火墙,,可能会有所影响。 第1次是:安装vm前后 第2次是:安装engine前后 【Q09】删除 集群 时,使用强制删除,,还有3个host遗留下来,再删除时,提示无法删除,“No up server in cluster”,怎么解决? A:正确的删除方法是,在host处于正常状态时,先删除所有的host,仅留下最后一个host。同理,遇到上述问题,可以先试图激活其中一个host,然后删除另外2个host,最后再删除这个host即可。 【Q10】如果安装失败,请检查日志,如果发现是某些安装包安装失败,提示类似:2015-11-04 17:13:42 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:97 Yum [u'4:perl-libs-5.10.1-136.el6.i686 requires perl = 4:5.10.1-136.el6'] A:可以在节点机上手动安装 vdsm 和 vdsm-cli 测试下,如果确认是类似这样的异常: Error: Package: 4:perl-libs-5.10.1-136.el6.i686 (base) Requires: perl = 4:5.10.1-136.el6 Installed: 4:perl-5.10.1-141.el6.x86_64 (@base) perl = 4:5.10.1-141.el6 Available: 4:perl-5.10.1-136.el6.x86_64 (base) perl = 4:5.10.1-136.el6 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest 请做软件包的降级操作: # yum downgrade perl* 【Q11】如何更改ovirt-engine的管理员admin的密码 A:使用工具:ovirt-aaa-jdbc-tool [root@e01 ~]# ovirt-aaa-jdbc-tool user password-reset admin Password: updating user admin... user updated successfully 参考: http://www.ovirt.org/Features/AAA_JDBC#Password_management 【Q12】ovirt-engine怎么接入域 A:测试接入现有办公网的AD [root@engine ~]# engine-manage-domains add --provider=ad --domain=test.org --user=ovirtmgr Enter password: The domain test.org has been added to the engine as an authentication source but no users from that domain have been granted permissions within the oVirt Manager. Users from this domain can be granted permissions by editing the domain using action edit and specifying --add-permissions or from the Web administration interface logging in as admin@internal user. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart). Manage Domains completed successfully 提示,增加域用户后要重启engine,增加时,也可以用参数“--add-permissions”来继承系统权限,当然,后面也可以编辑 [root@engine ~]# service ovirt-engine restart 列出域: [root@engine ~]# engine-manage-domains list Domain: test.org User name: [email protected] Manage Domains completed successfully 编辑权限: [root@engine ~]# engine-manage-domains edit --provider=ad --domain=test.org --user=ovirtmgr --add-permissions Enter password: Successfully added domain test.org. oVirt Engine restart is required in order for the changes to take place (service ovirt-engine restart). Manage Domains completed successfully [root@engine ~]# service ovirt-engine restart 登录到ovirt查看这个用户,是 SuperUser 角色,和admin@internal一样。 【Q13】登录ovirt页面遇到错误:无法登录。用户帐号被禁用或锁住,请联系系统管理员。 (Cannot Login. User Account is Disabled or Locked, Please contact your system administrator.) A:显而易见,,用户被锁定了,可能是输入了3次错误的密码导致的。 解锁即可: [root@e01 ~]# ovirt-aaa-jdbc-tool user unlock admin updating user admin... user updated successfully 【Q14】host1下线后该host上的虚拟机处于问号(?,未知的,unknown)状态,无法迁移到host2上,怎么解决? A:选择 已下线的节点host1 的右键菜单:“确认主机已经重启” 根据提示:在没有正确手动重启的主机上执行这个操作可能会导致虚拟机在多个主机上启动时存储损坏。 确认操作即可。 结果:符合预期。虚拟机自动迁移到 host2 上。 注:若 host1 可能只是网络故障,并非处于重启或者关机的状态,则在修复上线前,建议先重启一次。 【Q15】如何配置邮件告警 A:使用ovirt-engine-notifier服务来根据指定是事件发出邮件通知。 1)配置服务 [root@engine ~]# vim /usr/share/ovirt-engine/services/ovirt-engine-notifier/ovirt-engine-notifier.conf MAIL_SERVER=smtp.xxx.com MAIL_PORT=465 [email protected] MAIL_PASSWORD=xxxx MAIL_SMTP_ENCRYPTION=ssl HTML_MESSAGE_FORMAT=true [email protected] [root@engine ~]# chkconfig ovirt-engine-notifier on [root@engine ~]# service ovirt-engine-notifier start 2)配置用户 在 ovirt-engine 页面选择:“系统”-“用户” 选定用户(admin),在下方的菜单中选择:“事件通知器”-“管理事件” 选定需要告警的事件,配置邮件接收者。 重启一下服务: [root@engine ~]# service ovirt-engine-notifier restart 3)测试 迁移一个vm,观察延迟1-3分钟,将收到邮件。 查看日志: [root@engine ~]# tail /var/log/ovirt-engine/notifier/notifier.log 2015-12-24 10:40:57,692 INFO [org.ovirt.engine.core.notifier.EngineMonitorService initServerUrl] Engine health servlet URL is "http://e01.test:80/ovirt-engine/services/health". 2015-12-24 10:43:28,813 INFO [org.ovirt.engine.core.notifier.transport.smtp.Smtp idle] Send mail subject='alertMessage (e01.test), [Migration started (VM: tttttt, Source: n34.test, Desti nation: n33.test, User: admin@internal).]' to='[email protected]' 2015-12-24 10:43:31,090 INFO [org.ovirt.engine.core.notifier.transport.smtp.Smtp idle] Send mail subject='resolveMessage (e01.test), [Migration completed (VM: tttttt, Source: n34.test, D estination: n33.test, Duration: 1 minute 12 seconds, Total: 1 minute 12 seconds, Actual downtime: (N/A))]' to='[email protected]' 接收到的邮件1: 主题:alertMessage (e01.test), [Migration started (VM: tttttt,Source: n34.test, Destination: n33.test, User: admin@internal).] 发件人:xxx 时 间:2015年12月24日(星期四) 上午10:43 收件人: xxx 内容: Time:2015-12-24 10:41:44.999 Message:Migration started (VM: tttttt, Source: n34.test, Destination: n33.test, User: admin@internal). Severity:NORMAL User Name: admin@internal VM Name: tttttt Host Name: n34.test Template Name: tpl-m1 Data Center Name: SZ 接收到的邮件2: 发件人:xxx 时 间:2015年12月24日(星期四) 上午10:43 收件人: xxx 主题: resolveMessage (e01.test), [Migration completed (VM: tttttt,Source: n34.test, Destination: n33.test, Duration: 1 minute 12 seconds,Total: 1 minute 12 seconds, Actual downtime: (N/A))] 内容: Time:2015-12-24 10:42:57.125 Message:Migration completed (VM: tttttt, Source: n34.test, Destination: n33.test, Duration: 1 minute 12 seconds, Total: 1 minute 12 seconds, Actual downtime: (N/A)) Severity:NORMAL User Name: admin@internal VM Name: tttttt Host Name: n34.test Template Name: tpl-m1 Data Center Name: SZ 【Q16】如何升级版本 A:根据官网文档来操作即可。值得注意的地方是在3.5 -> 3.6 的升级过程中需要注意。 在版本为3.6的engine上,建立了数据中心,默认是兼容3.6的,,如果创建3.5的集群,会遇到错误提示兼容性问题。 el6的系统只支持到3.5版本的vdsm主机,如果需要3.6版本的vdsm主机,则需要el7的系统,因为对应的3.6版本的vdsm相关的rpm包仅存在于官方yum源的el7目录中。 http://resources.ovirt.org/pub/ovirt-3.6/rpm/el7/noarch/ 参考官网的系统需求: http://www.ovirt.org/Download ----------------------------------------------------- Minimum Hardware/Software 4 GB memory 20 GB disk space Optional Hardware Network storage Recommended browsers Latest Mozilla Firefox Latest Google Chrome IE10 and above Supported Manager Fedora 22 (3.6 only) CentOS Linux 6.7, 7.2 Red Hat Enterprise Linux 6.7, 7.2 Scientific Linux 6.7, 7.2 Supported Hosts Fedora 21, 22 CentOS Linux 6.7 (3.5 only), 7.2 Red Hat Enterprise Linux 6.7 (3.5 only), 7.2 Scientific Linux 6.7 (3.5 only), 7.2 ----------------------------------------------------- 【Q17】host xxx did no satisfy internal filter Memory because its swap value was illegal. A:细节在邮件列表中找这个thead: ovirt users mailing list: http://lists.ovirt.org/pipermail/users/2015-December/036891.html 【Q18】如何将系统封装成模版 A: ----------------------- centos6 ------------------------------ 1)配置epel源 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-6.repo yum makecache 2)配置 ovirt-guest-agent yum -y install http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm yum -y install ovirt-guest-agent service ovirt-guest-agent start chkconfig ovirt-guest-agent on 重启vm后查看效果:符合预期。 3)配置 cloud-init yum -y install cloud-init echo 'datasource_list: ["NoCloud", "ConfigDrive"]' >>/etc/cloud/cloud.cfg 关闭vm后验证:符合预期。 4)手动清理在创建虚拟机时可能导致冲突的配置 --- 清理cloud-init --- rm /var/lib/cloud -fr --- 清理hostname --- cat <<'_EOF' >/etc/sysconfig/network NETWORKING=yes HOSTNAME=localhost.localdomain _EOF --- 清理网卡相关 --- sed -i -e '/UUID/d' -e '/HWADDR/d' -e '/ONBOOT/d' -e '/BOOTPROTO/d' \ -e '/IPADDR/d' -e '/NETMASK/d' -e '/GATEWAY/d' \ -e '/TYPE=Ethernet/a\ONBOOT=no\nBOOTPROTO=dhcp' /etc/sysconfig/network-scripts/ifcfg-eth* --- 清理ssh相关 --- rm -f /etc/ssh/ssh_host_* rm /root/.ssh -fr --- 清理log --- find /var/log -type f -delete find /root -type f ! -name ".*" -delete --- 最后一步 --- (注:此处可以直接执行 sys-unconfig,这个工具除了清理udev,还将在下次启动时,启动几个服务,例如密码,网络,时间等配置,具体可以参考man的解释。由于本人不想重置root密码和其他服务,将采取下述操作来收尾) --- 清理 udev 和history --- rm /etc/udev/rules.d/*-persistent-*.rules -f echo >~/.bash_history history -c --- 关机 --- # poweroff ----------------------- centos7 ------------------------------ 1)配置epel源 mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo yum makecache 2)安装 ovirt-guest-agent yum -y install ovirt-guest-agent systemctl start ovirt-guest-agent.service systemctl enable ovirt-guest-agent.service 3)安装 cloud-init yum -y install cloud-init echo 'datasource_list: ["NoCloud", "ConfigDrive"]' >>/etc/cloud/cloud.cfg 4)手动清理在创建虚拟机时可能导致冲突的配置 --- 清理cloud-init --- rm /var/lib/cloud -fr --- 清理hostname --- cat <<'_EOF' >/etc/hostname localhost.localdomain _EOF --- 清理网卡相关 --- sed -i -e '/UUID/d' -e '/ONBOOT/d' -e '/BOOTPROTO/d' -e '/IPADDR/d' -e '/NETMASK/d' -e '/GATEWAY/d' \ -e '/TYPE=Ethernet/a\ONBOOT=no\nBOOTPROTO=dhcp' /etc/sysconfig/network-scripts/ifcfg-eth* --- 清理ssh相关 --- rm -f /etc/ssh/ssh_host_* /root/.ssh/* --- 清理log --- rm -f /root/anaconda-ks.cfg find /var/log -type f -delete --- 最后一步 --- (注:此处可以直接执行 sys-unconfig,这个工具除了清理udev,还将在下次启动时,启动几个服务,例如密码,网络,时间等配置,具体可以参考man的解释。由于本人不想重置root密码和其他服务,将采取下述操作来收尾) --- 清理 udev 和history --- rm /etc/udev/rules.d/*-persistent-*.rules -f echo >~/.bash_history history -c --- 关机 --- # poweroff 【Q19】遇到心跳超时的警报 Heartbeat exeeded A: engine页面事件记录: 2016-1-6 上午11:31:33 Host n33.test power management was verified successfully. 2016-1-6 上午11:31:33 Status of host n33.test was set to Up. 2016-1-6 上午11:31:33 Executing power management status on Host n33.test using Proxy Host n34.test and Fence Agent ipmilan:10.50.200.43. 2016-1-6 上午11:31:30 Invalid status on Data Center SZ. Setting Data Center status to Non Responsive (On host n33.test, Error: Network error during communication with the Host.). 2016-1-6 上午11:31:30 Host n33.test is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued. 2016-1-6 上午11:31:30 VDSM n33.test command failed: Heartbeat exeeded engine.log里面记录的日志内容也一致表明是engine检测node时,心跳超时。 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (org.ovirt.thread.pool-8-thread-27) [] Host 'n33.test' is not responding. It will stay in Connecting state for a grace period of 80 seconds and after that an attempt to fence the host will be issued. ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand] (DefaultQuartzScheduler_Worker-25) [] Command 'ListVDSCommand(HostName = n33.test, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='999be037-0298-4506-afb6-665b6f00db2e', vds='Host[n33.test,999be037-0298-4506-afb6-665b6f00db2e]'})' execution failed: VDSGenericException: VDSNetworkException: Heartbeat exeeded 尝试调整心跳超时的间隔: [root@e01 ~]# engine-config -s vdsHeartbeatInSeconds=20 [root@e01 ~]# service ovirt-engine restart 未能解决,继续收到警报。 突然发现,,engine和node的时间不一致,差距有3-5分钟,检查ntp服务器,手动执行同步命令失效,判断是ntp服务器异常。 ---- ntpdate xxx 报错:no server suitable for synchronization found 判断是所有公网的ntp服务器无法正常请求到数据 以下的定义是让NTP Server和其自身保持同步,如果在/etc/ntp.conf中定义的server都不可用时,将使用local时间作为ntp服务提供给ntp客户端。 server 127.127.1.0 fudge 127.127.1.0 stratum 8 ---- 调整ntp服务器的配置,并同步时间后,问题解决。 【Q20】UI在新建存储等页面中,出现输入框处于无法输入的状态,怎么处理? A:使用IE浏览器试试,目前接触到的案例都是兼容性的问题。 【Q21】ovirt node 在 ××× 网络下出现异常表现。 A:建议分析这个配置文件的存在 rule-ovirtmgmt ,对网络带来的影响。 示例: 1、当前状态 [root@n33 network-scripts]# cat rule-ovirtmgmt # Generated by VDSM version 4.16.27-0.el6 from 10.50.200.0/24 table 3232235797 from all to 10.50.200.0/24 dev ovirtmgmt table 3232235797 对应 table 3232235797 的路由要结合这个配置文件来看: [root@n33 network-scripts]# cat route-ovirtmgmt # Generated by VDSM version 4.16.27-0.el6 0.0.0.0/0 via 10.50.200.1 dev ovirtmgmt table 3232235797 10.50.200.0/24 via 10.50.200.21 dev ovirtmgmt table 3232235797 [root@n33 network-scripts]# ip rule 0: from all lookup local 32764: from all to 10.50.200.0/24 iif ovirtmgmt lookup 3232235797 32765: from 10.50.200.0/24 lookup 3232235797 32766: from all lookup main 32767: from all lookup default 2、可能存在的现象:即使手动指定了静态路由,,实际上数据还是走的默认网关。 3、实例分析 --------------------------------------------------------------------------------------------------- 数据流向: 10.50.200.100/24(server) ->10.50.200.1/24(gateway) ->10.50.200.254/24(*** server/172.16.17.0) <-> 172.16.17.6(client) --------------------------------------------------------------------------------------------------- 1)【给 10.50.200.100 增加一条静态路由】 [root@n33 network-scripts]# ip route add 172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt [root@n33 network-scripts]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.50.200.0 0.0.0.0 255.255.255.0 U 0 0 0 ovirtmgmt 172.16.0.0 10.50.200.254 255.255.0.0 UG 0 0 0 ovirtmgmt 169.254.0.0 0.0.0.0 255.255.0.0 U 1064 0 0 ovirtwan 169.254.0.0 0.0.0.0 255.255.0.0 U 1065 0 0 ovirtmgmt 0.0.0.0 10.50.200.1 0.0.0.0 UG 0 0 0 ovirtmgmt 2)【在 client 端启动一个 http 服务来测试】 [on 172.16.17.6] python -m SimpleHTTPServer 11111 3)【测试】 [on 10.50.200.100] curl -I http://172.16.17.6:11111/`hostname` 4)【结果分析】 预期结果: [on 172.16.17.6] 10.50.200.100 - - [date-time] "HEAD /n33.test.com HTTP/1.1" 404 - 实际结果: [on 172.16.17.6] 10.50.200.1 - - [date-time] "HEAD /n33.test.com HTTP/1.1" 404 - 5)【测试解决办法】 a)删除rule ip rule del from 10.50.200.0/24 lookup 3232235797 b)增加rule ip rule add from 10.50.200.0/24 to 172.16.0.0/16 lookup main 6)【原因分析】 结果前述的文件 route-ovirtmgmt 和 rule-ovirtmgmt 来分析 ip rule 的规则中,第一列是 priority 的值,值小,则优先匹配,因此,实例中的请求逻辑是这样的: 10.50.200.100 -> 172.16.17.6 匹配路由:转发给10.50.200.254 匹配规则:32765,转发给 table 3232235797 查找table 3232235797对应的路由:转发给10.50.200.1 数据外发 7)【结论】 在新增静态路由后,对应的增加一条规则来约束指定的流量走指定的静态路由。 a)命令行 ip route add 172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt ip rule add from 10.50.200.0/24 to 172.16.0.0/16 lookup main b)配置 echo '172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt' >>route-ovirtmgmt echo 'from 10.50.200.0/24 to 172.16.0.0/16 table main' >>rule-ovirtmgmt 最终的配置文件: [root@n33 network-scripts]# cat route-ovirtmgmt # Generated by VDSM version 4.16.27-0.el6 0.0.0.0/0 via 10.50.200.1 dev ovirtmgmt table 3232235797 10.50.200.0/24 via 10.50.200.21 dev ovirtmgmt table 3232235797 172.16.0.0/16 via 10.50.200.254 dev ovirtmgmt [root@n33 network-scripts]# cat rule-ovirtmgmt # Generated by VDSM version 4.16.27-0.el6 from 10.50.200.0/24 table 3232235797 from all to 10.50.200.0/24 dev ovirtmgmt table 3232235797 from 10.50.200.0/24 to 172.16.0.0/16 table main