Basic design principle 1:
Avoid using static host files as it leadsto inconsistency, which makes troubleshooting difficult.
避免使用静态主机文件来配置主机IP解析,这可能会导致以后调试更加困难。
l Basic design principle 2:
In blade environments, divide hosts overall blade chassis and never exceed four hosts per chassis to avoid having allprimary nodes in a single chassis.
在刀片环境中最好把HA主节点分散到不同的刀箱中,避免一个刀箱中部署所有主节点,以免该刀箱失效,导致整个HA集群的HA功能失效。
Ps:根据vCenter选举HA主节点的规则,最早添加大集群中的5台主机会被选为主节点,所以做好依次从不同刀箱选取主机添加到集群中。
l Basic design principle 3:
For network-based storage (iSCSI, NFS, FCoE)it is recommended (pre-vShpere 4.0 update 2) to set the isolation response to ”ShutDown” or “Power off”. It is also recommended to have a secondary ServiceConsole (ESX) or Management Network (ESXi) running on the same vSwitch as theStorage network to detect a storage outage and avoid false positives forisolation detection.
如果使用网络存储,对于4.0Update 2以前的系统最好把虚拟机隔离响应策略设置为“Shut Down”或“Power Off”,以避免“脑裂”。同时建议在用于网络存储的vSwitch上配置管理端口用于侦测网络存储失效,也可以避免隔离侦测误报。
l Basic design principle 4:
Keep das.failuredetectiontime low for fastresponses to failures. If an isolation validation address has been added, “das.isolationaddress”,add 5000 to the default “das.failuredetectiontime” (15000).
保持das.failuredetectiontime设置为一个较低的值(默认15000ms)以保证HA能够更快相应故障。如果在 “das.isolationaddress”中配置了额外的隔离侦测地址,请在das.failuredetectiontime默认值(15000ms)的基础上增加5000ms。
l Basic design principle 5:
Be really careful with reservations, ifthere’s no need to have them on a pervirtual machine basis’ don’t configurethem, especially when using Host Failures Cluster Tolerates. If reservationsare needed, resort to resource pool based reservations.
谨慎的设置虚拟机的资源保留,如果没有必要,就不要使用虚拟机级别的资源保留设置,特别是集群使用“Host Failures Cluster Tolerates”HA策略时。如果有必要设置资源保留,最好使用资源池级别的资源保留设置。
Ps:资源池级别的资源保留设置是全局的,并且不会影响vCenter对Slot数量的估算。虚拟机级别的资源保留设置将直接影响Slot数量的估算,即便单台VM资源保留设置过高,也会大大减少集群中可用Slot的数量。
l Basic design principle 6:
Avoid using advanced settings to decreasethe slot size as it could lead to more down time and adds an extra layer ofcomplexity. If there is a large discrepancy in size and reservations are set itmight help to put similar size virtual machines into their own cluster.
避免使用vCenter高级设置中的“das.slotCpuInMHz”、“das.slotMemInMB”强制指定Slot的大小,这将导致更长的服务中断时间以及增加系统复杂性。如果虚拟机的大小存在很多差异,并且需要设置资源保留,最好将大小相似的虚拟机置在相同的同一个集群中。
Ps:强制指定Slot大小,可能会导致在集群中产生资源碎片,这将可能导致导致HA在重启虚拟机时经过集群显示有足够的Slot数量,但任何单台主机上却没有足够的Slot来启动虚拟机。在这种情况下,HA需要请求DRS来进行碎片整理,这必然增加HA回复虚拟机所需的时间。同时vCenter并不保证资源碎片整理能够是某台主机获得足够的可用Slot数量,用以启动指定虚拟机。
l Basic design principle 7:
When using Admission Control, balance yourcluster and be conservative with reservations as it leads to decreased consolidationratios.
当启用HA的“接入控制”时,最好保证你的HA集群中主机性能的平衡,并且谨慎设置资源保留,否则该设置极有可能导致系统整合率的下降。
Ps:该准则只适用于“HostFailures Cluster Tolerates”接入控制策略,因为在非平衡的集群中,尽管某一台或几台主机拥有比其他主机更高的容量,但由于vCenter估算可用Slot数量时遵循“悲观原则”,因此这些较高容量的部分或全部主机可能被其算法排除在外,应此导致比你预想的更少的可以Slot数量。
l Basic design principle 8:
Although vSphere 4.1 will utilize DRS totry to accommodate for the resource requirements of this virtual machine aguarantee cannot be given. Do the math; verify that any single host has enoughresources to power-on you largest virtual machine. Also take restart priorityinto account for this/these virtual machine(s).
尽管vShpere 4.1可以请求DRS以满足重启虚拟机时的资源需求,当并不保证一定成功。应此详细的计算是必不可少的,确保你的集群有足够的资源启动你最大的虚拟机,同时启动优先级也需要考虑到计算中。
l Basic design principle 9:
Admission Control guarantees enoughcapacity is available for virtual machine failover. As such we recommendenabling it.
最好还是启用“接入控制”,以保证你集群有足够的资源进行HA。
l Basic design principle 10:
Do the math, and take customer requirementsinto account. We recommend using a “Percentage” based Admission Control Policy,as it is the most flexible policy.
还是那句话,详细计算资源容量。建议使用更为灵活的“Percentage of Cluster Resource Reserved”接入控制策略。
l Basic design principle 11:
VM Monitoring can substantially increaseavailability. It is part of the HA stack and we heavily recommend using it.
简单来讲,建议启用VMMonitoring功能,以增加系统可用性。
Reference:
"Sphere 4.1 HA and DRS Technical Deepdive", Duncan Epping & Frank Denneman, 2010
本文来自虚拟人网站