摘录 国外vShpere设计准则(HA部分)

  Basic design principle 1:

Avoid using static host files as it leadsto inconsistency, which makes troubleshooting difficult.

         避免使用静态主机文件来配置主机IP解析,这可能会导致以后调试更加困难。

l  Basic design principle 2:

In blade environments, divide hosts overall blade chassis and never exceed four hosts per chassis to avoid having allprimary nodes in a single chassis.

         在刀片环境中最好把HA主节点分散到不同的刀箱中,避免一个刀箱中部署所有主节点,以免该刀箱失效,导致整个HA集群的HA功能失效。

         Ps:根据vCenter选举HA主节点的规则,最早添加大集群中的5台主机会被选为主节点,所以做好依次从不同刀箱选取主机添加到集群中。

l  Basic design principle 3:

For network-based storage (iSCSI, NFS, FCoE)it is recommended (pre-vShpere 4.0 update 2) to set the isolation response to ”ShutDown” or “Power off”. It is also recommended to have a secondary ServiceConsole (ESX) or Management Network (ESXi) running on the same vSwitch as theStorage network to detect a storage outage and avoid false positives forisolation detection.

         如果使用网络存储,对于4.0Update 2以前的系统最好把虚拟机隔离响应策略设置为“Shut Down”或“Power Off”,以避免“脑裂”。同时建议在用于网络存储的vSwitch上配置管理端口用于侦测网络存储失效,也可以避免隔离侦测误报。

l  Basic design principle 4:

Keep das.failuredetectiontime low for fastresponses to failures. If an isolation validation address has been added, “das.isolationaddress”,add 5000 to the default “das.failuredetectiontime” (15000).

保持das.failuredetectiontime设置为一个较低的值(默认15000ms)以保证HA能够更快相应故障。如果在 “das.isolationaddress”中配置了额外的隔离侦测地址,请在das.failuredetectiontime默认值(15000ms)的基础上增加5000ms。


 

l  Basic design principle 5:

Be really careful with reservations, ifthere’s no need to have them on a pervirtual machine basis’ don’t configurethem, especially when using Host Failures Cluster Tolerates. If reservationsare needed, resort to resource pool based reservations.

         谨慎的设置虚拟机的资源保留,如果没有必要,就不要使用虚拟机级别的资源保留设置,特别是集群使用“Host Failures Cluster Tolerates”HA策略时。如果有必要设置资源保留,最好使用资源池级别的资源保留设置。

         Ps:资源池级别的资源保留设置是全局的,并且不会影响vCenter对Slot数量的估算。虚拟机级别的资源保留设置将直接影响Slot数量的估算,即便单台VM资源保留设置过高,也会大大减少集群中可用Slot的数量。

l  Basic design principle 6:

Avoid using advanced settings to decreasethe slot size as it could lead to more down time and adds an extra layer ofcomplexity. If there is a large discrepancy in size and reservations are set itmight help to put similar size virtual machines into their own cluster.

避免使用vCenter高级设置中的“das.slotCpuInMHz”、“das.slotMemInMB”强制指定Slot的大小,这将导致更长的服务中断时间以及增加系统复杂性。如果虚拟机的大小存在很多差异,并且需要设置资源保留,最好将大小相似的虚拟机置在相同的同一个集群中。

Ps:强制指定Slot大小,可能会导致在集群中产生资源碎片,这将可能导致导致HA在重启虚拟机时经过集群显示有足够的Slot数量,但任何单台主机上却没有足够的Slot来启动虚拟机。在这种情况下,HA需要请求DRS来进行碎片整理,这必然增加HA回复虚拟机所需的时间。同时vCenter并不保证资源碎片整理能够是某台主机获得足够的可用Slot数量,用以启动指定虚拟机。

l  Basic design principle 7:

When using Admission Control, balance yourcluster and be conservative with reservations as it leads to decreased consolidationratios.

当启用HA的“接入控制”时,最好保证你的HA集群中主机性能的平衡,并且谨慎设置资源保留,否则该设置极有可能导致系统整合率的下降。

         Ps:该准则只适用于“HostFailures Cluster Tolerates”接入控制策略,因为在非平衡的集群中,尽管某一台或几台主机拥有比其他主机更高的容量,但由于vCenter估算可用Slot数量时遵循“悲观原则”,因此这些较高容量的部分或全部主机可能被其算法排除在外,应此导致比你预想的更少的可以Slot数量。

l  Basic design principle 8:

Although vSphere 4.1 will utilize DRS totry to accommodate for the resource requirements of this virtual machine aguarantee cannot be given. Do the math; verify that any single host has enoughresources to power-on you largest virtual machine. Also take restart priorityinto account for this/these virtual machine(s).

         尽管vShpere 4.1可以请求DRS以满足重启虚拟机时的资源需求,当并不保证一定成功。应此详细的计算是必不可少的,确保你的集群有足够的资源启动你最大的虚拟机,同时启动优先级也需要考虑到计算中。

l  Basic design principle 9:

Admission Control guarantees enoughcapacity is available for virtual machine failover. As such we recommendenabling it.

         最好还是启用“接入控制”,以保证你集群有足够的资源进行HA。

l  Basic design principle 10:

Do the math, and take customer requirementsinto account. We recommend using a “Percentage” based Admission Control Policy,as it is the most flexible policy.

         还是那句话,详细计算资源容量。建议使用更为灵活的“Percentage of Cluster Resource Reserved”接入控制策略。

l  Basic design principle 11:

VM Monitoring can substantially increaseavailability. It is part of the HA stack and we heavily recommend using it.

         简单来讲,建议启用VMMonitoring功能,以增加系统可用性。


Reference:
"Sphere 4.1 HA and DRS Technical Deepdive", Duncan Epping & Frank Denneman, 2010

                                                                                        本文来自虚拟人网站

你可能感兴趣的:(vmware,职场,休闲)