openstack-instance-high-availability-Evacuate

This EtherPad was created to gather the requirements and other information for OpenStack Instance High Availability



So that a blueprint can be created.



_____________


I'm gonna do a re-structure of this document


I don't have time to work on this now, but I hope I have this done before the end of this weekend


The structure should probably be:


- introduction

- description of what is currently available in OpenStack

- description of what people might like to be able to do

- some nomenclature


General introduction of the pieces to handle failures:

- detection of a problem

- how to prevent split-brain situations

- fencing if it is a physical node

- evacuate/automatic startup


After which we can describe each of the points listed above, the different ways it can be done/handled, we add a list of the advantages and disadvantages of each approach


Hopefully followed by some conclusions and solutions


Followed by notes/comments


Followed by links


Hopefully the new structure makes it a lot easier to read and add new content


_____________





The biggest missing part is handling baremetal node failure, so the initial focus of this document at least started around that problem.




There are, broadly, two things that are sometimes called High Availability:

- handle planned failures / maintenance

- handle unplanned failures

Sometimes these are done using the same tools.



Instance High Availability depends on a baremetal instance/node.



Instance High Availability can be achieved with the following broad steps when a baremetal instance/node fails:

- detect a node failure

- fench off the failed baremetal node from the network and shared storage (a simple example is to use IPMI to power off the node as a form of STONITH)

- if possible for the fenching method: check if fencing worked as intended

- mark the node as down

- ask the scheduler to Evacuate the instances



There are basically 2 models to detect failure:


- use clusters of machines, for example with pacemaker/corosync, this lets other nodes in the cluster detect a node failure and fence the failed node

- use a more centralized method with ZooKeeper (which is already available in Nova) or monitoring



Advantages of a model with clusters:

- the existing pacemaker framework already has ways of dealing with split-brain situations


Advantages of a centralized model:

- simpler to deploy, no clusters are needed, it is probably more scalable

- because of security it would probably be best to go with a centralized method, so a Nova-node does not need to contain any passwords, keys, etc. for killing other nodes

- Heat for TripleO needs to be able to drive no-downtime (rolling canary-based) upgrades of services deployed by Heat. When restarting a nova-compute host, Heat would needs to move instances to other nodes. Being a "planned failure", this may or may not use the same approach. If an upgrade fails it need to be able to fence the failed node.

  Heat might also want to know about a failure so it will not automatically start to autoscale at the same time

- Instance HA can be made optional in Horizon -> should maybe be called: apply Heat restarted template ?

- OpenStack can provide an auto-start feature (basically the same thing) for starting the important instances back up when the whole datacenter has been shutdown (after some catastrophic failure or maintaince) Central to the long-term story of using OpenStack to deploy OpenStack on bare metal is the ability to have specific instances resume automatically after DC-wide power failure.


Disadvantages of a centralized model:

- there is currently no built-in monitoring in OpenStack

- there is not yet any fencing built into OpenStack so those would need to be created for the centralized model



The centralized model is probably a better fit for OpenStack/Nova-compute.


That does not mean operators models with clusters can't be used with OpenStack, but it does mean that all they get is the Evacuate API.



Fencing needs to support different implementations:

- IPMI or power distribution unit for STONITH

- use Quantum to 'disconnect' the failed node from the network

- use firewalling to 'disconnect' the failed node from the shared storage

- and many more


Maybe it should be possible to configure multiple implementations, so if IPMI does not work, apply the 2 other methodes


Before starting the instances on an other server it would be best to wait and check when/if the fencing is actually applied (is the node IPMI power state off ?)



The RedHat fence-agents are also available in other distributions like Debian/Ubuntu and could probably be used for doing some of the fencing




Can Heat be used to handle the different steps outlined at the top of this Etherpad ? Probably not on a single OpenStack installation, Heat deals with instances not with Nova nodes. Nova should take care of nodes.


But OpenStack is moving towards a model where TrippleO (OpenStack on OpenStack) is probably very common. Maybe even the "default" or an important reference implemetation. In that model, a baremetal Nova node, is just a baremetal instance. So Heat could be the one managing the failure process.


Heat could have a seperate nestable template which deals with restarting of failed baremetal instances. This template would include calling a fencing API.


Not everyone will be running TrippleO so a Nova Compute node isn't always going to be a baremetal instance of the under cloud. In that case they again could use the Evacuate API.




There has to be something in place which can deal with split-brain situations. In certain pacemaker setups 2 network connections are used  to prevent split-brain.

So if ZooKeeper is used for the monitoring: Can  we use multiple ZooKeeper installations ? So monitoring can happen in  different networks (like the public network, the storage network or  management network).


Heat can deal with multiple dependencies, Heat can watch multiple resources.


When Heat of the under cloud is busy dealing with the failure of the baremetal instance, the over cloud should wait before starting the instances again (if they are pets and are using shared storage). How would OpenStack handle that ?



So as a recap:


When a Heat-template is included with a restarter for a baremetal Nova node:


- it should watch for an alarm to be triggered (no response for some unit of time from Nova node X)

- mark the node as down (if not already handled by something else)

- it would fence the node

- check that the node has been fenced

- call the Evacuate APIThere are basically 2 models to detect failur



This is a list of some related information/blueprints/etc.:


https://blueprints.launchpad.net/nova/+spec/evacuate-host

https://blueprints.launchpad.net/nova/+spec/find-host-and-evacuate-instance

https://blueprints.launchpad.net/nova/+spec/unify-migrate-and-live-migrate

https://etherpad.openstack.org/HavanaUnifyMigrateAndLiveMigrate

https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling

https://blueprints.launchpad.net/nova/+spec/bare-metal-frmation/blueprints/etc.:ault-tolerance

http://openstacksummitapril2013.sched.org/event/92e3468e458c13616331e75f15685560#.UXUeVXyuiw4

https://blueprints.launchpad.net/nova/+spec/live-migration-scheduling

https://blueprints.launchpad.net/nova/+spec/evacuate-instance-automatically

https://blueprints.launchpad.net/nova/+spec/rhev-m-ovirt-clusters-as-compute-resources


[1] http://it20.info/2012/12/vcloud-openstack-pets-and-cattle/

你可能感兴趣的:(openstack-instance-high-availability-Evacuate)