1)What happens when a worker dies?
When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign the worker to another machine.
2)What happens when a node dies?
The tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.当一个节点挂了时候,该机器上的task会因为超时而被Nimbus重新分配给其他机器。
3)What happens when Nimbus or Supervisor daemons die?
The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk). As described in Setting up a Storm cluster, the Nimbus and Supervisor daemons must be run under supervision using a tool like daemontools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.
4)Is Nimbus a single point of failure?
所以说Nimbus可以认为是一个SPOF,但是并不会像hadoop JobTracker挂掉那样产生很严重的影响。
Storm provides mechanisms to guarantee data processing even if nodes die or messages are lost. See Guaranteeing message processing for the details.