症状 (Heartbeat启动一段时间后自杀,进程消失):
5729 heartbeat[4864]: 2010/04/09_17:58:40 CRIT: Emergency Shutdown: Attempting to kill everything ourselves
5730 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBREAD process 5764 with signal 9
5731 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBWRITE process 5765 with signal 9
5732 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBREAD process 5766 with signal 9
5733 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBFIFO process 5757 with signal 9
5734 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBWRITE process 5761 with signal 9
5735 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBREAD process 5762 with signal 9
5736 heartbeat[4864]: 2010/04/09_17:58:40 info: killing HBWRITE process 5763 with signal 9
Heartbeat日志显示错误:
heartbeat[7341]: 2010/04/09_20:49:45 debug: displaying uuid table
heartbeat[7341]: 2010/04/09_20:49:45 debug: uuid=f33f33a8-80ff-459a-9e56-5f706e2e0f9b, name=ha-04
heartbeat[7341]: 2010/04/09_20:49:47 WARN: nodename ha-03 uuid changed to ha-04
heartbeat[7341]: 2010/04/09_20:49:47 debug: displaying uuid table
heartbeat[7341]: 2010/04/09_20:49:47 debug: uuid=f33f33a8-80ff-459a-9e56-5f706e2e0f9b, name=ha-03
heartbeat[7341]: 2010/04/09_20:49:47 ERROR: should_drop_message: attempted replay attack [ha-04]? [gen = 1270099870, curgen = 1270099871]
heartbeat[7341]: 2010/04/09_20:49:47 ERROR: should_drop_message: attempted replay attack [ha-04]? [gen = 1270099870, curgen = 1270099871]
heartbeat[7341]: 2010/04/09_20:49:47 ERROR: should_drop_message: attempted replay attack [ha-04]? [gen = 1270099870, curgen = 1270099871]
heartbeat[7341]: 2010/04/09_20:49:47 WARN: nodename ha-04 uuid changed to ha-03
错误原因: 使用Vmware拷贝造成UUID一致:
解决办法: rm /var/lib/heartbeat/hb_uuid
转自:
http://hi.baidu.com/baoping2007/item/5805c630d6ac19f5a9842856