当mnesia集群出现网络分区(network_partition)时,各自的分区可能会写入不同的数据,从而出现数据不一致的现象,当网络分区恢复后,mnesia会上报一个inconsistent_database的系统事件,并且数据仍然处于不一致的状态。这里简单分析下inconsistent_database是怎样产生的。
1. mnesia怎样感知其他节点的启停状态
在mnesia应用中,mnesia_monitor进程负责对集群中节点的连接状态进行监控,进程在初始化时调用net_kernel:monitor_nodes(true)对节点的状态进行订阅,当有节点连接或者断开时,mnesia_monitor进程会收到相应的消息。
handle_call(init, _From, State) -> net_kernel:monitor_nodes(true), EarlyNodes = State#state.early_connects, State2 = State#state{tm_started = true}, {reply, EarlyNodes, State2}; handle_info({nodeup,Node}, State) -> ... handle_info({nodedown, _Node}, State) -> ...
另外,mneisa集群的节点间会进行协商,协商完成后,彼此的mnesia_monitor进程会相互link。
%% From remote monitor.. handle_call({negotiate_protocol, Mon, Version, Protocols}, From, State) when node(Mon) /= node() -> Protocol = protocol_version(), MyVersion = mnesia:system_info(version), case lists:member(Protocol, Protocols) of true -> accept_protocol(Mon, MyVersion, Protocol, From, State); false -> %% in this release we should be able to handle the previous %% protocol case hd(Protocols) of ?previous_protocol_version -> accept_protocol(Mon, MyVersion, ?previous_protocol_version, From, State); {7,6} -> accept_protocol(Mon, MyVersion, {7,6}, From, State); _ -> verbose("Connection with ~p rejected. " "version = ~p, protocols = ~p, " "expected version = ~p, expected protocol = ~p~n", [node(Mon), Version, Protocols, MyVersion, Protocol]), {reply, {node(), {reject, self(), MyVersion, Protocol}}, State} end end; accept_protocol(Mon, Version, Protocol, From, State) -> ... case lists:member(Node, State#state.going_down) of true -> ... false -> link(Mon), %% link to remote Monitor ...
2. 出现网络分区时mnesia干了些什么事情
当出现网络分区时,mnesia_monitor进程会收到{nodedown,Node},{'EXIT',Pid,_Reason}消息。对于{nodedown,Node}消息,mneisa_monitor进程不做任何处理;而对于{'EXIT',Pid,_Reason}消息,则判断进程pid是否是本节点的,如果不是本节点的则假定是集群中其他节点的mnesia_monitor进程结束了,这时,mneisa_monitor进程会依次向mnesia_recover,mneisa_controller,mnesia_tm,mnesia_locker进程发送mnesia_down消息,通知进程进行相应处理。
mnesia_recover进程收到mnesia_down消息后对未决议的事务进行相应处理。
mnesia_controller进程收到mnesia_down消息后在latest日志文件和mnesia_decision表中记录mnesia_down的相关信息,重置所有表的where_to_commit,active_replicas,where_to_write,where_to_wlock等属性并进行其他相关处理。
mnesia_tm进程收到mnesia_down消息后对本节点发起的事务与参与的事务进行相应处理。
mnesia_locker进程则释放对应节点占用的锁。
最后mnesia_monitor进程记录mnesia_down事件,并将该事件发送给订阅者,然后进行自身相关状态的清理工作。
3. 网络分区恢复后mnesia又干了些什么事情
当网络分区恢复后,mnesia_monitor进程收到一个{nodeup,Node}消息,如果本节点和远端节点都认为对方down掉过,即本地有记录对应的mnesi_down事件,则上报inconsistent_database事件,并且mneisa不会再进行schema的merge及表数据的同步工作。
handle_info({nodeup, Node}, State) -> %% Ok, we are connected to yet another Erlang node %% Let's check if Mnesia is running there in order %% to detect if the network has been partitioned %% due to communication failure. HasDown = mnesia_recover:has_mnesia_down(Node), ImRunning = mnesia_lib:is_running(), if %% If I'm not running the test will be made later. HasDown == true, ImRunning == yes -> spawn_link(?MODULE, detect_partitioned_network, [self(), Node]); true -> ignore end, {noreply, State}; detect_partitioned_network(Mon, Node) -> detect_inconcistency([Node], running_partitioned_network), unlink(Mon), exit(normal). detect_inconcistency([], _Context) -> ok; detect_inconcistency(Nodes, Context) -> Downs = [N || N <- Nodes, mnesia_recover:has_mnesia_down(N)], {Replies, _BadNodes} = rpc:multicall(Downs, ?MODULE, has_remote_mnesia_down, [node()]), report_inconsistency(Replies, Context, ok). report_inconsistency([{true, Node} | Replies], Context, _Status) -> %% Oops, Mnesia is already running on the %% other node AND we both regard each %% other as down. The database is %% potentially inconsistent and we has to %% do tell the applications about it, so %% they may perform some clever recovery %% action. Msg = {inconsistent_database, Context, Node}, mnesia_lib:report_system_event(Msg), report_inconsistency(Replies, Context, inconsistent_database); report_inconsistency([{false, _Node} | Replies], Context, Status) -> report_inconsistency(Replies, Context, Status); report_inconsistency([{badrpc, _Reason} | Replies], Context, Status) -> report_inconsistency(Replies, Context, Status); report_inconsistency([], _Context, Status) -> Status.===========================================================
出现网络分区后,可以通过mnesia:set_master_nodes(Nodes)或者mnesia:change_config(extra_db_nodes,Nodes)简单的进行恢复,但是这种处理方式容易出现数据丢失的问题。好一点的处理方式是订阅mnesia的系统事件,自行编写数据恢复的处理程序,在收到inconsistent_database消息后进行相应的处理。