mnesia之inconsistent_database

当mnesia集群出现网络分区(network_partition)时,各自的分区可能会写入不同的数据,从而出现数据不一致的现象,当网络分区恢复后,mnesia会上报一个inconsistent_database的系统事件,并且数据仍然处于不一致的状态。这里简单分析下inconsistent_database是怎样产生的。

1. mnesia怎样感知其他节点的启停状态

在mnesia应用中,mnesia_monitor进程负责对集群中节点的连接状态进行监控,进程在初始化时调用net_kernel:monitor_nodes(true)对节点的状态进行订阅,当有节点连接或者断开时,mnesia_monitor进程会收到相应的消息。

handle_call(init, _From, State) ->
    net_kernel:monitor_nodes(true),
    EarlyNodes = State#state.early_connects,
    State2 = State#state{tm_started = true},
    {reply, EarlyNodes, State2};


handle_info({nodeup,Node}, State) ->
    ...

handle_info({nodedown, _Node}, State) ->
    ...

另外,mneisa集群的节点间会进行协商,协商完成后,彼此的mnesia_monitor进程会相互link。

%% From remote monitor..
handle_call({negotiate_protocol, Mon, Version, Protocols}, From, State)
  when node(Mon) /= node() ->
    Protocol = protocol_version(),
    MyVersion = mnesia:system_info(version),
    case lists:member(Protocol, Protocols) of
	true ->
	    accept_protocol(Mon, MyVersion, Protocol, From, State);
	false ->
	    %% in this release we should be able to handle the previous
	    %% protocol
	    case hd(Protocols) of
		?previous_protocol_version ->
		    accept_protocol(Mon, MyVersion, ?previous_protocol_version, From, State);
		{7,6} ->
		    accept_protocol(Mon, MyVersion, {7,6}, From, State);
		_ ->
		    verbose("Connection with ~p rejected. "
			    "version = ~p, protocols = ~p, "
			    "expected version = ~p, expected protocol = ~p~n",
			    [node(Mon), Version, Protocols, MyVersion, Protocol]),
		    {reply, {node(), {reject, self(), MyVersion, Protocol}}, State}
	    end
    end;

accept_protocol(Mon, Version, Protocol, From, State) ->
    ...
    case lists:member(Node, State#state.going_down) of
    true ->
        ...
    false ->
        link(Mon),  %% link to remote Monitor
        ...
mnesia之inconsistent_database_第1张图片

 

2. 出现网络分区时mnesia干了些什么事情

当出现网络分区时,mnesia_monitor进程会收到{nodedown,Node},{'EXIT',Pid,_Reason}消息。对于{nodedown,Node}消息,mneisa_monitor进程不做任何处理;而对于{'EXIT',Pid,_Reason}消息,则判断进程pid是否是本节点的,如果不是本节点的则假定是集群中其他节点的mnesia_monitor进程结束了,这时,mneisa_monitor进程会依次向mnesia_recover,mneisa_controller,mnesia_tm,mnesia_locker进程发送mnesia_down消息,通知进程进行相应处理。

mnesia之inconsistent_database_第2张图片

mnesia_recover进程收到mnesia_down消息后对未决议的事务进行相应处理。

mnesia_controller进程收到mnesia_down消息后在latest日志文件和mnesia_decision表中记录mnesia_down的相关信息,重置所有表的where_to_commit,active_replicas,where_to_write,where_to_wlock等属性并进行其他相关处理。

mnesia_tm进程收到mnesia_down消息后对本节点发起的事务与参与的事务进行相应处理。

mnesia_locker进程则释放对应节点占用的锁。

最后mnesia_monitor进程记录mnesia_down事件,并将该事件发送给订阅者,然后进行自身相关状态的清理工作。

3. 网络分区恢复后mnesia又干了些什么事情

当网络分区恢复后,mnesia_monitor进程收到一个{nodeup,Node}消息,如果本节点和远端节点都认为对方down掉过,即本地有记录对应的mnesi_down事件,则上报inconsistent_database事件,并且mneisa不会再进行schema的merge及表数据的同步工作。

handle_info({nodeup, Node}, State) ->
    %% Ok, we are connected to yet another Erlang node
    %% Let's check if Mnesia is running there in order
    %% to detect if the network has been partitioned
    %% due to communication failure.

    HasDown   = mnesia_recover:has_mnesia_down(Node),
    ImRunning = mnesia_lib:is_running(),

    if
	%% If I'm not running the test will be made later.
	HasDown == true, ImRunning == yes ->
	    spawn_link(?MODULE, detect_partitioned_network, [self(), Node]);
	true ->
	    ignore
    end,
    {noreply, State};

detect_partitioned_network(Mon, Node) ->
    detect_inconcistency([Node], running_partitioned_network),
    unlink(Mon),
    exit(normal).

detect_inconcistency([], _Context) ->
    ok;
detect_inconcistency(Nodes, Context) ->
    Downs = [N || N <- Nodes, mnesia_recover:has_mnesia_down(N)],
    {Replies, _BadNodes} =
	rpc:multicall(Downs, ?MODULE, has_remote_mnesia_down, [node()]),
    report_inconsistency(Replies, Context, ok).

report_inconsistency([{true, Node} | Replies], Context, _Status) ->
    %% Oops, Mnesia is already running on the
    %% other node AND we both regard each
    %% other as down. The database is
    %% potentially inconsistent and we has to
    %% do tell the applications about it, so
    %% they may perform some clever recovery
    %% action.
    Msg = {inconsistent_database, Context, Node},
    mnesia_lib:report_system_event(Msg),
    report_inconsistency(Replies, Context, inconsistent_database);
report_inconsistency([{false, _Node} | Replies], Context, Status) ->
    report_inconsistency(Replies, Context, Status);
report_inconsistency([{badrpc, _Reason} | Replies], Context, Status) ->
    report_inconsistency(Replies, Context, Status);
report_inconsistency([], _Context, Status) ->
    Status.
===========================================================

出现网络分区后,可以通过mnesia:set_master_nodes(Nodes)或者mnesia:change_config(extra_db_nodes,Nodes)简单的进行恢复,但是这种处理方式容易出现数据丢失的问题。好一点的处理方式是订阅mnesia的系统事件,自行编写数据恢复的处理程序,在收到inconsistent_database消息后进行相应的处理。

你可能感兴趣的:(mnesia之inconsistent_database)