Mnesia的机制稍微有些奇怪, 今天一天都比较闲,于是测试study了下, 看看是如果动作.
目标:
逐渐新增N个Mnesia节点,并确保数据在这些节点上保持同步.
过程如下:
1. Mnesia的分布式可以从一个节点开始, 然后慢慢新增.
2. 新增加一个节点的时候, 首先要确保新节点上已经调用过mnesia:start()
3. 在每个已知存活的节点上调用(可以用rpc:call)mnesia:change_config(extra_db_nodes, [NewNode]),这样可以通知每个节点, 有一个新的节点要加入进来了
4. 改变NewNode上的schema表的存储方式: mnesia:change_table_copy_type(schema, NewNode, disc_copies)
5. 重启动NewNode的Mnesia,并稍微等待一段时间.(这大概是由于远程新节点的schema改变后,不能及时反应的缘故,可能不是必要的)
6. 向NewNode追加TableList : mensia:add_table_copy(Table, NewNode, disc_copies), 这里可能会调用多次,有多少用户表,就调用多少次
---- Over All ----
上面的6步可以确保依次增加新的节点并确保数据同步.
对应的代码如下:
addNode(NewNode) ->
io:format("New Node = ~p~n", [NewNode]),
RunningNodeList = mnesia:system_info(running_db_nodes),
io:format("-----------Adding Extra Node---------~n"),
addExtraNode(RunningNodeList, NewNode),
io:format("-----------Chang schema -> disc_copies---------~n"),
Rtn = mnesia:change_table_copy_type(schema, NewNode, disc_copies),
io:format("Rtn=~p~n", [Rtn]),
io:format("-----------Reboot Remote Node Mnesia---------~n"),
rpc:call(NewNode, mnesia, stop, []),
timer:sleep(1000),
rpc:call(NewNode, mnesia, start, []),
timer:sleep(1000),
io:format("-----------Adding Table List---------~n"),
addTableList(?TableList, NewNode),
io:format("-----------Over All---------~n").
addExtraNode([], _NewNode) ->
null;
addExtraNode(_RunningNodeList = [Node | T], NewNode) ->
Rtn = rpc:call(Node, mnesia, change_config, [extra_db_nodes, [NewNode]]),
io:format("Node = ~p, Rtn=~p~n", [Node, Rtn]),
addExtraNode(T, NewNode).
addTableList([], _NewNode) ->
null;
addTableList(_TableList = [Table | T], NewNode) ->
Rtn = mnesia:add_table_copy(Table, NewNode, disc_copies),
io:format("Table = ~p, Rtn = ~p~n", [Table, Rtn]),
addTableList(T, NewNode).
额外的, 可能会有这种情况, 一个A节点可能已经断开了,然后一个新的B节点被追加了进来, 这个时候如果A节点在上线,可能检测不到B节点其实是于自己保持同步的,这样有可能造成数据不同步, 解决该问题的方法即调用net_adm:ping(Node) :即每一个新节点上线后,即mnesia:start()以后, 立即查找与自己相连接的节点(mnesia:system_info(db_nodes)),然后用net_adm:ping()去ping下每一个连接的node,告诉自己上来了,这样即可解决刚才的问题.
对应的代码如下:
-module(ping).
-compile(export_all).
ping() ->
case whereis(ping) of
undefined ->
null;
OldPid ->
OldPid ! {exit},
unregister(ping)
end,
PingID = spawn(?MODULE, pingMain, []),
register(ping, PingID),
PingID.
pingMain() ->
AllNodeList = mnesia:system_info(db_nodes),
pingList(AllNodeList),
NodeListCount = length(AllNodeList),
receiveMsg(0, 0, NodeListCount).
receiveMsg(PingOK, PingFailed, NodeListCount) ->
receive
{ping, Result} ->
case Result of
true ->
NewPingOK = PingOK + 1,
NewPingFailed = PingFailed;
false ->
NewPingOK = PingOK,
NewPingFailed = PingFailed + 1
end,
case (NewPingOK + NewPingFailed < NodeListCount) of
true ->
receiveMsg(NewPingOK, NewPingFailed, NodeListCount);
false ->
io:format("-------Ping Over---------~n"),
io:format("Ping OK = ~p~n", [NewPingOK]),
io:format("Ping Failed = ~p~n", [NewPingFailed])
end;
{exit} ->
io:format("Receive to exit~n");
_Any ->
receiveMsg(PingOK, PingFailed, NodeListCount)
after 30000 ->
io:format("Error : Time out~n")
end.
pingList([]) ->
null;
pingList(_NodeList = [Node | T]) ->
spawn(?MODULE, pingOne, [Node]),
pingList(T).
pingOne(Node) ->
Rtn = net_adm:ping(Node),
PingID = whereis(ping),
case Rtn of
pong ->
PingID ! {ping, true};
pang ->
PingID ! {ping, false}
end.
------------------------------------------------
题外话,erlang的编译器比较怪,我在WinXP上面如果使用类似-record定义,然后使用%%-comment的话,就会报告错误@__@...... 因此上述的代码均没有注释...哪位好心人如果知道请告诉我下这个是为撒,Thanks~~