http://www.cnblogs.com/me-sa/archive/2012/01/10/erlang0030.html
Supervisors are used to build an hierarchical process structure called a supervision tree, a nice way to structure a fault tolerant application.
--Erlang/OTP Doc
Supervisor的基本思想就是通过建立层级结构实现错误隔离和管理,具体方法是通过重启的方式保持子进程一直活着.如果supervisor是进程树的一部分,它会被它的supervisor自动终止,当它的supervisor让它shutdown的时候,它会按照子进程启动顺序的逆序终止其所有的子进程,最后终止掉自己.重启的目的是让系统回归到一个稳定的状态,回归稳定状态后再出现异常可以进行重试,如果初始化都不稳定,后续的监控-重启策略意义不大.换句话说,Application初始化的阶段要有可靠性的保障,初始化阶段可能读取配置文件或者从数据库加载恢复数据,哪怕执行时间长一点都等待同步执行完.如果application依赖非本地数据库或外部服务就可以采取更快的异步启动,因为这种服务在正常使用过程中也经常出状况,早一点还是晚一点启动没有什么关系.
[Erlang 0025]理解Erlang/OTP - Application以log4erl项目为学习了Erlang/OTP application,我们说到application在start的方法中启动了log4erl的顶层监控树.今天我们继续跟进,看log4erl的监控树是怎么构建起来的,并做实验看supervisor如何通过重启恢复服务的.使用application:start(log4erl).启动起来之后的进程树:
下面是log4erl_sup文件的start_link方法,supervisor:start_link方法的执行是同步的,直到所有的子进程都启动了才会返回. supervisor:start_link会使用回调函数init/1.
start_link(Default_logger) ->
R = supervisor:start_link({local, ?MODULE}, ?MODULE, []),
%log4erl:start_link(Default_logger),
add_logger(Default_logger),
?LOG2("Result in supervisor is ~p~n",[R]),
R.
%%回调的方法init/1
init([]) ->
{ok,
{
{one_for_one,3,10},
[]
}
}.
%%rabbit_sup.erl 来自大名鼎鼎的rabbitmq
init([]) ->
{ok, {{one_for_all, 0, 1}, []}}.
%%yaws_sup.erl Yaws项目 - Yet Another Web Server
init([]) ->
ChildSpecs = child_specs(),
%% 0, 1 means that we never want supervisor restarts
{ok,{{one_for_all, 0, 1}, ChildSpecs}}.
%%ejabberd_sup ejabberd项目
init([]) ->
Hooks =
{ejabberd_hooks,
{ejabberd_hooks, start_link, []},
%%......................... 省略代码
{ok, {{one_for_one, 10, 1},
[Hooks,
GlobalRouter,
Cluster,
..................
Listener]}}.
Note: one of the big differences between one_for_one and simple_one_for_one is that one_for_one holds a list of all the children it has (and had, if you don't clear it), started in order, while simple_one_for_one holds a single definition for all its children and works using a dict to hold its data. Basically, when a process crashes, the simple_one_for_one supervisor will be much faster when you have a large number of children.Note: it is important to note thatsimple_one_for_one
children are not respecting this rule with the Shutdown time. In the case ofsimple_one_for_one
, the supervisor will just exit and it will be left to each of the workers to terminate on their own, after their supervisor is gone.For the most part, writing asimple_one_for_one
supervisor is similar to writing any other type of supervisor, except for one thing. The argument list in the{M,F,A}
tuple is not the whole thing, but is going to be appended to what you call it with when you dosupervisor:start_child(Sup, Args)
. That's right,supervisor:start_child/2
changes API. So instead of doingsupervisor:start_child(Sup, Spec)
, which would callerlang:apply(M,F,A)
, we now havesupervisor:start_child(Sup, Args)
, which callserlang:apply(M,F,Args++A)
.
add_logger(Name) when is_atom(Name) ->
N = atom_to_list(Name),
add_logger(N);
add_logger(Name) when is_list(Name) ->
C1 = {Name,
{log_manager, start_link ,[Name]},
permanent,
10000,
worker,
[log_manager]},
?LOG2("Adding ~p to ~p~n",[C1, ?MODULE]),
supervisor:start_child(?MODULE, C1).
Modules is a list of one element, the name of the callback module used by the child behavior. The exception to that is when you have callback modules whose identity you do not know beforehand (such as event handlers in an event manager). In this case, the value of
Modules should be
dynamic
so that the whole OTP system knows who to contact when using more advanced features, such as
releases.
%%log4erl.conf文件 内容我做了简单的缩排%%mod
logger default_logger{
file_appender default_app{
dir = "./log", level = debug, file = default_log, type = size, max = 1000000, suffix = log, rotation = 50, format = ' %d %h:%m:%s.%i %l%n'
}
}
%%mail mod
logger mail_logger{
file_appender mail_app{
dir = "./log", level = debug, file = mail_log, type = size, max = 1000000, suffix = log, rotation = 50, format = ' %d %h:%m:%s.%i %l%n'
}
}
我们沿着调用关系,逐步跟进代码:
%==== File : log4erl_conf =======
%%log4erl_conf:conf(File).
conf(File) ->
application:start(log4erl), %%启动log4erl
Tree = parse(leex(File)), %%解析配置文件
traverse(Tree). %%遍历配置项构造监控树
%%跟进遍历的逻辑,对于每一条配置执行的是element/1方法
traverse([]) ->
ok;
traverse([H|Tree]) ->
element(H),
traverse(Tree).
%%对于我们自定义的logger走的是{logger, Logger, Appenders}逻辑
element({cutoff_level, CutoffLevel}) ->
log_filter_codegen:set_cutoff_level(CutoffLevel);
element({default_logger, Appenders}) ->
appenders(Appenders);
element({logger, Logger, Appenders}) ->
log4erl:add_logger(Logger),
appenders(Logger, Appenders).
%==== File : log4erl =======
%%继续跟进我们走到log4erl:add_logger/1
add_logger(Logger) ->
try_msg({add_logger, Logger}).
%%try_msg 是的添加了异常捕获的通用方法
try_msg(Msg) ->
try
handle_call(Msg)
catch
exit:{noproc, _M} ->
io:format("log4erl has not been initialized yet. To do so, please run~n"),
io:format("> application:start(log4erl).~n"),
{error, log4erl_not_started};
E:M ->
?LOG2("Error message received by log4erl is ~p:~p~n",[E, M]),
{E, M}
end.
%%handle_call的代码片段
handle_call({add_logger, Logger}) ->
log_manager:add_logger(Logger);
%==== File : log_manager =======
%%逻辑转到log_manager的add_logger(Logger)
%%最终调用的是log4erl_sup:add_logger(Logger).这个我们上面已经分析过了
add_logger(Logger) ->
log4erl_sup:add_logger(Logger).
%%element方法在添加loger之后会添加appender
appenders([]) ->
ok;
appenders([H|Apps]) ->
appender(H),
appenders(Apps).
appenders(_, []) ->
ok;
appenders(Logger, [H|Apps]) ->
appender(Logger, H),
appenders(Logger, Apps).
appender({appender, App, Name, Conf}) ->
log4erl:add_appender({App, Name}, {conf, Conf}).
appender(Logger, {appender, App, Name, Conf}) ->
log4erl:add_appender(Logger, {App, Name}, {conf, Conf}).
%==== File : log4erl =======
%% Appender = {Appender, Name}
add_appender(Logger, Appender, Conf) ->
try_msg({add_appender, Logger, Appender, Conf}).
handle_call({add_appender, Logger, Appender, Conf}) ->
log_manager:add_appender(Logger, Appender, Conf);
%==== File : log_manager =======
add_appender(Logger, {Appender, Name} , Conf) ->
?LOG2("add_appender ~p with name ~p to ~p with Conf ~p ~n",[Appender, Name, Logger, Conf]),
log4erl_sup:add_guard(Logger, Appender, Name, Conf).
%==== File : log4erl_sup =======
add_guard(Logger, Appender, Name, Conf) ->
C = {Name,
{logger_guard, start_link ,[Logger, Appender, Name, Conf]},
permanent,
10000,
worker,
[logger_guard]},
?LOG2("Adding ~p to ~p~n",[C, ?MODULE]),
supervisor:start_child(?MODULE, C).
%==== File : logger_guard =======
start_link(Logger, Appender, Name, Conf) ->
%?LOG2("starting guard for logger ~p~n",[Logger]),
{ok, Pid} = gen_server:start_link(?MODULE, [Appender, Name], []),
case add_sup_handler(Pid, Logger, Conf) of
{error, E} ->
gen_server:call(Pid, stop),
{error, E};
_R ->
{ok, Pid}
end.
add_sup_handler(G_pid, Logger, Conf) ->
?LOG("add_sup()~n"),
gen_server:call(G_pid, {add_sup_handler, Logger, Conf}).
handle_call({add_sup_handler, Logger, Conf}, _From, [{appender, Appender, Name}] = State) ->
?LOG2("Adding handler ~p with name ~p for ~p From ~p~n",[Appender, Name, Logger, _From]),
try
Res = gen_event:add_sup_handler(Logger, {Appender, Name}, Conf),
{reply, Res, State}
catch
E:R ->
{reply, {error, {E,R}}, State}
end;
gen_event:add_sup_handler会建立EventManager与Event Handler之间的link的关系,所以我们修改一下,注释掉这段,看看监控树是什么样子:
% ?LOG("add_sup()~n"),
% gen_server:call(G_pid, {add_sup_handler, Logger, Conf}).
ok.
3> whereis(default_logger).
<0.45.0>
4> exit(whereis(default_logger),some_reason).
true
5> whereis(default_logger).
<0.45.0>
6> exit(whereis(default_logger),some_reason). %%由于gen_event默认process_flag(trap_exit, true),所以some_reason的退出消息并没有把它干掉
true
7> whereis(default_logger).
<0.45.0>
8> exit(whereis(default_logger),kill). %%向进程发送强制退出消息,
true
=SUPERVISOR REPORT==== 10-Jan-2012::10:35:21 === %首先能够看到log4erl报出的子进程终止的报告
Supervisor: {local,log4erl_sup}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.45.0>},
{name,"default_logger"},
{mfargs,{log_manager,start_link,["default_logger"]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
=PROGRESS REPORT==== 10-Jan-2012::10:35:21 === %log4erl_sup重建default_logger,新进程pid是<0.69.0>
supervisor: {local,log4erl_sup}
started: [{pid,<0.69.0>},
{name,"default_logger"},
{mfargs,{log_manager,start_link,["default_logger"]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
=SUPERVISOR REPORT==== 10-Jan-2012::10:35:21 === %default_logger退出消息转变成为killed继续广播给link的进程,对应的logger_guard终止
Supervisor: {local,log4erl_sup}
Context: child_terminated
Reason: killed
Offender: [{pid,<0.46.0>},
{name,default_app},
{mfargs,
{logger_guard,start_link,
[default_logger,file_appender,default_app,
{conf, [{dir,"./log"},{level,debug},{file,default_log},{type,size},
{max,1000000},{suffix,log}, {rotation,50},
{format," %d %h:%m:%s.%i %l%n"}]}]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
=PROGRESS REPORT==== 10-Jan-2012::10:35:21 === %logger_guard 重建
supervisor: {local,log4erl_sup}
started: [{pid,<0.70.0>},
{name,default_app},
{mfargs,
{logger_guard,start_link,
[default_logger,file_appender,default_app,
{conf,
[{dir,"./log"},{level,debug}, {file,default_log},{type,size},
{max,1000000}, {suffix,log},{rotation,50},
{format," %d %h:%m:%s.%i %l%n"}]}]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
9> whereis(default_logger).
<0.69.0>
10> is_process_alive(pid(0,70,0)). %这是新启动的logger_guard进程
true
11> exit(pid(0,70,0),some_reason). %向进程发送一个退出消息
true
=SUPERVISOR REPORT==== 10-Jan-2012::11:07:51 ===
Supervisor: {local,log4erl_sup}
Context: child_terminated
Reason: some_reason
Offender: [{pid,<0.70.0>},
{name,default_app},
{mfargs,
{logger_guard,start_link,
[default_logger,file_appender,default_app,
{conf,
[{dir,"./log"},{level,debug},{file,default_log},{type,size},{max,1000000},
{suffix,log},{rotation,50},{format," %d %h:%m:%s.%i %l%n"}]}]}},
{restart_type,permanent},
{shutdown,10000},
{child_type,worker}]
12>
=PROGRESS REPORT==== 10-Jan-2012::11:07:51 ===
supervisor: {local,log4erl_sup}
started: [{pid,<0.76.0>},
{name,default_app},
{mfargs,
{logger_guard,start_link,
[default_logger,file_appender,default_app,
{conf,
[{dir,"./log"},{level,debug},{file,default_log},{type,size},{max,1000000},
{suffix,log},{rotation,50},{format," %d %h:%m:%s.%i %l%n"}]}]}},
{restart_type,permanent},
{shutdown,10000},{child_type,worker}]
12> is_process_alive(pid(0,70,0)).
false
13> whereis(default_logger). %退出消息广播对default_logger没有影响
<0.69.0>
14> whereis(log4erl_sup).
<0.44.0>
15> exit(whereis(log4erl_sup),some_reason). % Supervisor 初始化的时候也会设置 process_flag(trap_exit, true),
true
16> whereis(log4erl_sup).
<0.44.0>
17> exit(whereis(log4erl_sup),kill). %杀掉log4erl_sup 应用程序停止
true
=CRASH REPORT==== 10-Jan-2012::13:26:23 ===
crasher:
initial call: gen_event:init_it/6
pid: <0.69.0>
registered_name: default_logger
exception exit: killed
in function gen_event:terminate_server/4
ancestors: [log4erl_sup,<0.43.0>]
messages: [{'EXIT',<0.76.0>,killed}]
links: [#Port<0.1891>,#Port<0.1885>]
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 24
reductions: 720
neighbours:
18>
=CRASH REPORT==== 10-Jan-2012::13:26:23 ===
crasher:
initial call: gen_event:init_it/6
pid: <0.47.0>
registered_name: mail_logger
exception exit: killed
in function gen_event:terminate_server/4
ancestors: [log4erl_sup,<0.43.0>]
messages: [{'EXIT',<0.48.0>,killed}]
links: [#Port<0.546>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 411
neighbours:
18>
=CRASH REPORT==== 10-Jan-2012::13:26:25 ===
crasher:
initial call: application_master:init/4
pid: <0.42.0>
registered_name: []
exception exit: killed
in function application_master:terminate/2
ancestors: [<0.41.0>]
messages: []
links: [<0.6.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 610
stack_size: 24
reductions: 1555
neighbours:
18>
=INFO REPORT==== 10-Jan-2012::13:26:25 ===
application: log4erl
exited: killed
type: temporary
18>
最后再贴一次log4erl项目的地址: http://code.google.com/p/log4erl/,建议下载下来代码自己动手做一下上面的实验.