简述
erlang 的 cowboy 是一个 web server 框架。它在客户端提前断开(nginx http code 499)时,会直接杀掉handler进程。这很容易造成bug。
示例代码
参考 https://ninenines.eu/docs/en/...
有handler代码如下:
-module(hello_handler).
-behavior(cowboy_handler).
-export([init/2]).
init(Req, State) ->
erlang:display("before_sleep"),
timer:sleep(3000),
erlang:display("after_sleep"),
Req = cowboy_req:reply(
200,
#{<<"content-type">> => <<"text/plain">>},
<<"Hello Erlang!">>,
Req
),
{ok, Req, State}.
在
curl http://localhost:8080
时,有输出:
([email protected])1> "before_sleep"
"after_sleep"
如果
curl http://localhost:8080 --max-time 0.001
curl: (28) Resolving timed out after 4 milliseconds
有输出:
([email protected])1> "before_sleep"
这个说明handler进程的执行被抢行掐断了。如果代码中有对进程外部资源的访问,比如加锁,显然会造成锁释放问题。
问题原因
见 cowboy_http.erl:loop
loop(State=#state{parent=Parent, socket=Socket, transport=Transport, opts=Opts,
buffer=Buffer, timer=TimerRef, children=Children, in_streamid=InStreamID,
last_streamid=LastStreamID}) ->
Messages = Transport:messages(),
InactivityTimeout = maps:get(inactivity_timeout, Opts, 300000),
receive
%% Discard data coming in after the last request
%% we want to process was received fully.
{OK, Socket, _} when OK =:= element(1, Messages), InStreamID > LastStreamID ->
loop(State);
%% Socket messages.
{OK, Socket, Data} when OK =:= element(1, Messages) ->
parse(<< Buffer/binary, Data/binary >>, State);
{Closed, Socket} when Closed =:= element(2, Messages) ->
terminate(State, {socket_error, closed, 'The socket has been closed.'});
{Error, Socket, Reason} when Error =:= element(3, Messages) ->
terminate(State, {socket_error, Reason, 'An error has occurred on the socket.'});
{Passive, Socket} when Passive =:= element(4, Messages);
%% Hardcoded for compatibility with Ranch 1.x.
Passive =:= tcp_passive; Passive =:= ssl_passive ->
setopts_active(State),
loop(State);
%% Timeouts.
最终会通过发送exit消息方式,杀掉children进程。
-spec terminate(children()) -> ok.
terminate(Children) ->
%% For each child, either ask for it to shut down,
%% or cancel its shutdown timer if it already is.
%%
%% We do not need to flush stray timeout messages out because
%% we are either terminating or switching protocols,
%% and in the latter case we flush all messages.
_ = [case TRef of
undefined -> exit(Pid, shutdown);
_ -> erlang:cancel_timer(TRef, [{async, true}, {info, false}])
end || #child{pid=Pid, timer=TRef} <- Children],
before_terminate_loop(Children).
因为children没有trap exit,在没有任何日志输出,任何机会处理的情况下退出了。
总结
因为cowboy在对端断开时,会直接杀掉handler进程,这个很容易造成bug。可以使用nginx的 proxy_ignore_client_abort on。让客户端断开不传递至后端,从而规避这个问题。