动机
在用elixir 写 rpc server/client时, 需要对传入gen_tcp的参数做一些考量. 如, 部分参数应该允许用户修改, 比如sndbuf recbuf, 让用户根据使用场景调节, 部分参数应该屏蔽, 减少使用理解成本.
故, 深挖了一下gen_tcp的option
代码版本
文章中贴的文件和行号来源于如下代码版本
- erlang: OTP-21.0.9
options
Available options for tcp:connect
inet.erl:723
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Available options for tcp:connect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
connect_options() ->
[tos, tclass, priority, reuseaddr, keepalive, linger, sndbuf, recbuf, nodelay,
header, active, packet, packet_size, buffer, mode, deliver, line_delimiter,
exit_on_close, high_watermark, low_watermark, high_msgq_watermark,
low_msgq_watermark, send_timeout, send_timeout_close, delay_send, raw,
show_econnreset, bind_to_device].
tos
type of service
下图来自tcp ip详解 卷1
tclass
IPV6_TCLASS
{tclass, Integer}
Sets IPV6_TCLASS IP level options on platforms where this is implemented.
The behavior and allowed range varies between different systems.
The option is ignored on platforms where it is not implemented. Use with caution.
不知道具体含义, 忽略
priority
SO_PRIORITY
Set the protocol-defined priority for all packets to be sent
on this socket. Linux uses this value to order the networking
queues: packets with a higher priority may be processed first
depending on the selected device queueing discipline. Setting
a priority outside the range 0 to 6 requires the CAP_NET_ADMIN
capability.
reuseaddr
SO_REUSEPORT (since Linux 3.9)
Permits multiple AF_INET or AF_INET6 sockets to be bound to an
identical socket address. This option must be set on each
socket (including the first socket) prior to calling bind(2)
on the socket. To prevent port hijacking, all of the pro‐
cesses binding to the same address must have the same effec‐
tive UID. This option can be employed with both TCP and UDP
sockets.
For TCP sockets, this option allows accept(2) load distribu‐
tion in a multi-threaded server to be improved by using a dis‐
tinct listener socket for each thread. This provides improved
load distribution as compared to traditional techniques such
using a single accept(2)ing thread that distributes connec‐
tions, or having multiple threads that compete to accept(2)
from the same socket.
For UDP sockets, the use of this option can provide better
distribution of incoming datagrams to multiple processes (or
threads) as compared to the traditional technique of having
multiple processes compete to receive datagrams on the same
socket.
keepalive
SO_KEEPALIVE
Enable sending of keep-alive messages on connection-oriented
sockets. Expects an integer boolean flag.
keepalive的可选参数和含义
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_time
1800
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any furthe
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
the number of unacknowledged probes to send before considering the connection dead and notifying the application layer
主要问题:
- 没有穿透负载均衡器.
- 检测得太慢.
- 不知道应用状态.
linger
SO_LINGER
Sets or gets the SO_LINGER option. The argument is a linger
structure.
struct linger {
int l_onoff; /* linger active */
int l_linger; /* how many seconds to linger for */
};
When enabled, a close(2) or shutdown(2) will not return until
all queued messages for the socket have been successfully sent
or the linger timeout has been reached. Otherwise, the call
returns immediately and the closing is done in the background.
When the socket is closed as part of exit(2), it always
lingers in the background.
close/shutdown前是否等待所有包都送达.
sndbuf recbuf buffer
SO_SNDBUF
Sets or gets the maximum socket send buffer in bytes. The
kernel doubles this value (to allow space for bookkeeping
overhead) when it is set using setsockopt(2), and this doubled
value is returned by getsockopt(2). The default value is set
by the /proc/sys/net/core/wmem_default file and the maximum
allowed value is set by the /proc/sys/net/core/wmem_max file.
The minimum (doubled) value for this option is 2048.
SO_RCVBUF
Sets or gets the maximum socket receive buffer in bytes. The
kernel doubles this value (to allow space for bookkeeping
overhead) when it is set using setsockopt(2), and this doubled
value is returned by getsockopt(2). The default value is set
by the /proc/sys/net/core/rmem_default file, and the maximum
allowed value is set by the /proc/sys/net/core/rmem_max file.
The minimum (doubled) value for this option is 256.
inet_drv.c:6708
case INET_OPT_SNDBUF:
{
arg.ival= get_int32 (curr); curr += 4;
proto = SOL_SOCKET;
type = SO_SNDBUF;
arg_ptr = (char*) (&arg.ival);
arg_sz = sizeof ( arg.ival);
/* Adjust the size of the user-level recv buffer, so it's not
smaller than the kernel one: */
if (desc->bufsz <= arg.ival)
desc->bufsz = arg.ival;
break;
}
可以看到, buffer是用户的缓存, 一定不小于内核buffer, 然而获得的buffer小于 recbuf, sdnbuf.
怀疑: 设置了recvbuf, sndbuf才会改变buffer.
nodelay
TCP_NODELAY
DISCUSSION:
The Nagle algorithm is generally as follows:
If there is unacknowledged data (i.e., SND.NXT >
SND.UNA), then the sending TCP buffers all user
data (regardless of the PSH bit), until the
outstanding data has been acknowledged or until
the TCP can send a full-sized segment (Eff.snd.MSS
bytes; see Section 4.2.2.6).
Some applications (e.g., real-time display window
updates) require that the Nagle algorithm be turned
off, so small data segments can be streamed out at the
maximum rate.
可以看到和延迟确认一起使用时会带来很大的延时.
header
http://erlang.org/doc/man/ine...
定长header, 处理定长header时可以一用.
active
用被动模式, 异步收发.
packet, raw
包头长度. 即用多少字节表示包长. raw 等同于 {packet, 0}
packet_size
包最大长度. 最大允许的包长.
mode
{mode, Mode :: binary | list}
Received Packet is delivered as defined by Mode.
deliver
{deliver, port | term}
When {active, true}, data is delivered on the form port : {S, {data, [H1,..Hsz | Data]}} or term : {tcp, S, [H1..Hsz | Data]}.
line_delimiter
{line_delimiter, Char}(TCP/IP sockets)
Sets the line delimiting character for line-oriented protocols (line). Defaults to $n.
exit_on_close
{exit_on_close, Boolean}
This option is set to true by default.
The only reason to set it to false is if you want to continue sending data to the socket after a close is detected, for example, if the peer uses gen_tcp:shutdown/2 to shut down the write side.
high_watermark, low_watermark, high_msgq_watermark,
low_msgq_watermark
影响socket busy state的切换.
需要搞清楚几个问题:
socket busy state是什么, 譬如调用发送/接收有什么返回?
msgq data size 和 socket data size, socket data size 是否就是buffer?
send_timeout
发送超时时间, 默认无限等待
send_timeout_close
发送超时是否自动关闭.
delay_send
应用层并包. 默认关闭. 可以考虑开启.
show_econnreset
是否把RST当正常关闭.
bind_to_device
使用指定的设备(网卡)
参考资料
- http://erlang.org/doc/man/gen...
- http://man7.org/linux/man-pag...
- http://erlang.org/doc/man/ine...
- https://github.com/erlang/otp
- https://tools.ietf.org/html/r...
- https://www.ietf.org/rfc/rfc3...
- https://tools.ietf.org/html/r...