作者:willorfang。
时间:二零一一年九月六日
说明:错误处理是网络编程中比较棘手的问题。特别是企业级网络程序,更是要考虑到各种意外的情况。本文总结了linux socket编程的异常原因及处理。有任何问题,欢迎交流、指正。
出处:http://blog.csdn.net/fangwei1235。
-----------------------------------------------------------------------
序
对于UNIX系统,大部分系统调用在非正常返回时,其返回值为-1,并设置全局变量errno。如socket()、bind()、accept()、listen()函数等。
变量errno存放一个正整数来表明上一个系统调用的错误值。仅当系统调用发生错误时才设置它。如果系统调用正常返回,它的值是不确定的。因此,当一个系统调用发生错误时应立即检查errno的值,以避免下一个调用修改了errno的值。
对于线程而言,每个线程都有专用的errno变量,它不是一个共享的变量,因此不必考虑多线程同步问题。错误值定义在头文件<sys/errno.h>中,为常量。通常用函数perror()来显示错误信息。
socket函数的出错处理:(linux man)
ERRORS
EACCES Permission to create a socket of the specified type and/or protocol is denied.
EAFNOSUPPORT
The implementation does not support the specified address family.
EINVAL Unknown protocol, or protocol family not available.
EMFILE Process file table overflow. 打开文件数达到最大值。(每进程)
注:
1. 查看进程打开文件在/proc下,对应每个进程有一个以进程号命名的目录,该目录下有一个fd目录,该目录下面的每个文件是一个符号连接,其文件名对应该进程占用的一个文件描述符,而连接指向的内容表示文件描述符对应的实际文件。
2. 修改进程打开文件数上限Linux 默认的进程打开文件上限是1024个,可以通过ulimit -n查看。很多系统上限可以通过修改/etc/security/limits.conf文件改变,这个文件有详细的注释,对如何修改做了说明。如果希望把所有用户的进程打开文件上限改为65536,可以加入下面两行* soft nofile 65535* hard nofile 65535还可以只真对某个用户或某个组做修改,具体方法参见文件注释。修改后需要重新启动系统才能生效。
ENFILE The system limit on the total number of open files has been reached. ()
ENOBUFS or ENOMEM
Insufficient memory is available. The socket cannot be created until sufficient
resources are freed.
EPROTONOSUPPORT
The protocol type or the specified protocol is not supported within this domain.
Other errors may be generated by the underlying protocol modules.
方法:修正参数,重新调用socket函数。
connect函数的出错处理:
–EADDRNOTAVAIL: 远程地址无效。通常处理方法是选择新的地址,并重新连接或提示用户,关闭套接字并终止程序。
– ECONNREFUSED: 连接被拒绝。当客户调用connect()函数,初始一个TCP连接请求,而此时服务器端没有进程等待在相应的端口上(例如服务器未启动),远程系统发回一个连接复位信号(RST),这时connect()函数将返回ECONNREFUSED错误码。通常的处理方法是等待一段时间后,仍无法连接则退出。 例如主机监听进程未启用,tcp取消连接等。
– EINTR: 由于信号的传递而引起系统中断该调用。通常的处理方法是重新执行该函数调用。
– ENETUNREACH: 网络无法抵达。由于路由器故障,导致网络无法返回ICMP的响应包,此时系统将进行多次尝试直到超时。通常处理方法是提示用户,并退出运行。
–ENXIO: 服务器在建立连接成功之前退出。通常处理方法是提示用户,并退出运行。
– ETIMEDOUT: 连接超时。当客户发出连接请求信号(SYN)后,服务器在指定时间段内(75秒)没有响应(例如主机关闭),则产生连接超时错误。通常处理方法是提示用户,并退出运行。
注:
ETIMEOUT,“connect time out”,即“连接超时”,这种情况一般发生在服务器主机崩溃。此时客户 TCP 将在一定时间内(依具体实现)持续重发数据分节,试图从服务 TCP 获得一个 ACK 分节。当最终放弃尝试后(此时服务器未重新启动),内核将会向客户进程返回 ETIMEDOUT 错误。
如果某个中间路由器判定该服务器主机已经不可达,则一般会响应“destination unreachable”-“目的地不可达”的ICMP消息,相应的客户进程返回的错误是 EHOSTUNREACH 或ENETUNREACH。
当服务器重新启动后,由于 TCP 状态丢失,之前所有的连接信息也不存在了,此时对于客户端发来请求将回应 RST。
如果客户进程对检测服务器主机是否崩溃很有必要,要求即使客户进程不主动发送数据也能检测出来,那么需要使用其它技术,如配置 SO_KEEPALIVE Socket 选项,或实现某些心跳函数。
ENOMEM: 没有足够的用户内存。通常处理方法是提示用户,并退出运行。
bind函数的出错处理(linux man):
ERRORS:
EADDRINUSE
The given address is already in use. 有时候出现close socket以后,重新bind,出现此错误,在调用setsockopt设置相应项即可。
EBADF sockfd is not a valid descriptor.
EINVAL The socket is already bound to an address.
ENOTSOCK
sockfd is a descriptor for a file, not a socket.
•listen()函数错误处理(linux man):
ERRORS
EADDRINUSE
Another socket is already listening on the same port.
EBADF The argument sockfd is not a valid descriptor.
ENOTSOCK
The argument sockfd is not a socket.
EOPNOTSUPP
The socket is not of a type that supports the listen() operation.
• accept()函数错误处理
ERRORS
EAGAIN or EWOULDBLOCK
The socket is marked non-blocking and no connections are present to be accepted. 对非阻塞socket accept可能会出现此错误,继续accept即可。
EBADF The descriptor is invalid.
ECONNABORTED
A connection has been aborted.
EINTR The system call was interrupted by a signal that was caught before a valid connection
arrived. 重试即可。
EINVAL Socket is not listening for connections, or addrlen is invalid (e.g., is negative).
EMFILE The per-process limit of open file descriptors has been reached.
ENFILE The system limit on the total number of open files has been reached.
ENOTSOCK
The descriptor references a file, not a socket.
EOPNOTSUPP
The referenced socket is not of type SOCK_STREAM.
•recv()函数错误处理
注:如果返回0,表示TCP连接已被关闭。
ERRORS
These are some standard errors generated by the socket layer. Additional errors may be gener-
ated and returned from the underlying protocol modules; see their manual pages.
EAGAIN The socket is marked non-blocking and the receive operation would block, or a receive
timeout had been set and the timeout expired before data was received. 继续接收数据即可。
EBADF The argument s is an invalid descriptor.
ECONNREFUSED
A remote host refused to allow the network connection (typically because it is not run-
ning the requested service).
EFAULT The receive buffer pointer(s) point outside the process‘s address space.
EINTR The receive was interrupted by delivery of a signal before any data were available.
EINVAL Invalid argument passed.
ENOMEM Could not allocate memory for recvmsg().
ENOTCONN
The socket is associated with a connection-oriented protocol and has not been connected
(see connect(2) and accept(2)).
ENOTSOCK
The argument s does not refer to a socket.
• send()函数错误处理
ERRORS
EAGAIN or EWOULDBLOCK
The socket is marked non-blocking and the requested operation would block.
EBADF An invalid descriptor was specified.
ECONNRESET
Connection reset by peer.
EDESTADDRREQ
The socket is not connection-mode, and no peer address is set.
EFAULT An invalid user space address was specified for a parameter.
EINTR A signal occurred before any data was transmitted.
EINVAL Invalid argument passed.
EISCONN
The connection-mode socket was connected already but a recipient was specified. (Now
either this error is returned, or the recipient specification is ignored.)
EMSGSIZE
The socket type requires that message be sent atomically, and the size of the message to
be sent made this impossible.
ENOBUFS
The output queue for a network interface was full. This generally indicates that the
interface has stopped sending, but may be caused by transient congestion. (Normally,
this does not occur in Linux. Packets are just silently dropped when a device queue
overflows.)
ENOMEM No memory available.
ENOTCONN
The socket is not connected, and no target has been given.
ENOTSOCK
The argument s is not a socket.
EOPNOTSUPP
Some bit in the flags argument is inappropriate for the socket type.
EPIPE The local end has been shut down on a connection oriented socket. In this case the pro-
cess will also receive a SIGPIPE unless MSG_NOSIGNAL is set.
附录:
EPIPE错误的几种典型情况:
1、Socket 关闭,但是socket号并没有置-1。继续在此socket上进行send和recv,就会返回这种错误。这个错误会引发SIGPIPE信号,系统会将产生此EPIPE错误的进程杀死。所以,一般在网络程序中,首先屏蔽此消息,以免发生不及时设置socket进程被杀死的情况。
2、write(..) on a socketthat has been closed at the other end will cause a SIGPIPE. // 写一个对端socket已经关闭的socket。
3、错误被描述为“broken pipe”,即“管道破裂”,这种情况一般发生在客户进程不理会(或未及时处理)Socket 错误,继续向服务 TCP 写入更多数据时,内核将向客户进程发送 SIGPIPE 信号,该信号默认会使进程终止(此时该前台进程未进行 core dump)。向一个 FIN_WAIT2 状态的服务 TCP(已 ACK 响应 FIN 分节)写入数据不成问题,但是写一个已接收了 RST 的 Socket 则是一个错误。
ECONNRESET错误的几种典型情况:
1、在客户端服务器程序中,客户端异常退出,并没有回收关闭相关的资源,服务器端会先收到ECONNRESET错误,然后收到EPIPE错误。
2、连接被远程主机关闭。有以下几种原因:远程主机停止服务,重新启动;当在执行某些操作时遇到失败,因为设置了“keep alive”选项,连接被关闭,一般与ENETRESET一起出现。
3、远程端执行了一个“hard”或者“abortive”的关闭。应用程序应该关闭socket,因为它不再可用。当执行在一个UDP socket上时,这个错误表明前一个send操作返回一个ICMP“port unreachable”信息。
4、如果client关闭连接,server端的select并不出错(不返回-1,使用select对唯一一个socket进行non- blocking检测),但是写该socket就会出错,用的是send.错误号:ECONNRESET.读(recv)socket并没有返回错误。
5、该错误被描述为“connection reset by peer”,即“对方复位连接”,这种情况一般发生在服务进程较客户进程提前终止。当服务进程终止时会向客户 TCP 发送 FIN 分节,客户 TCP 回应 ACK,服务 TCP 将转入 FIN_WAIT2 状态。此时如果客户进程没有处理该 FIN (如阻塞在其它调用上而没有关闭 Socket 时),则客户 TCP 将处于 CLOSE_WAIT 状态。当客户进程再次向 FIN_WAIT2 状态的服务 TCP 发送数据时,则服务 TCP 将立刻响应 RST。一般来说,这种情况还可以会引发另外的应用程序异常,客户进程在发送完数据后,往往会等待从网络IO接收数据,很典型的如 read 或 readline 调用,此时由于执行时序的原因,如果该调用发生在 RST 分节收到前执行的话,那么结果是客户进程会得到一个非预期的 EOF 错误。此时一般会输出“server terminatedprematurely”-“服务器过早终止”错误。