说起CLOSE_WAIT状态,如果不知道的话,还是先瞧一下TCP的状态转移图吧。
关闭socket分为主动关闭(Active closure)和被动关闭(Passive closure)两种情况。前者是指有本地主机主动发起的关闭;而后者则是指本地主机检测到远程主机发起关闭之后,作出回应,从而关闭整个连接。将关闭部分的状态转移摘出来,就得到了下图:
产生原因
通过图上,我们来分析,什么情况下,连接处于CLOSE_WAIT状态呢?
在被动关闭连接情况下,在已经接收到FIN,但是还没有发送自己的FIN的时刻,连接处于CLOSE_WAIT状态。
通常来讲,CLOSE_WAIT状态的持续时间应该很短,正如SYN_RCVD状态。但是在一些特殊情况下,就会出现连接长时间处于CLOSE_WAIT状态的情况。
出现大量close_wait的现象,主要原因是某种情况下对方关闭了socket链接,但是我方忙与读或者写,没有关闭连接。代码需要判断socket,一旦读到0,断开连接,read返回负,检查一下errno,如果不是AGAIN,就断开连接。
参考资料4中描述,通过发送SYN-FIN报文来达到产生CLOSE_WAIT状态连接,没有进行具体实验。不过个人认为协议栈会丢弃这种非法报文,感兴趣的同学可以测试一下,然后把结果告诉我;-)
为了更加清楚的说明这个问题,我们写一个测试程序,注意这个测试程序是有缺陷的。
只要我们构造一种情况,使得对方关闭了socket,我们还在read,或者是直接不关闭socket就会构造这样的情况。
server.c:
#include <stdio.h> #include <string.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(void) { struct sockaddr_in servaddr, cliaddr; socklen_t cliaddr_len; int listenfd, connfd; char buf[MAXLINE]; char str[INET_ADDRSTRLEN]; int i, n; listenfd = socket(AF_INET, SOCK_STREAM, 0); int opt = 1; setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_addr.s_addr = htonl(INADDR_ANY); servaddr.sin_port = htons(SERV_PORT); bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); listen(listenfd, 20); printf("Accepting connections ...\n"); while (1) { cliaddr_len = sizeof(cliaddr); connfd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddr_len); //while (1) { n = read(connfd, buf, MAXLINE); if (n == 0) { printf("the other side has been closed.\n"); break; } printf("received from %s at PORT %d\n", inet_ntop(AF_INET, &cliaddr.sin_addr, str, sizeof(str)), ntohs(cliaddr.sin_port)); for (i = 0; i < n; i++) buf[i] = toupper(buf[i]); write(connfd, buf, n); } //这里故意不关闭socket,或者是在close之前加上一个sleep都可以 //sleep(5); //close(connfd); } }
client.c:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <netinet/in.h> #define MAXLINE 80 #define SERV_PORT 8000 int main(int argc, char *argv[]) { struct sockaddr_in servaddr; char buf[MAXLINE]; int sockfd, n; char *str; if (argc != 2) { fputs("usage: ./client message\n", stderr); exit(1); } str = argv[1]; sockfd = socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; inet_pton(AF_INET, "127.0.0.1", &servaddr.sin_addr); servaddr.sin_port = htons(SERV_PORT); connect(sockfd, (struct sockaddr *)&servaddr, sizeof(servaddr)); write(sockfd, str, strlen(str)); n = read(sockfd, buf, MAXLINE); printf("Response from server:\n"); write(STDOUT_FILENO, buf, n); write(STDOUT_FILENO, "\n", 1); close(sockfd); return 0; }
结果如下:
debian-wangyao:~$ ./client a Response from server: A debian-wangyao:~$ ./client b Response from server: B debian-wangyao:~$ ./client c Response from server: C debian-wangyao:~$ netstat -antp | grep CLOSE_WAIT (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 1 0 127.0.0.1:8000 127.0.0.1:58309 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58308 CLOSE_WAIT 6979/server tcp 1 0 127.0.0.1:8000 127.0.0.1:58307 CLOSE_WAIT 6979/server
解决方法
基本的思想就是要检测出对方已经关闭的socket,然后关闭它。
1.代码需要判断socket,一旦read返回0,断开连接,read返回负,检查一下errno,如果不是AGAIN,也断开连接。(注:在UNP 7.5节的图7.6中,可以看到使用select能够检测出对方发送了FIN,再根据这条规则就可以处理CLOSE_WAIT的连接)
2.给每一个socket设置一个时间戳last_update,每接收或者是发送成功数据,就用当前时间更新这个时间戳。定期检查所有的时间戳,如果时间戳与当前时间差值超过一定的阈值,就关闭这个socket。
3.使用一个Heart-Beat线程,定期向socket发送指定格式的心跳数据包,如果接收到对方的RST报文,说明对方已经关闭了socket,那么我们也关闭这个socket。
4.设置SO_KEEPALIVE选项,并修改内核参数
前提是启用socket的KEEPALIVE机制:
//启用socket连接的KEEPALIVE
int iKeepAlive = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *)&iKeepAlive, sizeof(iKeepAlive));
tcp_keepalive_intvl (integer; default: 75; since Linux 2.4)
The number of seconds between TCP keep-alive probes.
tcp_keepalive_probes (integer; default: 9; since Linux 2.2)
The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end.
tcp_keepalive_time (integer; default: 7200; since Linux 2.2)
The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours). An idle connec�\tion is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled.
echo 120 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 2 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes
除了修改内核参数外,可以使用setsockopt修改socket参数,参考man 7 socket。
int KeepAliveProbes=1; int KeepAliveIntvl=2; int KeepAliveTime=120; setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (void *)&KeepAliveProbes, sizeof(KeepAliveProbes)); setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (void *)&KeepAliveTime, sizeof(KeepAliveTime)); setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (void *)&KeepAliveIntvl, sizeof(KeepAliveIntvl));
参考:
http://blog.chinaunix.net/u/20146/showart_1217433.html
http://blog.csdn.net/eroswang/archive/2008/03/10/2162986.aspx
http://haka.sharera.com/blog/BlogTopic/32309.htm
http://learn.akae.cn/media/ch37s02.html
http://faq.csdn.net/read/208036.html
http://www.cndw.com/tech/server/2006040430203.asp
http://davidripple.bokee.com/1741575.html
http://doserver.net/post/keepalive-linux-1.php
man 7 tcp
转自:http://blog.chinaunix.net/uid-20357359-id-1963662.html