作者:henrystark [email protected]
Blog: http://henrystark.blog.chinaunix.net/
日期:20140419
本文遵循CC协议:署名-非商业性使用-禁止演绎 2.5(https://creativecommons.org/licenses/by-nc-nd/2.5/cn/)。可以自由拷贝,转载。但转载请保持文档的完整性,注明原作者及原链接。如有错讹,烦请指出。
面试时被问及系统调用如何实现,这个问题不好说。往深处说,牵涉到NR……等中断向量的实现【引 3】;往浅了说,就是系统提供的接口在内核代码如何实现。我最开始说了printf和write系统调用的关系,说到一半接不下去了【注 1】。于是该说shutdown和close两个系统调用。
socket网络编程中,常用这两个系统调用,最主要的区别是:shutdown强制关闭套接字,close只将引用计数减一。
准确的定义见【引 1】。该函数有三种关闭方式:单独关闭读(写)、同时关闭读写。shutdown处理过程调用序列见【引 2】。shutdown不管引用计数,会直接关闭套接口。源码如下:
linux/net/ipv4/tcp.c /* * Shutdown the sending side of a connection. Much like close except * that we don't receive shut down or sock_set_flag(sk, SOCK_DEAD). */ void tcp_shutdown(struct sock *sk, int how) { /* We need to grab some memory, and put together a FIN, * and then put it into the queue to be sent. * Tim MacKenzie([email protected]) 4 Dec '92. */ if (!(how & SEND_SHUTDOWN)) return; /* If we've already sent a FIN, or it's a closed state, skip this. */ if ((1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_CLOSE_WAIT)) { /* Clear out any half completed packets. FIN if needed. */ if (tcp_close_state(sk)) tcp_send_fin(sk); } }
从注释中可以看到,这个函数主要负责关闭套接口的读端。注意,这里为了处理用位与的方式来判断是否是关闭读端,how变量已经经过了处理,见shutdown系统调用在套接口层的实现inet_shutdown。
linux/net/ipv4/af_inet.c int inet_shutdown(struct socket *sock, int how) { struct sock *sk = sock->sk; int err = 0; /* This should really check to make sure * the socket is a TCP socket. (WHY AC...) */ how++; /* maps 0->1 has the advantage of making bit 1 rcvs and 1->2 bit 2 snds. 2->3 */ if ((how & ~SHUTDOWN_MASK) || !how) /* MAXINT->0 */ return -EINVAL; ……………………………………………………………………………………………………………………………………………………………………………… linux/include/net/sock.h #define SHUTDOWN_MASK 3 #define RCV_SHUTDOWN 1 #define SEND_SHUTDOWN 2
问题是,读端怎么关闭?实际上,shutdown导致进程丢弃没有读取的或者后续到达的数据。这会在其他tcp接收函数中做处理,如tcp_poll、tcp_recvmsg等。
close系统调用的减引用计数操作主要由release函数完成,该函数最后调用close函数处理数据并发送fin。
linux/net/ipv4/af_inet.c int inet_release(struct socket *sock) { struct sock *sk = sock->sk; if (sk) { long timeout; //以下两个函数实现引用计数-1 sock_rps_reset_flow(sk); /* Applications forget to leave groups before exiting */ ip_mc_drop_socket(sk); /* If linger is set, we don't return until the close * is complete. Otherwise we return immediately. The * actually closing is done the same either way. * * If the close is due to the process exiting, we never * linger.. */ timeout = 0; if (sock_flag(sk, SOCK_LINGER) && !(current->flags & PF_EXITING)) timeout = sk->sk_lingertime; sock->sk = NULL; sk->sk_prot->close(sk, timeout); //这里调用tcp_close() } return 0; } linux/net/ipv4/tcp.c void tcp_close(struct sock *sk, long timeout) { …………………………………………………………………………………………………………………………………………………… if (data_was_unread) { /* Unread data was tossed, zap the connection. */ NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPABORTONCLOSE); tcp_set_state(sk, TCP_CLOSE); tcp_send_active_reset(sk, sk->sk_allocation); } else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) { /* Check zero linger _after_ checking for unread data. */ sk->sk_prot->disconnect(sk, 0); NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPABORTONDATA); } else if (tcp_close_state(sk)) { tcp_send_fin(sk); //这里发送fin } sk_stream_wait_close(sk, timeout); adjudge_to_death: state = sk->sk_state; sock_hold(sk); sock_orphan(sk); /* It is the last release_sock in its life. It will remove backlog. */ release_sock(sk); …………………………………………………………………………………………………………………………………………………………………………………………………… }
可以看到,shutdown和close两个系统调用最后都使用了send_fin函数来终止连接。
【引 3】中有系统调用的详细实现机制。在内核中定义系统调用编号,应用程序用软中断通知系统切换到内核态,传递参数。
引用:
【1】shutdown函数说明。http://pubs.opengroup.org/onlinepubs/007908799/xns/shutdown.html。
【2】shutdown调用序列,形参定义稍有不同。http://www.ibm.com/developerworks/cn/aix/library/au-tcpsystemcalls/#shutdown。
【3】系统调用如何实现。http://blog.chinaunix.net/uid-20321537-id-1966859.html.
注解:
【1】printf是库函数,write是系统调用,关于系统调用和库函数的区别,也很复杂,【引 3】中讲了一部分,关于printf的实现细节参见http://blog.csdn.net/dog250/article/details/23000909。