socket和端口复用

socket是操作tcp/ip协议栈的【实现】

什么是socket？
TCP/IP是一个协议栈，它在操作系统上必须要有具体实现，同时操作系统还需要将这些实现以接口形式对外暴露。就像操作系统会提供标准的编程接口，TCP/IP也必须对外提供编程接口，这就是socket对象及其方法。socket对象及其方法向os屏蔽了tcp/ip网络通信的底层细节，对于os来说，网络通信和文件io别无二致

socket编程

server端socket编程的步骤：

socket->bind->listen->accept
监听socket只负责建立连接，普通socket负责具体的处理逻辑

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# 监听端口:
s.bind(('127.0.0.1', 9999))
while True:
    # 接受一个新tcp连接:
    sock, addr = s.accept()
    # 创建新线程来处理TCP连接:
    t = threading.Thread(target=tcplink, args=(sock, addr))
    t.start()

socket.accept():
Accept a connection. The socket must be bound to an address and listening for connections. 
The return value is a pair `(conn, address)` where *conn* is a *new* socket object usable to send and receive data on the connection, and *address* is the address bound to the socket on the other end of the connection.
The newly created socket is non-inheritable

socket与tcp三次握手

注意：s.listen()监听端口并且创建新socket(ns)，ns和s有什么区别呢？
ns和s具有不同的文件描述符；
ns的状态是established，而s的状态是listen，所以它们的职能不同；
ns是对s的clone，因此ns也bind了相同的ip:port，但这是通过clone完成的bind，而不是调用bind()方法实现，因此不会报错bind: address already been used

为什么socket是文件

linux的vfs(virtual file system)对os屏蔽了底层不同类型的fs

os
vfs
-----
ext4, xfs, sockfs, etc

inode是vfs抽象出来适配所有文件系统的结构体
vfs中的inode，实际上则是由具体的文件系统分配而来。如ext4分配的struct ext4_inode_info，sockfs分配的struct socket_alloc，都可以视为vfs中的inode

而网络编程中的socket对象，实际上是sockfs中的一个文件的句柄，是sockfs分配的socket_alloc的fd

一般情况下，同一个ip同一个port只允许被一个socket对象执行bind操作，其他socket不允许再次bind该ip:port（否则报错bind: address already been used）

例如，父进程监听P端口，然后父进程fork多个子进程（这就是nginx的基本玩法，由主进程【监听进程】fork出来的子进程充当工作进程）
这种情况下，父进程和fork出来的子进程共同引用同一个socket对象，当有新的连接过来，父子进程都在这个socket对象上accept，但只有一个进程可以accept成功，其他的会返回失败，这就是accept惊群

SO_REUSEADDR 和 SO_REUSEPORT

case1:
socket通信时服务端和客户端连接后，用Ctrl+c结束服务端程序再次运行时很可能出现bind: address already been used错误。如下

来源网络

原因分析：ip:port仍然被time_wait状态的socket占用。在服务端主动终止后，原本服务端用于与客户端连接的socket处于TIME_WAIT的状态，没有被关闭，于是再次运行程序去bind就会出现：bind：address already in use

解决办法：服务器端可以使用REUSEADDR套接字选项
To prevent re-using an address+port combination, that may still be considered open by some remote peer, the system will not immediately consider a socket as dead after sending the last ACK but instead put the socket into a state commonly referred to as TIME_WAIT. It can be in that state for minutes (system dependent setting).
the SO_REUSEADDR flag tells the kernel to reuse a local socket in TIME_WAIT state, without waiting for its natural timeout to expire.
The code that decides if the bind will succeed or fail only inspects the SO_REUSEADDR flag of the socket fed into the bind() call, for all other sockets inspected, this flag is not even looked at.

- What's more, if SO_REUSEADDR is enabled on a socket prior to binding it, the socket can be successfully bound unless there is a conflict with another socket bound to [exactly the same] combination of source address and port

Without SO_REUSEADDR, binding socketA to 0.0.0.0:21 and then binding socketB to 192.168.0.1:21 will fail (with error EADDRINUSE), since 0.0.0.0 means "any local IP address", thus all local IP addresses are considered in use by this socket and this includes 192.168.0.1, too.

With SO_REUSEADDR it will succeed, since 0.0.0.0 and 192.168.0.1 are not exactly the same address, one is a wildcard for all local addresses and the other one is a very specific local address.

Note that the statement above is true regardless in which order socketA and socketB are bound; without SO_REUSEADDR it will always fail, with SO_REUSEADDR it will always succeed.

To give a better overview, let's make a table here and list all possible combinations:

SO_REUSEADDR       socketA        socketB       Result
---------------------------------------------------------------------
  ON/OFF       192.168.0.1:21   192.168.0.1:21    Error (EADDRINUSE)
  ON/OFF       192.168.0.1:21      10.0.0.1:21    OK
  ON/OFF          10.0.0.1:21   192.168.0.1:21    OK
   OFF             0.0.0.0:21   192.168.1.0:21    Error (EADDRINUSE)
   OFF         192.168.1.0:21       0.0.0.0:21    Error (EADDRINUSE)
   ON              0.0.0.0:21   192.168.1.0:21    OK
   ON          192.168.1.0:21       0.0.0.0:21    OK
  ON/OFF           0.0.0.0:21       0.0.0.0:21    Error (EADDRINUSE)

case2:
如何让多个socket绑定同一个IP:port?
其实socket->bind->listen->accept中的【监听socket】和【负责具体连接的socket】已经bind在同一ip:port上了，但这是通过s.bind后在s.listen中clone实现的

那可以让多个socket通过调用bind方法实现绑定同一个IP:port吗? SO_REUSEPORT

Unlike in case of SO_REUSEADDR, the code handling SO_REUSEPORT will not only verify that the currently bound socket has SO_REUSEPORT set but it will also verify that the socket with a conflicting address and port had SO_REUSEPORT set when it was bound.
要允许多个socket在同一ip:port上绑定和监听，每个绑定和监听在该ip:port上的socket必须设置SO_REUSEPORT。不设置SO_REUSEPORT的socket都被认为需要独占ip:port

这就是所谓的端口复用，允许多个socket在同一ip:port上绑定

see
https://docs.python.org/3/library/socket.html
https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ

阻塞式IO和非阻塞式IO

阻塞式IO指进程/线程进行IO时，处于阻塞态，直到IO完成才继续往下运行
非阻塞式IO指进程/线程进行IO时，IO函数会立刻返回一个结果而不管IO是否完成，使得进程/进程继续往下运行
以老王买票为例——买到票为IO完成
阻塞式IO：到火车站发现没票，不吃饭不睡觉一直等了7天7夜等到别人退票
非阻塞式IO：到火车站发现没票，第二天再来问，没有，第三天再来，直到有票

I/O多路复用

先上定义：一个线程并发交替地顺序完成多个socket的I/O操作，就叫I/O多路复用。必须明确的是，“复用”指复用同一个线程

历史背景：
如果每个socket都单独由一个线程处理，那么处理socket的线程内的

int iresult = recv(s, buffer, 1024)

这个语句会等待对端的数据发送过来，要是对端没有发送数据，这个语句就会阻塞在这里，直到有数据可读。因此，阻塞式IO可能导致大量的线程都等待数据而阻塞，白白消耗资源
当然，我们也可以使用非阻塞IO，即读不到数据时返回一个错误标记，然后过段时间再来查有没有数据读。读不到-下次读这段时间内，线程也没事干，同样处于阻塞态

改进：
从上面可以看到，反正只要没读到数据，处理socket的线程都会被阻塞。那我们可以把多个socket都交给一个线程处理，这样即使这多个socket全部没数据读，也只阻塞一个线程而已。也就是I/O多路复用
进程或线程调用select()/poll()/epoll()I/O多路复用
select()/poll()/epoll()是3个系统调用function，进程或者线程可以通过调用它们实现I/O多路复用。调用它们之后进程或线程会从用户空间进入内核空间，直到它们返回
- select()机制：基于轮询+数组。线程将要监控的系列socket的描述字(fd, file descriptor)加入数组，然后调用select()后线程会阻塞并等待select()这个系统调用返回。当数据到达时，fd状态改变，对应socket被激活，select函数返回(注意这里返回的是全部fd，具体哪个socket可读还要线程遍历一次才知道)。线程发起read请求，读取数据并继续执行
  
  Linux-io多路复用之select（图片源于网络）
  
  需要注意的是，线程向内核读数据时，必须使用非阻塞IO进行读取，也就是如果读不到数据的话这个线程不可以阻塞在那里。因为select返回有socket可读，但未必能读到数据——Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. 【select()会返回可读，但可能在读的时候造成阻塞】This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. 【比如数据来了但校验和经计算不对又被丢弃，变成无数据可读】There may be other circumstances in which a file descriptor is spuriously reported as ready. 【还有一些其他情况会造成通知可读但读时无数据而阻塞的情况】Thus it may be safer to use O_NONBLOCK on sockets that should not block.【因此必须配合使用非阻塞IO，即read不到数据时线程不阻塞，而是让read立刻返回一个错误，如EWOULDBLOCK】
- poll()机制：原理与select()一致，但基于轮询+链表。因此，select()一次可监听的socket受到数组size的约束，而poll()则没有上限
- epoll()机制：由于select()和poll()在返回时不能明确哪个socket可读，要遍历查询，而epoll()则进行了改进，为每个fd（file descriptor）注册回调，I/O准备好时，会执行回调，效率比select和poll高很多

暂时记录，如日后发现有理解不当之处再行修正

Linux 端口复用和I/O多路复用 2020-03-23（未经允许，禁止转载）