我随便摘录了几个自己觉得比较好的~
On modern Windows stacks, yes, it is, within limits.
It is safe, for instance, to have one thread calling send()
and another thread calling recv()
on a single socket.
By contrast, it’s a bad idea for two threads to both be calling send()
on a single socket. This is “thread-safe” in the limited sense that your program shouldn’t crash, and you certainly shouldn’t be able to crash the kernel, which is handling these send()
calls. The fact that it is “safe” doesn’t answer key questions about the actual effect of doing this. Which call’s data goes out first on the connection? Does it get interleaved somehow? Don’t do this.
Instead of multiple threads accessing a single socket, you may want to consider setting up a pair of network I/O queues. Then, give one thread sole ownership of the socket: this thread sends data from one I/O queue and enqueues received data on the other. Then other threads can access the queues (with suitable synchronization).
Applications that use some kind of non-synchronous socket typically have some I/O queue already. Of particular interest in this case is overlapped I/O or I/O completion ports, because these I/O strategies are also thread-friendly. You can tell Winsock about several OVERLAPPED blocks, and Winsock will finish sending one before it moves on to the next. This means you can keep a chain of these OVERLAPPED blocks, each perhaps added to the chain by a different thread. Each thread can also call WSASend()
on the block they added, making your main loop simpler.
The Nagle algorithm is an optimization to TCP that makes the stack wait until all data is acknowledged on the connection before it sends more data. The exception is that Nagle will not cause the stack to wait for an ACK if it has enough enqueued data that it can fill a network frame. (Without this exception, the Nagle algorithm would effectively disable TCP’s sliding window algorithm.) For a full description of the Nagle algorithm, see RFC 896.
So, you ask, what’s the purpose of the Nagle algorithm?
The ideal case in networking is that each program always sends a full frame of data with each call to send()
. That maximizes the percentage of useful program data in a packet.
The basic TCP and IPv4 headers are 20 bytes each. The worst case protocol overhead percentage, therefore, is 40/41, or 98%. Since the maximum amount of data in an Ethernet frame is 1500 bytes, the best case protocol overhead percentage is 40/1500, less than 3%.
While the Nagle algorithm is causing the stack to wait for data to be ACKed by the remote peer, the local program can make more calls to send()
. Because TCP is a stream protocol, it can coalesce the data in those send()
calls into a single TCP packet, increasing the percentage of useful data.
Imagine a simple Telnet program: the bulk of a Telnet conversation consists of sending one character, and receiving an echo of that character back from the remote host. Without the Nagle algorithm, this results in TCP’s worst case: one byte of user data wrapped in dozens of bytes of protocol overhead. With the Nagle algorithm enabled, the TCP stack won’t send that one Telnet character out until the previous characters have all been acknowledged. By then, the user may well have typed another character or two, reducing the relative protocol overhead.
This simple optimization interacts with other features of the TCP protocol suite, too:
Most stacks implement the delayed ACK algorithm: this causes the remote stack to delay ACKs under certain circumstances, which allows the local stack a bit of time to “Nagle” some more bytes into a single packet.
The Nagle algorithm tends to improve the percentage of useful data in packets more on slow networks than on fast networks, because ACKs take longer to come back.
TCP allows an ACK packet to also contain data. If the local stack decides it needs to send out an ACK packet and the Nagle algorithm has caused data to build up in the output buffer, the enqueued data will go out along with the ACK packet.
The Nagle algorithm is on by default in Winsock, but it can be turned off on a per-socket basis with the TCP_NODELAY
option of setsockopt()
. This option should not be turned off except in a very few situations.
Beware of depending on the Nagle algorithm too heavily. send()
is a kernel function, so every call to send()
takes much more time than for a regular function call. Your application should coalesce its own data as much as is practical to minimize the number of calls to send()
.
Almost never.
Inexperienced Winsockers usually try disabling the Nagle algorithm when they are trying to impose some kind of packet scheme on a TCP data stream. That is, they want to be able to send, say, two packets, one 40 bytes and the other 60, and have the receiver get a 40-byte packet followed by a separate 60-byte packet. (With the Nagle algorithm enabled, TCP will often coalesce these two packets into a single 100 byte packet.) Unfortunately, this is futile, for the following reasons:
Even if the sender manages to send its packets individually, the receiving TCP/IP stack may still coalesce the received packets into a single packet. This can happen any time the sender can send data faster than the receiver can deal with it.
Winsock Layered Service Providers (LSPs) may coalesce or fragment stream data, especially LSPs that modify the data as it passes.
Turning off the Nagle algorithm in a client program will not affect the way that the server sends packets, and vice versa.
Routers and other intermediaries on the network can fragment packets, and there is no guarantee of “proper” reassembly with stream protocols.
If a packet arrives that is larger than the available space in the stack’s buffers, it may fragment a packet, queuing up as many bytes as it has buffer space for and discarding the rest. (The remote peer will resend the remaining data later.)
Winsock is not required to give you all the data it has queued on a socket even if your recv()
call gave Winsock enough buffer space. It may require several calls to get all the data queued on a socket.
Aside from these problems, disabling the Nagle algorithm almost always causes a program’s throughput to degrade. The only time you should disable the algorithm is when some other consideration, such as packet timing, is more important than throughput.
Often, programs that deal with real-time user input will disable the Nagle algorithm to achieve the snappiest possible response, at the expense of network bandwidth. Two examples are X Window servers and multiplayer network games. In these cases, it is more important that there be as little delay between packets as possible than it is to conserve network bandwidth.
For more on this topic, see the Lame List and the FAQ article How to Use TCP Effectively.
When a connection request comes into a network stack, it first checks to see if any program is listening on the requested port. If so, the stack replies to the remote peer, completing the connection. The stack stores the connection information in a queue called the connection backlog. (When there are connections in the backlog, the accept()
call simply causes the stack to remove the oldest connection from the connection backlog and return a socket for it.)
The purpose of the listen()
call is to set the size of the connection backlog for a particular socket. When the backlog fills up, the stack begins rejecting connection attempts.
Rejecting connections is a good thing if your program is written to accept new connections as fast as it reasonably can. If the backlog fills up despite your program’s best efforts, it means your server has hit its load limit. If the stack were to accept more connections, your program wouldn’t be able to handle them as well as it should, so the client will think your server is hanging. At least if the connection is rejected, the client will know the server is too busy and will try again later.
The proper value for backlog
depends on how many connections you expect to see in the time between accept()
calls. Let’s say you expect an average of 1000 connections per second, with a burst value of 3000 connections per second. [Ed. I picked these values because they’re easy to manipulate, not because they’re representative of the real world!] To handle the burst load with a short connection backlog, your server’s time between accept()
calls must be under 0.3 milliseconds. Let’s say you’ve measured your time-to-accept under load, and it’s 0.8 milliseconds: fast enough to handle the normal load, but too slow to handle your burst value. In this case, you could make backlog
relatively large to let the stack queue up connections under burst conditions. Assuming that these bursts are short, your program will quickly catch up and clear out the connection backlog.
The traditional value for listen()
’s backlog
parameter is 5. On some stacks, that is also the maximum value: this includes the worstation class Windows NT derivatives and on Windows 95 derivatives. On the server-class Windows NT derivatives, the maximum connection backlog size is 200, unless the dynamic backlog feature is enabled. (More info on dynamic backlogs below.) The stack will use its maximum backlog value if you pass in a larger value. There is no standard way to find out what backlog value the stack chose to use.
If your program is quick about calling accept()
, low backlog limits are not normally a problem. However, it does mean that concerted attempts to make lots of connections in a short period of time can fill the backlog queue. This makes non-Server flavors of Windows a bad choice for a high-load server: either a legitimate load or a SYN flood attack can overload a server on such a platform. (See below for more on SYN attacks.)
There is a special constant you can use for the backlog size, SOMAXCONN
. This tells the underlying service provider to set the backlog queue to the largest possible size. This is defined as 5 in winsock.h, and 0x7FFFFFFF in winsock2.h. The Winsock.h definition limits its value somewhat.
There are even better reasons not to use SOMAXCONN
: that large backlogs make SYN flood attacks much more, shall we say, effective. When Winsock creates the backlog queue, it starts small and grows as required. Since the backlog queue is in non-pageable system memory, a SYN flood can cause the queue to eat a lot of this precious memory resource.
After the first SYN flood attacks in 1996, Microsoft added a feature to Windows NT called "dynamic backlog". (The feature is in service pack 3 and higher.) This feature is normally off for backwards compatibility, but when you turn it on, the stack can increase or decrease the size of the connection backlog in response to network conditions. (It can even increase the backlog beyond the "normal" maximum of 200, in order to soak up malicious SYNs.) The Microsoft Knowledge Base article that describes the feature also has some good practical discussion about connection backlogs.
You will note that SYN attacks are dangerous for systems with both short and very long backlog queues. The point is that a middle ground is the best course if you expect your server to withstand SYN attacks. Either use Microsoft’s dynamic backlog feature, or pick a value somewhere in the 20-200 range and tune it as required.
A program can rely too much on the backlog feature. Consider a single-threaded blocking server: the design means it can only handle one connection at a time. However, it can set up a large backlog, making the stack accept and hold connections until the program gets around to handling the next one. (See this example to see the technique at work.) You should not take advantage of the feature this way unless your connection rate is very low and the connection times are very short. (Pedagogues excepted.)
There are two 64-socket limitations:
The Win32 event mechanism (e.g. WaitForMultipleObjects()
) can only wait on 64 event objects at a time. Winsock 2 provides the WSAEventSelect()
function which lets you use Win32’s event mechanism to wait for events on sockets. Because it uses Win32’s event mechanism, you can only wait for events on 64 sockets at a time. If you want to wait on more than 64 Winsock event objects at a time, you need to use multiple threads, each waiting on no more than 64 of the sockets.
The select()
function is also limited in certain situations to waiting on 64 sockets at a time. The FD_SETSIZE constant defined in winsock.h determines the size of the fd_set structures you pass to select()
. It’s defined by default to 64. You can define this constant to a higher value before you #include winsock.h, and this will override the default value. Unfortunately, at least one non-Microsoft Winsock stack and some Layered Service Providers assume the default of 64; they will ignore sockets beyond the 64th in larger fd_sets.
You can write a test program to try this on the systems you plan on supporting, to see if they are not limited. If they are, you can get around this with threads, just as you would with event objects.
All of the I/O strategies discussed in the I/O strategies article have some way of indicating that the connection is closed.
First, keep in mind that TCP is a full-duplex network protocol. That means that you can close the connection half-way and still send data on the other half. An example of this is the old HTTP 1.0 web protocol. The browser sends a short request to the web server, then closes its half of the connection. The web server then sends back the requested data on the other half of the connection, and closes its sending side, which terminates the TCP session. (HTTP 1.1 is more complex than this, but at the end, the same basic thing happens.)
Normal TCP programs only close the sending half, which the remote peer perceives as the receiving half. So, what you normally want to detect is whether the remote peer closed its sending half, meaning you won’t be receiving data from them any more.
With asynchronous sockets, Winsock sends you an FD_CLOSE
message when the connection drops. Event objects are similar: the system signals the event object with an FD_CLOSE
notification.
With blocking and non-blocking sockets, you probably have a loop that calls recv()
on that socket. recv()
returns 0 when the remote peer closes the connection. As you would expect, if you are using select()
, the SOCKET descriptor in the read_fds
parameter gets set when the connection drops. As normal, you’ll call recv()
and see the 0 return value.
As you might have guessed from the discussion above, it is also possible to close the receiving half of the connection. If the remote peer then tries to send you data, the stack will drop that data on the floor and send a TCP RST to the remote peer.
See below for information on handling abnormal disconnects.