《Linux/UNIX系统编程手册》 英文版读书笔记 Alternative I/O Models63.2

63.2 I/O Multiplexing

63.2.4 Comparison of select() and poll()

Within the Linux kernel, select() and poll() both employ the same set of kernel internal poll routines.

The implementation of the poll() system call involves calling the kernel poll routine for each file descriptor and placing the resulting information in the corresponding revents field.

To implement select(), a set of macros is used to convert the information returned by the kernel poll routines into the corresponding event types returned by select():

#define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR)
#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
#define POLLEX_SET (POLLPRI)

The only additional information we need to complete the picture is that poll() returns POLLNVAL in the revents field if one of the monitored file descriptors was closed at the time of the call, while select() returns –1 with errno set to EBADF.

API differences

  • The use of the fd_set data type places an upper limit (FD_SETSIZE) on the range of file descriptors that can be monitored by select(). By default, this limit is 1024 on Linux, and changing it requires recompiling the application. By contrast, poll() places no intrinsic limit on the range of file descriptors that can be monitored.
  • Because the fd_set arguments of select() are value-result, we must reinitialize them if making repeated select() calls from within a loop. By using separate events (input) and revents (output) fields, poll() avoids this requirement.
  • If one of the file descriptors being monitored was closed, then poll() informs us exactly which one, via the POLLNVAL bit in the corresponding revents field. By contrast, select() merely returns –1 with errno set to EBADF, leaving us to determine which file descriptor is closed by checking for an error when performing an I/O system call on the descriptor.

Performance
The performance of poll() and select() is similar if either of the following is true:
- The range of file descriptors to be monitored is small
- A large number of file descriptors are being monitored, but they are densely packed (i.e., most or all of the file descriptors from 0 up to some limit are being monitored).

However, the performance of select() and poll() can differ noticeably if the set of file descriptors to be monitored is sparse. In this case, poll() can perform better than select().

63.2.5 Problems with select() and poll()

  • On each call to select() or poll(), the kernel must check all of the specified file descriptors to see if they are ready.
  • In each call to select() or poll(), the program must pass a data structure to the kernel describing all of the file descriptors to be monitored, and, after checking the descriptors, the kernel returns a modified version of this data structure to the program. (Furthermore, for select(), we must initialize the data structure before each call.) For poll(), the size of the data structure increases with the number of file descriptors being monitored, and the task of copying it from user to kernel space and back again consumes a noticeable amount of CPU time when monitoring many file descriptors.
  • After the call to select() or poll(), the program must inspect every element of the returned data structure to see which file descriptors are ready.

The poor scaling performance of select() and poll() stems from a simple limitation of these APIs: typically, a program makes repeated calls to monitor the same set of file descriptors; however, the kernel doesn’t remember the list of file descriptors to be monitored between successive calls.

63.3 Signal-Driven I/O

With signal-driven I/O, a process requests that the kernel send it a signal when I/O is possible on a file descriptor.
To use signal-driven I/O, a program performs the following steps:

  1. Establish a handler for the signal delivered by the signal-driven I/O mechanism.By default, this notification signal is SIGIO.
fcntl(fd, F_SETOWN,pid);
  1. Enable nonblocking I/O by setting the O_NONBLOCK open file status flag.
  2. Enable signal-driven I/O by turning on the **O_ASYNC **open file status flag.
flags = fcntl(fd, F_GETFL);
fcntl(fd,F_SETFL,flags|O_ASYNC|O_NONBLOCK);
  1. The calling process can now perform other tasks.
  2. Signal-driven I/O provides edge-triggered notification (Section 63.1.1). This means that once the process has been notified that I/O is possible, it should
    perform as much I/O (e.g., read as many bytes) as possible. Assuming a nonblocking file descriptor, this means executing a loop that performs I/O system
    calls until a call fails with the error EAGAIN or EWOULDBLOCK.

On Linux 2.4 and earlier, signal-driven I/O can be employed with file descriptors for sockets, terminals, pseudoterminals, and certain other types of devices. Linux 2.6 additionally allows signal-driven I/O to be employed with pipes and FIFOs. Since Linux 2.6.25, signal-driven I/O can also be used with inotify file descriptors.

Using POSIX AIO, a process requests the kernel to perform an I/O operation, and the kernel initiates the operation, but immediately passes control back to the calling process; the process is then later notified when the I/O operation completes or an error occurs.

default action of SIGIO is to terminate the process, we should enable the handler for SIGIO before enabling signal-driven I/O on a file descriptor.

If pid is positive, it is interpreted as a process ID. If pid is negative, its absolute value specifies a process group ID.

if a file descriptor is owned by a process group ID less
than 4096, then, instead of returning that ID as a negative function result from the fcntl() F_GETOWN operation, glibc misinterprets it as a system call error. Consequently, the fcntl() wrapper function returns –1, and errno contains the (positive) process group ID.

This is a consequence of the fact that the kernel system call interface indicates errors by returning a negative errno value as a function result, glibc interprets negative system call returns in the range –1 to –4095 as indicating an error, copies this (absolute) value into errno, and returns –1 as the function result for the application program

This limitation means that an application that uses process groups to receive “I/O possible” signals (which is unusual) can’t reliably use F_GETOWN to discover which process group owns a file descriptor.

Since glibc version 2.11, the fcntl() wrapper function fixes the problem of F_GETOWN with process group IDs less than 4096. It does this by implementing F_GETOWN in
user space using ** the F_GETOWN_EX **operation (Section 63.3.2), which is provided by Linux 2.6.32 and later.

63.3.1 When Is “I/O Possible” Signaled?

Terminals and pseudoterminals
For terminals and pseudoterminals, a signal is generated whenever new input becomes available, even if previous input has not yet been read. “Input possible” is also signaled if an** end-of-file** condition occurs on a terminal (but not on a pseudoterminal).
There is no “output possible” signaling for terminals. A terminal disconnect is also not signaled.
Starting with kernel 2.4.19, Linux provides “output possible” signaling for the slave side of a pseudoterminal. This signal is generated whenever input is consumed on the master side of the pseudoterminal.

Pipes and FIFOs
For the read end of a pipe or FIFO, a signal is generated in these circumstances:
- Data is written to the pipe (even if there was already unread input available).
- The write end of the pipe is closed.
For the write end of a pipe or FIFO, a signal is generated in these circumstances:
- A read from the pipe increases the amount of free space in the pipe so that it is now possible to write PIPE_BUF bytes without blocking.
- The read end of the pipe is closed.

Sockets
Signal-driven I/O works for datagram sockets in both the UNIX and the Internet domains. A signal is generated in the following circumstances:
- An input datagram arrives on the socket (even if there were already unread datagrams waiting to be read).
- An asynchronous error occurs on the socket.

Signal-driven I/O works for stream sockets in both the UNIX and the Internet domains. A signal is generated in the following circumstances:
- A new connection is received on a listening socket.
- A TCP connect() request completes; that is, the active end of a TCP connection entered the ESTABLISHED state, as shown in Figure 61-5 . The analogous condition is not signaled for UNIX domain sockets.
- New input is received on the socket (even if there was already unread input available).
- The peer closes its writing half of the connection using shutdown(), or closes its socket altogether using close().
- Output is possible on the socket (e.g., space has become available in the socket send buffer).
- An asynchronous error occurs on the socket.
inotify file descriptors

63.3.2 Refining the Use of Signal-Driven I/O

To take full advantage of signal-driven I/O, we must perform two steps:
- Employ a Linux-specific fcntl() operation, F_SETSIG, to specify a realtime signal that should be delivered instead of SIGIO when I/O is possible on a file descriptor.
- Specify the SA_SIGINFO flag when using sigaction() to establish the handler for the realtime signal employed in the previous step (see Section 21.4).

if (fcntl(fd,F_SETSIG,sig)==-1)
 errExit("fcntl");

The F_GETSIG operation performs the converse of F_SETSIG, retrieving the signal currently set for a file descriptor:

sig = fcntl(fd, F_GETSIG);
if (sig == -1)
 errExit("fcntl");

(In order to obtain the definitions of the F_SETSIG and F_GETSIG constants from
fcntl.h, we must define the _GNU_SOURCE feature test macro.)

  • The default “I/O possible” signal, SIGIO, is one of the standard, nonqueuing signals. If multiple I/O events are signaled while SIGIO is blocked—perhaps because the SIGIO handler is already invoked—all notifications except the first will be lost. If we use F_SETSIG to specify a realtime signal as the “I/O possible” signal, multiple notifications can be queued.
  • If the handler for the signal is established using a sigaction() call in which the SA_SIGINFO flag is specified in the sa.sa_flags field, then a siginfo_t structure is passed as the second argument to the signal handler (Section 21.4). This structure contains fields identifying the file descriptor on which the event occurred,as well as the type of event.

For an “I/O possible” event, the fields of interest in the siginfo_t structure
passed to the signal handler are as follows:
- si_signo: the number of the signal that caused the invocation of the handler. This value is the same as the first argument to the signal handler.
- si_fd: the file descriptor for which the I/O event occurred.
- si_code: a code indicating the type of event that occurred. The values that can appear in this field, along with their general descriptions, are shown in Table 63-7.
- si_band: a bit mask containing the same bits as are returned in the revents field by the poll() system call. The value set in si_code has a one-to-one correspondence with the bit-mask setting in si_band, as shown in Table 63-7.

《Linux/UNIX系统编程手册》 英文版读书笔记 Alternative I/O Models63.2_第1张图片

In an application that is purely input-driven, we can further refine the use of F_SETSIG. Instead of monitoring I/O events via a signal handler, we can block the nominated**“I/O possible” signal**, and then accept the queued signals via calls to sigwaitinfo() or sigtimedwait() (Section 22.10). These system calls return a structure that contains the same information as is passed to a signal handler established with SA_SIGINFO.

Handling signal-queue overflow
A properly designed application using F_SETSIG to establish a realtime signal as the “I/O possible” notification mechanism must also establish a handler for SIGIO. If SIGIO is delivered, then the application can drain the queue of realtime signals using sigwaitinfo() and temporarily revert to the use of select() or poll() to obtain a complete list of file descriptors with outstanding I/O events.

Using signal-driven I/O with multithreaded applications

The F_SETOWN_EX operation is like F_SETOWN, but as well as allowing the target to be specified as a process or process group, it also permits a thread to be specified as the target for “I/O possible” signals。

struct fd_owner_ex{
int type;
pid_t pid;
}

F_OWNER_PGRP
The pid field specifies the ID of a process group that is to be the target of“I/O possible” signals. Unlike with F_SETOWN, a process group ID is specified as a positive value.
F_OWNER_PID
The pid field specifies the ID of a process that is to be the target of “I/O possible” signals.
F_OWNER_TID
The pid field specifies the ID of a thread that is to be the target of “I/O possible” signals. The ID specified in pid is a value returned by clone() or gettid().

Because the F_SETOWN_EX and F_GETOWN_EX operations represent process group IDs as positive values, F_GETOWN_EX doesn’t suffer the problem described earlier for F_GETOWN when using process group IDs less than 4096.

你可能感兴趣的:(Linux,&,Unix,操作系统)