Within the Linux kernel, select() and poll() both employ the same set of kernel internal poll routines.
The implementation of the poll() system call involves calling the kernel poll routine for each file descriptor and placing the resulting information in the corresponding revents field.
To implement select(), a set of macros is used to convert the information returned by the kernel poll routines into the corresponding event types returned by select():
#define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR)
#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
#define POLLEX_SET (POLLPRI)
The only additional information we need to complete the picture is that poll() returns POLLNVAL in the revents field if one of the monitored file descriptors was closed at the time of the call, while select() returns –1 with errno set to EBADF.
API differences
Performance
The performance of poll() and select() is similar if either of the following is true:
- The range of file descriptors to be monitored is small
- A large number of file descriptors are being monitored, but they are densely packed (i.e., most or all of the file descriptors from 0 up to some limit are being monitored).
However, the performance of select() and poll() can differ noticeably if the set of file descriptors to be monitored is sparse. In this case, poll() can perform better than select().
The poor scaling performance of select() and poll() stems from a simple limitation of these APIs: typically, a program makes repeated calls to monitor the same set of file descriptors; however, the kernel doesn’t remember the list of file descriptors to be monitored between successive calls.
With signal-driven I/O, a process requests that the kernel send it a signal when I/O is possible on a file descriptor.
To use signal-driven I/O, a program performs the following steps:
fcntl(fd, F_SETOWN,pid);
flags = fcntl(fd, F_GETFL);
fcntl(fd,F_SETFL,flags|O_ASYNC|O_NONBLOCK);
On Linux 2.4 and earlier, signal-driven I/O can be employed with file descriptors for sockets, terminals, pseudoterminals, and certain other types of devices. Linux 2.6 additionally allows signal-driven I/O to be employed with pipes and FIFOs. Since Linux 2.6.25, signal-driven I/O can also be used with inotify file descriptors.
Using POSIX AIO, a process requests the kernel to perform an I/O operation, and the kernel initiates the operation, but immediately passes control back to the calling process; the process is then later notified when the I/O operation completes or an error occurs.
default action of SIGIO is to terminate the process, we should enable the handler for SIGIO before enabling signal-driven I/O on a file descriptor.
If pid is positive, it is interpreted as a process ID. If pid is negative, its absolute value specifies a process group ID.
if a file descriptor is owned by a process group ID less
than 4096, then, instead of returning that ID as a negative function result from the fcntl() F_GETOWN operation, glibc misinterprets it as a system call error. Consequently, the fcntl() wrapper function returns –1, and errno contains the (positive) process group ID.
This is a consequence of the fact that the kernel system call interface indicates errors by returning a negative errno value as a function result, glibc interprets negative system call returns in the range –1 to –4095 as indicating an error, copies this (absolute) value into errno, and returns –1 as the function result for the application program
This limitation means that an application that uses process groups to receive “I/O possible” signals (which is unusual) can’t reliably use F_GETOWN to discover which process group owns a file descriptor.
Since glibc version 2.11, the fcntl() wrapper function fixes the problem of F_GETOWN with process group IDs less than 4096. It does this by implementing F_GETOWN in
user space using ** the F_GETOWN_EX **operation (Section 63.3.2), which is provided by Linux 2.6.32 and later.
Terminals and pseudoterminals
For terminals and pseudoterminals, a signal is generated whenever new input becomes available, even if previous input has not yet been read. “Input possible” is also signaled if an** end-of-file** condition occurs on a terminal (but not on a pseudoterminal).
There is no “output possible” signaling for terminals. A terminal disconnect is also not signaled.
Starting with kernel 2.4.19, Linux provides “output possible” signaling for the slave side of a pseudoterminal. This signal is generated whenever input is consumed on the master side of the pseudoterminal.
Pipes and FIFOs
For the read end of a pipe or FIFO, a signal is generated in these circumstances:
- Data is written to the pipe (even if there was already unread input available).
- The write end of the pipe is closed.
For the write end of a pipe or FIFO, a signal is generated in these circumstances:
- A read from the pipe increases the amount of free space in the pipe so that it is now possible to write PIPE_BUF bytes without blocking.
- The read end of the pipe is closed.
Sockets
Signal-driven I/O works for datagram sockets in both the UNIX and the Internet domains. A signal is generated in the following circumstances:
- An input datagram arrives on the socket (even if there were already unread datagrams waiting to be read).
- An asynchronous error occurs on the socket.
Signal-driven I/O works for stream sockets in both the UNIX and the Internet domains. A signal is generated in the following circumstances:
- A new connection is received on a listening socket.
- A TCP connect() request completes; that is, the active end of a TCP connection entered the ESTABLISHED state, as shown in Figure 61-5 . The analogous condition is not signaled for UNIX domain sockets.
- New input is received on the socket (even if there was already unread input available).
- The peer closes its writing half of the connection using shutdown(), or closes its socket altogether using close().
- Output is possible on the socket (e.g., space has become available in the socket send buffer).
- An asynchronous error occurs on the socket.
inotify file descriptors
To take full advantage of signal-driven I/O, we must perform two steps:
- Employ a Linux-specific fcntl() operation, F_SETSIG, to specify a realtime signal that should be delivered instead of SIGIO when I/O is possible on a file descriptor.
- Specify the SA_SIGINFO flag when using sigaction() to establish the handler for the realtime signal employed in the previous step (see Section 21.4).
if (fcntl(fd,F_SETSIG,sig)==-1)
errExit("fcntl");
The F_GETSIG operation performs the converse of F_SETSIG, retrieving the signal currently set for a file descriptor:
sig = fcntl(fd, F_GETSIG);
if (sig == -1)
errExit("fcntl");
(In order to obtain the definitions of the F_SETSIG and F_GETSIG constants from
fcntl.h, we must define the _GNU_SOURCE feature test macro.)
For an “I/O possible” event, the fields of interest in the siginfo_t structure
passed to the signal handler are as follows:
- si_signo: the number of the signal that caused the invocation of the handler. This value is the same as the first argument to the signal handler.
- si_fd: the file descriptor for which the I/O event occurred.
- si_code: a code indicating the type of event that occurred. The values that can appear in this field, along with their general descriptions, are shown in Table 63-7.
- si_band: a bit mask containing the same bits as are returned in the revents field by the poll() system call. The value set in si_code has a one-to-one correspondence with the bit-mask setting in si_band, as shown in Table 63-7.
In an application that is purely input-driven, we can further refine the use of F_SETSIG. Instead of monitoring I/O events via a signal handler, we can block the nominated**“I/O possible” signal**, and then accept the queued signals via calls to sigwaitinfo() or sigtimedwait() (Section 22.10). These system calls return a structure that contains the same information as is passed to a signal handler established with SA_SIGINFO.
Handling signal-queue overflow
A properly designed application using F_SETSIG to establish a realtime signal as the “I/O possible” notification mechanism must also establish a handler for SIGIO. If SIGIO is delivered, then the application can drain the queue of realtime signals using sigwaitinfo() and temporarily revert to the use of select() or poll() to obtain a complete list of file descriptors with outstanding I/O events.
Using signal-driven I/O with multithreaded applications
The F_SETOWN_EX operation is like F_SETOWN, but as well as allowing the target to be specified as a process or process group, it also permits a thread to be specified as the target for “I/O possible” signals。
struct fd_owner_ex{
int type;
pid_t pid;
}
F_OWNER_PGRP
The pid field specifies the ID of a process group that is to be the target of“I/O possible” signals. Unlike with F_SETOWN, a process group ID is specified as a positive value.
F_OWNER_PID
The pid field specifies the ID of a process that is to be the target of “I/O possible” signals.
F_OWNER_TID
The pid field specifies the ID of a thread that is to be the target of “I/O possible” signals. The ID specified in pid is a value returned by clone() or gettid().
Because the F_SETOWN_EX and F_GETOWN_EX operations represent process group IDs as positive values, F_GETOWN_EX doesn’t suffer the problem described earlier for F_GETOWN when using process group IDs less than 4096.