Avoiding races with Unix signals and select() |
Some thinking out loud about signalsThe problem with signals: you never know when they're delivered. They arecompletely asynchronous, so delivery may happen before the next system call ismade as well as while inside the system call. This means that a setup where you test for a signal flag that'sincremented by a handler, just before going into a blocking call such asselect(), has a nasty race: the handler may have been called and the flagincremented just after the test, but before the select. If that happens, select() won't be interrupted, because the signal has alreadybeen handled as far as the kernel is concerned. And for sure, doing a testcloser to the select doesn't solve the race condition. Using blocking masks is no solution either: although signal delivery isdeferred till the unmasking call, it can still happen after the unmasking andhandling, but before the select(). As far as I can see, there are three obvious solutions to this problem:
The other option is the last one, but as it limits control and requiresthe work associated with signals to happen in an isolated function (the signalhandler), it's isn't very attractive either. RequirementsIt seems that what we want from a real solution is this:
As said, using a pipe as the queue is attractive as it also solvesthe problem of interrupting select even though the signal is already delivered when we enter it, by including the reading end in the select.However, there's still the risk of overflowing and missing signals. Still, the pipe must always have something in it as long as the eventhasn't been completely handled and cleared yet, or select will block. SolutionWhat if we do it like this: the signal handler empties the pipe first and thenwrites a byte to it. To all parties concerned, this appears to happenatomically; a signal handler won't interrupt itself (well, we can prevent it onsystems that have sigaction()), and surely the main program can never interruptthe signal handler. That means that the firstmost byte in the pipe is the actual 'signalled'flag. This can never not be set when the handler was actually called. As long as this flag is set, select() will always exit immediately ifyou include the reading end of the pipe. And, testing and resetting'atomically' (as far as the handler is concerned) can be done by readingthe single byte from it in non-blocking mode. If the read returns no data, there's nothing to be done. If read returnssomething else, we just tested and cleared the flag (which is theexistence of the data byte in the pipe) atomically. The atomicity comes from the fact that if a reading end of a pipe hastwo readers (effectively, the signal handler that empties it and puts itback, and the main program), a byte present in it has to go either toone or the other, but it can never disappear, or go to both. Ok! Now we want to refine the technique a bit, as we want to know whichsignals are in the set. The idea is this: Instead of a byte, we use a 32-bit value or even a real sigset_t that holds the'pending since last test' set in the pipe. The signal handler, when called, 'empties' the pipe by reading the word(using zero if the pipe didn't have any data), OR'ing the received signal intoit and writing it back. The main program needs to ensure that the read in the signal handlerdoesn't interfere with the read that it uses to test-and-clear the set.This used to be easy because of the single byte, but if two processeseach attempt to read four bytes from a pipe, you can't guarantee thatone doesn't gets a short read of two, causing the other two bytes to goto the other. But if the main process temporarily blocks signal delivery using sigprogmaskaround the nonblocking read, we're done. This way, unhandled signals will cause select() to exit immediately, while themain program can do an atomic test-and-clear to handle them at any desiredplace, without any races! |