The lack of condition variables in Win32 makes it harder to implement certain concurrency abstractions, such as thread-safe message queues and thread pools. This article explores various techniques and patterns for implementing POSIX condition variables correctly and/or fairly on Win32. Section 2 explains what condition variables are and shows how to use them. Secion 3 explains alternative strategies for implementing POSIX condition variables using Win32 synchronization primitives. A subsequent article will describe how the Wrapper Facade pattern and various C++ language features can help reduce common mistakes that occur when programming condition variables.
In general, condition variables are more appropriate than mutexes for situations involving complex condition expressions or scheduling behaviors. For instance, condition variables are often used to implement thread-safe message queues [Schmidt], which provide a ``producer/consumer'' communication mechanism for passing messages between multiple threads. In this use-case, condition variables block producer threads when a message queue is ``full'' and block consumer threads when the queue is ``empty.''
pthread_cond_wait (pthread_cond_t *cv, pthread_mutex_t *external_mutex)
-- This function waits for conditioncv
to be notified. When called, it atomically (1) releases the associated external_mutex
(which the caller must hold while evaluating the condition expression) and (2) goes to sleep awaiting a subsequent notification from another thread (via thepthread_cond_signal
orpthread_cond_broadcast
operations described next). Theexternal_mutex
will be locked whenpthread_cond_wait
returns.
pthread_cond_signal (pthread_cond_t *cv)
-- This function notifies one thread waiting on condition variablecv
.
pthread_cond_broadcast (pthread_cond_t *cv)
-- This function notifies all threads that are current waiting on condition variable cv
.
pthread_cond_timedwait (pthread_cond_t *cv, pthread_mutex_t *external_mutex, const struct timespec *abstime)
-- This function is a time-based variant of pthread_cond_wait
. It waits up to abstime
amount of time for cv
to be notified. If abstime
elapses beforecv
is notified, the function returns back to the caller with an ETIME
result, signifying that a timeout has occurred. Even in the case of timeouts, theexternal_mutex
will be locked whenpthread_cond_timedwait
returns.
pthread_cond_init
and
pthread_cond_destroy
, which initialize and destroy instances of POSIX condition variables, respectively.
Counting semaphores are a synchronization mechanism commonly used to serialize or coordinate multiple threads of control. Conceptually, counting semaphores are non-negative integers that can be incremented and decremented atomically. If a thread tries to decrement a semaphore whose value equals 0 this thread is suspended waiting for another thread to increment the semaphore count above 0.
Counting semaphores are often used to keep track of changes in the state of objects shared by multiple threads in a process. For instance, they can record the occurrence of a particular event. Unlike condition variables, semaphores maintain state. Therefore, they allow threads to make decisions based upon this state, even if the event has occurred in the past.
The standard POSIX Pthreads interface doesn't contain a counting semaphore abstraction. However, it's straightforward to create one using mutexes and condition variables, as shown below.
First, we define a struct
that holds the state information necessary to implement a counting semaphore:
struct sema_t
{
private:
u_int count_;
// Current count of the semaphore.
u_long waiters_count_;
// Number of threads that have called <sema_wait>.
pthread_mutex_t lock_;
// Serialize access to <count_> and <waiters_count_>.
pthread_cond_t count_nonzero_;
// Condition variable that blocks the <count_> 0.
};
Instances of type
sema_t
are created by the following
sema_init
factory (error checking is omitted to save space):
int
sema_init (sema_t *s, u_int initial_count)
{
pthread_mutex_init (&s->lock_);
pthread_cond_init (&s->count_nonzero_);
s->count_ = initial_count;
s->waiters_count_ = 0;
}
The
sema_wait
function shown below blocks the calling thread until the semaphore count pointed to by
s
is greater than 0, at which point the count is atomically decremented.
int
sema_wait (sema_t *s)
{
// Acquire mutex to enter critical section.
pthread_mutex_lock (&s->lock_);
// Keep track of the number of waiters so that <sema_post> works correctly.
s->waiters_count_++;
// Wait until the semaphore count is > 0, then atomically release
// <lock_> and wait for <count_nonzero_> to be signaled.
while (s->count_ == 0)
pthread_cond_wait (&s->count_nonzero_,
&s->lock_);
// <s->lock_> is now held.
// Decrement the waiters count.
s->waiters_count_--;
// Decrement the semaphore's count.
s->count_--;
// Release mutex to leave critical section.
pthread_mutex_unlock (&s->lock_);
}
The counting semaphore implementation shown above uses a ``caller manipulates waiter count'' idiom to simplify the synchronization between the
sema_wait
and
sema_post
functions. This synchronization idiom is used elsewhere in this article.
The sema_post
function atomically increments the semaphore count pointed to by s
. Only one thread will be unblocked and allowed to continue, regardless of the number of threads blocked on the semaphore.
int
sema_post (sema_t *s)
{
pthread_mutex_lock (&s->lock_);
// Always allow one thread to continue if it is waiting.
if (s->waiters_count_ > 0)
pthread_cond_signal (&s->count_nonzero_);
// Increment the semaphore's count.
s->count_++;
pthread_mutex_unlock (&s->lock_);
}
To further clarify the behavior of counting semaphores, assume there are two threads, T
1 and T
2, both of which are collaborating via semaphore
sema_t s
. If T
1 calls
sema_wait
when
s
's semaphore count is 0 it will block until thread T
2 calls
sema_post
. When T
2 subsequently calls
sema_post
, this function detects if the shared data
waiters_count_
is > 0, which signifies that thread T
1 is blocked waiting to acquire semaphore
s
.
When thread T2 signals the condition variablecount_nonzero_
in sema_post
, thread T1 is awakened in pthread_cond_wait
. When thread T1 is rescheduled and dispatched, it re-evaluates its condition expression, i.e., s->count_ == 0
. If count_
is > 0 the sema_post
function returns to thread T1.
Note that a while loop is used to wait for condition variablecount_nonzero_
in sema_wait
. This condition variable looping idiom [Lea] is necessary since another thread, e.g., T3, may have acquired the semaphore first and decremented s->count_
back to 0 before thread T1 is awakened. In this example, the while loop ensures that T1 doesn't return until the semaphore is > 0. In general, the while loop idiom ensures that threads don't return from pthread_cond_wait
prematurely.
pthread_cond_t
type and its supporting operations. In contrast, the Microsoft Win32 interface does not support condition variables natively. To workaround this omission, many developers write POSIX condition variable emulations using Win32 synchronization primitives such as mutexes, critical sections, semaphores, and events, which are outlined in the sidebar.
HANDLE
s, and can serialize threads within one process or across multiple processes. A single mutex is acquired via
WaitForSingleObject
and
WaitForMultipleObjects
can acquire multiple mutexes. A mutex is released via the
ReleaseMutex
operation.
EnterCriticalSection
and released via
LeaveCriticalSection
.
Although they are slower in some situations, some experiments show that critical sections provide no performance benefit at all when running on an SMP system and two threads are running on different processors and contending for the same critical section. Moreover, Win32 mutexes are also functionally richer than critical sections. For instance, WaitForMultipleObjects
can be used to wait for multiple waitable objects (mutexes, semaphores, threads, processes, etc). By enabling thewaitAll
parameter, all waitable objectHANDLE
s passed to WaitForMultipleObjects
can be acquired atomically. The waitAll
feature supports ``all or nothing'' synchronization semantics that are not available from critical sections.
WaitForSingleObject
and
WaitForMultipleObjects
can acquire multiple semaphores. A semaphore is released via the
ReleaseSemaphore
operation, which increments the semaphore value by a user-specified amount, which is typically 1.
SetEvent
,
PulseEvent
,
ResetEvent
, differ depending on what type of event they're operating on. A single event of either type can be waited upon via
WaitForSingleObject
and
WaitForMultipleObjects
can wait for multiple events.
When a manual-reset event is signaled by SetEvent
it sets the event into the signaled state and wakes up all threads waiting on this event. In contrast, when an auto-reset event is signaled bySetEvent
and there are any threads waiting, it wakes up one thread and reset the event to the non-signaled state. If there are no threads waiting the event remains signaled until a single waiting thread waits on it and is released.
When a manual-reset event is signaled by PulseEvent
it wakes up all waiting threads and atomically resets the event. In contrast, if an auto-reset event is signaled byPulseEvent
and there are any threads waiting, it wakes up one thread. In either case, the auto-reset event is set to the non-signaled state.
Both Win32 events and POSIX condition variables provide similarwaiting, signaling, and broadcasting features. For instance, WaitForMultipleObjects
can acquire a mutex and wait on an event simultaneously via thewaitAll
flag and SignalObjectAndWait
can release a mutex and wait on an event atomically. These functions provide semantics akin to the pthread_cond_wait
andpthread_cond_signal
. Thus, there are instances where either events and condition variables can be used interchangably.
However, extreme care must be taken with Win32 events to ensure that there are no race conditions introduced when switching from one mechanism to another. Unfortunately, there's no way to release just one waiting thread with a manual-reset event. Likewise, there's no way to release all waiting threads with an auto-reset event. This limitation is a major source of difficulty when implementing condition variables, as shown in Section 3.
After years of repeatedly seeing Win32 implementations of condition variables posted in newsgroups likecomp.programming.threads
it became apparent that many Win32 implementations are either incorrect or contain subtle problems that can lead to starvation, unfairness, or race conditions. To help developers avoid these problems, this article evaluates common strategies for implementing POSIX condition variables on Win32, illustrating common traps and pitfalls and ways to avoid them.
All these strategies described in this paper are designed for threads in the same process. Note that we use a C-style of C++ programming to focus our emphasis on the steps in each solution.
pthread_cond_t
data structure that contains a pair of Win32 events:
typedef struct
{
enum {
SIGNAL = 0,
BROADCAST = 1,
MAX_EVENTS = 2
};
HANDLE events_[MAX_EVENTS];
// Signal and broadcast event HANDLEs.
} pthread_cond_t;
In this solution, an auto-reset event (
events_[SIGNAL]
) implements
pthread_cond_signal
and a manual-reset event (
events_[BROADCAST]
) implements
pthread_cond_broadcast
.
Using these synchronization mechanisms, thepthread_cond_init
initialization function is defined as follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
// Create an auto-reset event.
cv->events_[SIGNAL] = CreateEvent (NULL, // no security
FALSE, // auto-reset event
FALSE, // non-signaled initially
NULL); // unnamed
// Create a manual-reset event.
cv->events_[BROADCAST] = CreateEvent (NULL, // no security
TRUE, // manual-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The following three functions implement the core operations of POSIX condition variables (for simplicity, most error handling is omitted in this and other solutions). We'll start by defining a mutual exclusion type for the POSIX
pthread_mutex_t
using a Win32
CRITICAL_SECTION
, as follows:
typedef CRITICAL_SECTION pthread_mutex_t;
The
pthread_cond_wait
function first releases the associated
external_mutex
of type
pthread_mutex_t
, which must be held when the caller checks the condition expression. The function then uses
WaitForMultipleObjects
to wait for either the
SIGNAL
or
BROADCAST
event to become signaled:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Release the <external_mutex> here and wait for either event
// to become signaled, due to <pthread_cond_signal> being
// called or <pthread_cond_broadcast> being called.
LeaveCriticalSection (external_mutex);
WaitForMultipleObjects (2, // Wait on both <events_>
ev->events_,
FALSE, // Wait for either event to be signaled
INFINITE); // Wait "forever"
// Reacquire the mutex before returning.
EnterCriticalSection (external_mutex, INFINITE);
}
Note that the
pthread_cond_timedwait
function can be implemented by mapping its
struct timespec *abstime
parameter into a non-
INFINITE
final argument to
WaitForMultipleObject
.
The pthread_cond_signal
function tries to wakeup one thread waiting on the condition variable cv
, as follows:
int
pthread_cond_signal (pthread_cond_t *cv)
{
// Try to release one waiting thread.
PulseEvent (cv->events_[SIGNAL]);
}
The call to
PulseEvent
on the auto-reset event
cv->events_[SIGNAL]
awakens one waiting thread. The
SIGNAL
event is atomically reset to the non-signaled state, even if no threads are waiting on this event.
The pthread_cond_broadcast
function is similar, though it wakes up all threads waiting on the condition variablecv
:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// Try to release all waiting threads.
PulseEvent (cv->events_[BROADCAST]);
}
Calling
PulseEvent
on a manual-reset event releases all threads that are waiting on
cv->events_[BROADCAST]
. As with
pthread_cond_signal
, the state of the event is atomically reset to non-signaled, even if no threads are waiting.
PulseEvent
solution is concise and ``intuitive,'' it is also incorrect. The root of the problem is known as the
lost wakeup bug. This bug occurs from the lack of atomicity between (1) releasing the external mutex and (2) the start of the wait,
i.e.:
LeaveCriticalSection (external_mutex);
// Potential "lost wakeup bug" here!
WaitForMultipleObjects (2, ev->events_, FALSE, INFINITE);
The lost wakeup bug manifests itself when the calling thread is preempted
after releasing the
external_mutex
, but
before calling
WaitForMultipleObjects
. In this case, a different thread could acquire the
external_mutex
and perform a
pthread_cond_signal
in between the two calls. Because our signal implementation
pulses the auto-reset event, the event remain unsignaled. Therefore, when the preempted calling thread resumes and calls
WaitForMultipleObjects
it may find both events non-signaled. In this case, it might wait indefinitely since its wakeup signal got ``lost.''
Note that this problem could be trivially solved if Win32 provided some type of ``SignalObjectAndWaitMultiple
'' function that released a mutex or critical section and atomically waited for an array of HANDLE
s to be signaled.
SetEvent
Win32 functions instead of
PulseEvent
. The second solution, shown below, implements this approach using a
pthread_cond_t
data structure that's similar to the first one. The difference is that it maintains a count of the threads waiting on the condition variable and a
CRITICAL_SECTION
to protect this count, as follows:
typedef struct
{
u_int waiters_count_;
// Count of the number of waiters.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
// Same as before...
} pthread_cond_t;
The
pthread_cond_init
initialization function is defined as follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
// Initialize the count to 0.
cv->waiters_count_ = 0;
// Create an auto-reset and manual-reset event, as before...
}
The
pthread_cond_wait
function waits for a condition
cv
and atomically releases the associated
external_mutex
that it holds while checking the condition expression:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_++;
LeaveCriticalSection (&cv->waiters_count_lock_);
// It's ok to release the <external_mutex> here since Win32
// manual-reset events maintain state when used with
// <SetEvent>. This avoids the "lost wakeup" bug...
LeaveCriticalSection (external_mutex);
// Wait for either event to become signaled due to <pthread_cond_signal>
// being called or <pthread_cond_broadcast> being called.
int result = WaitForMultipleObjects (2, ev->events_, FALSE, INFINITE);
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_--;
int last_waiter =
result == WAIT_OBJECT_0 + BROADCAST
&& cv->waiters_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// Some thread called <pthread_cond_broadcast>.
if (last_waiter)
// We're the last waiter to be notified or to stop waiting, so
// reset the manual event.
ResetEvent (cv->events_[BROADCAST]);
// Reacquire the <external_mutex>.
EnterCriticalSection (external_mutex, INFINITE);
}
The release of the
external_mutex
and the subsequent wait for either of the
events_
to become signaled is still non-atomic. This implementation avoids the lost wakeup bug, however, by relying on the ``stickiness'' of the manual-reset event, the use of
SetEvent
rather than
PulseEvent
, and the
waiters_count_
count. Note how the last waiter thread in
pthread_cond_wait
resets the
broadcast
manual-reset event to non-signaled before exiting the function. This event is signaled in
pthread_cond_broadcast
, which wakes up
all threads waiting on the condition variable
cv
, as follows:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (have_waiters)
SetEvent (cv->events_[BROADCAST]);
}
Calling
SetEvent
on a manual-reset event sets the
cv->events_[BROADCAST]
event to the signaled state. This releases
all threads until the event is manually reset in the
pthread_cond_wait
function above.
The pthread_cond_signal
function wakes up a thread waiting on the condition variable cv
:
int
pthread_cond_signal (pthread_cond_t *cv)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (have_waiters)
SetEvent (cv->events_[SIGNAL]);
}
Calling
SetEvent
on an auto-reset event will set the
cv->events_[SIGNAL]
event to the signaled state. This releases a single thread and atomically resets the event to the non-signaled state. If there are no waiting threads, the
SetEvent
function is skipped.
PulseEvent
implementation, it does have the following drawbacks:
pthread_cond_broadcast
function is to wake up all threads currently blocked in wait
calls on the condition variable. The awakened threads then compete for theexternal_mutex
. To ensure fairness, all of these threads should be released from theirpthread_cond_wait
calls and allowed to recheck their condition expressions before other threads can successfully complete a wait on the condition variable. Unfortunately, the SetEvent
implementation above does not guarantee that all threads sleeping on the condition variable when cond_broadcast
is called will acquire theexternal_mutex
and check their condition expressions. Although the Pthreads specification does not mandate this degree of fairness, the lack of fairness can cause starvation.
To illustrate the unfairness problem, imagine there are 2 threads, C1 and C2, that are blocked inpthread_cond_wait
on condition variablenot_empty_
that is guarding a thread-safe message queue. Another thread, P1 then places two messages onto the queue and calls pthread_cond_broadcast
. If C1 returns from pthread_cond_wait
, dequeues and processes the message, and immediately waits again then it and only it may end up acquiring both messages. Thus, C2 will never get a chance to dequeue a message and run.
The following illustrates the sequence of events:
non_empty_
non_empty_
not_empty_
not_empty_
, dequeues a message and runsnot_empty_
, immediately dequeues the 2nd message and runsnot_empty_
will never be signaled
pthread_cond_broadcast
. Howver, there are many cases where fair implementations of condition variables can simplify application code.
not_empty_
when a broadcast occurred. To illustrate this, we will use the same scenario as above: 2 threads, C1 and C2, are blocked dequeuing messages from the message queue. Another thread, P1 then places two messages onto the queue and calls pthread_cond_broadcast
. C1 returns from pthread_cond_wait
, dequeues and processes the message. At this time, C3 acquires theexternal_mutex
, calls pthread_cond_wait
and waits on the events in WaitForMultipleObjects
. Since C2 has not had a chance to run yet, theBROADCAST
event is still signaled. C3 then returns from WaitForMultipleObjects
, and dequeues and processes the message in the queue. Thus, C2 will never get a chance to dequeue a message and run.
The following illustrates the sequence of events:
non_empty_
non_empty_
not_empty_
not_empty_
, dequeues a message and runsnot_empty_
, immediately dequeues the 2nd message and runsnot_empty_
will never be signaled
waiters_count_lock_
critical section to protect the waiters_count_
from being corrupted by race conditions. This additional serialization overhead is not necessary as long as pthread_cond_signal
andpthread_cond_broadcast
are always called by a thread that has locked the same external_mutex
used bypthread_cond_wait
. For instance, application code that always uses the following idiom does not require this additional serialization:
pthread_mutex_t external_mutex;
pthread_cond_t cv;
void
release_resources (int resources_released)
{
// Acquire the lock.
pthread_mutex_lock (&external_mutex);
// Atomically modify shared state here...
// Could also use <pthread_cond_broadcast>.
pthread_cond_signal (&cv);
// Release the lock.
pthread_mutex_unlock (&external_mutex);
}
In contrast, application code that uses the following optimizationdoes
require the additional serialization:
void
release_resources (int resources_released)
{
// Acquire the lock.
pthread_mutex_lock (&external_mutex);
int can_signal = 0;
// Atomically modify shared state here...
// Keep track of whether we can signal or not
// as a result of the updated shared state.
can_signal = 1;
// Release the lock.
pthread_mutex_unlock (&external_mutex);
if (can_signal)
// Could also use <pthread_cond_broadcast>.
pthread_cond_signal (&cv);
}
The code shown above puts the pthread_cond_signal
after the pthread_mutex_unlock
to minimize the number of trips through the condition variable scheduler. The problem with this optimization, of course, is that race conditions can occur on the waiters_count_
if it was tested/modified without being protected by the external_mutex
. Note that thePulseEvents
solution did not have this race condition since it did not have any directly mutable internal state. In general, race conditions can occur with condition variable implementations that use internal state if threads don't acquire theexternal_mutex
when callingpthread_cond_signal
andpthread_cond_broadcast
. The POSIX Pthread specification allows these functions to be called by a thread whether or not that thread has currently locked the external_mutex
. Therefore, although the use of the internal lock can increase overhead, it is necessary to provide robust, standard-compliant condition variable programming model.
pthread_cond_t
struct
.
typedef struct
{
int waiters_count_;
// Count of the number of waiters.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
int release_count_;
// Number of threads to release via a <pthread_cond_broadcast> or a
// <pthread_cond_signal>.
int wait_generation_count_;
// Keeps track of the current "generation" so that we don't allow
// one thread to steal all the "releases" from the broadcast.
HANDLE event_;
// A manual-reset event that's used to block and release waiting
// threads.
} pthread_cond_t;
As shown in the
pthread_cond_t
struct
above, this solution uses just one Win32 event. This manual-reset event blocks threads in
pthread_cond_wait
and releases one or more threads in
pthread_cond_signal
and
pthread_cond_broadcast
, respectively. To enhance fairness, this scheme uses (1) a
wait_generation_count_
data member to track real signals/broadcasts and (2) a
release_count_
to control how many threads should be notified for each signal/broadcast.
The following function creates and initializes a condition variable using the generation count implementation:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *);
{
cv->waiters_count_ = 0;
cv->wait_generation_count_ = 0;
cv->release_count_ = 0;
// Create a manual-reset event.
cv->event_ = CreateEvent (NULL, // no security
TRUE, // manual-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The
pthread_cond_wait
implementation shown below waits for condition
cv
and atomically releases the associated
external_mutex
that must be held while checking the condition expression:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
// Increment count of waiters.
cv->waiters_count_++;
// Store current generation in our activation record.
int my_generation = cv->wait_generation_count_;
LeaveCriticalSection (&cv->waiters_count_lock_);
LeaveCriticalSection (external_mutex);
for (;;) {
// Wait until the event is signaled.
WaitForSingleObject (cv->event_, INFINITE);
EnterCriticalSection (&cv->waiters_count_lock_);
// Exit the loop when the <cv->event_> is signaled and
// there are still waiting threads from this <wait_generation>
// that haven't been released from this wait yet.
int wait_done = cv->release_count_ > 0
&& cv->wait_generation_count_ != my_generation;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (wait_done)
break;
}
EnterCriticalSection (external_mutex);
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_--;
cv->release_count_--;
int last_waiter = cv->release_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
if (last_waiter)
// We're the last waiter to be notified, so reset the manual event.
ResetEvent (cv->event_);
}
This function loops until the
event_
HANDLE
is signaled and there are still threads from this ``generation'' that haven't been released from the wait. The
wait_generation_count_
field is incremented every time the
event_
is signal via
pthread_cond_broadcast
or
pthread_cond_signal
. It tries to eliminate the fairness problems with the
SetEvents
solution by not responding to signal or broadcast notifications that have occurred in a previous ``generation,''
i.e.,
before the current group of threads started waiting.
The following function notifies a single thread waiting on a condition variable:
int
pthread_cond_signal (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
if (cv->waiters_count_ > cv->release_count_) {
SetEvent (cv->event_); // Signal the manual-reset event.
cv->release_count_++;
cv->wait_generation_count++;
}
LeaveCriticalSection (&cv->waiters_count_lock_);
}
Note that we only signal the event if there are more waiters than threads in the generation that is in the midst of being released.
The following implementation of pthread_cond_broadcast
notifies all threads waiting on a condition variable:
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
if (cv->waiters_count_ > 0) {
SetEvent (cv->event_);
// Release all the threads in this generation.
cv->release_count_ = cv->waiters_count_;
// Start a new generation.
cv->wait_generation_count_++;
}
LeaveCriticalSection (&cv->waiters_count_lock_);
}
SetEvent
solution is nominally avoided by having each thread test whether there was actually a signal since it waited or if there are only left-over releases. This prevents one thread from taking more than one release without waiting in a new ``generation.'' However, the following are the subtle traps and pitfalls with this approach:
pthread_cond_broadcast
signals the manual-resetevent_
it remains signaled. Therefore, the highest priority thread may cycle endlessly through the for
loop in pthread_cond_wait
and never sleep on the event.
for
loop inpthread_cond_wait
leaves the critical section before calling WaitForSingleObject
. Thus, it's possible for another thread to acquire the external_mutex
and callpthread_cond_signal
orpthread_cond_broadcast
again during this unprotected region. If this situation occurs, thewait_generation_count_
will increase, which may cause the waiting thread to break out of the loop prematurely. In this case, the waiting thread can steal a release that was intended for another thread. Therefore, the Generation Count solution isn't entirely fair after all!
pthread_cond_t
implementation, the internal serialization logic is more complex.
SignalObjectAndWait
function, in conjunction with the following three synchronization mechanisms:
sema_
) -- which is used to queue up threads waiting for the condition to become signaled.
waiters_done_
) -- which is used by the broadcast or signaling thread to wait for all the waiting thread(s) to wake up and be released from the semaphore.
waiters_count_lock_
) -- which serializes access to the count of waiting threads.
struct
to implement POSIX condition variables on Win32:
typedef struct
{
int waiters_count_;
// Number of waiting threads.
CRITICAL_SECTION waiters_count_lock_;
// Serialize access to <waiters_count_>.
HANDLE sema_;
// Semaphore used to queue up threads waiting for the condition to
// become signaled.
HANDLE waiters_done_;
// An auto-reset event used by the broadcast/signal thread to wait
// for all the waiting thread(s) to wake up and be released from the
// semaphore.
size_t was_broadcast_;
// Keeps track of whether we were broadcasting or signaling. This
// allows us to optimize the code if we're just signaling.
} pthread_cond_t;
For variety's sake, this solution optimizes the serialization required by assuming that the
external_mutex
is held when
pthread_cond_signal
or
pthread_cond_broadcast
is called. If this optimization is not possible, an extra mutex can be added to
pthread_cond_t
and held for the entire duration of the
pthread_cond_signal
,
pthread_cond_broadcast
,
pthread_cond_wait
, and
pthread_cond_timedwait
calls.
The remainder of this section illustrates how to implement condition variables on Windows NT 4.0 using the SignalObjectAndWait
function. This function atomically signals one HANDLE
and waits for another HANDLE
to become signaled. Since mutexes are accessed via HANDLE
s, whereasCRITICAL_SECTIONS
are not, we'll need to typedef a Win32 mutex for the POSIX pthread_mutex_t
instead of aCRITICAL_SECTION
:
typedef HANDLE pthread_mutex_t;
The
pthread_cond_init
function is defined as follows:
int
pthread_cond_init (pthread_cond_t *cv,
const pthread_condattr_t *)
{
cv->waiters_count_ = 0;
cv->was_broadcast_ = 0;
cv->sema_ = CreatedSemaphore (NULL, // no security
0, // initially 0
0x7fffffff, // max count
NULL); // unnamed
InitializeCriticalSection (&cv->waiters_count_lock_);
cv->waiters_done_ = CreateEvent (NULL, // no security
FALSE, // auto-reset
FALSE, // non-signaled initially
NULL); // unnamed
}
The
pthread_cond_wait
function uses two steps:
sema_
semaphore to be signaled, which indicates that a pthread_cond_broadcast
orpthread_cond_signal
has occurred.
waiters_done_
auto-reset event into the signaled state when the last waiting thread is about to leave thepthread_cond_wait
critical section.
pthread_cond_broadcast
function described below to ensure fairness. But first, here's the implementation of
pthread_cond_wait
:
int
pthread_cond_wait (pthread_cond_t *cv,
pthread_mutex_t *external_mutex)
{
// Avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
cv->waiters_count_++;
LeaveCriticalSection (&cv->waiters_count_lock_);
// This call atomically releases the mutex and waits on the
// semaphore until <pthread_cond_signal> or <pthread_cond_broadcast>
// are called by another thread.
SignalObjectAndWait (*external_mutex, cv->sema_, INFINITE, FALSE);
// Reacquire lock to avoid race conditions.
EnterCriticalSection (&cv->waiters_count_lock_);
// We're no longer waiting...
cv->waiters_count_--;
// Check to see if we're the last waiter after <pthread_cond_broadcast>.
int last_waiter = cv->was_broadcast_ && cv->waiters_count_ == 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// If we're the last waiter thread during this particular broadcast
// then let all the other threads proceed.
if (last_waiter)
// This call atomically signals the <waiters_done_> event and waits until
// it can acquire the <external_mutex>. This is required to ensure fairness.
SignalObjectAndWait (cv->waiters_done_, *external_mutex, INFINITE, FALSE);
else
// Always regain the external mutex since that's the guarantee we
// give to our callers.
WaitForSingleObject (*external_mutex);
}
This implementation of
pthread_cond_wait
ensures that the
external_mutex
is held until all threads waiting on the
cv
have a chance to wait again on the
external_mutex
before returning to their callers. This solution relies on the fact that Windows NT mutex requests are queued in FIFO order, rather than in,
e.g., priority order
[MSMutex]. Because the
external_mutex
queue is serviced in FIFO order, all waiting threads will acquire the external mutex before any of them can reacquire it a second time. This property is essential to ensure fairness.
We use the SignalObjectAndWait
function to ensure atomicity when signaling one synchronization object (external_mutex
) and waiting for another synchronization object to become signaled (cv->sema_
). This is particularly important for the last waiter thread, i.e., the one that signals the broadcasting thread when all waiters are done. If we don't leverage the atomicity ofSignalObjectAndWait
, our solution is potentially unfair since the last waiter thread may not get the chance to wait on theexternal_mutex
before one of the other waiters gets it, performs some work, releases and then immediately reacquiresexternal_mutex
.
The pthread_cond_signal
function is straightforward since it just releases one waiting thread. Therefore, it simply increments the sema_
semaphore by 1:
int
pthread_cond_signal (pthread_cond_t *cv)
{
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = cv->waiters_count_ > 0;
LeaveCriticalSection (&cv->waiters_count_lock_);
// If there aren't any waiters, then this is a no-op.
if (have_waiters)
ReleaseSemaphore (cv->sema_, 1, 0);
}
The
pthread_cond_broadcast
function is more complex and requires two steps:
sema_
semaphore, which can be done atomically by passing thewaiters_count_
to ReleaseSemaphore
;
waiters_done_
event until the last thread in the group of waiting threads exits thepthread_cond_wait
critical section.
int
pthread_cond_broadcast (pthread_cond_t *cv)
{
// This is needed to ensure that <waiters_count_> and <was_broadcast_> are
// consistent relative to each other.
EnterCriticalSection (&cv->waiters_count_lock_);
int have_waiters = 0;
if (cv->waiters_count_ > 0) {
// We are broadcasting, even if there is just one waiter...
// Record that we are broadcasting, which helps optimize
// <pthread_cond_wait> for the non-broadcast case.
cv->was_broadcast_ = 1;
have_waiters = 1;
}
if (have_waiters) {
// Wake up all the waiters atomically.
ReleaseSemaphore (cv->sema_, cv->waiters_count_, 0);
LeaveCriticalSection (&cv->waiters_count_lock_);
// Wait for all the awakened threads to acquire the counting
// semaphore.
WaitForSingleObject (cv->waiters_done_, INFINITE);
// This assignment is okay, even without the <waiters_count_lock_> held
// because no other waiter threads can wake up to access it.
cv->was_broadcast_ = 0;
}
else
LeaveCriticalSection (&cv->waiters_count_lock_);
}
As mentioned above, this solution requires that the
pthread_cond_signal
and
pthread_cond_broadcast
are only called by a thread that's locked the same
external_mutex
.
pthread_cond_wait
relied on the
SignalObjectAndWait
function, which was first released with Windows NT 4.0. Although this solution provides fairness, it has more overhead than the earlier solutions. For instance, the
external_mutex
is a mutex rather than a critical section, which increases locking and unlocking overhead. Moreover, the
pthread_cond_wait
and
pthread_cond_broadcast
implementations require additional synchronization operations to ensure fairness when broadcasting.
Unfortunately, the SignalObjectAndWait
function is not available in Windows CE, Windows '95, or Windows NT 3.51. Therefore, emulating condition variables can be trickier, less ``fair,'' and more computationally expensive on these Win32 platforms. The next article in this series will show how to develop a portable condition variable implementation for platforms that lackSignalObjectAndWait
.
The SignalObjectsAndWait
solution in Section 3.4 is a good approach if fairness is paramount. However, this approach is not as efficient as other solutions, nor is it as portable. Therefore, if efficiency or portability are more important than fairness, theSetEvent
approach described in Section 3.2 may be more suitable. Naturally, the easiest solution would be for Microsoft to simply provide condition variables in the Win32 API.
The C++ source code for POSIX condition variable on Win32 described in this article is freely available with the ACE framework at//www.cs.wustl.edu/~schmidt/ACE.html
.
[GoF] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Software Architecture, Addison-Wesley, 1995.
[Lea] Doug Lea, Concurrent Programming with Java, Addison Wesley, 1996.
[MSMutex] ``Mutex Wait is FIFO but Can Be Interrupted,'' in Win32 SDK Books Online, Microsoft Corporation, 1995.
[Pthreads] ``Threads Extension for Portable Operating Systems,'' IEEE P1003.4a, February, 1996.
[Richter] Jeffrey Richter, Advanced Windows, 1997, Microsoft Press, Redmond, WA.
[Schmidt] Douglas C. Schmidt and Steve Vinoski, ``Comparing Alternative Programming Techniques for Multi-threaded CORBA Servers: Thread Pool,'' C++ Report, Volume 8, No. 4, April, 1996.