首先我要纠正一个观点,那就是条件变量丢信号。许多人在使用了条件变量后,由于应用场景和应用条件的不满足得出这样的观点。
所谓的条件变量丢信号,其实是大家在使用的时候,没有满足条件变量的使用条件,——:先wait,后发信号。
如果你没有满足这样的条件,在linux下是必须丢,注意我这里用的是必须丢,而不是可能。之所以有这样的问题,这和条件变量的底层实现有关,在linux平台下底层的条件变量实现的时候,类比电路,条件变量的信号是”边沿机制,而非电平机制“,其实底层在实现的时候,当你调用发送信号后,会检查当前是否有等待成员,如果没有人wait直接就返回,那也就意味着你先发信号后wait这种实现方案会产生“丢信号”。
当然也推荐看看这篇博客:
https://blog.csdn.net/absurd/article/details/1402433
先来个小例子:
#include
#include
#include
#include
#include
#include
pthread_mutex_t m_pthreadMutex =PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t m_pthreadCond = PTHREAD_COND_INITIALIZER;
int is_ok=0;
void * ThreadFunc(void * i)
{
/*!*/
pthread_mutex_lock(&m_pthreadMutex);
while(1)
{
pthread_cond_wait(&m_pthreadCond,&m_pthreadMutex);
printf("wait-----------\n");
}
pthread_mutex_unlock(&m_pthreadMutex);
return NULL;
}
int main (void)
{
pthread_t tid;
pthread_create(&tid,NULL,ThreadFunc,NULL);
sleep(4);
int cnt=10;
while(cnt -- )
{
pthread_mutex_lock(&m_pthreadMutex);
pthread_cond_signal(&m_pthreadCond);
printf("signal\n");
pthread_mutex_unlock(&m_pthreadMutex);
}
pthread_join(tid,NULL);
return 0;
}
显然上面的结果和我们的需求以及理解完全不相符,我们愿意是发送一次signal打印一次wait ,但是事实是发送了一堆signal只有一次wait。
抢锁
解锁
[区间可以抢]
休眠
抢锁[不一定成功]
解锁
条件变量,可能发生的事情,
1.先发了几个信号,那边还没有wait
2.解锁等待的时候,一起来了几个信号
都说:源码之前,了无秘密。
https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/nptl/pthread_cond_wait.c
基本就是下面的逻辑了
static __always_inline int
__pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
const struct timespec *abstime)
{
const int maxspin = 0;
int err;
int result = 0;
LIBC_PROBE (cond_wait, 2, cond, mutex);
/* Acquire a position (SEQ) in the waiter sequence (WSEQ). We use an
atomic operation because signals and broadcasts may update the group
switch without acquiring the mutex. We do not need release MO here
because we do not need to establish any happens-before relation with
signalers (see __pthread_cond_signal); modification order alone
establishes a total order of waiters/signals. We do need acquire MO
to synchronize with group reinitialization in
__condvar_quiesce_and_switch_g1. */
uint64_t wseq = __condvar_fetch_add_wseq_acquire (cond, 2);
/* Find our group's index. We always go into what was G2 when we acquired
our position. */
unsigned int g = wseq & 1;
uint64_t seq = wseq >> 1;
/* Increase the waiter reference count. Relaxed MO is sufficient because
we only need to synchronize when decrementing the reference count. */
unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
int private = __condvar_get_private (flags);
/* Now that we are registered as a waiter, we can release the mutex.
Waiting on the condvar must be atomic with releasing the mutex, so if
the mutex is used to establish a happens-before relation with any
signaler, the waiter must be visible to the latter; thus, we release the
mutex after registering as waiter.
If releasing the mutex fails, we just cancel our registration as a
waiter and confirm that we have woken up. */
err = __pthread_mutex_unlock_usercnt (mutex, 0);
if (__glibc_unlikely (err != 0))
{
__condvar_cancel_waiting (cond, seq, g, private);
__condvar_confirm_wakeup (cond, private);
return err;
}
/* Now wait until a signal is available in our group or it is closed.
Acquire MO so that if we observe a value of zero written after group
switching in __condvar_quiesce_and_switch_g1, we synchronize with that
store and will see the prior update of __g1_start done while switching
groups too. */
unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g);
do
{
while (1)
{
/* Spin-wait first.
Note that spinning first without checking whether a timeout
passed might lead to what looks like a spurious wake-up even
though we should return ETIMEDOUT (e.g., if the caller provides
an absolute timeout that is clearly in the past). However,
(1) spurious wake-ups are allowed, (2) it seems unlikely that a
user will (ab)use pthread_cond_wait as a check for whether a
point in time is in the past, and (3) spinning first without
having to compare against the current time seems to be the right
choice from a performance perspective for most use cases. */
unsigned int spin = maxspin;
while (signals == 0 && spin > 0)
{
/* Check that we are not spinning on a group that's already
closed. */
if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
goto done;
/* TODO Back off. */
/* Reload signals. See above for MO. */
signals = atomic_load_acquire (cond->__data.__g_signals + g);
spin--;
}
/* If our group will be closed as indicated by the flag on signals,
don't bother grabbing a signal. */
if (signals & 1)
goto done;
/* If there is an available signal, don't block. */
if (signals != 0)
break;
/* No signals available after spinning, so prepare to block.
We first acquire a group reference and use acquire MO for that so
that we synchronize with the dummy read-modify-write in
__condvar_quiesce_and_switch_g1 if we read from that. In turn,
in this case this will make us see the closed flag on __g_signals
that designates a concurrent attempt to reuse the group's slot.
We use acquire MO for the __g_signals check to make the
__g1_start check work (see spinning above).
Note that the group reference acquisition will not mask the
release MO when decrementing the reference count because we use
an atomic read-modify-write operation and thus extend the release
sequence. */
atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
|| (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
{
/* Our group is closed. Wake up any signalers that might be
waiting. */
__condvar_dec_grefs (cond, g, private);
goto done;
}
// Now block.
struct _pthread_cleanup_buffer buffer;
struct _condvar_cleanup_buffer cbuffer;
cbuffer.wseq = wseq;
cbuffer.cond = cond;
cbuffer.mutex = mutex;
cbuffer.private = private;
__pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer);
if (abstime == NULL)
{
/* Block without a timeout. */
err = futex_wait_cancelable (
cond->__data.__g_signals + g, 0, private);
}
else
{
/* Block, but with a timeout.
Work around the fact that the kernel rejects negative timeout
values despite them being valid. */
if (__glibc_unlikely (abstime->tv_sec < 0))
err = ETIMEDOUT;
else if ((flags & __PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0)
{
/* CLOCK_MONOTONIC is requested. */
struct timespec rt;
if (__clock_gettime (CLOCK_MONOTONIC, &rt) != 0)
__libc_fatal ("clock_gettime does not support "
"CLOCK_MONOTONIC");
/* Convert the absolute timeout value to a relative
timeout. */
rt.tv_sec = abstime->tv_sec - rt.tv_sec;
rt.tv_nsec = abstime->tv_nsec - rt.tv_nsec;
if (rt.tv_nsec < 0)
{
rt.tv_nsec += 1000000000;
--rt.tv_sec;
}
/* Did we already time out? */
if (__glibc_unlikely (rt.tv_sec < 0))
err = ETIMEDOUT;
else
err = futex_reltimed_wait_cancelable
(cond->__data.__g_signals + g, 0, &rt, private);
}
else
{
/* Use CLOCK_REALTIME. */
err = futex_abstimed_wait_cancelable
(cond->__data.__g_signals + g, 0, abstime, private);
}
}
__pthread_cleanup_pop (&buffer, 0);
if (__glibc_unlikely (err == ETIMEDOUT))
{
__condvar_dec_grefs (cond, g, private);
/* If we timed out, we effectively cancel waiting. Note that
we have decremented __g_refs before cancellation, so that a
deadlock between waiting for quiescence of our group in
__condvar_quiesce_and_switch_g1 and us trying to acquire
the lock during cancellation is not possible. */
__condvar_cancel_waiting (cond, seq, g, private);
result = ETIMEDOUT;
goto done;
}
else
__condvar_dec_grefs (cond, g, private);
/* Reload signals. See above for MO. */
signals = atomic_load_acquire (cond->__data.__g_signals + g);
}
}
/* Try to grab a signal. Use acquire MO so that we see an up-to-date value
of __g1_start below (see spinning above for a similar case). In
particular, if we steal from a more recent group, we will also see a
more recent __g1_start below. */
while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
&signals, signals - 2));
/* We consumed a signal but we could have consumed from a more recent group
that aliased with ours due to being in the same group slot. If this
might be the case our group must be closed as visible through
__g1_start. */
uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
if (seq < (g1_start >> 1))
{
/* We potentially stole a signal from a more recent group but we do not
know which group we really consumed from.
We do not care about groups older than current G1 because they are
closed; we could have stolen from these, but then we just add a
spurious wake-up for the current groups.
We will never steal a signal from current G2 that was really intended
for G2 because G2 never receives signals (until it becomes G1). We
could have stolen a signal from G2 that was conservatively added by a
previous waiter that also thought it stole a signal -- but given that
that signal was added unnecessarily, it's not a problem if we steal
it.
Thus, the remaining case is that we could have stolen from the current
G1, where "current" means the __g1_start value we observed. However,
if the current G1 does not have the same slot index as we do, we did
not steal from it and do not need to undo that. This is the reason
for putting a bit with G2's index into__g1_start as well. */
if (((g1_start & 1) ^ 1) == g)
{
/* We have to conservatively undo our potential mistake of stealing
a signal. We can stop trying to do that when the current G1
changes because other spinning waiters will notice this too and
__condvar_quiesce_and_switch_g1 has checked that there are no
futex waiters anymore before switching G1.
Relaxed MO is fine for the __g1_start load because we need to
merely be able to observe this fact and not have to observe
something else as well.
??? Would it help to spin for a little while to see whether the
current G1 gets closed? This might be worthwhile if the group is
small or close to being closed. */
unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
while (__condvar_load_g1_start_relaxed (cond) == g1_start)
{
/* Try to add a signal. We don't need to acquire the lock
because at worst we can cause a spurious wake-up. If the
group is in the process of being closed (LSB is true), this
has an effect similar to us adding a signal. */
if (((s & 1) != 0)
|| atomic_compare_exchange_weak_relaxed
(cond->__data.__g_signals + g, &s, s + 2))
{
/* If we added a signal, we also need to add a wake-up on
the futex. We also need to do that if we skipped adding
a signal because the group is being closed because
while __condvar_quiesce_and_switch_g1 could have closed
the group, it might stil be waiting for futex waiters to
leave (and one of those waiters might be the one we stole
the signal from, which cause it to block using the
futex). */
futex_wake (cond->__data.__g_signals + g, 1, private);
break;
}
/* TODO Back off. */
}
}
}
done:
/* Confirm that we have been woken. We do that before acquiring the mutex
to allow for execution of pthread_cond_destroy while having acquired the
mutex. */
__condvar_confirm_wakeup (cond, private);
/* Woken up; now re-acquire the mutex. If this doesn't fail, return RESULT,
which is set to ETIMEDOUT if a timeout occured, or zero otherwise. */
err = __pthread_mutex_cond_lock (mutex);
/* XXX Abort on errors that are disallowed by POSIX? */
return (err != 0) ? err : result;
}
可以看到的逻辑是首先解锁,__pthread_mutex_unlock_usercnt,这个时候自旋 原子判断条件状况,然后休眠