关于pthread的cancel point

前言

今天遇到一个线程死锁,通常gdb把mutex lock的owner打出来,就可以查到锁的持有者,但今天打印owner,发现是一个不存在的线程,说明线程已经结束生命周期。排查使用该lock相关的线程,发现有个线程会被主线程给cancel掉,因此怀疑在cancel子线程时,子线程持有锁并未释放。当时怀疑难道pthread_mutex_lock是cancel点?


正文

1 查询man手册,并未说明pthread_mutex_lock是线程cancel点。

Description

The mutex object referenced by mutex shall be locked by calling pthread_mutex_lock(). If the mutex is already locked, the calling thread shall block until the mutex becomes available. This operation shall return with the mutex object referenced by mutex in the locked state with the calling thread as its owner.

If the mutex type is PTHREAD_MUTEX_NORMAL, deadlock detection shall not be provided. Attempting to relock the mutex causes deadlock. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.

If the mutex type is PTHREAD_MUTEX_ERRORCHECK, then error checking shall be provided. If a thread attempts to relock a mutex that it has already locked, an error shall be returned. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error shall be returned.

If the mutex type is PTHREAD_MUTEX_RECURSIVE, then the mutex shall maintain the concept of a lock count. When a thread successfully acquires a mutex for the first time, the lock count shall be set to one. Every time a thread relocks this mutex, the lock count shall be incremented by one. Each time the thread unlocks the mutex, the lock count shall be decremented by one. When the lock count reaches zero, the mutex shall become available for other threads to acquire. If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error shall be returned.

If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock the mutex results in undefined behavior. Attempting to unlock the mutex if it was not locked by the calling thread results in undefined behavior. Attempting to unlock the mutex if it is not locked results in undefined behavior.

The pthread_mutex_trylock() function shall be equivalent to pthread_mutex_lock(), except that if the mutex object referenced by mutex is currently locked (by any thread, including the current thread), the call shall return immediately. If the mutex type is PTHREAD_MUTEX_RECURSIVE and the mutex is currently owned by the calling thread, the mutex lock count shall be incremented by one and the pthread_mutex_trylock() function shall immediately return success.

The pthread_mutex_unlock() function shall release the mutex object referenced by mutex. The manner in which a mutex is released is dependent upon the mutex's type attribute. If there are threads blocked on the mutex object referenced by mutex when pthread_mutex_unlock() is called, resulting in the mutex becoming available, the scheduling policy shall determine which thread shall acquire the mutex.

(In the case of PTHREAD_MUTEX_RECURSIVE mutexes, the mutex shall become available when the count reaches zero and the calling thread no longer has any locks on this mutex.)

If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.

Return Value

If successful, the pthread_mutex_lock() and pthread_mutex_unlock() functions shall return zero; otherwise, an error number shall be returned to indicate the error.

The pthread_mutex_trylock() function shall return zero if a lock on the mutex object referenced by mutex is acquired. Otherwise, an error number is returned to indicate the error.

Errors

The pthread_mutex_lock() and pthread_mutex_trylock() functions shall fail if:

EINVAL
The mutex was created with the protocol attribute having the value PTHREAD_PRIO_PROTECT and the calling thread's priority is higher than the mutex's current priority ceiling.
The pthread_mutex_trylock() function shall fail if:

EBUSY
The mutex could not be acquired because it was already locked.
The pthread_mutex_lock(), pthread_mutex_trylock(), and pthread_mutex_unlock() functions may fail if:

EINVAL
The value specified by mutex does not refer to an initialized mutex object.
EAGAIN
The mutex could not be acquired because the maximum number of recursive locks for mutex has been exceeded.
The pthread_mutex_lock() function may fail if:

EDEADLK
The current thread already owns the mutex.
The pthread_mutex_unlock() function may fail if:

EPERM
The current thread does not own the mutex.
These functions shall not return an error code of [EINTR].

The following sections are informative.

Examples

None.

Application Usage

None.

Rationale

Mutex objects are intended to serve as a low-level primitive from which other thread synchronization functions can be built. As such, the implementation of mutexes should be as efficient as possible, and this has ramifications on the features available at the interface.

The mutex functions and the particular default settings of the mutex attributes have been motivated by the desire to not preclude fast, inlined implementations of mutex locking and unlocking.

For example, deadlocking on a double-lock is explicitly allowed behavior in order to avoid requiring more overhead in the basic mechanism than is absolutely necessary. (More "friendly" mutexes that detect deadlock or that allow multiple locking by the same thread are easily constructed by the user via the other mechanisms provided. For example, pthread_self() can be used to record mutex ownership.) Implementations might also choose to provide such extended features as options via special mutex attributes.

Since most attributes only need to be checked when a thread is going to be blocked, the use of attributes does not slow the (common) mutex-locking case.

Likewise, while being able to extract the thread ID of the owner of a mutex might be desirable, it would require storing the current thread ID when each mutex is locked, and this could incur unacceptable levels of overhead. Similar arguments apply to a mutex_tryunlock operation.

2.  man pthreads

      明确列举了所有的Cancellation points Functions.

  说明pthread_mutex_lock不是cancel point,因此分析是线程中存在其他cancel point,但为设置线程cancel的退出栈,因此线程被cancel时,导致lock未释放。

  需要注意的是:读写锁相关lock函数却是cancel point。

pthread_rwlock_rdlock()
pthread_rwlock_timedrdlock()
pthread_rwlock_timedwrlock()
pthread_rwlock_wrlock()


后语

     如果主线程要通过pthread_cancel给子线程发起cancel request时,子线程通常要设置退出栈pthread_cleanup_push,pthread_cleanup_pop)。

      但为简单起见,个人更喜欢在子线程设置running flag,在主线程通过disabled 子线程的running flag来停止子线程。


进一步思考

1) 为什么线程都退出了,还能持有mutex lock呢?

2) 是否存在线程退出后,自动unlock其持久的锁?

3) 线程cancel是如何实现的?


参考:

http://stackoverflow.com/questions/16224469/pthreads-cancel-blocking-thread

http://man7.org/linux/man-pages/man3/pthread_cancel.3.html

http://man7.org/linux/man-pages/man7/pthreads.7.html

http://man7.org/linux/man-pages/man3/pthread_cleanup_push.3.html

你可能感兴趣的:(linux,线程)