线程概念

利用线程的意义

Advantages of using thread

thread reduce the OS overhead on creating and switching context compare with process

threads are tightly coupled therefore easy to share resources

thread improve performance by avoid waiting for processing or I/O

thread exploits the performance improvement on multiprocessor by concurrent

相比进程，线程的切换花销更小，可以更容易的共享资源，可以提高等待I/O和处理时的性能，还可以充分发掘多核处理器的计算能力。

线程中的全局变量-TLS

Basic understanding of thread

each thread has its own stack

the argument (pointer) pass to a thread at creation time is actually on thread's stack

thread can allocate TLS(thread local storage), which provides small pointer arrays to thread. TLS can only be accessed by the specific thread, which assures thread wouldn't modify one another's data

每个线程都有一个执行栈，在创建时通过void* 指针传入数据，每个线程还有一个局部堆，提供一系列指针，指向一块内存，仅由线程内的方法访问。

在WinThread中，

/*Allocates a thread local storage (TLS) index. Any thread of the process can subsequently use this index to store and retrieve values that are local to the thread, because each thread receives its own slot for the index.
the return value is a TLS index.*/
DWORD TlsAlloc();
/*Releases a thread local storage (TLS) index, making it available for reuse.*/
BOOL TlsFree(
  DWORD dwTlsIndex
);
/*Retrieves the value in the calling thread's thread local storage (TLS) slot for the specified TLS index. Each thread of a process has its own slot for each TLS index.*/
LPVOID TlsGetValue(
  DWORD dwTlsIndex
);
/*Stores a value in the calling thread's thread local storage (TLS) slot for the specified TLS index. Each thread of a process has its own slot for each TLS index.*/
BOOL TlsSetValue(
  DWORD  dwTlsIndex,
  LPVOID lpTlsValue
);
/*Thread local storage can be used in Dll to replace gloabl storage, each calling thread has its own global storage (for thread-safe DLL)*/

TLS对于实现线程安全的DLL有重要意义，用TLS代替全局变量，每一个调用DLL的会拥有一份静态内存。

C标准库中的线程安全问题

c lib was written to operate in single-threaded processes, some functions may use static storage to store intermediate results, therefore are not thread-safe when two separate threads simultaneously accessing and modifying static storgae.

早期的C标准库是用于单线程进程的，一些函数会利用静态内存存储中间数据。然而，这样的函数是不可重入的，线程内多个函数同时访问可能带来数据竞争。

/*Split string into tokens
A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters*/
char * strtok ( char * str, const char * delimiters );
/*scan a string to find the next occurrence of a token is an example, which maintains persistent state between successive function calls*/

例如strtok()，在前后两次调用中，函数内存在静态变量，它不是线程安全的。现在许多C标准库的实现已经是线程安全的。

线程状态

Running, executing on a processor

Wait, wait on a nonsignaled handle (thread, process, synchronization obj), blocked, sleeping

Ready, scheduler can put it in running state at any time

Suspend, a ready thread will not be run

Terminated

线程在其生命周期中，存在着五种状态，处于“Running”状态的线程正在某一个处理器上执行；“Wait”状态的线程正在被阻塞，等待某个信号；“Ready”状态的线程被放置在等待执行的队列中，随时可能被调度执行；“Suspend”状态的线程也在等待队列中，但是不会被执行；“Terminated”状态下，线程已经被终止，存在的的意义是为其它线程提供信息。

处理器亲和度

/*Sets a processor affinity mask for the specified thread.*/
DWORD_PTR SetThreadAffinityMask(
  HANDLE    hThread,
  DWORD_PTR dwThreadAffinityMask
);
/*Retrieves the process affinity mask for the specified process and the system affinity mask for the system.*/
BOOL GetProcessAffinityMask(
  HANDLE     hProcess,
  PDWORD_PTR lpProcessAffinityMask,
  PDWORD_PTR lpSystemAffinityMask
);
/*limit the processor that can run specific thread, and prevent other thread from using this processor*/

设置处理器亲和度可以使得线程被调度到特定的处理器核心执行，同时防止其它线程来竞争这个核心。

线程模型

Advantages of applying threading model when designing multi-thread program

model is well-understand and tested, which avoid many of the mistakes

model helps dev obtain the best performance

model correspond naturally to structures of programming problem

troubleshooting is easy if analyze in terms of models, underlying problem (race condition, deadlocks)is seen to violate the basic principles of models.

多线程对于编程带来了更多的复杂度，引入模型可以降低复杂度，帮助理清逻辑。

Pthread库介绍

Pthread是一种线程并发模型的实现，

Thread management functions to create, destroy, join, and detach threads.

Synchronization functions to manage the synchronization of threads, including mutex, condition variables, and barriers.

它提供了两方面的方法，线程管理和线程同步。本文以Pthread的方法汇总了线程和并发编程中的概念。

Pthread is provided by glibc, but in a separate library and requires explicit linkage
the flag -pthread also setting certain preprocessor.

在Linux中，Pthread虽然由C标准库提供，但是在使用时需要指定链接Pthread库libthread。

gcc -Wall -Werror -pthread beard.c -o beard

Pthread线程管理

线程的创建和结束

创建线程时需要指定起始函数及其输入参数，pthread_attr_t默认为NULL，可以通过它指定线程栈大小，调度方法和初始状态。线程ID可以通过pthread_self()获取，由于线程ID的类型pthread_t在不同平台的实现不同，因此Pthread库实现pthread_equal()比较线程ID是否相等。

#include 

//pthread_attr_t object pointed defines the thread attributes such as
//  stack size, schedulizing parameters, initial detached state
int pthread_create(pthread_t *thread,
                   const pthread_attr_t *attr,
                   void *(*start_routine)(void *),
                   void *arg);
//obtain thread id at runtime
pthread_t pthread_self(void);
//equality operator for thread id
int pthread_equal(pthread_t t1, pthread_t t2);

和进程一样，线程的终结有三种方式，

thread termination
- return from start routine "fall off the end of main()"
- call pthread_exit()
- thread cancled by another thread via pthread_cancel()

process termination
- returns from its main()
- call exit()
- executes new binary image via execve()

即从起始函数返回，调用pthread_exit()或者由两个线程调用pthread_cancel()终结指定线程。

//terminate a thread deep in a function call stack,
// retval is provided to any thread waiting on the terminating thread’s death
void pthread_exit(void *retval);
//termination of threads by other threads cancellation, return 0 if success, but
//  since actual termination occurs asynchronously, the successful ret flag only
//  indicate cancellation request is processing successfully
int pthread_cancel(pthread_t thread);
//a thread's cancelable state can be enabled or disabled
int pthread_setcancelstate(int state, int *oldstate);
//A thread’s cancellation type is either asynchronous or deferred, asynchronous cancell may kill
//  thread at any point after request made, while deferred only kill thread at specific
//  cancellation points(safe), it's useful when leaving the thread in undefined state (e.g.
//  cancel thread in the middle of a critical region)
int pthread_setcanceltype(int type, int *oldtype);

一个线程可以通过pthread_setcancelstate()允许或者禁止其它线程终止它。线程调用pthread_cancel()通过线程ID终止线程，pthread_cancel()的过程是异步的，因此其返回0只代表终止请求成功发出，而不代表终止完成。线程被其它线程终止有两种模式，“异步地”和“延迟的”，前一种终止动作会在任何时候发生，后一种会延迟到一个安全的执行上下文，想象一下，线程在critical region中，持有锁的时候被结束会怎么样。

线程等待和分离

//allows one thread to block while waiting for the termination of another,
//  joining allows thread to synchronize their execution against the lifetime
//  of other threads,
//return EDEADLK when deadlock was detected
int pthread_join(pthread_t thread, void **retval);
//make the thread no longer joinable
int pthread_detach (pthread_t thread);

pthread_join(target_tid)使得调用线程被阻塞，直到目标线程完成执行。一个线程默认是可以加入等待的，调用pthread_detach()可以让线程分离执行，不再可以加入等待。

Pthread线程同步方法

同一个进程中的不同线程可以访问公共的进程地址空间，因此通过共享内存实现线程通信是一种常用的方法。然而，多个线程对共享内存的并发访问会带来数据访问的竞争。线程同步方法可以将并发访问串行化。下面对Pthread线程同步的方法和相关问题做一点小总结。

Mutex

Mutex，互斥锁用于保护共享数据，使其只被一个线程访问。当一个线程获得锁时，其它尝试获得锁的线程会被阻塞而睡眠，直至互斥锁被释放。

以下是mutex相关的API，mutex需要创建和释放，手动管理内部的动态内存。线程调用lock()去申请锁，调用unlock()释放锁。调用try_lock()的线程可以避免被阻塞，无法获得锁时可以进行处理。timedlock()可以为持有锁指定一段时间。

#include 
/* define and initialize a mutex named `mutex' */
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int pthread_mutex_init(pthread_mutex_t *restrict mutex,
    const pthread_mutexattr_t *restrict attr);
int pthread_mutex_destroy(pthread_mutex_t *mutex);
//Both return: 0 if OK, error number on failure
int pthread_mutex_lock(pthread_mutex_t *mutex);
//try to lock to avoid block (lock mutex confitionally), return EBUSY
//  if cannot lock
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
int pthread_mutex_timedlock(pthread_mutex_t *restrict mutex, const struct timespec *restrict tsptr);
//All return: 0 if OK, error number on failure

一个简单的例子：[https://github.com/KevinACoder/apue.3e/blob/master/threads/mutex3.c]

Deadlock

锁使用不当容易造成死锁，例如，一个线程对一个mutex两次上锁；A线程拥有mutex1，B线程拥有mutex2，同时地，A线程尝试对mutex1上锁而B线程尝试对mutex2上锁；10个线程，每个线程完成后发送一个信号，需要等待11个信号才能完成工作。在这两种情形下，线程永远无法推进，造成死锁。

在锁的使用上避免死锁的方法有很多，例如，对于多个mutex，在多个线程中保持加锁的顺序一致；使用try_lock()，当线程无法上锁的话，释放线程所拥有的一些mutex锁，过一段时间在尝试上锁。

Read-Writer Mutex

Read-Writer Mutex，读写锁和mutex的功能基本相同。不同的是，虽然写锁只有一个线程可以持有，但是读锁可以被多个线程持有。读写锁适用于保护数据结构，其读的频率高于写的频率。

#include 
int pthread_rwlock_init(pthread_rwlock_t *restrict rwlock,
    const pthread_rwlockattr_t *restrict attr);
int pthread_rwlock_destroy(pthread_rwlock_t *rwlock);
//Both return: 0 if OK, error number on failure
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock);
//All return: 0 if OK, error number on failure
int pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock);
//Both return: 0 if OK, error number on failure
int pthread_rwlock_timedrdlock(pthread_rwlock_t *restrict rwlock,
    const struct timespec *restrict tsptr);
int pthread_rwlock_timedwrlock(pthread_rwlock_t *restrict rwlock,
    const struct timespec *restrict tsptr);
//Both return: 0 if OK, error number on failure

例子：[https://github.com/KevinACoder/apue.3e/blob/master/threads/rwlock.c]

Spin locks

Spin locks, 自旋锁，也是一种互斥锁，但是它们的效果有所不同，被mutex阻塞的线程会进入睡眠，而spin lock阻塞的线程会进行忙等待。不会睡眠意味着线程不会被重新调度，但是CPU资源也会被占用。因此，spin lock锁必须尽快释放。

自旋锁可以用于非抢占的内核代码中，被锁定的代码段不会被调度抢占，从而不会因为中断代码中加锁而出现死锁。在实时程序中，自旋锁可以避免线程被调度，即使线程的时间片已经用完，或出现更高优先级的执行线程。

#include 
int pthread_spin_init(pthread_spinlock_t *lock, int pshared);
int pthread_spin_destroy(pthread_spinlock_t *lock);
int pthread_spin_lock(pthread_spinlock_t *lock);
int pthread_spin_trylock(pthread_spinlock_t *lock);
int pthread_spin_unlock(pthread_spinlock_t *lock);
//Both return: 0 if OK, error number on failure

Condition variable

Condition variable可以充当一个发令枪，一个线程发出信号，等待信号的线程不再被阻塞。

#include 
//init or free condition variable
int pthread_cond_init(pthread_cond_t *restrict cond,
    const pthread_condattr_t *restrict attr);
int pthread_cond_destroy(pthread_cond_t *cond);
//wait for a condition to be true.
int pthread_cond_wait(pthread_cond_t *restrict cond,
    pthread_mutex_t *restrict mutex);
int pthread_cond_timedwait(pthread_cond_t *restrict cond,
    pthread_mutex_t *restrict mutex,
    const struct timespec *restrict tsptr);
//signaling the thread or condition
int pthread_cond_signal(pthread_cond_t *cond);
int pthread_cond_broadcast(pthread_cond_t *cond);    
//Both return: 0 if OK, error number on failure

例子：[https://github.com/KevinACoder/apue.3e/blob/master/threads/condvar.c]

Barriers

Barriers是一种同步机制，可以协调多个并行执行的线程。pthread_join()是一种Barriers，它让一个线程等待另一个线程结束。

#include 
int pthread_barrier_init(pthread_barrier_t *restrict barrier,
    const pthread_barrierattr_t *restrict attr,
    unsigned int count);
/*
 count: argument to specify the number of threads that must 
    reach the barrier
*/
int pthread_barrier_destroy(pthread_barrier_t *barrier);
//Both return: 0 if OK, error number on failure
int pthread_barrier_wait(pthread_barrier_t *barrier);
//ready to wait for all the other threads to catch up.
//  the calling thread is put to sleep if condition not yet saftisfied
//once barrier reached, it can be use again, but count is inmutalable
//Returns: 0 or PTHREAD_BARRIER_SERIAL_THREAD if OK, error number on failure

例子：[https://github.com/KevinACoder/apue.3e/blob/master/threads/barrier.c]

线程知识点汇总-Pthread和Win32Thread