processes, threads and signals in Linux

之前想知道process和threads在Linux kernel中的调度问题,后来就想知道linux如何支持multithreaded application了;感谢几位osc的朋友提供的资料和帮助。对于以上两个问题,我了解了个大概。以下是我的分析记录,因为都是一些浅显的材料和分析,所以就不做整理和总结了。

 

===========================================
Basics


When a process executes a fork call, a new copy of the process is created with its
own variables and its own PID. This new process is scheduled independently, and (in general)
executes almost independently of the process that created it. When we create a new thread in a
process, in contrast, the new thread of execution gets its own stack (and hence local variables) but shares global variables, file descriptors, signal handlers, and its current directory state with the process that created it.

The performance of an application that mixes input, calculation, and output may be improved
by running these as three separate threads. While the input or output thread is waiting for a
connection, one of the other threads can continue with calculations.

Writing multithreaded programs requires very careful design. The potential for introducing
subtle timing faults, or faults caused by the unintentional sharing of variables in a multithreaded
program is considerable.

Re-entrant code can be called more than once, whether by different threads or by nested invocations in some way, and still function correctly. Thus, the re-entrant section of code usually must use local variables only in such a way that each and every call to the code gets its own unique copy of the data.

In multithreaded programs, you tell the compiler that you need this feature by defining the _REENTRANT macro before any #include lines in your program. This does three things
=>     Some functions get prototypes for a re-entrant safe equivalent. These are normally the same
       function name, but with _r appended so that, for example, gethostbyname is changed to
       gethostbyname_r.
=>     Some stdio.h functions that are normally implemented as macros become proper re-entrant
       safe functions.
=>     The variable errno, from errno.h, is changed to call a function, which can determine the real
       errno value in a multithread safe way.

the terms process, thread, and task are used rather interchangeably in the Linux kernel
 ==> struct_task
 
==================pthread library functions=====================================
#include <pthread.h>
int pthread_create(pthread_t *thread, pthread_attr_t *attr, void
*(*start_routine)(void *), void *arg);
Using fork causes execution to continue in the same location with a dif-
ferent return code, whereas using a new thread explicitly provides a pointer to a function where the new
thread should start executing.

pthread_create
pthread_exit
pthread_join: it is a thread equivalent of wait, so the stack of the thread is not cleared until the the main thread invokes
       this function.

everything except local function variables are shared between the different threads in a process

synchronization with semaphores
sem_init
sem_wait
sem_post

synchronization with mutexes
pthread_mutex_init
pthread_mutex_lock
pthread_mutex_unlock
pthread_mutex_destroy

Thread Attributes
int pthread_attr_init(pthread_attr_t *attr);
int pthread_attr_setxxxxxxx
int pthread_attr_setschedpolicy
int pthread_attr_setschedparam

Cancelling a thread

==============================================================
Related System Calls
kill
exec
fork
clone
tkill
tgkill

On Linux, the system call clone clones a task, with a configurable level of sharing, among which are:

    * CLONE_FILES: share the same file descriptor table (instead of creating a copy)
    * CLONE_PARENT: don't set up a parent-child relationship between the new task and the old (otherwise, child's getppid() = parent's getpid())
    * CLONE_VM: share the same memory space (instead of creating a COW copy)

fork() calls clone(least sharing) and pthread_create() calls clone(most sharing). **

 


================================================================
=======   Thread Model                          ================
================================================================

Kernel thread
A kernel thread is a way to implement background tasks inside the kernel.
Kernel threads are similar to user processes, except that they live in kernel space and have access to kernel functions and data structures. Like user processes, kernel threads appear to monopolize the processor because of preemptive scheduling.
***********************************************************************
Difference between processes and threads
The difference between processes and threads under Linux 2.4 is that threads share more parts of their state (address space, file handles etc) than processes, which usually don't. The NPTL under Linux 2.6 makes this a bit clearer by giving them "thread groups" which are a bit like "processes" in win32 and Solaris.

Perhaps the important difference is that in Windows processes are heavy and expensive compared to threads, and in Linux the difference is much smaller, so the equation balances at a different point.

鍏朵腑鏈塒ID 鍜孴GID. PID瀹為檯涓婄被浼间簬WINDOWS涓殑THREAD ID銆傝€孴GID 锛坱head group id) 瀵瑰簲浜嶹INDOWS涓殑PID銆侾ID瀵逛簬鐙珛鐨凱ROCESS鏉ヨ锛屽氨鏄畠鐨凱ID銆傝繖鏃禤ID == TGID銆?
瀵逛簬鍜屽叾浠朠ROCESS鍏变韩鍦板潃绌洪棿鐨凱ROCESS鏉ヨ锛屾瘡涓兘鏈夌嫭绔嬬殑PID,浣嗘槸浠栦滑鐨凾GID鏄竴鏍风殑銆?

***********************************************************************
Linux uses a 1-1 threading model, with (to the kernel) no distinction between processes and threads -- everything is simply a runnable task.
***********************************************************************
On the cost of creating and switching between processes/threads
forking costs a tiny bit more than pthread_createing because of copying tables and creating COW mappings for memory, but the Linux kernel developers have tried (and succeeded) at minimizing those costs.

Switching between tasks, if they share the same memory space and various tables, will be a tiny bit cheaper than if they aren't shared, because the data may already be loaded in cache. However, switching tasks is still very fast even if nothing is shared -- this is something else that Linux kernel developers try to ensure (and succeed at ensuring).

In fact, if you are on a multi-processor system, not sharing may actually be beneficial to performance: if each task is running on a different processor, synchronizing shared memory is expensive.
***********************************************************************
Security Comparison between Processes and Threads
Conversely, processes are safer and more secure than threads, because each process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
***********************************************************************
Models
[edit] 1:1 (Kernel-level threading)
currently adopted by Linux

Threads created by the user are in 1-1 correspondence with schedulable entities in the kernel. This is the simplest possible threading implementation. Win32 used this approach from the start. On Linux, the usual C library implements this approach (via the NPTL or older LinuxThreads). The same approach is used by Solaris, NetBSD and FreeBSD.

[edit] N:1 (User-level threading)

An N:1 model implies that all application-level threads map to a single kernel-level scheduled entity; the kernel has no knowledge of the application threads. With this approach, context switching can be done very quickly and, in addition, it can be implemented even on simple kernels which do not support threading. One of the major drawbacks however is that it cannot benefit from the hardware acceleration on multi-threaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time. For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be utilized. The GNU Portable Threads uses User-level threading.

[edit] M:N (Hybrid threading)

M:N maps some N number of application threads onto some M number of kernel entities, or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In general, "M:N" threading systems are more complex to implement than either kernel or user threads, because changes to both kernel and user-space code are required. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.
***********************************************************************
Two ways for operating system to schedule threads
1. preemptive scheduling (currently adopted by linux)
2. cooperative scheduling
***********************************************************************
processes, kernel threads, user threads and fibers
A kernel thread is the "lightest" unit of kernel scheduling. At least one kernel thread exists within each process.
Kernel threads do not own resources except for a stack, a copy of the registers including the program counter, and thread-local storage (if any).
Threads are sometimes implemented in userspace libraries, thus called user threads. The kernel is not aware of them, so they are managed and scheduled in userspace.
Fibers are an even lighter unit of scheduling which are cooperatively scheduled.
***********************************************************************
Thread and fiber issues
User thread or fiber implementations are typically entirely in userspace. As a result, context switching between user threads or fibers within the same process is extremely efficient because it does not require any interaction with the kernel at all: a context switch can be performed by locally saving the CPU registers used by the currently executing user thread or fiber and then loading the registers required by the user thread or fiber to be executed.

However, the use of blocking system calls in user threads (as opposed to kernel threads) or fibers can be problematic: one thread blocks, the others may also blocks because the scheduling is performed in user space.

A common solution to the problem of blocking system calls in user threads is to provide an API that implements a synchronous interface by using non-blocking operations internally.

================================================================
=======   NPTL Design                           ================
================================================================
The First Implementation
=> use processes to emulate threads, these processes share almost all resources
=> no usable synchronization primitives in the kernel
 => use signals to implement thread synchronization
=> no concept of 'kernel thread groups' in the kernel
All these lead to non-compliant and fragile signal handling in the thread library.

API and ABI
ABI: application binary interface
API: application programming interface

Problem with the existing implementation
=> the signal system is severely broken
=> high latency and complexity (because of the use of signals to implement synchronization primitives)
=> compatibility problems with other POSIX thread implementation (because of the abscense of the concept 'kernel thread groups'
On the kernel side there are also problems
=> too many processes
=> misuse of signals

Design Decisions
1. 1-on-1 vs M-on-N
 => 1-on-1 model is better suited for Linux.
2. There should be no manager thread, because it would cause serious scalability problem and it will add unnecessary complexity into the design.
3. Memory Allocation
 => The thread data structure and the thread-local storage is placed on the stack.
 
================================================================
=======   Understanding the Linux Kernel, Chapter 3                           ================
================================================================
M-on-1 model, user-level scheduling
(cannot benefit from SMP structure)
              ||
              \/
1-on-1 model, LinuxThread
(have problem with signal system and system performance bottleneck caused by manager thread)
  ||
  \/
1-on-1 model, NPTL
(support the concept of "thread groups)

getpid() return tgid instead of pid of a task

struct task_struct
{
 volatile long state;
 pid_t pid;
 pid_t tgid;
 struct task_struct *group_leader;
 /* PID/PID hash table linkage. */
 struct pid_link pids[PIDTYPE_MAX];
 struct list_head thread_group;
 struct thread_struct thread;
}

struct pid_link
{
 struct hlist_node node;
 struct pid *pid;
};

struct thread_struct {
 unsigned long ksp; /* Kernel stack pointer. */
 unsigned long usp; /* User stack pointer. */
 unsigned long ccs; /* Saved flags register. */
};

union thread_union {
 struct thread_info thread_info;
 unsigned long stack[THREAD_SIZE/sizeof(long)];
};

static struct hlist_head *pid_hash;
static unsigned int pidhash_shift = 4;

================================================================
=======   Understanding the Linux Kernel, Chapter 11                          ================
================================================================
The POSIX standard has some stringent requirements for signal handling of multithreaded application.
=> signal handlers must be shared among all threads of a multithreaded application; however, each thread
     should have its own mask of pending and blocked signals. (RQ1)
=> kill() and sigqueue() must send signals to whole multithreaded application (RQ2)
=> Each signal sent to a multithreaded application will be delivered to only one thread, which is arbitrarily
 chosen by the kernel. (RQ3)
=> if a fatal signal is sent to a multithreaded application, the kernel will kill all threads of the application. (RQ4)

struct task_struct
{
 struct signal_struct *signal;
 struct sighand_struct *sighand;
 sigset_t blocked, real_blocked;
 struct sigpending pending;

 unsigned long sas_ss_sp;
 size_t sas_ss_size;
 int (*notifier)(void *priv);
 void *notifier_data;
 sigset_t *notifier_mask;
}

typedef struct {
 unsigned long sig[2];//64 bits
} sigset_t;

In signal_struct, we can see that many fields are related with the concept of 'thread group'.

The signal descriptor is shared by all processes belonging to the same thread group, that is, all processes
that are created by invoking the clone() system call with CLONE_THREAD flag setup.
clone
 => sys_clone
  => do_fork
   => copy_process
   if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
    return ERR_PTR(-EINVAL);
   if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
    return ERR_PTR(-EINVAL);
    => copy_sighand
    if (clone_flags & CLONE_SIGHAND) {
     atomic_inc(&current->sighand->count);
     return 0;
     }
    => copy_signals
    if (clone_flags & CLONE_THREAD)
     return 0;
    if (clone_flags & CLONE_THREAD) {
    current->signal->nr_threads++;
    atomic_inc(&current->signal->live);
    atomic_inc(&current->signal->sigcnt);
    p->group_leader = current->group_leader;
    list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
    }


  

 

 

 

===========================================
Questions

1. Is thread a lightweight process in Linux? What's the difference between a thread and a process?

2. Say, there's a single-threaded process A and a multi-threaded process B, are these two processes
   scheduled differently by the Kernel? Do they have the same average CPU sleep time?

3. How to display thread info in Linux?

4. How does the Kernel schedule processes and threads?

5. How does Linux support threads?

6. Is signal delivery shared between threads in a multi-threaded application?

7. How does the kernel handle signals in a multi-threaded application?

8. How to get the process descriptor in O(1) according to PID?

9. How does the Kernel satisfy requirements made by POSIX standard?

 

 

 

===========================================
Analysis
1. Yes. A thread is a lightweight process in Linux. Threads usually share many resources with the process that
   creates them. However, child processes have a copy of most resources with the parent process.

2. According to question 1, we know that a thread is a process in nature. We also know that when
   the kernel creates a child process, the parent process's CPU time is divided into two parts.
   This policy avoids mature programs to occupy CPU. Thus, we say that, if we don't change the scheduling policy
   of the threads in B, A and B have almost the same average CPU sleep time.

3. ps -eLf

4. In kernel, both threads and processes are presented by task_struct data structure. All tasks (both processes and threads) are scheduled independently.
    This provides the most flexibility to schedule execution context.

5. => set up the concept of 'thread groups'
    => rewrite several system calls, such as kill, getpid, etc
    => creation
    => synchronization
    => destruction
    => signals

6. refer to 9

7. refer to 9

8. pid_hash (use 4 hash tables and chained lists)

9. RQ1: If CLONE_THREAD is set, CLONE_SIGHAND must be set which implies CLONE_VM must be set.
  According to the source code of copy_process() function above, we can see that if CLONE_THREAD is
  set, the child process will share memory address with the parent process as well as signal descriptor.
  The copy_process function only increases its reference counter.
    RQ2: In the underlying signal sending routine __send_signal(), there's a parameter named 'group' which specifies
      whether the signal should be sent to the whole thread group all just the specific process. By saying sending
      signals, we mean updating some data structures in the destination process's process descriptor. Thus, the paramter
      'group' specifies whether the shared pending signal list or the private pending signal list should be updated.
static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
   int group, int from_ancestor_ns)
{
 struct sigpending *pending;
 struct sigqueue *q;
 pending = group ? &t->signal->shared_pending : &t->pending;
 if (legacy_queue(pending, sig))
  return 0;
 q = __sigqueue_alloc(sig, t, GFP_ATOMIC | __GFP_NOTRACK_FALSE_POSITIVE,
  override_rlimit);
 if (q) {
  list_add_tail(&q->list, &pending->list);
  //update p->info
  }
 sigaddset(&pending->signal, sig);
}

 RQ3 and RQ4: The kernel scans the processes in the thread group, looking for a process to receive the signal and wakes it up; the process will be selected
  if it satisfy some certain conditions.
  __send_signal
   => update signal-related data structure in the process
   => complete_signal(sig, t, group);

static void complete_signal(int sig, struct task_struct *p, int group)
{
 struct signal_struct *signal = p->signal;
 struct task_struct *t;

 if (wants_signal(sig, p)) //if (sig == SIGKILL), this function always returns 1
  t = p;
 else if (!group || thread_group_empty(p))
  return; //there's only one thread in this thread group, and it does not want thiss signal, return, that is, do nothing
  //the signal is still in the pending list but __send_signal function does not wake any process up
 else {
  /*
   * Otherwise try to find a suitable thread.
   */
  t = signal->curr_target;
  while (!wants_signal(sig, t)) {
   t = next_thread(t);
   if (t == signal->curr_target)//we have iterated over all threads in this group
    /*
     * No thread needs to be woken.
     * Any eligible threads will see
     * the signal in the queue soon.
     */
     //see wants_signal for more details
    return;
  }
  signal->curr_target = t;
  
  if (/*signal is fatal*/)
  {
   /*
    * Start a group exit and wake everybody up.
    * This way we don't have other threads
    * running and doing things after a slower
    * thread has the fatal signal pending.
    */
   signal->flags = SIGNAL_GROUP_EXIT;
   signal->group_exit_code = sig;
   signal->group_stop_count = 0;
   t = p;
   do {
    sigaddset(&t->pending.signal, SIGKILL);
    signal_wake_up(t, 1);
   } while_each_thread(p, t);
   return;
  }
  /*
  * The signal is already in the shared-pending queue.
  * Tell the chosen thread to wake up and dequeue it.
  */
 signal_wake_up(t, sig == SIGKILL);
 return;
}

static inline int wants_signal(int sig, struct task_struct *p)
{
 if (sigismember(&p->blocked, sig))
  return 0;
 if (p->flags & PF_EXITING)
  return 0;
 if (sig == SIGKILL)
  return 1;
 if (task_is_stopped_or_traced(p))
  return 0;
 return task_curr(p) || !signal_pending(p); //this means if p's TIF_SIGPENDING has been set, p does not want
 //any signal except SIGKILL; this works well because once p is availabe and it tries to return from kernel mode to user mode,
 //TIF_SIGPENDING will be checked and it will deal with pending signals if necessary.
 //do_notify_resume -> do_signal
}

  


===========================================
Tasks

1. Write a program which performs automatic backup while still editing the text. (pratice multithread programming)

2. Design a virtual process scheduling system to emulate kernel scheduling of multithreaded application.

 

============================================
Key Words

Thread Model
NPLT
POSIX Thread
Kernel thread
signal handler table
interprocess communication
interthread communication
thread registers
Locating thread-local data
Linux Signals
real-time signals
__send_signal
signal.c

 

 

==================================================
References

http://www.linuxdiyf.com/viewarticle.php?id=19984
(kernel thread)

http://stackoverflow.com/questions/807506/threads-vs-processes-in-linux
(Threads vs Processes in Linux)

http://www.linuxjournal.com/article/3814
(The Linux Process Model)

http://en.wikipedia.org/wiki/Thread_%28computer_science%29

Beginning Linux Programming, 4th edition

The Native POSIX Thread Library for Linux
(nptl-design.pdf)

Understanding the Linux Kernel, 3rd edition
(chapter 3, 7, 11)

2.4.31 source code
2.6.35 source code

http://blog.sina.com.cn/s/blog_508d2c500100gdnp.html
(hlist_head/hlist_node)

 

 

你可能感兴趣的:(processes, threads and signals in Linux)