October 23th Friday

  Today I have read an article on "thundering herd" problem in the system call accept().  As a rule, in order to implement a server program to deal with throusands request at once, we adapted multiple thread or processes to wait for increasing  number of concurrent incoming connections.  By pre-creating these multiple threads, a network server can handle connections and requests at a faster rate than with a single thread.

 

  In Linux, when multiple threads use the system call accept() on the same TCP socket, they use the same wait queue, waiting for an incoming connection to wake them up.  In the Linux 2.2.9 kernel (and eariler), when an incoming TCP connection is accepted, the wake_up_interruptible() function is invoked to awaken waiting threads.  This function walks the socket's wait queue and awakens everybody. All but one of the threads, however, will put themselves back on the wait queue to wait for the next connection. This unnecessary awakening is commonly referred to as a "thundering herd" problem and creates scalability problems for network server applications.

 

  The socket structure in Linux contains a virtual operations vector, similar to VFS inodes, that lists six methods (referred to as call-backs in some kernel comments). These methods are initially pointed to a set of generic functions for all sockets when each socket is created. Each socket protocol family (e.g., TCP) has the option to override these default functions and point the method to a function specific to the protocol family. TCP overrides just one of these methods for TCP sockets. The four most commonly-used socket methods for TCP sockets are as follows:

 

  • sock->state_change.................... (pointer to sock_def_wakeup)
  • sock->data_ready...................... (pointer to sock_def_readable)
  • sock->write_space..................... (pointer to tcp_write_space)
  • sock->error_report.................... (pointer to sock_def_error_report
  • The code for each one of these methods invokes the wake_up_interruptible() function. This means that every time

    one of these methods is called, tasks may be unnecessarily awakened. In fact, in the accept() call alone, Linux invokes

    three of these methods, essentially tripling impact of the "thundering herd" problem. The three methods invoked in

    every call to accept() in the 2.2.9 kernel are tcp_write_space(), sock_def_readable() and sock_def_wakeup(), in that order.

    Because the most frequently used socket methods call wake_up_interruptible(), the thundering herd problem extends

    beyond the accept() system call and into the rest of the TCP code. In fact, it is rarely necessary for these methods to

    wake up the entire wait queue. Thus, almost any TCP socket operation unnecessarily awakens tasks and returns them

    to sleep. This inefficient practice robs valuable CPU cycles from server applications.

     

     

    中文解释:

     

    在多线程和多进程模式开发的服务器程序中,使用accept()接受连接,由于Linux2.2.9内核中默认的处理函数是对一个建立的连接将会唤醒

    所有等待的线程或进程。因而这就导致一个“惊群”问题,譬如,100个线程中只有一个线程选出来响应连接,同时这也导致很多不必要的操作,

    不仅浪费时间也浪费CPU资源。

    你可能感兴趣的:(October 23th Friday)