相信用C/C++写过服务的同学对通过响应Ctrl-C(信号量SIG_TERM)实现多线程C进程的优雅退出都不会陌生,典型的实现伪码如下:
#include
int main(int argc, char * argv[])
{
// 1. do some init work
... init() ...
// 2. install signal handler, take SIGINT as example install:
struct sigaction action;
memset(&action, 0, sizeof(struct sigaction));
action.sa_handler = term;
sigaction(SIGTERM, &action, NULL);
// 3. create thread(s)
pthread_create(xxx);
// 4. wait thread(s) to quit
pthread_join(thrd_id);
return 0;
}
主要步骤概括如下:
1)设置SIG_TERM信号量的处理函数
2)创建子线程
3)主线程通过pthread_join阻塞
4)主线程收到Ctrl-C对应的SIG_INT信号后,信号处理函数被调用,该函数通常会设置子线程的退出flag变量
5)子线程的spin-loop通常会检测flag变量,当flag指示退出时机已到时,子线程break其loop
6)待子线程释放资源退出后,主线程的pthread_join()结束阻塞返回,主线程退出,进程销毁
然而,如果用Python多线程库(threading或thread)实现一个与上述伪码流程相似的多线程模块时,新手很容易犯错,导致进程启动后,Ctrl-C不起作用,甚至kill也结束不了进程,必须kill -9强杀才行。
下面用实例来说明。
常见错误1:试图捕获Ctrl-C的KeyboardInterrupt异常实现进程退出,示例伪码如下:
def main():
try:
thread1.start()
thread1.join()
except KeyboardInterrupt:
print "Ctrl-c pressed ..."
sys.exit(1)
上面的伪码在主线程中创建子线程后调用join主动阻塞,预期的行为是按下Ctrl-C时,进程能捕获到键盘中断异常。然而,根据Python thread文档Caveats部分的说明,”Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)”,所以,上面伪码由于没有import signal模块,键盘中断只能由主线程接收,而主线程被thread.join()阻塞导致无法响应中断信号,最终的结果是进程无法通过Ctrl-C结束。
解决方法:
调用thread.join()时,传入timeout值并在spin-loop做isAlive检测,示例如下:
def main():
try:
thread1.start()
## thread is totally blocking e.g. while (1)
while True:
thread1.join(2)
if not thread1.isAlive:
break
except KeyboardInterrupt:
print "Ctrl-c pressed ..."
sys.exit(1)
上面的方案可以实现用Ctrl-C退出进程的功能,不过,如Python Issue1167930中有人提到的,带timeout的join()调用会由于不断的polling带来额外的CUP开销。
常见错误2:注册SIG_INT或SIG_TERM信号处理函数,试图通过捕获Ctrl-C或kill对应的信号来销毁线程、退出主进程。示例如下:
#/bin/env python
#-*- encoding: utf-8 -*-
import sys
import time
import signal
import threading
def signal_handler(sig, frame):
g_log_inst.get().warn('Caught signal: %s, process quit now ...', sig)
sys.exit(0)
def main():
## install signal handler, for gracefully shutdown
signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)
## start thread
thrd = threading.Tread(target = thread_routine)
thrd.start()
## block主线程,否则主线程直接退出会导致其创建的子线程出现未定义行为
## 具体可以参考python thread文档Caveats部分的说明。
thrd.join()
if '__main__' == __name__:
main()
如果运行上述伪码,你会发现Ctrl-C和kill均无法结束这个多线程的Python进程,那么,问题出在哪里呢?
根据Python signal文档,我们需要牢记”Some general rules for working with signals and their handlers“,其中,跟本文最相关的第7条摘出如下:
Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead.
如上所述,当signal和threading模块同时出现时,所有的信号处理函数都只能由主线程来设置,且所有的信号均只能由主线程来接收,而主线程调用thrd.join()后处于阻塞状态。
所以,Ctrl-C或kill不起作用的原因与前面列出的常见错误1类似,解决方法也类似,这里不赘述。
[1] Python Doc: thread Caveats
[2] Python Doc: threading
[3] Python Issue: threading.Thread.join() cannot be interrupted by a Ctrl-C
[4] Python Doc: signal
============== EOF =============