需求:现有爬虫程序(名为CNSubAllInd),需要使其一直保持在后台运行(如果执行完毕,立即重新启动,继续执行),并记录其运行日志。
利用python的logging模块来记录日志,利用subprocess模块来和系统交互执行命令,检测到子程序结束运行之后,重新开启子程序。
代码如下keeprunning.py(CNSubAllInd就是需要保持在后台运行的程序):
# -*- coding: UTF-8 -*-
#!DATE: 2018/10/9
#!@Author: yingying
#keeprunning.py
import os
import subprocess
# logging
# require python2.6.6 and later
import logging
from logging.handlers import RotatingFileHandler
## log settings: SHOULD BE CONFIGURED BY config
LOG_PATH_FILE = "D:\workspace\PyCharmProject\CompanyInfoSpider\my_service_mgr.log"
LOG_MODE = 'a'
LOG_MAX_SIZE = 10 * 1024 * 1024 # 10M per file
LOG_MAX_FILES = 10 # 10 Files: my_service_mgr.log.1, printmy_service_mgrlog.2, ...
LOG_LEVEL = logging.DEBUG
LOG_FORMAT = "%(asctime)s %(levelname)-10s[%(filename)s:%(lineno)d(%(funcName)s)] %(message)s"
handler = RotatingFileHandler(LOG_PATH_FILE, LOG_MODE, LOG_MAX_SIZE, LOG_MAX_FILES)
formatter = logging.Formatter(LOG_FORMAT)
handler.setFormatter(formatter)
Logger = logging.getLogger()
Logger.setLevel(LOG_LEVEL)
Logger.addHandler(handler)
# color output
#
pid = os.getpid()
def print_error(s):
print '\033[31m[%d: ERROR] %s\033[31;m' % (pid, s)
def print_info(s):
print '\033[32m[%d: INFO] %s\033[32;m' % (pid, s)
def print_warning(s):
print '\033[33m[%d: WARNING] %s\033[33;m' % (pid, s)
def start_child_proc(command, merged):
try:
if command is None:
raise OSError, "Invalid command"
child = None
if merged is True:
# merge stdout and stderr
child = subprocess.Popen(command)
# child = subprocess.Popen(command,
# stderr=subprocess.STDOUT, # 表示子进程的标准错误也输出到标准输出
# stdout=subprocess.PIPE # 表示需要创建一个新的管道
# )
else:
# DO NOT merge stdout and stderr
child = subprocess.Popen(command)
# child = subprocess.Popen(command,
# stderr=subprocess.PIPE,
# stdout=subprocess.PIPE)
return child
except subprocess.CalledProcessError:
pass # handle errors in the called executable
except OSError:
raise OSError, "Failed to run command!"
def run_forever(command):
print_info("start child process with command: " + ' '.join(command))
Logger.info("start child process with command: " + ' '.join(command))
merged = False
child = start_child_proc(command, merged)
failover = 0
while True:
while child.poll() != None:
failover = failover + 1
print_warning("child process shutdown with return code: " + str(child.returncode))
Logger.critical("child process shutdown with return code: " + str(child.returncode))
print_warning("restart child process again, times=%d" % failover)
Logger.info("restart child process again, times=%d" % failover)
child = start_child_proc(command, merged)
# read child process stdout and log it
out, err = child.communicate()
returncode = child.returncode
if returncode != 0:
for errorline in err.slitlines():
Logger.info(errorline)
else:
Logger.info("execute child process failed")
Logger.exception("!!!should never run to this!!!")
if __name__ == "__main__":
run_forever(['scrapy', 'crawl', 'CNSubAllInd'])
windows中运行方式:在命令行中输入start pythonw keeprunning.py命令,之后便会打开pythonw窗口如下:
注意:这个窗口是关不掉的,因为有keeprunning在后台运行,一旦检测到爬虫程序结束了,就会重新打开一个窗口(也即重新开启程序)。想要关闭的话,只能在任务管理器中关闭pythonw.exe程序,便停止了监控,当前爬虫程序执行完毕之后便结束爬虫。
但是原作者提供的通过read来获取执行输出结果的方法(如下),我使用的时候会出现deadlock,每次就卡在read这里不往下执行了。
while True:
while child.poll() != None:
failover = failover + 1
print_warning("child process shutdown with return code: " + str(child.returncode))
Logger.critical("child process shutdown with return code: " + str(child.returncode))
print_warning("restart child process again, times=%d" % failover)
Logger.info("restart child process again, times=%d" % failover)
child = start_child_proc(command, merged)
# deadlock!!!
ch = child.stdout.read(1)
if ch != '' and ch != '\n':
line += ch
if ch == '\n':
print_info(line)
line = ''
查了相关资料以及官方文档之后,Python Popen().stdout.read() hang发现问题就出在这里,按照官方文档的解释是之所以调用.stdout会卡死,是因为在读完最后一行后管道空了。
为了防止出现这样的情况应该使用communicate()来代替.stdout.read(),communicate的使用见官方文档