最近写了一个wiki看门狗(wiki-watchdog), 作用就是监控wiki的改动,然后通过钉钉机器人发送到群组。因为脚本健壮性的问题,代码有可能会不定期crash掉,所以需要一个能在脚本crash后及时恢复的服务,查了查发现supervisor挺合适。
简介
Supervisor是一款用于管理和监控类 UNIX 操作系统上面的进程工具,基于Python开发,典型的Client/Server架构。其中:
- supervisord 用于server端启动服务;
- supervisorctl 相当于client,用于连接server端来间接对进城进行操作
- echo_supervisord_conf 类似于一个文档,详细展示配置项的含义,对于像我这样初识的人比较友好。
安装
Supervisor 基于Python开发,所以可以通过easy_install和pip的方式进行安装。另外Ubuntu用户也可以方便的使用apt进行安装。具体就不介绍了,网上教程一大堆。
加载
在配置supervisord服务的时候,没有指定配置文件的情况下会有如下搜索路径。
$CWD/supervisord.conf
$CWD/etc/supervisord.conf
/etc/supervisord.conf
/etc/supervisor/supervisord.conf (since Supervisor 3.3.0)
../etc/supervisord.conf (Relative to the executable)
../supervisord.conf (Relative to the executable)
relative,就是指执行supervisord命令的PWD对应的路径。
管理配置
在/etc/supervisor/supervisord.conf的末尾,详细的写着这么一段话。
; The [include] section can just contain the "files" setting. This
; setting can list multiple files (separated by whitespace or
; newlines). It can also contain wildcards. The filenames are
; interpreted as relative to this file. Included files *cannot*
; include files themselves.
[include]
files = /etc/supervisor/conf.d/*.conf
所以在管理多个进程的时候,就可以直接以.conf结尾扔到conf.d目录下。supervisor会自动读取和加载配置,然后管理我们的服务。
管理进程
supervisor本身的偶用就是帮助我们来管理服务的,所以我们要对conf.d立面的配置文件认真对待。前面通过echo_supervisord_conf命令我们可以看到有这样的配置项。
; The below sample program section shows all possible program subsection values,
; create one or more 'real' program: sections to be able to control them under
; supervisor.
;[program:theprogramname]
;command=/bin/cat ; the program (relative uses PATH, can take args)
;process_name=%(program_name)s ; process_name expr (default %(program_name)s)
;numprocs=1 ; number of processes copies to start (def 1)
;directory=/tmp ; directory to cwd to before exec (def no cwd)
;umask=022 ; umask for process (default None)
;priority=999 ; the relative start priority (default 999)
;autostart=true ; start at supervisord start (default: true)
;startsecs=1 ; # of secs prog must stay up to be running (def. 1)
;startretries=3 ; max # of serial start failures when starting (default 3)
;autorestart=unexpected ; when to restart if exited after running (def: unexpected)
;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2)
;stopsignal=QUIT ; signal used to kill process (default TERM)
;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)
;stopasgroup=false ; send stop signal to the UNIX process group (default false)
;killasgroup=false ; SIGKILL the UNIX process group (def false)
;user=chrism ; setuid to this UNIX account to run the program
;redirect_stderr=true ; redirect proc stderr to stdout (default false)
;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO
;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10)
;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stdout_events_enabled=false ; emit events on stdout writes (default false)
;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO
;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10)
;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stderr_events_enabled=false ; emit events on stderr writes (default false)
;environment=A="1",B="2" ; process environment additions (def no adds)
;serverurl=AUTO ; override serverurl computation (childutils)
参数很多,但是我们不必全部用到,下面拿个例子来一起看看配置文件的写法。
[program:mytest]
command=bash /etc/supervisor/conf.d/mybash.sh
directory=/etc/supervisor/conf.d
stdout_file=/tmp/stdout.log
startsecs=0
autostart=false
autorestart=false
经过我的测试,command中的执行脚本最好是绝对路径。
示例
在让supervisor帮我们管理进程的之前,我们要确保supervisord服务已经正确开启了。
root@Server218 ~# ps aux | grep supervisor
root 20090 0.0 0.9 59580 19464 ? Ss 19:58 0:00 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf
root 22776 0.0 0.0 14196 860 pts/1 S 20:51 0:00 grep --color=auto supervisor
具体管理其他进程需要通过client,也就是supervisorctl来实现,格式为:
supervisorctl start programname
supervisorctl stop programname
supervisorctl restart programname
下面简单写一个shell脚本,略微“耗时”吧。
#!/usr/bin bash
i=1
while [ $i -le 100 ]
do
let i++
echo $i
sleep 1
done
开启服务:
root@Server218 /e/s/conf.d# supervisorctl start mytest
mytest: started
查看状态
root@Server218 /e/s/conf.d# supervisorctl
mytest RUNNING pid 22937, uptime 0:00:21
supervisor> status
mytest RUNNING pid 22937, uptime 0:00:27
supervisor> help
default commands (type help ):
=====================================
add exit open reload restart start tail
avail fg pid remove shutdown status update
clear maintail quit reread signal stop version
关闭服务
root@Server218 /e/s/conf.d# supervisorctl stop mytest
mytest: stopped
这样就完成了对外部服务的管理了。
遇到的问题
1 supervisor.sock refused connection.
解决办法:supervisord重启下supervisord的服务。
2 unix:///tmp/supervisor.sock no such file
解决办法:加权限 chmod 777 /xxx/supervisor.sock
这里把所有的/tmp路径改掉,/tmp/supervisor.sock 改成 /var/run/supervisor.sock,/tmp/supervisord.log 改成 /var/log/supervisor.log,/tmp/supervisord.pid 改成 /var/run/supervisor.pid 要不容易被linux自动清掉
3 启动报错 IOError: [Errno 13] Permission denied: '/var/log/supervisord.log'
解决办法: 给文件或者目录加可写权限, 然后记得重启下supervisord的服务。