【产生背景】
在一个分布式环境中,每台机器上可能需要启动和停止多个进程,使用命令行方式一个一个手动启动和停止非常麻烦,而且查看每个进程的状态也很不方便。如果有一个工具能够实现每台机器上多个进程的简单高效中心化管理将是非常方便的。于是Supervisord工具应运而生。与Supervisord类似的工具包括monit, daemontools和runit。本文只涉及Supervisord。
【简介】
supervisord的官网:http://supervisord.org。看懂英文的可以不用看我的博客,直接看文档就行了,文档写得非常好。点个赞!!
Supervisor是一个客户/服务器系统,它可以在类Unix系统中管理控制大量进程。Supervisor使用python开发,有多年历史,目前很多生产环境下的服务器都在使用Supervisor。
Supervisor的服务器端称为supervisord,主要负责在启动自身时启动管理的子进程,响应客户端的命令,重启崩溃或退出的子进程,记录子进程stdout和stderr输出,生成和处理子进程生命周期中的事件。可以在一个配置文件中配置相关参数,包括Supervisord自身的状态,其管理的各个子进程的相关属性。配置文件一般位于/etc/supervisord.conf。
Supervisor的客户端称为supervisorctl,它提供了一个类shell的接口(即命令行)来使用supervisord服务端提供的功能。通过supervisorctl,用户可以连接到supervisord服务器进程,获得服务器进程控制的子进程的状态,启动和停止子进程,获得正在运行的进程列表。客户端通过Unix域套接字或者TCP套接字与服务端进行通信,服务器端具有身份凭证认证机制,可以有效提升安全性。当客户端和服务器位于同一台机器上时,客户端与服务器共用同一个配置文件/etc/supervisord.conf,通过不同标签来区分两者的配置。
Supervisor也提供了一个web页面来查看和管理进程状态,这个功能用得人比较少。
【平台要求】
Supervisor可以运行在大多数Unix系统上,但不支持在Windows系统上运行。
Supervisor需要Python2.4及以上版本,但任何Python 3版本都不支持。
【安装】
当前Supervisor的最高版本是3.0,之前尝试使用2.x版本管理实验集群中的若干mdrill进程,发现使用客户端无法有效启动和停止服务器端管理的各个子进程,从网上搜索错误发现2.x版本有一些bug,建议升级到3.0版本。因此我卸载了2.x版本,重新安装了3.0版本,发现3.0版本很好使。3.0版本相对2.x版本,配置文件不同部分的配置项都发生了变化,详见官方文档。
我使用的操作系统是CentOS 6.4, 默认已经安装了Python,且是2.6.6版本。如果Python没有安装,请自行安装符合要求的版本。
1. 安装easy_install
sudo yum install python-setuptools-devel
2. 安装Supervisor
easy_install supervisor
3. 生成配置文件
echo_supervisord_conf > /etc/supervisord.conf
经过上面两步,Supervisor就安装好了。配置文件模版已经在/etc/supervisord.conf中生成。当然配置文件的位置不一定放在/etc/supervisord.conf中。在不指定配置文件位置的情况下,supervisord和supervisorctl会按照以下三个顺序来搜索配置文件的位置。
接下来就可以根据需要修改配置文件中的相应部分,对需要管理的进程进行配置了。具体的配置可以参考下一部分。
【配置】
我在一个三个结点的集群中部署了mdrill,三台主机名称分别为cma01, cma02, cma03,在cma01和cma02上打算启动Supervisor进程(这里的Supervisor进程是storm的进程,与supervisor进程管理工具不是一回事),在cma03上打算启动NimbusServer,Mdrillui和Supervisor三个进程。在三台主机上都安装supervisor进程控制系统后,分别在每台主机的/etc/supervisord.conf文件中配置每台主机要管理的进程。具体配置如下,其中以;开始的行为注释行:
[cma01 /etc/supervisord.conf]
; Sample supervisor config file. ; ; For more information on the config file, please see: ; http://supervisord.org/configuration.html ; ; Note: shell expansion ("~" or "$HOME") is not supported. Environment ; variables can be expanded using this syntax: "%(ENV_HOME)s". [unix_http_server] file=/tmp/supervisor.sock ; (the path to the socket file) ;chmod=0700 ; socket file mode (default 0700) ;chown=nobody:nogroup ; socket file uid:gid owner ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [inet_http_server] ; inet (TCP) server disabled by default port=cma01:9001 ; (ip_address:port specifier, *:port for all iface) ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [supervisord] logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log) logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB) logfile_backups=10 ; (num of main logfile rotation backups;default 10) loglevel=info ; (log level;default info; others: debug,warn,trace) pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid) nodaemon=false ; (start in foreground if true;default false) minfds=1024 ; (min. avail startup file descriptors;default 1024) minprocs=200 ; (min. avail process descriptors;default 200) ;umask=022 ; (process file creation umask;default 022) ;user=chrism ; (default is current user, required if root) ;identifier=supervisor ; (supervisord identifier, default is 'supervisor') ;directory=/tmp ; (default is not to cd during start) ;nocleanup=true ; (don't clean up tempfiles at start;default false) ;childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP) ;environment=KEY="value" ; (key value pairs to add to environment) ;strip_ansi=false ; (strip ansi escape codes in logs; def. false) ; the below section must remain in the config file for RPC ; (supervisorctl/web interface) to work, additional interfaces may be ; added by defining them in separate rpcinterface: sections [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket ;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket ;username=chris ; should be same as http_username if set ;password=123 ; should be same as http_password if set ;prompt=mysupervisor ; cmd line prompt (default "supervisor") ;history_file=~/.sc_history ; use readline history if available ; The below sample program section shows all possible program subsection values, ; create one or more 'real' program: sections to be able to control them under ; supervisor. ;[program:theprogramname] ;command=/bin/cat ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=999 ; the relative start priority (default 999) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10) ;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions (def no adds) ;serverurl=AUTO ; override serverurl computation (childutils) [program:spvs-01] command=/root/alimama/adhoc-core/bin/bluewhale supervisor process_name=%(program_name)s numprocs=1 user=root autostart=true autorestart=true startsecs=5 startretries=3 redirect_stderr=true stdout_logfile=/root/alimama/adhoc-core/bin/supervisor.log stdout_logfile_maxbytes=20MB stdout_logfile_backups=10 stopasgroup=true ; The below sample eventlistener section shows all possible ; eventlistener subsection values, create one or more 'real' ; eventlistener: sections to be able to handle event notifications ; sent by supervisor. ;[eventlistener:theeventlistenername] ;command=/bin/eventlistener ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;events=EVENT ; event notif. types to subscribe to (req'd) ;buffer_size=10 ; event buffer queue size (default 10) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=-1 ; the relative start priority (default -1) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups ; # of stderr logfile backups (default 10) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions ;serverurl=AUTO ; override serverurl computation (childutils) ; The below sample group section shows all possible group values, ; create one or more 'real' group: sections to create "heterogeneous" ; process groups. ;[group:thegroupname] ;programs=progname1,progname2 ; each refers to 'x' in [program:x] definitions ;priority=999 ; the relative start priority (default 999) ; The [include] section can just contain the "files" setting. This ; setting can list multiple files (separated by whitespace or ; newlines). It can also contain wildcards. The filenames are ; interpreted as relative to this file. Included files *cannot* ; include files themselves. ;[include] ;files = relative/directory/*.ini
关键部分是program:spvs-01,该部分为主机cma01配置了让supervisord管理的storm进程Supervisor。
[cma02 /etc/supervisord.conf]
; Sample supervisor config file. ; ; For more information on the config file, please see: ; http://supervisord.org/configuration.html ; ; Note: shell expansion ("~" or "$HOME") is not supported. Environment ; variables can be expanded using this syntax: "%(ENV_HOME)s". [unix_http_server] file=/tmp/supervisor.sock ; (the path to the socket file) ;chmod=0700 ; socket file mode (default 0700) ;chown=nobody:nogroup ; socket file uid:gid owner ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [inet_http_server] ; inet (TCP) server disabled by default port=cma02:9001 ; (ip_address:port specifier, *:port for all iface) ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [supervisord] logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log) logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB) logfile_backups=10 ; (num of main logfile rotation backups;default 10) loglevel=info ; (log level;default info; others: debug,warn,trace) pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid) nodaemon=false ; (start in foreground if true;default false) minfds=1024 ; (min. avail startup file descriptors;default 1024) minprocs=200 ; (min. avail process descriptors;default 200) ;umask=022 ; (process file creation umask;default 022) ;user=chrism ; (default is current user, required if root) ;identifier=supervisor ; (supervisord identifier, default is 'supervisor') ;directory=/tmp ; (default is not to cd during start) ;nocleanup=true ; (don't clean up tempfiles at start;default false) ;childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP) ;environment=KEY="value" ; (key value pairs to add to environment) ;strip_ansi=false ; (strip ansi escape codes in logs; def. false) ; the below section must remain in the config file for RPC ; (supervisorctl/web interface) to work, additional interfaces may be ; added by defining them in separate rpcinterface: sections [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket ;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket ;username=chris ; should be same as http_username if set ;password=123 ; should be same as http_password if set ;prompt=mysupervisor ; cmd line prompt (default "supervisor") ;history_file=~/.sc_history ; use readline history if available ; The below sample program section shows all possible program subsection values, ; create one or more 'real' program: sections to be able to control them under ; supervisor. ;[program:theprogramname] ;command=/bin/cat ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=999 ; the relative start priority (default 999) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10) ;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions (def no adds) ;serverurl=AUTO ; override serverurl computation (childutils) [program:spvs-02] command=/root/alimama/adhoc-core/bin/bluewhale supervisor process_name=%(program_name)s numprocs=1 user=root autostart=true autorestart=true startsecs=5 startretries=3 redirect_stderr=true stdout_logfile=/root/alimama/adhoc-core/bin/supervisor.log stdout_logfile_maxbytes=20MB stdout_logfile_backups=10 stopasgroup=true ; The below sample eventlistener section shows all possible ; eventlistener subsection values, create one or more 'real' ; eventlistener: sections to be able to handle event notifications ; sent by supervisor. ;[eventlistener:theeventlistenername] ;command=/bin/eventlistener ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;events=EVENT ; event notif. types to subscribe to (req'd) ;buffer_size=10 ; event buffer queue size (default 10) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=-1 ; the relative start priority (default -1) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups ; # of stderr logfile backups (default 10) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions ;serverurl=AUTO ; override serverurl computation (childutils) ; The below sample group section shows all possible group values, ; create one or more 'real' group: sections to create "heterogeneous" ; process groups. ;[group:thegroupname] ;programs=progname1,progname2 ; each refers to 'x' in [program:x] definitions ;priority=999 ; the relative start priority (default 999) ; The [include] section can just contain the "files" setting. This ; setting can list multiple files (separated by whitespace or ; newlines). It can also contain wildcards. The filenames are ; interpreted as relative to this file. Included files *cannot* ; include files themselves. ;[include] ;files = relative/directory/*.ini
关键部分是program:spvs-02,该部分为主机cma02配置了让supervisord管理的storm进程Supervisor。
[cma03 /etc/supervisord.conf]
; Sample supervisor config file. ; ; For more information on the config file, please see: ; http://supervisord.org/configuration.html ; ; Note: shell expansion ("~" or "$HOME") is not supported. Environment ; variables can be expanded using this syntax: "%(ENV_HOME)s". [unix_http_server] file=/tmp/supervisor.sock ; (the path to the socket file) ;chmod=0700 ; socket file mode (default 0700) ;chown=nobody:nogroup ; socket file uid:gid owner ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [inet_http_server] ; inet (TCP) server disabled by default port=cma03:9001 ; (ip_address:port specifier, *:port for all iface) ;username=user ; (default is no username (open server)) ;password=123 ; (default is no password (open server)) [supervisord] logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log) logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB) logfile_backups=10 ; (num of main logfile rotation backups;default 10) loglevel=info ; (log level;default info; others: debug,warn,trace) pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid) nodaemon=false ; (start in foreground if true;default false) minfds=1024 ; (min. avail startup file descriptors;default 1024) minprocs=200 ; (min. avail process descriptors;default 200) ;umask=022 ; (process file creation umask;default 022) ;user=chrism ; (default is current user, required if root) ;identifier=supervisor ; (supervisord identifier, default is 'supervisor') ;directory=/tmp ; (default is not to cd during start) ;nocleanup=true ; (don't clean up tempfiles at start;default false) ;childlogdir=/tmp ; ('AUTO' child log dir, default $TEMP) ;environment=KEY="value" ; (key value pairs to add to environment) ;strip_ansi=false ; (strip ansi escape codes in logs; def. false) ; the below section must remain in the config file for RPC ; (supervisorctl/web interface) to work, additional interfaces may be ; added by defining them in separate rpcinterface: sections [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket ;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket ;username=chris ; should be same as http_username if set ;password=123 ; should be same as http_password if set ;prompt=mysupervisor ; cmd line prompt (default "supervisor") ;history_file=~/.sc_history ; use readline history if available ; The below sample program section shows all possible program subsection values, ; create one or more 'real' program: sections to be able to control them under ; supervisor. ;[program:theprogramname] ;command=/bin/cat ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=999 ; the relative start priority (default 999) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10) ;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions (def no adds) ;serverurl=AUTO ; override serverurl computation (childutils) [program:nmbs] command=/root/alimama/adhoc-core/bin/bluewhale nimbus process_name=%(program_name)s numprocs=1 user=root priority=700 autostart=true autorestart=true startsecs=5 startretries=3 redirect_stderr=true stdout_logfile=/root/alimama/adhoc-core/bin/nimbus.log stdout_logfile_maxbytes=20MB stdout_logfile_backups=10 stopasgroup=true [program:spvs-03] command=/root/alimama/adhoc-core/bin/bluewhale supervisor process_name=%(program_name)s numprocs=1 user=root priority=800 autostart=true autorestart=true startsecs=5 startretries=3 redirect_stderr=true stdout_logfile=/root/alimama/adhoc-core/bin/supervisor.log stdout_logfile_maxbytes=20MB stdout_logfile_backups=10 stopasgroup=true [program:ui] command=/root/alimama/adhoc-core/bin/bluewhale mdrillui 1107 /root/alimama/adhoc-core/lib/adhoc-web-0.18-beta.jar process_name=%(program_name)s numprocs=1 user=root priority=900 autostart=true autorestart=true startsecs=5 startretries=3 redirect_stderr=true stdout_logfile=/root/alimama/adhoc-core/bin/ui.log stdout_logfile_maxbytes=20MB stdout_logfile_backups=10 stopasgroup=true ; The below sample eventlistener section shows all possible ; eventlistener subsection values, create one or more 'real' ; eventlistener: sections to be able to handle event notifications ; sent by supervisor. ;[eventlistener:theeventlistenername] ;command=/bin/eventlistener ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;events=EVENT ; event notif. types to subscribe to (req'd) ;buffer_size=10 ; event buffer queue size (default 10) ;directory=/tmp ; directory to cwd to before exec (def no cwd) ;umask=022 ; umask for process (default None) ;priority=-1 ; the relative start priority (default -1) ;autostart=true ; start at supervisord start (default: true) ;autorestart=unexpected ; whether/when to restart (default: unexpected) ;startsecs=1 ; number of secs prog must stay running (def. 1) ;startretries=3 ; max # of serial start failures (default 3) ;exitcodes=0,2 ; 'expected' exit codes for process (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup=false ; send stop signal to the UNIX process group (default false) ;killasgroup=false ; SIGKILL the UNIX process group (def false) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr=true ; redirect proc stderr to stdout (default false) ;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_events_enabled=false ; emit events on stdout writes (default false) ;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups ; # of stderr logfile backups (default 10) ;stderr_events_enabled=false ; emit events on stderr writes (default false) ;environment=A="1",B="2" ; process environment additions ;serverurl=AUTO ; override serverurl computation (childutils) ; The below sample group section shows all possible group values, ; create one or more 'real' group: sections to create "heterogeneous" ; process groups. ;[group:thegroupname] ;programs=progname1,progname2 ; each refers to 'x' in [program:x] definitions ;priority=999 ; the relative start priority (default 999) ; The [include] section can just contain the "files" setting. This ; setting can list multiple files (separated by whitespace or ; newlines). It can also contain wildcards. The filenames are ; interpreted as relative to this file. Included files *cannot* ; include files themselves. ;[include] ;files = relative/directory/*.ini
关键部分是program:nmbs, program:spvs-03和program:ui,这三部分为主机cma03配置了让supervisord管理的storm进程NimbusServer, Supervisor和Mdrillui。
有几个配置项值得解释一下,可以根据需要自行设置。
1. stopasgroup=true。这一配置项的作用是:如果supervisord管理的进程px又产生了若干子进程,使用supervisorctl停止px进程,停止信号会传播给px产生的所有子进程,确保子进程也一起停止。这一配置项对希望停止所有进程的需求是非常有用的。
2. autostart=true。这一配置项的作用是:当启动supervisord的时候会将该配置项设置为true的所有进程自动启动。
【启动supervisord】
确保配置无误后可以在每台主机上使用下面的命令启动supervisor的服务器端supervisord
supervisord
【停止supervisord】
supervisorctl shutdown
【重新加载配置文件】
supervisorctl reload
【进程管理】
1. 启动supervisord管理的所有进程
supervisorctl start all
2. 停止supervisord管理的所有进程
supervisorctl stop all
3. 启动supervisord管理的某一个特定进程
supervisorctl start program-name // program-name为[program:xx]中的xx
4. 停止supervisord管理的某一个特定进程
supervisorctl stop program-name // program-name为[program:xx]中的xx
5. 重启所有进程或所有进程
supervisorctl restart all // 重启所有
supervisorctl reatart program-name // 重启某一进程,program-name为[program:xx]中的xx
6. 查看supervisord当前管理的所有进程的状态
supervisorctl status
【遇到问题及解决方案】
在使用命令supervisorctl start all启动控制进程时,遇到如下错误
unix:///tmp/supervisor.sock no such file
出现上述错误的原因是supervisord并未启动,只要在命令行中使用命令sudo supervisord启动supervisord即可。
【遗留问题】
当集群规模扩大后,登录到每台主机使用supervior控制进程也是很麻烦的,能不能用一台主机作为客户端,同时连接集群中的所有主机,以一种中心化的方式统一管理集群中的所有进程?之前一直使用的方式是使用交互式工具expect。supervisor本身有没有提供一种机制实现集群中所有进程的中央化管理?
目前测试成功的是使用一台主机作为客户端(supervisorctl),控制另一台服务器(supervisord)主机上的状态。方法是在服务器端配置[inet_http_server]部分,开启TCP端口监听。客户端配置[supervisorctl]部分,指定服务器端的serverurl,连接服务器端监听的端口。但是一个客户端只能连接一个服务器,无法指定多个服务器。