一、配置详情:
1.1 监控频次
监控频次:300s/次
1.2 监控页面查看地址
WEB监控页面地址:http://XXX:2812/
需要账号密码登录
1.3系统监控项配置详情:
check system myhost.mydomain.tld
if loadavg (1min) > 4 then alert
if loadavg (5min) > 2 then alert
if memory usage > 75% then alert
if swap usage > 25% then alert
if cpu usage (user) > 70% then alert
if cpu usage (system) > 30% then alert
if cpu usage (wait) > 20% then alert
1.4 事件服务器监控项详情:
# Event Server
check process eventserver
matching "Console eventserver"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh stop"
if cpu usage > 95% for 10 cycles then restart
1.5引擎服务器监控项详情:
# Engine
check process pioengine
matching "Console deploy"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"
if cpu usage > 95% for 10 cycles then restart
1.6 pioengine-http崩溃时监控项详情:
check program pioengine-http with path "/data/monit/monit_PredictionIO/check_engine.sh"
start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"
stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"
if status != 1
then restart
1.7 邮件提醒:
set mailserver XXX
username "XXX" password "XXX"
set mail-format { from:[email protected] }
set alert XXX
set mail-format {
from: monit@$HOST
subject: monit alert -- $EVENT $SERVICE
message: $EVENT Service $SERVICE
Date: $DATE
Action: $ACTION
Host: $HOST
Description: $DESCRIPTION#
Your faithful employee,
Monit
}
二、监控页面:
说明:
Process:监控了事件服务器和推荐引擎服务器,目前定值为CPU超过95%会报警
Program:监控了有些情况下,进程正在运行,但是引擎已经关闭。如果PredictionIO使用的Akka HTTP REST API崩溃,引擎进程将继续,但是在查询时引擎将失败。此时会报警并重启服务。
三、监控详情:
Parameter |
Value |
Parameter |
Value |
Monit ID |
ed652ace7517e5334c830b732eb324df |
Host |
myhost.mydomain.tld |
Process id |
119966 |
Effective user running Monit |
root |
Controlfile |
/etc/monitrc |
Logfile |
/var/log/monit.log |
Pidfile |
/run/monit.pid |
State file |
/root/.monit.state |
Debug |
False |
Log |
True |
Use syslog |
False |
Mail server(s) |
XXX:25 |
Default mail from |
monit@$HOST |
Default mail subject |
monit alert -- $EVENT $SERVICE |
Default mail message |
$EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION# Your faithful employee, Monit |
Limit for Send/Expect buffer |
256 B |
Limit for file content buffer |
512 B |
Limit for HTTP content buffer |
1 MB |
Limit for program output |
512 B |
Limit for network timeout |
5 s |
Limit for check program timeout |
5 m |
Limit for service stop timeout |
30 s |
Limit for service start timeout |
30 s |
Limit for service restart timeout |
30 s |
On reboot |
start |
Poll time |
300 seconds with start delay 0 seconds |
httpd bind address |
Any/All |
httpd portnumber |
2812 |
httpd signature |
True |
httpd auth. style |
Basic Authentication and Host/Net allow list |
Alert mail to |
|
Alert on |
All events |
大数据、数据分析、爬虫群: 《453908562》