Monit 监控 PredictionIO 系统

一、配置详情:

1.1 监控频次

监控频次:300s/次

 

1.2 监控页面查看地址

WEB监控页面地址:http://XXX:2812/

需要账号密码登录

 

1.3系统监控项配置详情:

check system myhost.mydomain.tld

if loadavg (1min) > 4 then alert

if loadavg (5min) > 2 then alert

if memory usage > 75% then alert

if swap usage > 25% then alert

if cpu usage (user) > 70% then alert

if cpu usage (system) > 30% then alert

if cpu usage (wait) > 20% then alert

 

1.4 事件服务器监控项详情:

# Event Server

check process eventserver

matching "Console eventserver"

start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh start"

stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/event_scripts.sh stop"

if cpu usage > 95% for 10 cycles then restart

 

1.5引擎服务器监控项详情:

# Engine

check process pioengine

matching "Console deploy"

start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"

stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"

if cpu usage > 95% for 10 cycles then restart

 

1.6 pioengine-http崩溃时监控项详情:

check program pioengine-http with path "/data/monit/monit_PredictionIO/check_engine.sh"

start program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh start"

stop program = "/etc/monit/modebug /data/monit/monit_PredictionIO/engine_scripts.sh stop"

if status != 1

then restart

 

1.7 邮件提醒:

set mailserver XXX

username "XXX" password "XXX"

set mail-format { from:[email protected] }

set alert XXX

set mail-format {

from: monit@$HOST

subject: monit alert -- $EVENT $SERVICE

message: $EVENT Service $SERVICE

Date: $DATE

Action: $ACTION

Host: $HOST

Description: $DESCRIPTION#

Your faithful employee,

Monit

}

 

二、监控页面:

 

 

 

Monit 监控 PredictionIO 系统_第1张图片

说明:

Process:监控了事件服务器和推荐引擎服务器,目前定值为CPU超过95%会报警

Program:监控了有些情况下,进程正在运行,但是引擎已经关闭。如果PredictionIO使用的Akka HTTP REST API崩溃,引擎进程将继续,但是在查询时引擎将失败。此时会报警并重启服务。

 

三、监控详情:

Parameter

Value

Parameter

Value

Monit ID

ed652ace7517e5334c830b732eb324df

Host

myhost.mydomain.tld

Process id

119966

Effective user running Monit

root

Controlfile

/etc/monitrc

Logfile

/var/log/monit.log

Pidfile

/run/monit.pid

State file

/root/.monit.state

Debug

False

Log

True

Use syslog

False

Mail server(s)

XXX:25

Default mail from

monit@$HOST

Default mail subject

monit alert -- $EVENT $SERVICE

Default mail message

$EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION# Your faithful employee, Monit

Limit for Send/Expect buffer

256 B

Limit for file content buffer

512 B

Limit for HTTP content buffer

1 MB

Limit for program output

512 B

Limit for network timeout

5 s

Limit for check program timeout

5 m

Limit for service stop timeout

30 s

Limit for service start timeout

30 s

Limit for service restart timeout

30 s

On reboot

start

Poll time

300 seconds with start delay 0 seconds

httpd bind address

Any/All

httpd portnumber

2812

httpd signature

True

httpd auth. style

Basic Authentication and Host/Net allow list

Alert mail to

[email protected]

Alert on

All events

 

大数据、数据分析、爬虫群: 《453908562》

你可能感兴趣的:(大数据,数据分析)