老的方法感觉效果不好,又写了一个模板和脚本,请移步:
http://www.cnops.top/posts/748ad64f.html
有兴趣的可以继续往下看。
Zabbix监控JVM(微服务进程)
Zabbix服务器需安装java,编译需添加启动参数--enable-java
本次安装的编译参数为:
./configure --prefix=/data/zabbix/ --enable-server --enable-agent --with-mysql --enable-ipv6 --with-net-snmp --with-libcurl --with-libxml2 --enable-java |
ZabbixAgent端不仅需要安装zabbix_agentd,还需要安装zabbix_sender,可以通过地址http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/选择合适的版本。
安装
rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-sender-3.0.9-1.el7.x86_64.rpm
微服务的特性:
1、 每个进程是直接以java-jar service.jar的方式启动,并没有依赖于tomcat或者其他web应用。
2、 每台服务器上的微服务并没有固定的数量,可以灵活的增加或者减少。
3、 每个微服务的启动参数已有配置端口很多。
鉴于此种情况,传统的监控方法监控微服务,会造成经常的手动去增加删减web页面配置,服务器内的端口管理也会很混乱。
所以使用discovery自动发现的方式去监控微服务。并将每个微服务的信息通过zabbix_sender发送到ZabbixServer端。
首先java版本为jdk1.8
# java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
关于微服务的信息主要通过jstat获取,如下
# ps -ef|grep java
root 28131 1 0 11:17 ? 00:00:56 java -Xms100M -Xmx500M -Xmn150M -jar /data/work/service_jar/manageMiddle.jar --server.port=20000 --management.port=20001 --config.profile=test
root 28305 1 0 11:26 ? 00:00:51 java -Xms100M -Xmx300M -Xmn100M -jar /data/work/service_jar/resourceService.jar --server.port=18000 --management.port=18001 --config.profile=test
root 29067 1 0 11:59 ? 00:00:54 java -Xms100M -Xmx500M -Xmn150M -jar /data/work/service_jar/systemService.jar --server.port=21000 --management.port=21001 --config.profile=test
root 31345 29980 0 14:03 pts/0 00:00:00 grep --color=auto java
# jstat -gcutil 28131
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
67.75 0.00 74.28 81.92 97.29 94.90 74 1.248 7 1.065 2.313
# jstat -gc 28131
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
14336.0 14848.0 9712.2 0.0 122880.0 91488.9 55808.0 45716.6 56704.0 55169.1 7296.0 6924.1 74 1.248 7 1.065 2.313
关于输出结果的参数解释
S0C:年轻代中第一个survivor(幸存区)的容量 (字节)
S1C:年轻代中第二个survivor(幸存区)的容量 (字节)
S0U:年轻代中第一个survivor(幸存区)目前已使用空间 (字节)
S1U:年轻代中第二个survivor(幸存区)目前已使用空间 (字节)
EC:年轻代中Eden(伊甸园)的容量 (字节)
EU:年轻代中Eden(伊甸园)目前已使用空间 (字节)
OC:Old代的容量 (字节)
OU:Old代目前已使用空间 (字节)
PC:Perm(持久代)的容量 (字节)
PU:Perm(持久代)目前已使用空间 (字节)
YGC:从应用程序启动到采样时年轻代中gc次数
YGCT:从应用程序启动到采样时年轻代中gc所用时间(s)
FGC:从应用程序启动到采样时old代(全gc)gc次数
FGCT:从应用程序启动到采样时old代(全gc)gc所用时间(s)
GCT:从应用程序启动到采样时gc用的总时间(s)
NGCMN:年轻代(young)中初始化(最小)的大小 (字节)
NGCMX:年轻代(young)的最大容量 (字节)
NGC:年轻代(young)中当前的容量 (字节)
OGCMN:old代中初始化(最小)的大小 (字节)
OGCMX:old代的最大容量 (字节)
OGC:old代当前新生成的容量 (字节)
PGCMN:perm代中初始化(最小)的大小 (字节)
PGCMX:perm代的最大容量 (字节)
PGC:perm代当前新生成的容量 (字节)
S0:年轻代中第一个survivor(幸存区)已使用的占当前容量百分比
S1:年轻代中第二个survivor(幸存区)已使用的占当前容量百分比
E:年轻代中Eden(伊甸园)已使用的占当前容量百分比
O:old代已使用的占当前容量百分比
P:perm代已使用的占当前容量百分比
S0CMX:年轻代中第一个survivor(幸存区)的最大容量 (字节)
S1CMX :年轻代中第二个survivor(幸存区)的最大容量 (字节)
ECMX:年轻代中Eden(伊甸园)的最大容量 (字节)
DSS:当前需要survivor(幸存区)的容量 (字节)(Eden区已满)
TT: 持有次数限制
MTT : 最大持有次数限制
Jdk1.8中取消了永久区Perm
微服务全部放置在固定的目录内,自动发现微服务脚本为
# cat java_discovery.py
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import os
import socket
import json
import glob
java_names_file='java_names.txt'
javas=[]
if os.path.isfile(java_names_file):
# print 'java_names_file exists!'
#####
##### here should use % (java_names_file) instead of using the python variable java_names_file directly inside the ''' ''' quotes
#####
args='''awk -F':' '{print $1':'$2}' %s''' % (java_names_file)
t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
#elif glob.glob('/opt/xx/*_tomcat') and not os.path.isdir('/opt/logs/logstash') and not os.path.isdir('/opt/app/elasticsearch/config'):
elif glob.glob('/data/work/service_jar/*.jar'):
t=subprocess.Popen('cd /data/work/service_jar && ls *.jar|grep jar',shell=True,stdout=subprocess.PIPE)
for java in t.stdout.readlines():
if len(java) != 0:
javas.append({'{#JAVA_NAME}':java.strip('\n').strip(':')})
print json.dumps({'data':javas},indent=4,separators=(',',':'))
脚本内的目录可以自由修改
输出结果为json格式
# python java_discovery.py
{
"data":[
{
"{#JAVA_NAME}":"insuranceService.jar"
},
{
"{#JAVA_NAME}":"manageMiddle.jar"
},
{
"{#JAVA_NAME}":"resourceService.jar"
},
{
"{#JAVA_NAME}":"systemService.jar"
}
]
}
对微服务进行信息获取,并利用zabbix_sender发送的脚本为
# cat jstat_status.py
#!/usr/bin/python
import subprocess
import sys
import os
__maintainer__ = "Francis"
jps = '/data/jdk1.8/bin/jps'
jstat = '/data/jdk1.8/bin/jstat'
zabbix_sender = "/usr/bin/zabbix_sender"
zabbix_conf = "/etc/zabbix/zabbix_agentd.conf"
send_to_zabbix = 1
ip=os.popen("ifconfig|grep 'inet '|grep -v '127.0'|xargs|awk -F '[ :]' '{print $3}'").readline().rstrip()
serverip="172.19.138.53"
#"{#JAVA_NAME}":"tomcat_web_1"
def usage():
"""Display program usage"""
print "\nUsage : ", sys.argv[0], " java_name alive|all"
print "Modes : \n\talive : Return pid of running processs\n\tall : Send jstat stats as well"
sys.exit(1)
class Jprocess:
def __init__(self, arg):
self.pdict = {
"jpname": arg,
}
self.zdict = {
"Heap_used" : 0,
"Heap_ratio" : 0,
"Heap_max" : 0,
"Perm_used" : 0,
"Perm_ratio" : 0,
"Perm_max" : 0,
"S0_used" : 0,
"S0_ratio" : 0,
"S0_max" : 0,
"S1_used" : 0,
"S1_ratio" : 0,
"S1_max" : 0,
"Eden_used" : 0,
"Eden_ratio" : 0,
"Eden_max" : 0,
"Old_used" : 0,
"Old_ratio" : 0,
"Old_max" : 0,
"YGC" : 0,
"YGCT" : 0,
"YGCT_avg" : 0,
"FGC" : 0,
"FGCT" : 0,
"FGCT_avg" : 0,
"GCT" : 0,
"GCT_avg" : 0,
}
def chk_proc(self):
# ps -ef|grep java|grep tomcat_web_1|awk '{print $2}'
# print self.pdict['jpname']
pidarg = '''ps -ef|grep java|grep %s|grep -v grep | grep -v jstat_status.py |awk '{print $2}' ''' %(self.pdict['jpname'])
#pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE)
#pid = pidout.stdout.readline().strip('\n')
pid = subprocess.check_output(pidarg, shell=True).strip()
if pid != "" :
self.pdict['pid'] = pid
# print "Process found :", java_name, "with pid :", self.pdict['pid']
else:
self.pdict['pid'] = ""
# print "Process not found"
return self.pdict['pid']
def get_jstats(self):
if self.pdict['pid'] == "":
return False
self.pdict.update(self.fill_jstats("-gc"))
self.pdict.update(self.fill_jstats("-gccapacity"))
self.pdict.update(self.fill_jstats("-gcutil"))
# print "\nDumping collected stat dictionary\n-----\n", self.pdict, "\n-----\n"
def fill_jstats(self, opts):
# print "\nGetting", opts, "stats for process", self.pdict['pid'], "with command : sudo", jstat, opts, self.pdict['pid'] ,"\n"
# jstatout = subprocess.Popen(['sudo','-u','tomcat', jstat, opts, self.pdict['pid']], stdout=subprocess.PIPE)
#print([jstat, opts, self.pdict['pid']])
jstatout = subprocess.Popen([jstat, opts, self.pdict['pid']], stdout=subprocess.PIPE)
stdout, stderr = jstatout.communicate()
legend, data = stdout.split('\n',1)
mydict = dict(zip(legend.split(), data.split()))
return mydict
def compute_jstats(self):
if self.pdict['pid'] == "":
return False
self.zdict['S0_used'] = format(float(self.pdict['S0U']) * 1024,'0.2f')
self.zdict['S0_max'] = format(float(self.pdict['S0C']) * 1024,'0.2f')
self.zdict['S0_ratio'] = format(float(self.pdict['S0']),'0.2f')
self.zdict['S1_used'] = format(float(self.pdict['S1U']) * 1024,'0.2f')
self.zdict['S1_max'] = format(float(self.pdict['S1C']) * 1024,'0.2f')
self.zdict['S1_ratio'] = format(float(self.pdict['S1']),'0.2f')
self.zdict['Old_used'] = format(float(self.pdict['OU']) * 1024,'0.2f')
self.zdict['Old_max'] = format(float(self.pdict['OC']) * 1024,'0.2f')
self.zdict['Old_ratio'] = format(float(self.pdict['O']),'0.2f')
self.zdict['Eden_used'] = format(float(self.pdict['EU']) * 1024,'0.2f')
self.zdict['Eden_max'] = format(float(self.pdict['EC']) * 1024,'0.2f')
self.zdict['Eden_ratio'] = format(float(self.pdict['E']),'0.2f')
# self.zdict['Perm_used'] = format(float(self.pdict['PU']) * 1024,'0.2f')
# self.zdict['Perm_max'] = format(float(self.pdict['PC']) * 1024,'0.2f')
# self.zdict['Perm_ratio'] = format(float(self.pdict['P']),'0.2f')
self.zdict['Heap_used'] = format((float(self.pdict['EU']) + float(self.pdict['S0U']) + float(self.pdict['S1U']) + float(self.pdict['OU'])) * 1024,'0.2f')
self.zdict['Heap_max'] = format((float(self.pdict['EC']) + float(self.pdict['S0C']) + float(self.pdict['S1C']) + float(self.pdict['OC'])) * 1024,'0.2f')
self.zdict['Heap_ratio'] = format(float(self.zdict['Heap_used']) / float(self.zdict['Heap_max'])*100,'0.2f')
self.zdict['YGC'] = self.pdict['YGC']
self.zdict['FGC'] = self.pdict['FGC']
self.zdict['YGCT'] = format(float(self.pdict['YGCT']),'0.3f')
self.zdict['FGCT'] = format(float(self.pdict['FGCT']),'0.3f')
self.zdict['GCT'] = format(float(self.pdict['GCT']),'0.3f')
if self.pdict['YGC'] == '0':
self.zdict['YGCT_avg'] = '0'
else:
self.zdict['YGCT_avg'] = format(float(self.pdict['YGCT'])/float(self.pdict['YGC']),'0.3f')
if self.pdict['FGC'] == '0':
self.zdict['FGCT_avg'] = '0'
else:
self.zdict['FGCT_avg'] = format(float(self.pdict['FGCT'])/float(self.pdict['FGC']),'0.3f')
if self.pdict['YGC'] == '0' and self.pdict['FGC'] == '0':
self.zdict['GCT_avg'] = '0'
else:
self.zdict['GCT_avg'] = format(float(self.pdict['GCT'])/(float(self.pdict['YGC']) + float(self.pdict['FGC'])),'0.3f')
# print "Dumping zabbix stat dictionary\n-----\n", self.zdict, "\n-----\n"
def send_to_zabbix(self, metric):
#### {#JAVA_NAME} tomcat_web_1
#### UserParameter=java.discovery,/usr/bin/python /opt/app/zabbix/sbin/java_discovery.py
#### UserParameter=java.discovery_status[*],/opt/app/zabbix/sbin/jstat_status.sh $1 $2 $3 $4
#### java.discovery_status[tomcat_web_1,Perm_used]
#### java.discovery_status[{#JAVA_NAME},Perm_used]
key = "java.discovery_status[" + self.pdict['jpname'] + "," + metric + "]"
if self.pdict['pid'] != "" and send_to_zabbix > 0:
#print key + ":" + str(self.zdict[metric])
try:
subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self.zdict[metric])], stdout=FNULL,stderr=FNULL, shell=False)
except OSError, detail:
print "Something went wrong while exectuting zabbix_sender : ", detail
else:
print "Simulation: the following command would be execucted :\n", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self.zdict[metric], "\n"
accepted_modes = ['alive', 'all']
if len(sys.argv) == 3 and sys.argv[2] in accepted_modes:
java_name = sys.argv[1]
mode = sys.argv[2]
else:
usage()
#Check if process is running / Get PID
jproc = Jprocess(java_name)
pid = jproc.chk_proc()
if pid != "" and mode == 'all':
jproc.get_jstats()
#print jproc.zdict
jproc.compute_jstats()
FNULL = open(os.devnull, 'w')
for key in jproc.zdict:
#print key,jproc.zdict[key]
jproc.send_to_zabbix(key)
FNULL.close()
# print pid
else:
print 0
触发脚本为
# cat java_discovery_status_sender.py
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import os
import socket
import json
import glob
java_names_file='java_names.txt'
javas=[]
if os.path.isfile(java_names_file):
args='''awk -F':' '{print $1':'$2}' %s''' % (java_names_file)
t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
elif glob.glob('/data/work/service_jar/*.jar'):
t=subprocess.Popen('cd /data/work/service_jar && ls *.jar|grep jar',shell=True,stdout=subprocess.PIPE)
res=subprocess.check_output('cd /data/work/service_jar && ls *.jar|grep jar',stderr=subprocess.STDOUT,shell = True)
for java in t.stdout.readlines():
if len(java) != 0:
javas.append({'{#JAVA_NAME}':java.strip('\n').strip(':')})
#print json.dumps({'data':javas},indent=4,separators=(',',':'))
#print res
for java in res.strip().split("\n"):
if java:
#print java
out = subprocess.check_output("python /etc/zabbix/scripts/java/jstat_status.py %s all" % java, shell=True)
#print(out)
其中web界面配置Host name的参数必须与Agent端配置文件内Hostname的参数完全相同
将脚本java_discovery_status_sender.py加入crontab
*/1 * * * * root /usr/bin/python /etc/zabbix/scripts/java/java_discovery_status_sender.py
每分钟触发一次,向server端发信息
路径及内容如下
# pwd
/etc/zabbix/zabbix_agentd.d
# cat userparameter_java_discovery_status.conf
UserParameter=java.discovery,/usr/bin/python /etc/zabbix/scripts/java/java_discovery.py
UserParameter=java.discovery_status[*],/usr/bin/python /etc/zabbix/scripts/java/jstat_status.py $1 $2
UserParameter=java.discovery_status_sender,/usr/bin/python /etc/zabbix/scripts/java/java_discovery_status_sender.py
UserParameter=java.discovery_status[*]和UserParameter=java.discovery_status_sender的作用和原理都是一样的,只不过在Agent端可以执行,在Server端通过zabbix_get调用都会出错,暂时没有找到更好的解决方法。所以通过crontab的方法定时向Server端发送监控信息。如果能够解决问题,或者有更好的解决方法请联系我。
重启ZabbixAgent端
模板下载地址:https://download.csdn.net/download/fjp824/10462387
导入模板,并将对应的host关联
可以在Monitoring>Latest data页面根据主机,查看zabbix_trapper接收到的监控信息,如下所以
查看图表展示正常,监控完成。