ganglia是一个监控服务器,可以监视和显示集群中的节点的各种状态信息,比如如:cpu 、mem、硬盘利用率, I/O负载、网络流量情况等,同时可以将历史数据以曲线方式通过php页面呈现。
ganglia服务端能够通过一台客户端收集到同一个网段的所有客户端的数据,ganglia集群服务端能够通过一台服务端收集到它下属的所有客户端数据。
ganglia又依赖于一个web服务器用来显示集群状态,用rrdtool来存储数据和生成曲线图,需要xml解析因此需要expat,配置文件解析需要libconfuse。安装apche的httpd还需要支持php4以上,同时还有一些依赖软件。
server2.example.com 172.25.85.2
server3.example.com 172.25.85.3
ganglia是在之前nagios的基础上做的。
1.ganglia的安装和配置
server2监控端:
yum install rpm-build -y
rpmbuild -tb ganglia-3.4.0.tar.gz ###解决依赖性
yum install gcc-c++ python-devel pcre-devel expat-devel apr-devel
rrdtool-devel-1.3.8-6.el6.x86_64.rpm
libconfuse-2.6-3.el6.x86_64.rpm
libconfuse-devel-2.6-3.el6.x86_64.rpm -y
rpmbuild -tb ganglia-3.4.0.tar.gz
rpmbuild -tb ganglia-web-3.4.2.tar.gz
cd /root/rpmbuild/RPMS/noarch
yum install php php-gd -y
rpm -ivh ganglia-web-3.4.2-1.noarch.rpm
cd /root/rpmbuild/RPMS/x86_64
ls
ganglia-devel-3.4.0-1.x86_64.rpm ganglia-gmetad-3.4.0-1.x86_64.rpm ganglia-gmond-3.4.0-1.x86_64.rpm ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm libganglia-3.4.0-1.x86_64.rpm
rpm -ivh *
scp ganglia-gmond-3.4.0-1.x86_64.rpm ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm libganglia-3.4.0-1.x86_64.rpm 172.25.85.3:/root
cd /root
scp libconfuse-2.6-3.el6.x86_64.rpm libconfuse-devel-2.6-3.el6.x86_64.rpm [email protected]:/root
server3:
cd /root
rpm -ivh ganglia-gmond-3.4.0-1.x86_64.rpm
ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
libganglia-3.4.0-1.x86_64.rpm
libconfuse-2.6-3.el6.x86_64.rpm
libconfuse-devel-2.6-3.el6.x86_64.rpm
server2:
vim /etc/ganglia/gmetad.conf
data_source "wei cluster" localhost
/etc/init.d/gmetad start
vim /etc/ganglia/gmond.conf
/etc/init.d/gmond start
server3:
vim /etc/ganglia/gmond.conf
/etc/init.d/gmond start
server2:
cd /var/www/html/gweb
ls
/etc/init.d/httpd start
检测:
http://172.25.85.2/gweb
2.server2:
cd /var/lib/ganglia/rrds
cd wei\ cluster/
tar zxf ganglia-3.4.0.tar.gz
cd /root/ganglia-3.4.0/contrib
cp check_ganglia.py /usr/local/nagios/libexec/
cd /usr/local/nagios/libexec/
chown nagios.nagios check_ganglia.py
cd /usr/local/nagios/libexec
vim check_ganglia.py
ganglia_host = '172.25.85.2'
if critical > warning: if value >= critical: print "CHECKGANGLIA CRITICAL: %s is %.2f" % (metric, value) sys.exit(2) elif value >= warning: print "CHECKGANGLIA WARNING: %s is %.2f" % (metric, value) sys.exit(1) else: print "CHECKGANGLIA OK: %s is %.2f" % (metric, value) sys.exit(0) else: if critical >= value: print "CHECKGANGLIA CRITICAL: %s is %.2f" % (metric, value) sys.exit(2) elif warning >= value: print "CHECKGANGLIA WARNING: %s is %.2f" % (metric, value) sys.exit(1) else: print "CHECKGANGLIA OK: %s is %.2f" % (metric, value) sys.exit(0)
./check_ganglia.py -h server2.example.com -m disk_free_percent_rootfs -w 20 -c 10
[cd /var/lib/ganglia/rrds/wei cluster/server2.example.com
/usr/local/nagios/libexec/check_ganglia.py -h server2.example.com -m disk_free_percent_roots -w 20 -c 10
server2:
cd /usr/local/nagios/etc/objects
vim command.cfg
define command { command_name check_ganglia command_line $USER1$/check_ganglia.py -h $HOSTADDRESS$ -m $ARG1$ -w $ARG2$ -c $ARG3$ }
vim host.cfg
define host { use linux-server host_name server4.example.com address 172.25.85.4 }
define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members server2.example.com,server3.example.com ; Comma separated list of hosts that belong to this group }
define hostgroup { hostgroup_name ganglia-servers alias ganglia-servers members server4.example.com }
vim service.cfg
define servicegroup { servicegroup_name ganglia-metrics alias Ganglia Metrics }
define service{ use ganglia-server service_description 根分区空闲百分比 check_command check_ganglia!disk_free_percent_rootfs!20!10 }
define service{ use ganglia-server service_description 内存空间 check_command check_ganglia!mem_free!50000!30000 }
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
/etc/init.d/nagios reload
http://172.25.85.2/nagios
server4的资源被监控