环境:
系统
CentOS 6.0
hadoop集群中有3台服务器
server01 -> master 192.168.255.128
server02 -> slave 192.168.255.130
server03 -> slave 192.168.255.131
软件仓库 epel
直接使用epel源中的ganglia(自己编译安装有点小麻烦)。
1. 安装epel源
wget http://download.fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm -P /usr/local/src
rpm -ivh /usr/local/src/epel-release-6-5.noarch.rpm
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
2. ganglia服务端安装gemtad和gmond
yum install ganglia ganglia-devel ganglia-gmetad ganglia-gmond ganglia-web ganglia-gmond-python
会自动安装相应的依赖包。
3. 其他服务器(作为客户端)只需安装gmond
yum install ganglia ganglia-gmond
4. 配置ganglia的gemtad
cd /etc/ganglia
vi gmetad.conf
data_source "ganglia_hadoop" 192.168.255.128 192.168.255.130 192.168.255.131
修改数据源data_source这一行即可。
data_source "name" ip01:port01 ip02:port02 ...
说明:后面这些IP地址就是进行监控的主机,冒号后面跟的是要监听的端口号(默认为8649)。
启动服务
service gmetad start
chkconfig gmetad on
5. 所有服务器配置客户端gmond(使用多播)
vi /etc/ganglia/gmond.conf
cluster {
name = "ganglia_hadoop"
...
只需将集群的name设置为gmetad中data_source设置的名字即可。
启动服务
service gmond start
6. 配置nginx
vi /usr/local/nginx/conf/vhosts/ganglia.conf
server
{
listen 80;
server_name 域名;
index index.html index.htm index.php;
root /usr/share/ganglia;
location ~ ^(.*)\/\.svn\/
{
deny all;
}
location ~ .*\.(php|php5)?$
{
# fastcgi_pass unix:/tmp/php-cgi.sock;
fastcgi_pass php_server01;
fastcgi_index index.php;
include fcgi.conf;
}
location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$
{
expires 30d;
access_log off;
}
location ~ .*\.(js|css)?$
{
expires 1h;
access_log off;
}
log_format ganglia '$remote_addr - $remote_user [$time_local] [$request_time] "$request"'
'$status $body_bytes_sent "$http_referer"'
'"$http_user_agent" $http_x_forwarded_for';
access_log off;
}
主目录为:/usr/share/ganglia
可以添加通过nginx设置用户名密码访问和限制IP访问。
访问http://域名
会报错:
Notice: Undefined variable: private in /usr/share/ganglia/auth.php on line 27
因为我的php-fpm的运行用户为nobody,程序auth.php中fopen打开的文件为private_clusters,链接到/etc/ganglia/private_clusters,查看文件的拥有者
ls -l /etc/ganglia/private_clusters
-rw-r----- 1 root apache 1222 Feb 17 2010 /etc/ganglia/private_clusters
组拥有者为apache,修改组为php-fpm运行的用户即可。
chown root:nobody /etc/ganglia/private_clusters
7. 监控hadoop
我使用的hadoop的版本为hadoop-0.20.205.0.tar.gz,ganglia的配置文件已经修改为hadoop-metrics2.properties
修改配置文件
vi $HADOOP_HOME/conf/hadoop-metrics2.properties
# for Ganglia 3.1 support
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
# default for supportsparse is false
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers=239.2.11.71:8649
datanode.sink.ganglia.servers=239.2.11.71:8649
jobtracker.sink.ganglia.servers=239.2.11.71:8649
tasktracker.sink.ganglia.servers=239.2.11.71:8649
maptask.sink.ganglia.servers=239.2.11.71:8649
reducetask.sink.ganglia.servers=239.2.11.71:8649
只需要将ganglia段落中的相关注释取消即可。
注意:需要根据你的ganglia的版本来选择注释以下哪一行
# for Ganglia 3.0 support
# *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30
#
# for Ganglia 3.1 support
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
需要修改hadoop集群中所有的服务器的hadoop-metrics2.properties文件
重启hadoop
stop-all.sh
start-all.sh
8. 查看ganglia的监控页面将会看到相关的监控信息
如:dfs.dataname metrics