hadoop ganglia configuration


作者:hovlj_1130 | 可以任意转载, 但转载时务必以超链接形式标明文章原始出处 和 作者信息 及 版权声明
http://hi.baidu.com/hovlj_1130/blog/item/e8fe89c3e9a67e160ff47755.html

#准备工作
场景:
给hadoop集群机器安装监控工具ganglia,采集cpu、memory、disk、process、network数据的同时,采集hadoop相关的数值(metrics)
server:hadoopbj01~hadoopbj08
hadoop-version:hadoop-0.20.2
ganglia-version:ganglia-3.1.1

准备工作:
在ganglia服务端需要安装gmetad的机器(下文称之为ganglia_m)上rhn_register(需要redhat注册码),以便使用yum install,要不会被依赖包折腾的死无活来的。

step1:(in ganglia_m)
#安装基础依赖包
yum -y install apr-devel apr-util check-devel cairo-devel pango-devel libxml2-devel \
rpmbuild glib2-devel dbus-devel freetype-devel fontconfig-devel gcc-c++ expat-devel \
python-devel libXrender-devel pcre pcre-devel

step2:(in ganglia_m)
yum install libconfuse libconfuse-devel

step3:(in ganglia_m)
#安装RRDTool

wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm

rpm -ivh rpmforge-release-0.3.6-1.el5.rf.i386.rpm

会在/etc/yum.repos.d目录生成一些yum源

yum install rrdtool

yum install rrdtool-devel

which rrdtool
ldconfig  -p | grep rrd # make sure you have the new rrdtool libraries linked.

step4:(in ganglia_m: 包含 步骤0~4)
#步骤 0:安装ganglia
cd /tmp/
wget http://sourceforge.net/projects/ganglia/files/ganglia%20monitoring%20core/3.1.1%20\(Wien\)/ganglia-3.1.1.tar.gz/download
cd /tmp/
tar zxvf ganglia*gz
cd ganglia-3.1.1/
./configure --with-gmetad
make -j8
make install

#步骤 1:处理命令行文件
cd /tmp/ganglia-3.1.1/   # you should already be in this directory
mkdir -p /var/www/html/ganglia/  # make sure you have apache installed
cp -a web/* /var/www/html/ganglia/   # this is the web interface
cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad  # startup script
cp gmond/gmond.init /etc/rc.d/init.d/gmond
mkdir /etc/ganglia  # where config files go
gmond -t | tee /etc/ganglia/gmond.conf  # generate initial gmond config
cp gmetad/gmetad.conf /etc/ganglia/  # initial gmetad configuration
mkdir -p /var/lib/ganglia/rrds  # place where RRDTool graphs will be stored
chown nobody:nobody /var/lib/ganglia/rrds  # make sure RRDTool can write here.
chkconfig --add gmetad  # make sure gmetad starts up at boot time
chkconfig --add gmond # make sure gmond starts up at boot time

#步骤 2:修改 /etc/ganglia下的gmond.conf和gmetad.conf
vi gmetad.conf
#添加需要监控的机器,以及监控组名,下面的列,表示建立监控组:hadoop-bj,监控机器hadoopbj01~hadoopbj08
data_source "hadoop-bj" hadoopbj01 hadoopbj02 hadoopbj03 hadoopbj04 hadoopbj05 hadoopbj06 hadoopbj07 hadoopbj08

vi gmond.conf
#现在可以修改 /etc/ganglia/gmond.conf 以命名集群。我们上面定义的集群名称为"hadoop-bj",因此我们将name="unspecified"更改为name="hadoop-bj"
#这里我们的udp_send_channel和udp_recv_channel都是用默认的mcast_join和Port,当然你可以修改
cluster {
name = "hadoop-bj"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}

udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
ttl = 1
}

udp_recv_channel {
mcast_join = 239.2.11.71
port = 8649
bind = 239.2.11.71
}

#步骤 3:注意多宿主计算机
#在我的集群中,eth0 是我的系统的公共 IP 地址。但是,监视服务器将通过 eth1 与私有集群网络中的节点进行通信。我需要确保 Ganglia 使用的多点传送将与 eth1 绑定在一起。这可以通过创建 /etc/sysconfig/network-scripts/route-eth1 文件来完成。添加 239.2.11.71 dev eth1 内容。
#然后您可以使用 service network restart 重新启动网络并确保路由器显示此 IP 通过 eth1。注:您应当使用 239.2.11.71,因为这是 ganglia 的默认多点传送通道。如果使用其他通道或者增加更多通道,请更改它。
touch /etc/sysconfig/network-scripts/route-eth1
echo "239.2.11.71 dev eth1">>/etc/sysconfig/network-scripts/route-eth1
service network restart

#步骤 4:在管理服务器中启动它
service gmond start
service gmetad start
service httpd restart

step5:(in ganglia_m)
#通过脚本修改配置需要监控的节点,注意我们这里ganglia_m同时也为hadoopbj01,已经配置好gmond,因此下面的脚本中只需配置hadoopbj02~hadoopbj08
touch /tmp/mynodes
vi  /tmp/mynodes
hadoopbj02
hadoopbj03
hadoopbj04
hadoopbj05
hadoopbj06
hadoopbj07
hadoopbj08

touch /tmp/configure-all-ganglia
vi /tmp/configure-all-ganglia
for i in `cat /tmp/mynodes`; do 
scp /usr/sbin/gmond $i:/usr/sbin/gmond
ssh $i "mkdir -p /etc/ganglia/"
scp /etc/ganglia/gmond.conf $i:/etc/ganglia/
scp /etc/init.d/gmond $i:/etc/init.d/
scp /usr/lib64/libganglia-3.1.1.so.0 $i:/usr/lib64/
scp /lib64/libexpat.so.0 $i:/lib64/
scp /usr/lib64/libconfuse.so.0 $i:/usr/lib64/
scp /usr/lib64/libapr-1.so.0 $i:/usr/lib64/
scp -r /usr/lib64/ganglia $i:/usr/lib64/
scp /etc/sysconfig/network-scripts/route-eth1 $i:/etc/sysconfig/network-scripts/
ssh $i "service network restart"
ssh $i "service gmond start"
ssh $i "chkconfig --add gmond"
done
sh /tmp/configure-all-ganglia

#查看运行情况
http://hadoopbj01/ganglia/

step6:(in ganglia_m)
#我们这里使用的是hadoop-0.20.2,在使用ganglia-3.1.1收集hadoop数据时需要打补丁HADOOP-4756
#下面我们给hadoop打补丁
#打补丁的事情,还有下面的重新ant package最好在测试环境先测试通过,然后再将生成的hadoop core包复制到在线环境替换即可。
wget https://issues.apache.org/jira/browse/HADOOP-4756
cp HADOOP-4675-v9.patch HADOOP-4675-v9.patch.bak
vi HADOOP-4675-v9.patch
#下载下来的补丁需要修改一下,下面是修改的细节,修改后编译生成新的hadoop core包,里面包含最新的org.apache.hadoop.metrics.ganglia.GangliaContext31类
#用org.apache.hadoop.metrics.ganglia.GangliaContext包装的;
#(为什么下载下来的补丁不能直接patch,大家是什么情况?因为这个补丁被折腾了好久。)
diff HADOOP-4675-v9.patch HADOOP-4675-v9.patch.bak 
237c237
< Index: src/core/org/apache/hadoop/metrics/ganglia/GangliaContext.java
---
> Index: src/java/org/apache/hadoop/metrics/ganglia/GangliaContext.java
239,240c239,240
< --- src/core/org/apache/hadoop/metrics/ganglia/GangliaContext.java    (revision 771522)
< +++ src/core/org/apache/hadoop/metrics/ganglia/GangliaContext.java    (working copy)
---
> --- src/java/org/apache/hadoop/metrics/ganglia/GangliaContext.java    (revision 771522)
> +++ src/java/org/apache/hadoop/metrics/ganglia/GangliaContext.java    (working copy)
325c325
< Index: src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java
---
> Index: src/java/org/apache/hadoop/metrics/ganglia/GangliaContext31.java
327,328c327,328
< --- src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java    (revision 0)
< +++ src/core/org/apache/hadoop/metrics/ganglia/GangliaContext31.java    (revision 0)
---
> --- src/java/org/apache/hadoop/metrics/ganglia/GangliaContext31.java    (revision 0)
> +++ src/java/org/apache/hadoop/metrics/ganglia/GangliaContext31.java    (revision 0)
cd $HADOOP_HOME
patch -p0 < HADOOP-4675-v9.patch
vi build.xml
#注释掉904和908行
#(ant不熟,同事说904,908编译通不过没关系,为什么?最好自己写一个build.xml,生成包含org.apache.hadoop.metrics.ganglia.GangliaContext31类的hadoop core包,我不会写...只好用自带的。)
<target name="forrest.check" unless="forrest.home" depends="java5.check">
<!--fail message="'forrest.home' is not defined. Please pass -Dforrest.home=&lt;base of Apache Forrest installation&gt; to Ant on the command-line." /-->
</target>

<target name="java5.check" unless="java5.home">
<!--fail message="'java5.home' is not defined.  Forrest requires Java 5.  Please pass -Djava5.home=&lt;base of Java 5 distribution&gt; to Ant on the command-line." /-->
</target>
ant package
#ant package成功后,会生成build目录,下面有编译生成的新的hadoop core包
#将新生成的带有org.apache.hadoop.metrics.ganglia.GangliaContext31类的hadoop core jar包替换原有的包
cp $HADOOP_HOME/hadoop-0.20.2-core.jar $HADOOP_HOME/hadoop-0.20.2-core.jar.bak
cp $HADOOP_HOME/build/hadoop-0.20.3-dev-core.jar $HADOOP_HOME/

#更新hadoop-metrics.properties,使用新的GangliaContext31类收集hadoop数据
vi $HADOOP_HOME/conf/hadoop-metrics.properties
# Configuration of the "dfs" context for ganglia
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=239.2.11.71:8649

# Configuration of the "mapred" context for null
#mapred.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "mapred" context for file
#mapred.class=org.apache.hadoop.metrics.file.FileContext
#mapred.period=10
#mapred.fileName=/tmp/mrmetrics.log

# Configuration of the "mapred" context for ganglia
mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=239.2.11.71:8649


# Configuration of the "jvm" context for null
#jvm.class=org.apache.hadoop.metrics.spi.NullContext

# Configuration of the "jvm" context for file
#jvm.class=org.apache.hadoop.metrics.file.FileContext
#jvm.period=10
#jvm.fileName=/tmp/jvmmetrics.log

# Configuration of the "jvm" context for ganglia
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=239.2.11.71:8649

rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
rpc.period=10
rpc.servers=239.2.11.71:8649

#重启hadoop集群
#namenode
$HADOOP_HOME/bin/stop-dfs.sh
#jobtracker
$HADOOP_HOME/bin/stop-mapred.sh
#namenode
$HADOOP_HOME/bin/start-dfs.sh
#jobtracker
$HADOOP_HOME/bin/start-mapred.sh
#重启ganglia集群
service gmond restart
service gmetad restart
service httpd restart

#不出意外的话,这时候ganglia可以收集hadoop metrics了。

参考url:
#http://wiki.yepn.net/ganglia#%E9%85%8D%E7%BD%AE_node%E7%9A%84hadoop%E7%9B%91%E6%8E%A7
#http://www.pginjp.org/modules/newbb/viewtopic.php?topic_id=1235&forum=22
#http://www.pginjp.org/modules/newbb/viewtopic.php?topic_id=1234&forum=7
#http://www.ibm.com/developerworks/cn/linux/l-ganglia-nagios-1/index.html
#sed用法 http://hi.baidu.com/hovlj_1130/blog/item/e9721b7b31e8dbe02e73b3b2.html

你可能感兴趣的:(hadoop ganglia configuration)