ganglia监控Hadoop完整部署

安装Ganglia所在集群的环境:
linux版本:

[root@cloud0 hadoop]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.3.1611 (Core)
Release:        7.3.1611
Codename:       Core

命令不能用:

[root@cloud0 hadoop]# lsb_release -a
bash: lsb_release: 未找到命令...
1
2
安装命令

[root@cloud0 hadoop]# yum install lsb
1
hadoop版本:

[root@cloud0 hadoop]# hadoop version
Hadoop 2.7.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41
Compiled by jenkins on 2016-01-26T00:08Z
Compiled with protoc 2.5.0
From source with checksum d0fda26633fa762bff87ec759ebe689c
This command was run using /usr/local/hadoop/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar

测试集群:有两个网卡,内网和外网,确保外网的网卡启动
网卡启动命令:ifup 网卡名
确保所有的机器都能够上网

(解释不正确的请在评论区咱们一起探讨,谢谢~)

安装场景:
服务器1 (master):安装gmond,gmetad,和web
服务器2 (slave1):仅安装gmond
服务器3 (slave2):仅安装gmond
服务器n (slaven):仅安装gmond

本次试验集群情况
 

安装流程:
首先要为每台机器安装EPEL:是yum的一个软件源,里面包含了许多基本源里没有的软件,不安装会找不到包 。
方式1:
wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -vih epel-release-latest-7.noarch.rpm
方式2:
rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
方式3
yum install epel-release

第一步:Linux开启安装EPEL YUM源
机器cloud0

[root@cloud0 hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@cloud0 hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud2

[root@cloud2 hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@cloud2 hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud3

[root@cloud3 hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@cloud3 hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

机器cloud4

[root@cloud4 hadoop]# wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@cloud4 hadoop]# rpm  -vih epel-release-latest-7.noarch.rpm

检查安装epel源是否成功:
源标识下有epel/x86_64即为成功

[root@cloud0 ~]# yum repolist

第二步:安装Ganglia ,root权限下操作
a: 机器cloud0

[root@cloud0 ~]# yum -y install ganglia-gmetad
[root@cloud0 ~]# yum -y install ganglia-web

b: 机器cloud0 、机器cloud2 、机器cloud3、机器cloud4

[root@cloud0 ~]# yum -y install ganglia-gmond
[root@cloud2 ~]# yum -y install ganglia-gmond
[root@cloud3 ~]# yum -y install ganglia-gmond
[root@cloud4 ~]# yum -y install ganglia-gmond

第三步:配置
文件说明:
gmetad.conf 配置监控哪些机器的文件。
gmond.conf 配置受监控机器文件

a: 对机器 cloud0 上的文件操作

[root@cloud0 ~]# vim /etc/ganglia/gmetad.conf
#修改
# data_source "my cluster" localhost
#为
data_source "MyCluster_TEST" cloud0 cloud2 cloud3 cloud4

[root@cloud0 ~]# vim /etc/httpd/conf.d/ganglia.conf
  7
  8
  9   Order deny,allow
 10   # Deny from all 注释这一行
 11   Allow from all
 12   
 13   # Require local
 14   # Require ip 10.1.2.3
 15   # Require host example.org
 16


Apache的配置文件 httpd.conf
修改如下:

[root@cloud0 ~]# vim /etc/httpd/conf/httpd.conf
    102
    103     #AllowOverride none
    104     #Require all denied
    105
    106    Options FollowSymLinks
    107    AllowOverride None
    108    Order deny,allow
    109    allow from all
    110
    111


注: 不修改的话,用web页面查看图形化界面时,会报错:403 没有权限访问

b: 对机器cloud0 、机器cloud2 、机器cloud3、机器cloud4 上的文件操作

[root@cloud0 ~]# vi /etc/ganglia/gmond.conf
cluster {
  name = "MyCluster_TEST"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

注: 此时的 name 要和 文件/etc/ganglia/gmetad.conf中配置的data_source 中相同

第四步: 启动服务并设置开机启动
a: 机器cloud0 : 启动服务 gmetad gmond apache

[root@cloud0 ~]# service gmetad start  
[root@cloud0 ~]# service gmond  start  
[root@cloud0 ~]# service httpd  start  

查看是否启动成功

[root@cloud0 ~]# service gmetad status
...active (running) ...
[root@cloud0 ~]# service gmond status
...active (running)...
[root@cloud0 ~]# service httpd status
... active (running) ...

设置开机启动

[root@cloud0 ~]# chkconfig gmetad on
[root@cloud0 ~]# chkconfig gmond on
[root@cloud0 ~]# systemctl enable httpd.service  

b: 机器cloud2 、机器cloud3、机器cloud4

[root@cloud2 ~]# service gmond  start
[root@cloud3 ~]# service gmond  start
[root@cloud4 ~]# service gmond  start

[root@cloud2 ~]# chkconfig gmond on
[root@cloud3 ~]# chkconfig gmond on
[root@cloud4 ~]# chkconfig gmond on

最后通过网址访问:http://service_ip/ganglia

一些注意问题:
1、gmetad收集到的信息被放到/var/lib/ganglia/rrds/

2、可以通过以下命令检查是否有数据在传输
tcpdump port 8649  




Hadoop配置:


三、配置hadoop与hbase
1、配置hadoop
hadoop-metrics2.properties
[plain] view plain copy
1.    # syntax: [prefix].[source|sink|jmx].[instance].[options]  
2.    # See package.html for org.apache.hadoop.metrics2 for details  
3.      
4.    *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink  
5.      
6.    #namenode.sink.file.filename=namenode-metrics.out  
7.      
8.    #datanode.sink.file.filename=datanode-metrics.out  
9.      
10.    #jobtracker.sink.file.filename=jobtracker-metrics.out  
11.      
12.    #tasktracker.sink.file.filename=tasktracker-metrics.out  
13.      
14.    #maptask.sink.file.filename=maptask-metrics.out  
15.      
16.    #reducetask.sink.file.filename=reducetask-metrics.out  
17.    # Below are for sending metrics to Ganglia  
18.    #  
19.    # for Ganglia 3.0 support  
20.    # *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30  
21.    #  
22.    # for Ganglia 3.1 support  
23.    *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31  
24.      
25.    *.sink.ganglia.period=10  
26.      
27.    # default for supportsparse is false  
28.    *.sink.ganglia.supportsparse=true  
29.      
30.    *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both  
31.    *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40  
32.    menode.sink.ganglia.servers=78.79.12.9:8649  
33.      
34.    datanode.sink.ganglia.servers=78.79.12.9:8649  
35.      
36.    jobtracker.sink.ganglia.servers=78.79.12.9:8649  
37.    tasktracker.sink.ganglia.servers=78.79.12.9:8649  
38.      
39.    maptask.sink.ganglia.servers=78.79.12.9:8649  
40.      
41.    reducetask.sink.ganglia.servers=78.79.12.9:8649  

2、配置hbase
hadoop-metrics.properties
[plain] view plain copy
1.    # See http://wiki.apache.org/hadoop/GangliaMetrics  
2.    # Make sure you know whether you are using ganglia 3.0 or 3.1.  
3.    # If 3.1, you will have to patch your hadoop instance with HADOOP-4675  
4.    # And, yes, this file is named hadoop-metrics.properties rather than  
5.    # hbase-metrics.properties because we're leveraging the hadoop metrics  
6.    # package and hadoop-metrics.properties is an hardcoded-name, at least  
7.    # for the moment.  
8.    #  
9.    # See also http://hadoop.apache.org/hbase/docs/current/metrics.html  
10.    # GMETADHOST_IP is the hostname (or) IP address of the server on which the ganglia   
11.    # meta daemon (gmetad) service is running  
12.      
13.    # Configuration of the "hbase" context for NullContextWithUpdateThread  
14.    # NullContextWithUpdateThread is a  null context which has a thread calling  
15.    # periodically when monitoring is started. This keeps the data sampled  
16.    # correctly.  
17.    hbase.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  
18.    hbase.period=10  
19.      
20.    # Configuration of the "hbase" context for file  
21.    # hbase.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  
22.    # hbase.fileName=/tmp/metrics_hbase.log  
23.      
24.    # HBase-specific configuration to reset long-running stats (e.g. compactions)  
25.    # If this variable is left out, then the default is no expiration.  
26.    hbase.extendedperiod = 3600  
27.      
28.    # Configuration of the "hbase" context for ganglia  
29.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  
30.    # hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext  
31.    hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  
32.    hbase.period=10  
33.    hbase.servers=10.171.29.191:8649  
34.      
35.    # Configuration of the "jvm" context for null  
36.    jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  
37.    jvm.period=10  
38.      
39.    # Configuration of the "jvm" context for file  
40.    # jvm.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  
41.    # jvm.fileName=/tmp/metrics_jvm.log  
42.      
43.    # Configuration of the "jvm" context for ganglia  
44.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  
45.    # jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext  
46.    jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  
47.    jvm.period=10  
48.    jvm.servers=10.171.29.191:8649  
49.      
50.    # Configuration of the "rpc" context for null  
51.    rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread  
52.    rpc.period=10  
53.      
54.    # Configuration of the "rpc" context for file  
55.    # rpc.class=org.apache.hadoop.hbase.metrics.file.TimeStampingFileContext  
56.    # rpc.fileName=/tmp/metrics_rpc.log  
57.      
58.    # Configuration of the "rpc" context for ganglia  
59.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  
60.    # rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext  
61.    rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  
62.    rpc.period=10  
63.    rpc.servers=10.171.29.191:8649  
64.      
65.    # Configuration of the "rest" context for ganglia  
66.    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)  
67.    # rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext  
68.    rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext31  
69.    rest.period=10  
70.    rest.servers=10.171.29.191:8649  

重启hadoop与hbase。




 监控Hadoop集群
修改Hadoop的配置文件/etc/hadoop/hadoop-metrics.properties,根据文件中的说明,修改三处:
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=30
dfs.servers=192.168.52.105:8649
 
mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=30
mapred.servers=192.168.52.105:8649
 
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=30
jvm.servers=192.168.52.105:8649
 
所有的servers都修改为安装为gmetad的机器IP。
重启Hadoop datanode:service hadoop-datanode restart
重启gmond:/usr/sbin/gmond restart



你可能感兴趣的:(监控系统,Linux,Hadoop)