主要是以jmx_exporter、prometheus为主导进行对hadoop的metrics进行收集,通过grafana进行展示、预警。
1、安装jmx_exporter以及配置文件
1、通过https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar
下载jmx exporter包
2、创建配置文件
startDelaySeconds: 0
hostPort: localhost:1234 #1234为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
startDelaySeconds: 0
hostPort: localhost:1235 #1235为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
3、将以上3个文件放到 /usr/local/prometheus_jmx_export_0.3.1
并执行 chown -R hadoop:root /usr/local/prometheus_jmx_export_0.3.1
4、修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh (提示:端口1234(1235)要与之前设置的jmx端口保持一致)
export HADOOP_NAMENODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1234 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9222:/usr/local/prometheus_jmx_export_0.3.1/namenode.yaml"
export HADOOP_DATANODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1235 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9322:/usr/local/prometheus_jmx_export_0.3.1/datanode.yaml"
5、修改$HADOOP_HOME/bin/hdfs 修改 namenode、datanode启动参数如下
if [ "$COMMAND" = "namenode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_JMX_OPTS $HADOOP_NAMENODE_OPTS"
.......
elif [ "$COMMAND" = "datanode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_JMX_OPTS"
if [ "$starting_secure_dn" = "true" ]; then
HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
else
HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
fi
6、重启 hadoop dfs集群,namenode机器访问 http://xxx:9222/metrics datanode机器访问 http://xxx:9322/metrics 即可获得metrics信息
2、安装Prometheus以及配置文件
1、https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz 下载 prometheus linux版本到 /usr/local/ 下,
解压 并执行 chown -R hadoop:root prometheus-2.3.2.linux-amd64.tar.gz
2、修改配置文件 prometheus.yml(注意:以下代码只是在测试上执行的,对多少台机器进行监控就需要配置多少个job,配置文件注意缩进)
- job_name: hadoop-namenode
static_configs:
- targets: ['binamenode01:9222']
- job_name: hadoop-datanode
static_configs:
- targets: ['bidatanode01:9322']
3、用户hadoop 启动 prometheus
cd /usr/local/prometheus-2.3.2.linux-amd64
./startPromethous.sh
4、http://master:9090/targets 查看是否添加成功(prometheus 执行默认端口9090)
通过点击http://bidatanode01:9222/metrics可以看到metrics数据
3、安装grafana以及配置文件
1、下载grafana,解压
cd /usr/local
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2.linux-amd64.tar.gz
tar -zxvf grafana-5.2.2.linux-amd64.tar.gz
chown -R hadoop:root grafana-5.2.2.linux-amd64
2、用户 hadoop 启动grafana
cd /usr/local/grafana-5.2.2/bin/
nohup ./grafana-server start &
3、启动后,即可通过http://master:3000/ 来访问了(默认账号密码是admin/admin,grafana默认端口3000)
4、关联Grafana和Prometheus
点击Data Sources
点击Add data source,填写数据保存
4、配置grafana预警邮件发送
1、检查mailx是否安装
rpm -qa | grep mailx
如果检查没有安装 则需要用一下命令安装
yum -y install mailx
2、编辑 /usr/local/grafana-5.2.2/conf/defaults.ini
...
#################################### SMTP / Emailing #####################
[smtp]
enabled = true
host = smtp.luckincoffee.com:587
user = [email protected]
# 如果密码中包含#或者; 密码需要用三个双引号包围 例如:"""QWER123;4!@#$"""
password = xxxxxxx #此为邮箱密码
cert_file =
key_file =
skip_verify = true
from_address = [email protected]
from_name = sys_sender
ehlo_identity =
[emails]
welcome_email_on_sign_up = false
templates_pattern = emails/*.html
...
#################################### Alerting ############################
[alerting]
# Disable alerting engine & UI features
enabled = true
# Makes it possible to turn off alert rule execution but alerting UI is visible
execute_alerts = true
3、测试 grafana 邮件发送
编辑发送邮件,点击测试 OK
=======================================================================================
2018-08-27追加:
对于yarn的接入也是大同小异
对于${HADOOP_HOME}/etc/hadoop/yarn-env.sh 添加 metrics 开启信息并制定端口
export YARN_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1236 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9422:/usr/local/prometheus_jmx_export_0.3.1/yarn.yaml"
然后修改${HADOOP_HOME}/bin/yarn
elif [ "$COMMAND" = "resourcemanager" ] ; then
CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS $YARN_RESOURCEMANAGER_OPTS"
if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m"
fi
......
elif [ "$COMMAND" = "nodemanager" ] ; then
CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/nm-config/log4j.properties
CLASS='org.apache.hadoop.yarn.server.nodemanager.NodeManager'
YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS -server $YARN_NODEMANAGER_OPTS"
if [ "$YARN_NODEMANAGER_HEAPSIZE" != "" ]; then
JAVA_HEAP_MAX="-Xmx""$YARN_NODEMANAGER_HEAPSIZE""m"
fi
重启 yarn
添加 prometheus_jmx_export下的yarn.yaml文件
修改配置文件 prometheus.yml
- job_name: yarn
static_configs:
- targets: ['binamenode01:9422']
重启 prometheus,即可
=======================================================================================
2018-08-29 添加
对于hbase的监控:
修改配置文件 $HBASE_HOME/bin/hbase
在文件
# figure out which class to run
位置添加:
#======================================= prometheus jmx export start===================================
HBASE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1237 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9522:/usr/local/prometheus_jmx_export_0.3.1/hbase.yaml"
#======================================= prometheus jmx export end ===================================
......
elif [ "$COMMAND" = "master" ] ; then
CLASS='org.apache.hadoop.hbase.master.HMaster'
if [ "$1" != "stop" ] && [ "$1" != "clear" ] ; then
HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_MASTER_OPTS"
fi
elif [ "$COMMAND" = "regionserver" ] ; then
CLASS='org.apache.hadoop.hbase.regionserver.HRegionServer'
if [ "$1" != "stop" ] ; then
HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_REGIONSERVER_OPTS"
fi
重启 hbase
添加 prometheus_jmx_export下的hbase.yaml文件
修改配置文件 prometheus.yml
- job_name: hbase
static_configs:
- targets: ['binamenode01:9522']
重启 prometheus,即可
=======================================================================================
2018-09-01 添加
kylin 监控添加
修改 kylin.sh文件,其启动项 添加 配置
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1239 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9722:/usr/local/prometheus_jmx_export_0.3.1/kylin.yaml \
重启 kylin
添加 prometheus_jmx_export下的kylin.yaml文件
修改配置文件 prometheus.yml
- job_name: hbase
static_configs:
- targets: ['binamenode01:9722']
重启 prometheus,即可
=======================================================================================
2018-09-01 添加
hive 监控添加
修改文件
${HIVE_HOME}/conf/hive-env.sh 添加如下代码
if [ "$SERVICE" = "hiveserver2" ] ; then
HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1240 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9822:/usr/local/prometheus_jmx_export_0.3.1/hive_hiveserver2.yaml"
fi
if [ "$SERVICE" = "metastore" ] ; then
HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1241 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9922:/usr/local/prometheus_jmx_export_0.3.1/hive_metastore.yaml"
fi
添加 prometheus_jmx_export下的hive_metastore.yaml、hive_hiveserver2.yaml文件
重启 hive的 metastore hiveserver2
修改配置文件 prometheus.yml
- job_name: hbase
static_configs:
- targets: ['binamenode01:9822','binamenode01:9922']
重启 prometheus,即可