prometheus+grafana 监控hadoop、yarn

主要是以jmx_exporter、prometheus为主导进行对hadoop的metrics进行收集,通过grafana进行展示、预警。
1、安装jmx_exporter以及配置文件

1、通过https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.1/jmx_prometheus_javaagent-0.3.1.jar

下载jmx exporter包

2、创建配置文件

 

startDelaySeconds: 0
hostPort: localhost:1234  #1234为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false

 

startDelaySeconds: 0
hostPort: localhost:1235  #1235为想设置的jmx端口(可设置为未被占用的端口)
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false

3、将以上3个文件放到 /usr/local/prometheus_jmx_export_0.3.1

 并执行 chown -R hadoop:root /usr/local/prometheus_jmx_export_0.3.1

4、修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh (提示:端口1234(1235)要与之前设置的jmx端口保持一致)

 

export HADOOP_NAMENODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1234 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9222:/usr/local/prometheus_jmx_export_0.3.1/namenode.yaml"
export HADOOP_DATANODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1235 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9322:/usr/local/prometheus_jmx_export_0.3.1/datanode.yaml"

5、修改$HADOOP_HOME/bin/hdfs 修改 namenode、datanode启动参数如下

 

if [ "$COMMAND" = "namenode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_JMX_OPTS $HADOOP_NAMENODE_OPTS"
.......
elif [ "$COMMAND" = "datanode" ] ; then
  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_JMX_OPTS"
  if [ "$starting_secure_dn" = "true" ]; then
    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
  else
    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
  fi

 

6、重启 hadoop dfs集群,namenode机器访问 http://xxx:9222/metrics   datanode机器访问 http://xxx:9322/metrics 即可获得metrics信息

 
2、安装Prometheus以及配置文件

1、https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz 下载 prometheus linux版本到 /usr/local/ 下,

解压 并执行  chown -R hadoop:root prometheus-2.3.2.linux-amd64.tar.gz

2、修改配置文件 prometheus.yml(注意:以下代码只是在测试上执行的,对多少台机器进行监控就需要配置多少个job,配置文件注意缩进)

 

- job_name: hadoop-namenode
  static_configs:
  - targets: ['binamenode01:9222']
- job_name: hadoop-datanode
  static_configs:
  - targets: ['bidatanode01:9322']

3、用户hadoop 启动 prometheus

cd /usr/local/prometheus-2.3.2.linux-amd64
./startPromethous.sh


 

4、http://master:9090/targets 查看是否添加成功(prometheus 执行默认端口9090)

通过点击http://bidatanode01:9222/metrics可以看到metrics数据

 
3、安装grafana以及配置文件

1、下载grafana,解压

 

 

cd /usr/local
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.2.2.linux-amd64.tar.gz
tar -zxvf grafana-5.2.2.linux-amd64.tar.gz
chown -R hadoop:root grafana-5.2.2.linux-amd64

2、用户 hadoop 启动grafana

 

cd /usr/local/grafana-5.2.2/bin/
nohup ./grafana-server start &

3、启动后,即可通过http://master:3000/ 来访问了(默认账号密码是admin/admin,grafana默认端口3000)

4、关联Grafana和Prometheus

点击Data Sources

 

点击Add data source,填写数据保存

 
4、配置grafana预警邮件发送

1、检查mailx是否安装

rpm -qa | grep mailx

如果检查没有安装 则需要用一下命令安装

 

yum -y install mailx

2、编辑 /usr/local/grafana-5.2.2/conf/defaults.ini

 

...
 
#################################### SMTP / Emailing #####################
[smtp]
enabled = true
host = smtp.luckincoffee.com:587
user = [email protected]
# 如果密码中包含#或者; 密码需要用三个双引号包围  例如:"""QWER123;4!@#$"""
password = xxxxxxx #此为邮箱密码
cert_file =
key_file =
skip_verify = true
from_address = [email protected]
from_name = sys_sender
ehlo_identity =
[emails]
welcome_email_on_sign_up = false
templates_pattern = emails/*.html
 
...
 
#################################### Alerting ############################
[alerting]
# Disable alerting engine & UI features
enabled = true
# Makes it possible to turn off alert rule execution but alerting UI is visible
execute_alerts = true
 
 

 

3、测试 grafana 邮件发送

编辑发送邮件,点击测试 OK

=======================================================================================

2018-08-27追加:

对于yarn的接入也是大同小异

对于${HADOOP_HOME}/etc/hadoop/yarn-env.sh 添加 metrics 开启信息并制定端口

export YARN_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1236 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9422:/usr/local/prometheus_jmx_export_0.3.1/yarn.yaml"
 

 

然后修改${HADOOP_HOME}/bin/yarn


 

elif [ "$COMMAND" = "resourcemanager" ] ; then
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
  CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
  YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS $YARN_RESOURCEMANAGER_OPTS"
  if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
    JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m"
  fi
......
elif [ "$COMMAND" = "nodemanager" ] ; then
  CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/nm-config/log4j.properties
  CLASS='org.apache.hadoop.yarn.server.nodemanager.NodeManager'
  YARN_OPTS="$YARN_OPTS $YARN_JMX_OPTS -server $YARN_NODEMANAGER_OPTS"
  if [ "$YARN_NODEMANAGER_HEAPSIZE" != "" ]; then
    JAVA_HEAP_MAX="-Xmx""$YARN_NODEMANAGER_HEAPSIZE""m"
  fi
 

重启 yarn

添加 prometheus_jmx_export下的yarn.yaml文件

修改配置文件 prometheus.yml

- job_name: yarn
  static_configs:
  - targets: ['binamenode01:9422']

重启 prometheus,即可

=======================================================================================

2018-08-29 添加

对于hbase的监控:

修改配置文件 $HBASE_HOME/bin/hbase

在文件

# figure out which class to run

位置添加:

#======================================= prometheus jmx export start===================================
HBASE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1237 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9522:/usr/local/prometheus_jmx_export_0.3.1/hbase.yaml"
#======================================= prometheus jmx export end ===================================
......
elif [ "$COMMAND" = "master" ] ; then
  CLASS='org.apache.hadoop.hbase.master.HMaster'
  if [ "$1" != "stop" ] && [ "$1" != "clear" ] ; then
    HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_MASTER_OPTS"
  fi
elif [ "$COMMAND" = "regionserver" ] ; then
  CLASS='org.apache.hadoop.hbase.regionserver.HRegionServer'
  if [ "$1" != "stop" ] ; then
    HBASE_OPTS="$HBASE_OPTS $HBASE_JMX_OPTS $HBASE_REGIONSERVER_OPTS"
  fi

重启 hbase

添加 prometheus_jmx_export下的hbase.yaml文件

修改配置文件 prometheus.yml

 

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9522']

 

重启 prometheus,即可

=======================================================================================

2018-09-01 添加

kylin 监控添加

修改 kylin.sh文件,其启动项 添加 配置

 

 

 

-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1239 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9722:/usr/local/prometheus_jmx_export_0.3.1/kylin.yaml \

重启 kylin

 

添加 prometheus_jmx_export下的kylin.yaml文件

 

修改配置文件 prometheus.yml

 

 

 

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9722']

 

 

 

重启 prometheus,即可

=======================================================================================

2018-09-01 添加

hive 监控添加

修改文件

${HIVE_HOME}/conf/hive-env.sh 添加如下代码

if [ "$SERVICE" = "hiveserver2" ] ; then
        HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1240 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9822:/usr/local/prometheus_jmx_export_0.3.1/hive_hiveserver2.yaml"
fi
if [ "$SERVICE" = "metastore" ] ; then
        HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1241 -javaagent:/usr/local/prometheus_jmx_export_0.3.1/jmx_prometheus_javaagent-0.3.1.jar=9922:/usr/local/prometheus_jmx_export_0.3.1/hive_metastore.yaml"
fi

添加 prometheus_jmx_export下的hive_metastore.yaml、hive_hiveserver2.yaml文件

 

 重启 hive的 metastore hiveserver2

 

修改配置文件 prometheus.yml  

 

- job_name: hbase
  static_configs:
  - targets: ['binamenode01:9822','binamenode01:9922']

 

 重启 prometheus,即可

 
 

你可能感兴趣的:(监控)