大数据平台组件日常运维操作说明(Hadoop/Zookeeper/Kafa/ES/Mysql/Spark/Flume/Logstash/Tomcat)

Hadoop 日常运维操作说明

hdfs

生产环境hadoop为30台服务器组成的集群,统一安装配置,版本号为2.7.7
部署路径:/opt/hadoop

启动用户:hadoop

配置文件:

  • /opt/hadoop/config/hdfs-site.xml
  • /opt/hadoop/config/core-site.xml

hadoopy运行环境变量配置文件:

  • hadoop-env.sh
  • journalnode.env
  • datanode.env
  • namenode.env

hadoop系统服务配置文件:

  • zkfc.service
  • journalnode.service
  • namenode.service
  • datanode.service

存储快照文件snapshot的目录:/data/hadoop/data
运行日志输出目录:/data/hadoop/logs

Hadoop运行正常时会有下列端口

  • 50010 HDFS datanode 服务端口,用于数据传输
  • 50075 HDFS namenode http服务的端口
  • 50020 HDFS namenode ipc服务的端口
  • 50070 HDFS namenode http服务的端口,active namenode中启动
  • 8020 HDFS namenode 接收Client连接的RPC端口,用于获取文件系统metadata信息。

[hadoop@hostname-2 ~]$ netstat -ln|egrep "(50010|50075|50475|50020|50070|50470|8020|8019)"
tcp        0      0 172.0.0.2:50070      0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN

Hadoop官方参考文档

hadoop组件启动与停止命令

# 启动
sudo systemctl start namenode.service
sudo systemctl start datanode.service
sudo systemctl start journalnode.service
# 停止
sudo systemctl stop namenode.servicec
sudo systemctl stop datanode.servicec
sudo systemctl stop journalnode.service
# 查看启动状态
sudo systemctl status namenode.service
sudo systemctl status datanode.service
sudo systemctl status journalnode.service
# 开机时自动自动启动
sudo systemctl enable namenode.service
sudo systemctl enable datanode.service
sudo systemctl enable journalnode.service

查看hadoop组件运行状态参数

# 查看当前namenode节点
[hadoop@hostname-2 ~]$ hdfs getconf -namenodes
hostname-3 hostname-2
# 查看集群datanode节点配置文件
[hadoop@hostname-2 ~]$ hdfs getconf -includeFile
/opt/hadoop/config/slaves
# 查看datanode rpc端口
[hadoop@hostname-2 ~]$ hdfs getconf -nnRpcAddresses
hostname-3:9000
hostname-2:9000

hdfs getconf -confKey [key]
# dfsadmin
[hadoop@hostname-2 ~]$  hdfs dfsadmin -report -live
Configured Capacity: 422346469376 (393.34 GB)
Present Capacity: 317439557632 (295.64 GB)
DFS Remaining: 315510235136 (293.84 GB)
...
-------------------------------------------------
Live datanodes (3):

Name: 172.0.0.3:50010 (hostname-3)
Hostname: iZ8vbacq1jxnabyu7992d1Z
Decommission Status : Normal
...

Name: 172.0.0.1:50010 (hostname-1)
Hostname: iZ8vb2s7y1j8fqmqbmufz9Z
Decommission Status : Normal
...

Name: 172.0.0.2:50010 (iZ8vbacq1jxnabyu7992d2Z)
Hostname: iZ8vbacq1jxnabyu7992d2Z
Decommission Status : Normal
...

# haadmin 查看namenode主节点
[hadoop@hostname-2 ~]$ hdfs haadmin -getServiceState hostname-2
active

yarn

启动用户: hadoop

配置文件:

  • /opt/hadoop/config/yarn-site.xml

环境变量文件:

  • yarn.env
  • zkfc.env

系统服务配置文件:

  • yarn-nm.service
  • yarn-rm.service
  • zkfc.service

hadoop Yarn组件运行正常时会有下列端口

  • 8030 YARN ResourceManager scheduler组件的IPC端口
  • 8031 YARN ResourceManager RPC
  • 8032 YARN ResourceManager RM的applications manager(ASM)端口
  • 8033 YARN ResourceManager IPC
  • 8088 YARN ResourceManager http服务端口
  • 10020 YARN JobHistory Server IPC
  • 18080 YARN JobHistory Server http服务端口

[hadoop@hostname-2 ~]$ netstat -ln|egrep "(8032|8030|8031|8033|8088)"
tcp        0      0 172.0.0.2:8088       0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.2:8030       0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.2:8031       0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.2:8032       0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.2:8033       0.0.0.0:*               LISTEN
[hadoop@hostname-1 ~]$ netstat -ln|egrep "(10020|18080)"
tcp        0      0 0.0.0.0:18080           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:10020           0.0.0.0:*               LISTEN

Yarn官方文档

yarn服务启动与停止

# 启动
sudo systemctl start yarn-rm.service
sudo systemctl start yarn-nm.service
# 停止
sudo systemctl stop yarn-rm.servicec
sudo systemctl stop yarn-nm.servicec
# 查看启动状态
sudo systemctl status yarn-rm.service
sudo systemctl status yarn-nm.service
# 开机时自动自动启动
sudo systemctl enable yarn-rm.service
sudo systemctl enable yarn-nm.service

yarn状态检查命令


[hadoop@hostname-2 ~]$ yarn node -list
Total Nodes:2
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
iZ8vbacq1jxnabyu7992d1Z:46719           RUNNING iZ8vbacq1jxnabyu7992d1Z:8042                               0
iZ8vb2s7y1j8fqmqbmufz9Z:40138           RUNNING iZ8vb2s7y1j8fqmqbmufz9Z:8042                               0

查看某节点状态

[hadoop@hostname-2 ~]$ yarn rmadmin -getServiceState hostname-2
active

请求服务执行运行状况检查。如果检查失败,RMAdmin工具将以非零退出代码退出。

[hadoop@hostname-2 ~]$ yarn rmadmin -checkHealth hostname-2 ; echo $?
0

zookeeper日常运维操作说明

生产环境zookeeper为三台服务器组成的集群,统一安装配置,版本号为3.4.14

启动用户:logmanager

部署路径:/opt/zookeeper
配置文件:/opt/zookeeper/conf/zoo.cfg
存储快照文件snapshot的目录:/data/zookeeper/data
事务日志输出目录:/data/zookeeper/logs
运行日志输出目录:/data/zookeeper/logs

zookeeper运行正常时会有3个端口,分别为2181,2888,3888。其中

  • 2181为对外提供服务的端口,每个节点都会启动
  • 2888为Leader和Follower交互的端口,这个端口仅再leader服务器中启动
  • 3888为zookeeper组件Leader Election使用的端口,每个节点都会启动

[hadoop@hostname-3 ~]$ netstat -ln|egrep "(2181|2888|3888)"
tcp        0      0 0.0.0.0:2181         0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.3:2888       0.0.0.0:*               LISTEN
tcp        0      0 172.0.0.3:3888       0.0.0.0:*               LISTEN

zookeeper 启动与停止

# 启动
sudo systemctl start zookeeper.service
# 查看启动状态
systemctl status zookeeper.service
# 停止
sudo systemctl stop zookeeper.service
# 服务开机自启动
sudo systemctl enable zookeeper.service

查看zookeeper节点状态

方法1

[hadoop@hostname-1 ~]$ /opt/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader

方法2

[hadoop@hostname-1 ~]$ echo stat | nc 127.0.0.1 2181
Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
Clients:
 /127.0.0.1:60696[0](queued=0,recved=1,sent=0)
 /172.0.0.2:53934[1](queued=0,recved=595720,sent=595742)
 /172.0.0.3:42448[1](queued=0,recved=594837,sent=594837)

Latency min/avg/max: 0/0/137
Received: 1190603
Sent: 1190624
Connections: 3
Outstanding: 0
Zxid: 0x1240000e71a
Mode: follower
Node count: 229

测试是否启动了该Server,若回复imok表示已经启动。

[hadoop@hostname-1 ~]$ echo ruok | nc 127.0.0.1 2181
imok

kafka日常运维操作说明

生产环境kafka为三台服务器组成的集群,统一安装配置,版本号为2.11-1.10

启动用户:logmanager

部署路径:/opt/kafka
配置文件:/opt/kafka/config/server.properties
存储数据目录:/data/kafka/data
日志输出目录:/data/kafka/logs

elasticsearch运行正常时会涉及2个端口,分别为2181,9092。其中

  • 2181为zookeeper提供服务的端口,kafka需要在zookeeper中存储broker和consumer信息。zookeeper记录了所有broker的存货状态,broker会想zookeeper发送心跳请求来上报自己的状态。
  • 9092为kafka集群间通信地址

[hadoop@hostname-3 ~]$ netstat -ln|egrep "(2181|9092)"
tcp        0      0 0.0.0.0:2181            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:9092            0.0.0.0:*               LISTEN

服务启动与停止

# 启动
sudo systemctl start kafka.service
# 查看状态
sudo systemctl status kafka.service
# 停止
sudo systemctl stop kafka.service
# 服务开机自启动
sudo sysctlctl enable kafka.service

查看当前kafka topic列表

[hadoop@hostname-1 opt]$ kafka-topics.sh -list --zookeeper 172.0.0.3:2181  
EXECUTE_LOG_TOPIC
METRICBEAT_LOG
ROBOT_MAIN_PROCESS_EXECUTE_MESSAGE
__consumer_offsets
agent-status
flume-sink
logmanager-filebeat
logmanager-flume
logstash-filebeat
origin-biz-log

查看topic信息

[hadoop@hostname-1 opt]$ kafka-topics.sh --zookeeper 172.0.0.2:2181 --topic "agent-status" --describe
Topic: agent-status     PartitionCount: 1       ReplicationFactor: 3    Configs:
        Topic: agent-status     Partition: 0    Leader: 1       Replicas: 1,2,3 Isr: 1,2,3

查看指定group信息

[hadoop@hostname-1 opt]$ ./kafka-consumer-groups.sh --new-consumer --bootstrap-server 192.168.52.131:9092 --group test2 --describe

查看版本

[hadoop@hostname-1 opt]$ find ./libs/ -name \*kafka_\* | head -1 | grep -o '\kafka [^\n]*'

查询集群描述

[hadoop@hostname-1 opt]$ bin/kafka-topics.sh --describe --zookeeper 127.0.0.1:2181

elasticsearch日常运维操作说明

生产环境elasticsearch为三台服务器组成的集群,统一安装配置,版本号为5.4.3

启动用户:logmanager

部署路径:/opt/elasticsearch
配置文件:/opt/elasticsearch/config/elasticsearch.yml
存储数据目录:/data/elasticsearch/data
日志输出目录:/data/elasticsearch/logs

elasticsearch运行正常时会有3个端口,分别为9200,9300。其中

  • 9200为对外提供服务的端口,每个节点都会启动
  • 9300为Leader和Follower交互的端口,这个端口仅再leader节点中启动

[hadoop@hostname-3 ~]$ netstat -ln|egrep "(9300|9200)"
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:9300            0.0.0.0:*               LISTEN

服务 启动与停止

# 启动
sudo systemctl start elasticsearch.service
# 查看启动状态
sudo systemctl status elasticsearch.service
# 停止
sudo systemctl stop elasticsearch.service
# 服务开机自启动
sudo systemctl enable elasticsearch.service

查看es版本信息

[user@hostname-1 ~]$ curl -XGET localhost:9200
{
  "name" : "hostname-1",
  "cluster_name" : "elastic-cyclone",
  "cluster_uuid" : "-OfufJGMQfylFBm34d0SKg",
  "version" : {
    "number" : "5.4.3",
    "build_hash" : "eed30a8",
    "build_date" : "2017-06-22T00:34:03.743Z",
    "build_snapshot" : false,
    "lucene_version" : "6.5.1"
  },
  "tagline" : "You Know, for Search"
}

操作命令

调整副本数: `curl -XPUT http://localhost/yunxiaobai/_settings?pretty -d ‘{“settings”:{“index”:{“number_of_replicas”:”10″}}}’`
创建index:`curl -XPUT ‘localhost:9200/yunxiaobai?pretty’`
插入数据:`curl -XPUT ‘localhost:9200/yunxiaobai/external/1?pretty’ -d ‘ { “name”:”yunxiaobai” }’`
获取数据:`curl -XGET ‘localhost:9200/yunxiaobai/external/1?pretty’`
删除索引:`curl -XDELETE ‘localhost:9200/jiaozhenqing?pretty’`
屏蔽节点:`curl -XPUT 127.0.0.1:9200/_cluster/settings?pretty -d ‘{ “transient” :{“cluster.routing.allocation.exclude._ip” : “10.0.0.1”}}’`
删除模板:`curl -XDELETE http://127.0.0.1:9200/_template/metricbeat-6.2.4`
调整shard刷新时间:`curl -XPUT http://localhost:9200/metricbeat-6.2.4-2018.05.21/_settings?pretty -d ‘{“settings”:{“index”:{“refresh_interval”:”30s”} }}’`
提交模板配置文件:`curl -XPUT localhost:9200/_template/devops-logstore-template -d @devops-logstore.json`
查询模板: `curl -XGET localhost:9200/_template/devops-logstor-template`
查询线程池:http://localhost:9200/_cat/thread_pool/bulk?v&h=ip,name,active,rejected,completed

mysql日常运维操作说明

生产环境mysql为三台服务器组成的集群,统一安装配置,版本号为5.7

启动用户:mysql

部署路径:/usr/share/mysql
配置文件:/etc/my.cnf
存储数据目录:/var/lib/mysql/mysql
访问日志路径:/var/log/mysqld.log

二进制日志路径:/var/lib/mysql

Mysql运行正常时会有1个端口,为3306

  • 3306为mysql对外提供服务的端口

[hadoop@hostname-3 ~]$ netstat -ln|egrep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN

服务启动与停止

# 启动
sudo systemctl start mysqld.service
# 查看状态
sudo systemctl status mysqld.service
# 停止
sudo systemctl stop mysqld.service
# 服务开机自启动
sudo systemctl enable mysqld.service

测试登录mysql并查看数据库信息

mysql -u 用户名 -p
# 输入密码
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test               |
+--------------------+

spark日常运维操作说明

生产环境spark为为sprak on yarn的形式部署,spark中运行的任务由yarn调度,spark historyjob服务需要独立配置,historyjob启动是工作在18080端口。启动时spark版本号为1.7.0

启动用户:hadoop

部署路径:/opt/spark
配置文件:/opt/spark/conf/spark-conf.properties
存储数据目录:/dat/spark/data
访问日志路径:/data/spark/logs

Spark运行正常时会有2个端口,为18080和8088

  • 18080为spark history server对外提供服务的端口,可用于查看历史任务记录

# 启动
systemctl start spark-history.service
# 开机自启动
systemctl enable spark-history.service
# 停止
systemctl stop spark-history.service
# 查看状态
systemctl status spark-history.service

flume日常运维操作说明

生产环境flume为多个节点独立运行,在需要运行的服务器上部署,独立安装配置,版本号为1.7.0

启动用户:logmanager

部署路径:/opt/flume
配置文件:/opt/flume/conf/flume-conf.properties
存储数据目录:/dat/flume/data
访问日志路径:/data/flume/logs

flume运行正常时会有1个端口,为4541

  • 4541为flume对外提供服务的端口

[hadoop@hostname-2 ~]$ netstat -ln|egrep 4541
tcp        0      0 172.0.0.2:4541       0.0.0.0:*               LISTEN

服务启动和停止

# 启动
sudo systemctl start flume.service
# 查看运行状态
sudo systemctl status flume.service
# 停止
sudo systemctl stop flume.service
# 服务开机自启动
sudo cyctemctl enable flume.service

查看master节点端口

[hadoop@hostname-2 ~]$ sudo netstat -lntp |grep 4541
tcp        0      0 172.0.0.2:4541       0.0.0.0:*               LISTEN  7774/java

查看队列内消息

查看队列内消息可安装kafka-tools,dump部分topic数据查看内容。

logstash日常运维操作说明

生产环境logstash为三台服务器组成的集群,统一安装配置,版本号为2.4.1

启动用户:logmanager

部署路径:/opt/logstash
配置文件:/opt/logstash/conf/logstash.yml
存储数据目录:/dat/logstash/data
访问日志路径:/data/logstash/logs

logstash运行正常时会有2个端口,为5044和9600

  • 5044为logstash对外提供服务的端口,用于接收数据
  • 9600端口用于获取logstash基本信息

[hadoop@hostname-2 ~]$ netstat -ln|egrep "(9600|5044)"
tcp        0      0 0.0.0.0:5044            0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:9600          0.0.0.0:*               LISTEN

服务启动和停止

# 启动
sudo systemctl start logstash.service
# 查看运行状态
sudo systemctl status logstash.service
# 停止
sudo systemctl stop logstash.service
# 服务开机自启动
sudo systemctl enable logstash.service

9600端口用于获取logstash基本信息

[hadoop@hostname-2 ~]$ curl -XGET 'localhost:9600/?pretty'
{
  "host" : "iZ8vbacq1jxnabyu7992d2Z",
  "version" : "7.7.0",
  "http_address" : "127.0.0.1:9600",
  "id" : "c9662897-7c12-4eb3-a92c-772da4536730",
  "name" : "logmanager",
  "ephemeral_id" : "99c86cbf-182a-46c5-9cc9-05f5bd13075b",
  "status" : "green",
  "snapshot" : false,
  "pipeline" : {
    "workers" : 8,
    "batch_size" : 125,
    "batch_delay" : 50
  },
  "build_date" : "2020-05-12T04:34:14+00:00",
  "build_sha" : "d8ed01157be10d78e9910f1fb21b137c5d25529e",
  "build_snapshot" : false
}

Tomcat日常运维操作说明

生产环境tomcat为单节点,可通过负载均衡实现集群,版本号为8.5.60

启动用户:logmanager

部署路径:/opt/tomcat
配置文件:/opt/tomcat/conf/server.xml
存储数据目录:/dat/tomcat/data
访问日志路径:/data/tomcat/logs

logstash运行正常时会有2个端口,为8009和8080或8761

  • 8009为tomcat控制台端口
  • 8080为tomcat提供web服务端口
  • 8761为spring注册中心euraka服务端口

[hadoop@hostname-1 conf]$ netstat -ln|egrep "(8009|8080)"
tcp        0      0 0.0.0.0:8009            0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN

服务启动和停止

# 启动
sudo systemctl start tomcat.service
# 查看运行状态
sudo systemctl status tomcat.service
# 停止
sudo systemctl stop tomcat.service
# 服务开机自启动
sudo systemctl enable tomcat.service

你可能感兴趣的:(大数据,java-zookeeper,运维)