“使用Ansible搭建分布式大数据基础环境”系列文章完整包含了如何使用Ansible这一分布式运维利器,来帮我们快速搭建Hadoop2/Spark2/Hive2/ZooKeeper3/Flink1.7/ElasticSearch5等一整套大数据解决方案。本篇是系列文章的第六篇。更多后续文章敬请关注后续文章。
(一)使用Ansible搭建分布式大数据基础环境-环境准备
(二)使用Ansible搭建分布式大数据基础环境-Ansible项目创建
(三)使用Ansible搭建分布式大数据基础环境-编写第一个playbook
(四)使用Ansible搭建分布式大数据基础环境-Ansible常用Module介绍
(五)使用Ansible搭建分布式大数据基础环境-ZooKeeper集群模式搭建
(六)使用Ansible搭建分布式大数据基础环境-Hadoop高可用集群搭建
(七)使用Ansible搭建分布式大数据基础环境-MySQL安装
终于到了我们的重头戏——Hadoop分布式集群的搭建了。Hadoop做为整个开源大数据解决方案的核心,是Hive/Hbase等技术的基石,
Spark/Flink等时下最热门的技术也可以通过托管到Hadoop(实际上是Yarn)来实现分布式。
整个Hadoop集群搭建的PlayBook我们主要分两个步骤:
tasks/main.yaml
---
# 创建DataNode数据存放目录,data_base在外层group_vars里面定义
- name: Create DataNode Data Directory
file: path={{data_base}}/hdfs/datan``````ode state=directory
# 创建NameNode数据存放目录
- name: Create NameNode Data Directory
file: path={{data_base}}/hdfs/namenode state=directory
# 创建NameNode数据存放目录
- name: Create JOURNAL Data Directory
file: path={{data_base}}/jnode state=directory
# 开始下载,实际下载Mirror上下载地址格式:http://mirror.bit.edu.cn/apache/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz, {{filename}}等变量在当前role的vars/main.yaml定义
# 这里有个tricky,就是get_url 的dest参数的值如果到具体文件,那么如果当前机器上该文件存在,get_url没有制定force=yes的情况下就不会重新下载,如果dest只是到目录,那么每次都会重新下载
- name: Download hadoop file
get_url: url='{{download_server}}/{{path}}/{{filename}}-{{version}}/{{filename}}-{{version}}.{{suffix}}' dest="{{download_base}}/{{filename}}-{{version}}.{{suffix}}" remote_src=yes
register: download_result
# 校验下载文件是否成功创建
- name: Check whether registered
stat:
path: "{{app_base}}/{{filename}}-{{version}}"
register: node_files
# debug调试输出
- debug:
msg: "{{node_files.stat.exists}}"
# 在下载成功,并且解压缩目标目录不存在的情况下解压,防止解压覆盖
- name: Extract archive
unarchive: dest={{app_base}} src='{{download_base}}/{{filename}}-{{version}}.{{suffix}}' remote_src=yes
when: download_result.state == 'file' and node_files.stat.exists == False
# 创建软链接
- name: Add soft link
file: src="{{app_base}}/{{filename}}-{{version}}" dest={{app_base}}/{{filename}} state=link
# 添加HADOOP_HOME环境变量,下面几条指令作用相通,become表示要切换到root执行
- name: Add ENV HADOOP_HOME
become: yes
lineinfile: dest=/etc/profile.d/app_bin.sh line="export HADOOP_HOME={{app_base}}/{{filename}}"
- name: Add ENV HADOOP_PREFIX
become: yes
lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_PREFIX=$HADOOP_HOME"
- name: Add ENV HADOOP_COMMON_HOME
become: yes
lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_HOME={{app_base}}/{{filename}}"
- name: Add ENV YARN_CONF_DIR
become: yes
lineinfile: dest=/profile.d/app_bin.sh line="export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop"
- name: Add ENV HADOOP_CONF_DIR
become: yes
lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop"
- name: Export PATH
become: yes
lineinfile: dest=/profile.d/app_bin.sh line="export PATH=$PATH:/$HADOOP_HOME/bin"
其中,使用到的hadoop相关变量都定义在当前roles/hadoop/vars/main.yaml中,只有Hadoop role使用到
var/main.yaml
path: hadoop/core
filename: hadoop
version: 2.7.7
suffix: tar.gz
appname: hadoop
# 以下是NameNode/DataNode启动使用JVM参数
HADOOP_HEAPSIZE: 1024
HADOOP_NAMENODE_INIT_HEAPSIZE: 1024
HADOOP_NAMENODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/namenode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
HADOOP_DATANODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/datanode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
HADOOP_SECONDARYNAMENODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/secondarynamenode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
core-site.xml是common组件的配置文件,测试集群配置以下三个property即可
fs.defaultFS
hdfs://mycluster:8020
hadoop.tmp.dir
file:///data/bigdata/data/hadoop
ha.zookeeper.quorum
master1:2181,master2:2181,slave1:2181
slaves用于告诉NameNode所有的DataNode有哪些,只需要配置在NameNode节点上,每个datanode的hostname/ip一行
master1
master2
slave1
这里,我们搭建给予HA的高可用HDFS集群,需要配置以下property,具体property作用参看文档
dfs.replication
3
dfs.datanode.data.dir
file:///data/bigdata/data/hdfs/datanode
dfs.namenode.name.dir
file:///data/bigdata/data/hdfs/namenode
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
master1:8020
dfs.namenode.rpc-address.mycluster.nn2
master2:8020
dfs.namenode.http-address.mycluster.nn1
master1:50070
dfs.namenode.http-address.mycluster.nn2
master2:50070
dfs.namenode.shared.edits.dir
qjournal://master2:8485;slave1:8485;master1:8485/mycluster
dfs.journalnode.edits.dir
/data/bigdata/data/hdfs/jnode
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
shell(/bin/true)
dfs.ha.fencing.ssh.private-key-files
/home/hadoop/.ssh/id_rsa
dfs.ha.automatic-failover.enabled
true
yarn-site.xml主要是HDFS集群的配置文件,这里我们搭建具备Active/Standy两个NameNode的HA方案,提高整个集群高可用,
注意参数"yarn.nodemanager.resource.memory-mb"的配置,我这里VM是32G,所以我配置了24G给Hadoop集群用,
具体配置多少需要结合你的机器实际内存打下来设置,如果设置超过主机最大内存数,DataNode可能启动失败。
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
ns1
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.ha.automatic-failover.recover.enabled
true
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.hostname.rm1
master1
yarn.resourcemanager.hostname.rm2
master2
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
yarn.resourcemanager.zk-address
master1:2181,master1:2181,slave1:2181
yarn.resourcemanager.scheduler.address.rm1
master1:8030
yarn.resourcemanager.scheduler.address.rm2
master2:8030
yarn.resourcemanager.resource-tracker.address.rm1
master1:8031
yarn.resourcemanager.resource-tracker.address.rm2
master2:8031
yarn.resourcemanager.address.rm1
master1:8032
yarn.resourcemanager.address.rm2
master2:8032
yarn.resourcemanager.admin.address.rm1
master1:8033
yarn.resourcemanager.admin.address.rm2
master2:8033
yarn.resourcemanager.webapp.address.rm1
master1:8088
yarn.resourcemanager.webapp.address.rm2
master2:8088
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.nodemanager.hostname
0.0.0.0
yarn.nodemanager.bind-host
0
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
Classpath for typical applications.
yarn.application.classpath
$HADOOP_CONF_DIR
,$HADOOP_COMMON_HOME/share/hadoop/common/*
,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*
,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*
,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*
,$YARN_HOME/share/hadoop/yarn/*
yarn.nodemanager.resource.memory-mb
24576
NodeManager可使用最大内存
yarn.scheduler.minimum-allocation-mb
1024
单个任务可申请的最少物理内存量
yarn.nodemanager.resource.cpu-vcores
16
NodeManager可使用最大虚拟cpu core数
yarn.scheduler.maximum-allocation-mb
5632
单个任务可申请的最大物理内存量
mapred-site.xml是mapreduce任务所需配置文件
mapreduce.framework.name
yarn
mapreduce.jobtracker.address
master1:8021
mapreduce.jobhistory.address
master1:10020
mapreduce.jobhistory.webapp.address
master1:19888
mapred.max.maps.per.node
2
mapred.max.reduces.per.node
1
mapreduce.map.memory.mb
1408
mapreduce.map.java.opts
-Xmx1126M
mapreduce.reduce.memory.mb
2816
mapreduce.reduce.java.opts
-Xmx2252M
mapreduce.task.io.sort.mb
512
mapreduce.task.io.sort.factor
100
所有配置文件都准备好了,接着编写tasks/main.yaml,完成配置文件的替换和copy到远程主机
tasks/main.yaml
- name: Copy core-site.xml
template: src=core-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/core-site.xml" mode=0755
- name: Copy hdfs-site.xml
template: src=hdfs-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/hdfs-site.xml" mode=0755
- name: Copy mapred-site.xml
template: src=mapred-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/mapred-site.xml" mode=0755
- name: Copy yarn-site.xml
template: src=yarn-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/yarn-site.xml" mode=0755
- name: Copy slaves
template: src=slaves dest="{{app_base}}/{{appname}}/etc/hadoop/slaves" mode=0755
这个脚本,我们直接采用lineinfile模块来远程修改,就不实用模板文件啦。
tasks/main.yaml
- name: Update ENV JAVA_HOME
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"
- name: Update ENV HADOOP_HEAPSIZE
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_HEAPSIZE={{HADOOP_HEAPSIZE}}"
- name: Update ENV HADOOP_NAMENODE_INIT_HEAPSIZE
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_NAMENODE_INIT_HEAPSIZE={{HADOOP_NAMENODE_INIT_HEAPSIZE}}"
- name: Update HADOOP_OPTS
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_HEAPSIZE={{HADOOP_NAMENODE_INIT_HEAPSIZE}}"
- name: Update HADOOP_NAMENODE_OPTS
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} {{HADOOP_NAMENODE_OPTS}}"'
- name: Update HADOOP_DATANODE_OPTS
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_DATANODE_OPTS="${HADOOP_DATANODE_OPTS} {{HADOOP_DATANODE_OPTS}}"'
- name: Update HADOOP_SECONDARYNAMENODE_OPTS
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_SECONDARYNAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} {{HADOOP_SECONDARYNAMENODE_OPTS}}"'
- name: Update mapred-env.sh ENV JAVA_HOME
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/mapred-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"
- name: Update yarn-env.sh ENV JAVA_HOME
lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/yarn-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"
好了,到了激动人心的是时刻啦:
$] ansible-playbook hadoop.yaml -i production/hosts
PLAY [cluster] *********************************************************************************************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************************************************************************************
ok: [master1]
ok: [master2]
ok: [master1]
TASK [hadoop : Create Hadoop Data Directory] ***************************************************************************************************************************************************
ok: [slave1]
ok: [master2]
ok: [master1]
TASK [hadoop : Create DataNode Data Directory] *************************************************************************************************************************************************
ok: [master1]
ok: [slave1]
ok: [master2]
TASK [hadoop : Create NameNode Data Directory] *************************************************************************************************************************************************
ok: [slave1]
ok: [master1]
ok: [master2]
TASK [hadoop : Create JOURNAL Data Directory] **************************************************************************************************************************************************
ok: [master2]
ok: [slave1]
ok: [master1]
TASK [hadoop : Download hadoop file] ***********************************************************************************************************************************************************
ok: [slave1]
ok: [master2]
ok: [master1]
......
TASK [hadoop : Update yarn-env.sh ENV JAVA_HOME] ***********************************************************************************************************************************************
ok: [master2]
ok: [master1]
ok: [slave1]
PLAY RECAP *************************************************************************************************************************************************************************************
master1 : ok=29 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
master2 : ok=29 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
slave1 : ok=29 changed=1 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
然后SSH到master1上去启动集群
sbin]# $HADOOP_HOME/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/bigdata/app/hadoop/logs/yarn-root-resourcemanager-master.out
......
sbin]#$HADOOP_HOME/sbin/start-dfs.sh
......
使用jps查看进程
~]$ jps
1938861 Jps
3240019 JournalNode
433395 ResourceManager
3239463 NameNode
1280395 QuorumPeerMain
433534 NodeManager
1090267 Kafka
3240349 DFSZKFailoverController
3239695 DataNode
打开浏览器,输入:http://master1:50070/ 查看HDFS详情,如果能够打开,那么namenode1启动成功,同理,输入
master2:50070/,如果能够打开,那么namenode2启动正常,能够看到其中一个显示
Overview ‘master1:8020’ (active)
另一个显示
Overview ‘master2:8020’ (standby)
(或者master1 standy,master2 active)
就表明我们配置的HDFS HA成功。
接着打开: http://master1:8088/cluster 查看Yarn ResourceManager详情,如果能够正常打开,表明我们的Yarn也启动成功。
整个Hadoop高可用集群就搭建完成了。