(六)使用Ansible搭建分布式大数据基础环境-Hadoop高可用集群搭建

“使用Ansible搭建分布式大数据基础环境”系列文章完整包含了如何使用Ansible这一分布式运维利器,来帮我们快速搭建Hadoop2/Spark2/Hive2/ZooKeeper3/Flink1.7/ElasticSearch5等一整套大数据解决方案。本篇是系列文章的第六篇。更多后续文章敬请关注后续文章。
(一)使用Ansible搭建分布式大数据基础环境-环境准备
(二)使用Ansible搭建分布式大数据基础环境-Ansible项目创建
(三)使用Ansible搭建分布式大数据基础环境-编写第一个playbook
(四)使用Ansible搭建分布式大数据基础环境-Ansible常用Module介绍
(五)使用Ansible搭建分布式大数据基础环境-ZooKeeper集群模式搭建
(六)使用Ansible搭建分布式大数据基础环境-Hadoop高可用集群搭建
(七)使用Ansible搭建分布式大数据基础环境-MySQL安装

终于到了我们的重头戏——Hadoop分布式集群的搭建了。Hadoop做为整个开源大数据解决方案的核心,是Hive/Hbase等技术的基石,
Spark/Flink等时下最热门的技术也可以通过托管到Hadoop(实际上是Yarn)来实现分布式。

整个Hadoop集群搭建的PlayBook我们主要分两个步骤:

  1. 下载Hadoop安装包并解压,完成环境变量配置
  2. 配置core-site.xml,hdfs-site.xml,salves,yarn-site.xml,mapred-site.xml并copy到Hadoop安装目录。
  3. 运行ansible命令完成Hadoop集群搭建然后远程SSH到namenode节点直接使用NameNode集群上的start-dfs.sh/start-yarn.sh启动整个集群,并验证集群启动成功

1. 下载并解压,完成环境变量配置

tasks/main.yaml


---
# 创建DataNode数据存放目录,data_base在外层group_vars里面定义
- name: Create DataNode Data Directory
  file: path={{data_base}}/hdfs/datan``````ode state=directory
  
# 创建NameNode数据存放目录
- name: Create NameNode Data Directory
  file: path={{data_base}}/hdfs/namenode state=directory

# 创建NameNode数据存放目录
- name: Create JOURNAL Data Directory
  file: path={{data_base}}/jnode state=directory

# 开始下载,实际下载Mirror上下载地址格式:http://mirror.bit.edu.cn/apache/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz, {{filename}}等变量在当前role的vars/main.yaml定义
# 这里有个tricky,就是get_url 的dest参数的值如果到具体文件,那么如果当前机器上该文件存在,get_url没有制定force=yes的情况下就不会重新下载,如果dest只是到目录,那么每次都会重新下载
- name: Download hadoop file
  get_url: url='{{download_server}}/{{path}}/{{filename}}-{{version}}/{{filename}}-{{version}}.{{suffix}}' dest="{{download_base}}/{{filename}}-{{version}}.{{suffix}}" remote_src=yes
  register: download_result

# 校验下载文件是否成功创建
- name: Check whether registered
  stat:
    path: "{{app_base}}/{{filename}}-{{version}}"
  register: node_files

# debug调试输出
- debug:
    msg: "{{node_files.stat.exists}}"

# 在下载成功,并且解压缩目标目录不存在的情况下解压,防止解压覆盖
- name: Extract archive
  unarchive: dest={{app_base}} src='{{download_base}}/{{filename}}-{{version}}.{{suffix}}' remote_src=yes
  when: download_result.state == 'file' and node_files.stat.exists == False

# 创建软链接
- name: Add soft link
  file: src="{{app_base}}/{{filename}}-{{version}}" dest={{app_base}}/{{filename}} state=link

# 添加HADOOP_HOME环境变量,下面几条指令作用相通,become表示要切换到root执行
- name: Add ENV HADOOP_HOME
  become: yes
  lineinfile: dest=/etc/profile.d/app_bin.sh line="export HADOOP_HOME={{app_base}}/{{filename}}"

- name: Add ENV HADOOP_PREFIX
  become: yes
  lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_PREFIX=$HADOOP_HOME"

- name: Add ENV HADOOP_COMMON_HOME
  become: yes
  lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_HOME={{app_base}}/{{filename}}"

- name: Add ENV YARN_CONF_DIR
  become: yes
  lineinfile: dest=/profile.d/app_bin.sh line="export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop"

- name: Add ENV HADOOP_CONF_DIR
  become: yes
  lineinfile: dest=/profile.d/app_bin.sh line="export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop"

- name: Export PATH
  become: yes
  lineinfile: dest=/profile.d/app_bin.sh line="export PATH=$PATH:/$HADOOP_HOME/bin"

其中,使用到的hadoop相关变量都定义在当前roles/hadoop/vars/main.yaml中,只有Hadoop role使用到
var/main.yaml

path: hadoop/core
filename: hadoop
version: 2.7.7
suffix: tar.gz
appname: hadoop
# 以下是NameNode/DataNode启动使用JVM参数
HADOOP_HEAPSIZE: 1024
HADOOP_NAMENODE_INIT_HEAPSIZE: 1024
HADOOP_NAMENODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/namenode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
HADOOP_DATANODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/datanode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
HADOOP_SECONDARYNAMENODE_OPTS: -Xms1g -Xmx1g -XX:+UseG1GC -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:G1RSetUpdatingPauseTimePercent=5 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=20 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1 -Xloggc:/data/bigdata/log/secondarynamenode-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M

2. 准备core-site.xml,yarn-site.xml,slaves,mapred-site.xml等配置文件,并copy到远程所有主机的$HADOOP_HOME/etc/hadoop配置文件保存目录下

1. templates/core-site.xml

core-site.xml是common组件的配置文件,测试集群配置以下三个property即可




    
        fs.defaultFS
        hdfs://mycluster:8020
    

    
        hadoop.tmp.dir
        file:///data/bigdata/data/hadoop
    
    
        ha.zookeeper.quorum
        master1:2181,master2:2181,slave1:2181
    

2. templates/slaves

slaves用于告诉NameNode所有的DataNode有哪些,只需要配置在NameNode节点上,每个datanode的hostname/ip一行

master1
master2
slave1

3. templates/hdfs-site.xml

这里,我们搭建给予HA的高可用HDFS集群,需要配置以下property,具体property作用参看文档








    
        dfs.replication
        3
    
    
        dfs.datanode.data.dir
        file:///data/bigdata/data/hdfs/datanode
    
    
        dfs.namenode.name.dir
        file:///data/bigdata/data/hdfs/namenode
    
    
        dfs.nameservices
        mycluster
    
    
        dfs.ha.namenodes.mycluster
        nn1,nn2
    
    
        dfs.namenode.rpc-address.mycluster.nn1
        master1:8020
    
    
        dfs.namenode.rpc-address.mycluster.nn2
        master2:8020
    
    
        dfs.namenode.http-address.mycluster.nn1
        master1:50070
    
    
        dfs.namenode.http-address.mycluster.nn2
        master2:50070
    
    
        dfs.namenode.shared.edits.dir
        qjournal://master2:8485;slave1:8485;master1:8485/mycluster
    
    
        dfs.journalnode.edits.dir
        /data/bigdata/data/hdfs/jnode
    
    
        dfs.client.failover.proxy.provider.mycluster
        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
    
    
        dfs.ha.fencing.methods
        
            sshfence
            shell(/bin/true)
        
    
    
        dfs.ha.fencing.ssh.private-key-files
        /home/hadoop/.ssh/id_rsa
    
    
        dfs.ha.automatic-failover.enabled
        true
    

4. templates/yarn-site.xml

yarn-site.xml主要是HDFS集群的配置文件,这里我们搭建具备Active/Standy两个NameNode的HA方案,提高整个集群高可用,
注意参数"yarn.nodemanager.resource.memory-mb"的配置,我这里VM是32G,所以我配置了24G给Hadoop集群用,
具体配置多少需要结合你的机器实际内存打下来设置,如果设置超过主机最大内存数,DataNode可能启动失败。




    
    
        yarn.resourcemanager.ha.enabled
        true
    

    
    
        yarn.resourcemanager.cluster-id
        ns1
    


    
        
        yarn.resourcemanager.ha.rm-ids
        rm1,rm2
    


    
    
        yarn.resourcemanager.ha.automatic-failover.recover.enabled
        true
    


    

    
        yarn.resourcemanager.recovery.enabled
        true
    


    
    
        yarn.resourcemanager.hostname.rm1
        master1
    

    
    
        yarn.resourcemanager.hostname.rm2
        master2
    


    
    
        yarn.resourcemanager.store.class
        org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
    


    
    
        yarn.resourcemanager.zk-address
        master1:2181,master1:2181,slave1:2181
    


    
    
        yarn.resourcemanager.scheduler.address.rm1
        master1:8030
    


    
        yarn.resourcemanager.scheduler.address.rm2
        master2:8030
    


    
    
        yarn.resourcemanager.resource-tracker.address.rm1
        master1:8031
    

    
        yarn.resourcemanager.resource-tracker.address.rm2
        master2:8031
    


    
    
        yarn.resourcemanager.address.rm1
        master1:8032
    
    
        yarn.resourcemanager.address.rm2
        master2:8032
    


    
    
        yarn.resourcemanager.admin.address.rm1
        master1:8033
    

    
        yarn.resourcemanager.admin.address.rm2
        master2:8033
    


    
    
        yarn.resourcemanager.webapp.address.rm1
        master1:8088
    

    
        yarn.resourcemanager.webapp.address.rm2
        master2:8088
    


    
        yarn.resourcemanager.scheduler.class
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
    

    
        yarn.nodemanager.hostname
        0.0.0.0
    

    
        yarn.nodemanager.bind-host
        0
    


    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    

    
        yarn.nodemanager.aux-services.mapreduce.shuffle.class
        org.apache.hadoop.mapred.ShuffleHandler
    

    
        Classpath for typical applications.
        yarn.application.classpath
        $HADOOP_CONF_DIR
            ,$HADOOP_COMMON_HOME/share/hadoop/common/*
            ,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*
            ,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*
            ,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*
            ,$YARN_HOME/share/hadoop/yarn/*
        
    

    
    
        yarn.nodemanager.resource.memory-mb
        24576
        NodeManager可使用最大内存
    

    
        yarn.scheduler.minimum-allocation-mb
        1024
        单个任务可申请的最少物理内存量
    

    
        yarn.nodemanager.resource.cpu-vcores
        16
        NodeManager可使用最大虚拟cpu core数
    

    
        yarn.scheduler.maximum-allocation-mb
        5632
        单个任务可申请的最大物理内存量
    

5. templates/mapred-site.xml

mapred-site.xml是mapreduce任务所需配置文件










    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.jobtracker.address
        master1:8021
    
    
        mapreduce.jobhistory.address
        master1:10020
    
    
        mapreduce.jobhistory.webapp.address
        master1:19888
    
    
        mapred.max.maps.per.node
        2
    
    
        mapred.max.reduces.per.node
        1
    
    
        mapreduce.map.memory.mb
        1408
    
    
        mapreduce.map.java.opts
        -Xmx1126M
    

    
        mapreduce.reduce.memory.mb
        2816
    
    
        mapreduce.reduce.java.opts
        -Xmx2252M
    
    
        mapreduce.task.io.sort.mb
        512
    
    
        mapreduce.task.io.sort.factor
        100
    

6. tasks/main.yaml

所有配置文件都准备好了,接着编写tasks/main.yaml,完成配置文件的替换和copy到远程主机
tasks/main.yaml

- name: Copy core-site.xml
  template: src=core-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/core-site.xml" mode=0755

- name: Copy hdfs-site.xml
  template: src=hdfs-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/hdfs-site.xml" mode=0755

- name: Copy mapred-site.xml
  template: src=mapred-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/mapred-site.xml" mode=0755

- name: Copy yarn-site.xml
  template: src=yarn-site.xml dest="{{app_base}}/{{appname}}/etc/hadoop/yarn-site.xml" mode=0755

- name: Copy slaves
  template: src=slaves dest="{{app_base}}/{{appname}}/etc/hadoop/slaves" mode=0755

7. 修改远程主机上$HADOOP_HOME/etc/hadoop/hadoop-env.sh,为DataNode/NameNode设置合适启动参数

这个脚本,我们直接采用lineinfile模块来远程修改,就不实用模板文件啦。
tasks/main.yaml

- name: Update ENV JAVA_HOME
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"

- name: Update ENV HADOOP_HEAPSIZE
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_HEAPSIZE={{HADOOP_HEAPSIZE}}"

- name: Update ENV HADOOP_NAMENODE_INIT_HEAPSIZE
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_NAMENODE_INIT_HEAPSIZE={{HADOOP_NAMENODE_INIT_HEAPSIZE}}"

- name: Update HADOOP_OPTS
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line="export HADOOP_HEAPSIZE={{HADOOP_NAMENODE_INIT_HEAPSIZE}}"

- name: Update HADOOP_NAMENODE_OPTS
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_NAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} {{HADOOP_NAMENODE_OPTS}}"'

- name: Update HADOOP_DATANODE_OPTS
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_DATANODE_OPTS="${HADOOP_DATANODE_OPTS} {{HADOOP_DATANODE_OPTS}}"'

- name: Update HADOOP_SECONDARYNAMENODE_OPTS
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/hadoop-env.sh" line='export HADOOP_SECONDARYNAMENODE_OPTS="${HADOOP_NAMENODE_OPTS} {{HADOOP_SECONDARYNAMENODE_OPTS}}"'

- name: Update mapred-env.sh ENV JAVA_HOME
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/mapred-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"

- name: Update yarn-env.sh ENV JAVA_HOME
  lineinfile: dest="{{app_base}}/{{appname}}/etc/hadoop/yarn-env.sh" line="export JAVA_HOME={{JAVA_HOME}}"

3. 运行ansible命令完成Hadoop集群搭建然后远程SSH到namenode节点直接使用NameNode集群上的start-dfs.sh/start-yarn.sh启动整个集群,并验证集群启动成功

好了,到了激动人心的是时刻啦:

$] ansible-playbook hadoop.yaml -i production/hosts
PLAY [cluster] *********************************************************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************************************************
ok: [master1]
ok: [master2]
ok: [master1]

TASK [hadoop : Create Hadoop Data Directory] ***************************************************************************************************************************************************
ok: [slave1]
ok: [master2]
ok: [master1]

TASK [hadoop : Create DataNode Data Directory] *************************************************************************************************************************************************
ok: [master1]
ok: [slave1]
ok: [master2]

TASK [hadoop : Create NameNode Data Directory] *************************************************************************************************************************************************
ok: [slave1]
ok: [master1]
ok: [master2]

TASK [hadoop : Create JOURNAL Data Directory] **************************************************************************************************************************************************
ok: [master2]
ok: [slave1]
ok: [master1]

TASK [hadoop : Download hadoop file] ***********************************************************************************************************************************************************
ok: [slave1]
ok: [master2]
ok: [master1]
......

TASK [hadoop : Update yarn-env.sh ENV JAVA_HOME] ***********************************************************************************************************************************************
ok: [master2]
ok: [master1]
ok: [slave1]

PLAY RECAP *************************************************************************************************************************************************************************************
master1 : ok=29   changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
master2 : ok=29   changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
slave1 : ok=29   changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0  

然后SSH到master1上去启动集群

sbin]# $HADOOP_HOME/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/bigdata/app/hadoop/logs/yarn-root-resourcemanager-master.out
......
sbin]#$HADOOP_HOME/sbin/start-dfs.sh
......

使用jps查看进程

~]$ jps
1938861 Jps
3240019 JournalNode
433395 ResourceManager
3239463 NameNode
1280395 QuorumPeerMain
433534 NodeManager
1090267 Kafka
3240349 DFSZKFailoverController
3239695 DataNode

打开浏览器,输入:http://master1:50070/ 查看HDFS详情,如果能够打开,那么namenode1启动成功,同理,输入
master2:50070/,如果能够打开,那么namenode2启动正常,能够看到其中一个显示

Overview ‘master1:8020’ (active)

另一个显示

Overview ‘master2:8020’ (standby)

(或者master1 standy,master2 active)
就表明我们配置的HDFS HA成功。

接着打开: http://master1:8088/cluster 查看Yarn ResourceManager详情,如果能够正常打开,表明我们的Yarn也启动成功。

整个Hadoop高可用集群就搭建完成了。

你可能感兴趣的:((六)使用Ansible搭建分布式大数据基础环境-Hadoop高可用集群搭建)