Tags: Hadoop
Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)
- Centos7.5安装分布式Hadoop2.6.0+Hbase+Hive(CDH5.14.2离线安装tar包)
-
- 主机环境
- 软件环境
- 主机规划
- 主机安装前准备
- 安装jdk1.8
- 安装zookeeper
- 安装hadoop
- 配置HDFS
- 配置YARN
- 集群初始化
- 启动HDFS
- 启动YARN
- 整个集群启动顺序
- 启动
- 停止
- Hbase安装
- Hive安装
-
主机环境
基本配置:
节点数 | 5 |
---|---|
操作系统 | CentOS Linux release 7.5.1804 (Core) |
内存 | 8GB |
流程配置:
节点数 | 5 |
---|---|
操作系统 | CentOS Linux release 7.5.1804 (Core) |
内存 | 16GB |
注: 实际生产中按照需求分配内存,如果只是在vmvare中搭建虚拟机,内存可以调整为每台主机1-2GB即可
软件环境
软件 | 版本 | 下载地址 |
---|---|---|
jdk | jdk-8u172-linux-x64 | 点击下载 |
hadoop | hadoop-2.6.0-cdh5.14.2 | 点击下载 |
zookeeeper | zookeeper-3.4.5-cdh5.14.2 | 点击下载 |
hbase | hbase-1.2.0-cdh5.14.2 | 点击下载 |
hive | hive-1.1.0-cdh5.14.2 | 点击下载 |
注: CDH5的所有软件可以在此下载:http://archive.cloudera.com/cdh5/cdh/5/
主机规划
5个节点角色规划如下:
主机名 | CDHNode1 | CDHNode2 | CDHNode3 | CDHNode4 | CDHNode5 |
---|---|---|---|---|---|
IP | 192.168.223.201 | 192.168.223.202 | 192.168.223.203 | 192.168.223.204 | 192.168.223.205 |
namenode | yes | yes | no | no | no |
dataNode | no | no | yes | yes | yes |
resourcemanager | yes | yes | no | no | no |
journalnode | yes | yes | yes | yes | yes |
zookeeper | yes | yes | yes | no | no |
hmaster(hbase) | yes | yes | no | no | no |
regionserver(hbase) | no | no | yes | yes | yes |
hive(hiveserver2) | no | no | yes | yes | yes |
注: Journalnode和ZooKeeper保持奇数个,如果需要高可用则不少于 3 个节点。具体原因,以后详叙。
主机安装前准备
- 关闭所有节点的
SELinux
sed -i 's/^SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
- 关闭所有节点防火墙
firewalld
oriptables
systemctl disable firewalld;
systemctl stop firewalld;
systemctl disable iptables; systemctl stop iptables;
- 开启所有节点时间同步
ntpdate
echo "*/5 * * * * /usr/sbin/ntpdate asia.pool.ntp.org | logger -t NTP" >> /var/spool/cron/root
- 设置所有节点语言编码以及时区
echo 'export TZ=Asia/Shanghai' >> /etc/profile
echo 'export LANG=en_US.UTF-8' >> /etc/profile . /etc/profile
- 所有节点添加hadoop用户
useradd -m hadoop
echo '123456' | passwd --stdin hadoop
# 设置PS1
su - hadoop
echo 'export PS1="\u@\h:\$PWD>"' >> ~/.bash_profile echo "alias mv='mv -i' alias rm='rm -i'" >> ~/.bash_profile . ~/.bash_profile
- 设置hadoop用户之间免密登录 首先在CDHNode1主机生成秘钥
su - hadoop
ssh-keygen -t rsa # 一直回车即可生成hadoop用户的公钥和私钥
cd .ssh vi id_rsa.pub # 去掉私钥末尾的主机名 hadoop@CDHNode1 cat id_rsa.pub > authorized_keys chmod 600 authorized_keys
压缩.ssh文件夹
su - hadoop
zip -r ssh.zip .ssh
随后分发ssh.zip到CDHNode2-5主机hadoop用户家目录解压即完成免密登录
- 主机内核参数优化以及最大文件打开数、最大进程数等参数优化 不同主机优化参数有可能不一样,故这里不作出具体优化方法,但如果Hadoop环境用于正式生产,必须优化,linux默认参数可能会导致hadoop集群性能低下。
- datanode节点(CDHNode3-5)挂载数据盘/chunk1,大小15G,请挂载后目录需要授权给hadoop用户
注: 以上操作需要使用 root
用户,到目前为止操作系统环境已经准备完成,以下开始正式安装,后面的操作如果不做特殊说明均使用 hadoop
用户
安装jdk1.8
所有节点都需要安装,安装方式都一样 解压 jdk-8u172-linux-x64.tar.gz
tar zxvf jdk-8u172-linux-x64.tar.gz
mkdir -p /home/hadoop/app
mv jdk-8u172-linux-x64 /home/hadoop/app/jdk rm -f jdk-8u172-linux-x64.tar.gz
配置环境变量 vi ~/.bash_profile
添加以下内容:
#java
export JAVA_HOME=/home/hadoop/app/jdk
export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
加载环境变量
. ~/.bash_profile
查看是否安装成功 java -version
java version "1.8.0_172"
Java(TM) SE Runtime Environment (build 1.8.0_172-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
如果出现以上结果证明安装成功。
安装zookeeper
首先在CDHNode1上安装
解压 zookeeper-3.4.5-cdh5.14.2.tar.gz
tar zxvf zookeeper-3.4.5-cdh5.14.2.tar.gz
mv zookeeper-3.4.5-cdh5.14.2 /home/hadoop/app/zookeeper rm -f zookeeper-3.4.5-cdh5.14.2.tar.gz
设置环境变量 vi ~/.bash_profile
添加以下内容:
#zk
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
加载环境变量
. ~/.bash_profile
添加配置文件 vi /home/hadoop/app/zookeeper/conf/zoo.cfg
添加以下内容:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. #数据文件目录与日志目录 dataDir=/home/hadoop/data/zookeeper/zkdata dataLogDir=/home/hadoop/data/zookeeper/zkdatalog # the port at which the clients will connect clientPort=2181 #server.服务编号=主机名称:Zookeeper不同节点之间同步和通信的端口:选举端口(选举leader) server.1=CDHNode1:2888:3888 server.2=CDHNode2:2888:3888 server.3=CDHNode3:2888:3888 # 节点变更时只需在此添加或者删除相应的节点(所有节点配置都需要修改),然后在启动新增或者停止删除的节点即可 # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
创建所需目录
mkdir -p /home/hadoop/data/zookeeper/zkdata
mkdir -p /home/hadoop/data/zookeeper/zkdatalog
mkdir -p /home/hadoop/app/zookeeper/logs
添加myid vim /home/hadoop/data/zookeeper/zkdata/myid
,添加:
1
注: 此数字来源于zoo.cfg中配置 server.1=CDHNode1:2888:3888
行server后面的1,故CDHNode2填写2,CDHNode3填写3
配置日志目录 vim /home/hadoop/app/zookeeper/libexec/zkEnv.sh
,修改以下参数为:
ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
注: /home/hadoop/app/zookeeper/libexec/zkEnv.sh
与 /home/hadoop/app/zookeeper/bin/zkEnv.sh
文件内容相同。启动脚本 /home/hadoop/app/zookeeper/bin/zkServer.sh
会优先读取/home/hadoop/app/zookeeper/libexec/zkEnv.sh
,当其不存在时才会读取 /home/hadoop/app/zookeeper/bin/zkEnv.sh
。
vim /home/hadoop/app/zookeeper/conf/log4j.properties
,修改以下参数为:
zookeeper.root.logger=INFO, ROLLINGFILE
zookeeper.log.dir=/home/hadoop/app/zookeeper/logs
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
复制zookeeper到CDHNode2-3
scp ~/.bash_profile CDHNode2:/home/hadoop
scp ~/.bash_profile CDHNode3:/home/hadoop
scp -pr /home/hadoop/app/zookeeper CDHNode2:/home/hadoop/app scp -pr /home/hadoop/app/zookeeper CDHNode3:/home/hadoop/app ssh CDHNode2 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs" ssh CDHNode2 "echo 2 > /home/hadoop/data/zookeeper/zkdata/myid" ssh CDHNode3 "mkdir -p /home/hadoop/data/zookeeper/zkdata;mkdir -p /home/hadoop/data/zookeeper/zkdatalog;mkdir -p /home/hadoop/app/zookeeper/logs" ssh CDHNode3 "echo 3 > /home/hadoop/data/zookeeper/zkdata/myid"
启动zookeeper 3个节点均启动
/home/hadoop/app/zookeeper/bin/zkServer.sh start
查看节点状态
/home/hadoop/app/zookeeper/bin/zkServer.sh status
如果一个节点为leader,另2个节点为follower,则说明Zookeeper安装成功
查看进程
jps
其中 QuorumPeerMain
进程为zookeeper
停止zookeeper
/home/hadoop/app/zookeeper/bin/zkServer.sh stop
安装hadoop
首先在CDHNode1节点安装,然后复制到其他节点 解压 hadoop-2.6.0-cdh5.14.2.tar.gz
tar zxvf hadoop-2.6.0-cdh5.14.2.tar.gz
mv hadoop-2.6.0-cdh5.14.2 /home/hadoop/app/hadoop rm -f hadoop-2.6.0-cdh5.14.2.tar.gz
设置环境变量 vi ~/.bash_profile
添加以下内容:
#hadoop
HADOOP_HOME=/home/hadoop/app/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME PATH
加载环境变量
. ~/.bash_profile
配置HDFS
配置 /home/hadoop/app/hadoop/etc/hadoop/hadoop-env.sh
, 修改以下内容
export JAVA_HOME=/home/hadoop/app/jdk
配置 /home/hadoop/app/hadoop/etc/hadoop/core-site.xml
"1.0" encoding="UTF-8"?> "text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFSname> <value>hdfs://cluster1value> property> <property> <name>hadoop.tmp.dirname> <value>/home/hadoop/data/tmpvalue> property> <property> <name>ha.zookeeper.quorumname> <value>CDHNode1:2181,CDHNode2:2181,CDHNode3:2181value> property> configuration>
配置 /home/hadoop/app/hadoop/etc/hadoop/hdfs-site.xml
"1.0" encoding="UTF-8"?> "text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replicationname> <value>3value> property> <property> <name>dfs.name.dirname> <value>/home/hadoop/data/hdfs/namevalue> property> <property> <name>dfs.data.dirname> <value>/chunk1value> property> <property> <name>dfs.permissionsname> <value>falsevalue> property> <property> <name>dfs.permissions.enabledname> <value>falsevalue> property> <property> <name>dfs.nameservicesname> <value>cluster1value> property> <property> <name>dfs.ha.namenodes.cluster1name> <value>CDHNode1,CDHNode2value> property> <property> <name>dfs.namenode.rpc-address.cluster1.CDHNode1name> <value>CDHNode1:9000value> property> <property> <name>dfs.namenode.http-address.cluster1.CDHNode1name> <value>CDHNode1:50070value> property> <property> <name>dfs.namenode.rpc-address.cluster1.CDHNode2name> <value>CDHNode2:9000value> property> <property> <name>dfs.namenode.http-address.cluster1.CDHNode2name> <value>CDHNode2:50070value> property> <property> <name>dfs.ha.automatic-failover.enabledname> <value>truevalue> property> <property> <name>dfs.namenode.shared.edits.dirname> <value>qjournal://CDHNode1:8485;CDHNode2:8485;CDHNode3:8485;CDHNode4:8485;CDHNode5:8485/cluster1value> property> <property> <name>dfs.client.failover.proxy.provider.cluster1name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue> property> <property> <name>dfs.journalnode.edits.dirname> <value>/home/hadoop/data/journaldata/jnvalue> property> <property> <name>dfs.ha.fencing.methodsname> <value>shell(/bin/true)value>