前置条件
- 各软件版本:hadoop-2.7.7、hbase-2.1.5 、jdk1.8.0_211、zookeeper-3.4.10、apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
- 至少 3 台 Centos 服务器,主机名分别为:hadoop0001、hadoop0002、hadoop0003
- 这里所有的软件将安装在 hadoop 用户的 /home/hadoop/app 目录下
- 在每台服务器设置 hosts
[hadoop@hadoop0001 ~]$ vim /etc/hosts
host 内容如下:
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.1.102 hadoop0001
10.2.1.103 hadoop0002
10.2.1.104 hadoop0003
- ssh 免密登录(此步骤可以忽略,但 Hadoop 每次启动都需要输入密码)
在 hadoop0001 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0002 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0003 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
验证免密登录
[hadoop@hadoop0001 ~]$ ssh localhost
Last login: Fri Jan 4 13:45:54 2019 //出现这个结果表示免密登录成功
- JDK 安装
JDK 版本:
Linux:jdk-8u192-linux-x64.tar.gz
JDK 环境变量配置:
# 用户家目录下
[hadoop@hadoop0001 ~]$ vim .bashrc
添加以下内容:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_192
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH
最后使环境变量生效:
# 用户家目录下
[hadoop@hadoop0001 ~]$ . .bashrc
JDK 验证:
java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version
将 hadoop0001 的 JDK 复制到其他服务器上
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0002:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0003:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0002:/etc/profile
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0003:/etc/profile
- NTP 服务搭建
每台服务器上安装 ntp
[hadoop@hadoop0001 ~]$ yum install -y ntp
hadoop0001 配置 ntp
[hadoop@hadoop0001 ~]$ vim /etc/ntp.conf
添加以下配置:
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
logfile /var/log/ntpd.log
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
完整配置文件(ntp.conf):
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
logfile /var/log/ntpd.log
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Disable the monitoring facility to prevent amplification attacks using ntpdc
# monlist command when default restrict does not include the noquery flag. See
# CVE-2013-5211 for more details.
# Note: Monitoring will not be disabled with the limited restriction flag.
disable monitor
时间服务器可参考:https://www.pool.ntp.org/zone/asia
时间同步:
[hadoop@hadoop0001 ~]$ sudo ntpdate -u ntp1.aliyun.com
16 Jul 16:46:39 ntpdate[12700]: adjust time server 120.25.115.20 offset -0.002546 sec
启动时间服务:
[hadoop@hadoop0001 ~]$ sudo systemctl start ntpd
时间服务开机自启:
[hadoop@hadoop0001 ~]$ sudo systemctl enable ntpd
在 hadoop0002 和 hadoop0003 配置 ntp 客户端
在 /etc/ntp.conf 配置如下代码
server hadoop0001
查看 ntp 是否同步
如下表示未同步
[root@hadoop0002 ~]# ntpstat
unsynchronised
time server re-starting
polling server every 8 s
如下表示已同步
[root@hadoop0001 ~]# ntpstat
synchronised to NTP server (120.25.115.20) at stratum 3
time correct to within 976 ms
polling server every 64 s
注意:同步需要 10 分钟左右
Hadoop 安装
下载 Hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
解压 Hadoop
tar -zxvf hadoop-2.7.7.tar.gz
配置 hadoop-env.sh
# 根据实际业务需要配置
export HADOOP_HEAPSIZE=1024
配置 mapred-env.sh
export JAVA_HOME=${JAVA_HOME}
配置 yarn-env.sh
# 根据实际业务需要配置
JAVA_HEAP_MAX=-Xmx512m
YARN_HEAPSIZE=1024
配置 core-site.xml
fs.defaultFS
hdfs://hadoop0001:8020
hadoop.tmp.dir
/home/hadoop/application/hadoop-2.7.7/data
fs.trash.interval
14400
配置 yarn-site.xml
yarn.resourcemanager.hostname
hadoop0001
指定 YARN 的 ResourceManager 的地址
yarn.log-aggregation-enable
true
日志聚集功能
yarn.nodemanager.aux-services
mapreduce_shuffle
Reducer 获取数据方式
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
日志保留时间设置 7 天
yarn.nodemanager.pmem-check-enabled
false
yarn.nodemanager.vmem-check-enabled
false
yarn.nodemanager.resource.memory-mb
15000
每个节点可用内存,单位MB
yarn.scheduler.minimum-allocation-mb
100
单个任务可申请最少内存,默认1024MB
yarn.scheduler.maximum-allocation-mb
15000
单个任务可申请最大内存,默认8192MB
yarn.nodemanager.resource.cpu-vcores
2
NodeManager总的可用虚拟CPU个数
yarn.scheduler.minimum-allocation-vcores
1
单个可申请的最小。比如设置为1,则运行MapRedce作业时,每个Task最少可申请1个虚拟CPU
yarn.scheduler.maximum-allocation-vcores
4
单个可申请的最大虚拟CPU个数。比如设置为4,则运行MapRedce作业时,最多可申请4个虚拟CPU
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.preemption
true
yarn.scheduler.fair.preemption.cluster-utilization-threshold
0.8
配置 hdfs-site.xml
dfs.replication
3
dfs.namenode.secondary.http-address
hadoop0001:50090
dfs.namenode.http-address
hadoop0001:50070
dfs.permissions.enabled
false
配置 mapred-site.xml
mapreduce.jobhistory.address
hadoop0001:10020
mapreduce.jobhistory.webapp.address
hadoop0001:19888
mapreduce.framework.name
yarn
配置 slaves (/home/hadoop/app/hadoop-2.7.7)
hadoop0001
hadoop0002
hadoop0003
配置 Hadoop 环境变量
在用户家目录下的 .bashrc
# added by Hadoop installer
export HADOOP_HOME=/home/hadoop/app/hadoop-2.7.7
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
使环境生效:
. .bashrc
将配置好的 hadoop 发送到其他服务器
[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0002:~/app/hadoop-2.7.7
[hadoop@hadoop0001 app]$ scp -r /hadoop-2.7.7 hadoop@hadoop0003:~/app/hadoop-2.7.7
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
在主 master 初始化 namenode
hadoop namenode -format
启动 hadoop 集群
# mater 节点 出现 NameNode、SecondaryNameNode,其他机器上出现 DataNode 说明集群搭建成功
start-all.sh
停止集群
stop-all.sh
Zookeeper 分布式集群搭建
下载 Zookeeper
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
解压 Zookeeper
tar -zxvf zookeeper-3.4.10.tar.gz
配置 zoo.cfg
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
配置内容如下:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=20
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=10
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/app/zookeeper-3.4.10/data
dataLogDir=/root/app/zookeeper-3.4.10/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop0001:2888:3888
server.2=hadoop0002:2888:3888
server.3=hadoop0003:2888:3888
在 zookeeper 根目录下创建 data 和 logs 文件夹
mkdir data
mkdir logs
在 data 目录下创建 myid
vim myid
内容为:
1
配置 zookeeper 环境变量
在用户家目录下的 .bashrc
# added by zookeeper installer
export ZOOKEEPER_HOME=/home/hadoop/app/zookeeper-3.4.10
export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib
export PATH=$PATH:$ZOOKEEPER_HOME/bin
将配置好的 zookeeper 发送到其他机器上
[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0002:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 app]$ scp -r /zookeeper-3.4.10 hadoop@hadoop0003:~/app/zookeeper-3.4.10
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
修改其他机器的 myid
将其他节点的 myid 修改为 2、3,保证每台机器的 myid 在集群内唯一
启动 zookeeper 服务
每台机器执行:
zkServer.sh start
查看 zookeeper 状态
zkServer.sh status
Hbase HA 分布式集群搭建
下载 hbase
wget http://mirror.bit.edu.cn/apache/hbase/2.1.5/hbase-2.1.5-bin.tar.gz
解压 hbase
tar -zxvf hbase-2.1.5-bin.tar.gz
配置 hbase-site.xml
hbase.rootdir
hdfs://hadoop0001:8020/hbase
hbase.cluster.distributed
true
hbase.master.port
16000
hbase.zookeeper.quorum
hadoop0001,hadoop0002,hadoop0003
hbase.regionserver.restart.on.zk.expire
true
hbase.coprocessor.abortonerror
false
hbase.zookeeper.property.dataDir
/root/app/zookeeper-3.4.10/data
hbase.unsafe.stream.capability.enforce
false
Controls whether HBase will check for stream capabilities (hflush/hsyn c).
Disable this if you intend to run on LocalFileSystem, denoted by a roo tdir
with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and
inconsistent system state in the event of process and/or node failures . If
HBase is complaining of an inability to use hsync or hflush it's most
likely not a false positive.
配置 regionservers
在 hbase 根目录下的 conf 目录下的 regionservers 文件加入如下配置:
# 主机名即 host
hadoop0001
hadoop0002
hadoop0003
配置 hbase 环境变量
在用户家目录下的 .bashrc
# added by hbase installer
export HBASE_HOME=/root/app/hbase-2.1.5/
export CLASSPATH=$CLASSPATH:$HBASE_HOME/lib
export PATH=$PATH:$HBASE_HOME/bin
将配置好的 hbase 发送到其他机器
[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0002:~/app/hbase-2.1.5
[hadoop@hadoop0001 app]$ scp -r /hbase-2.1.5 hadoop@hadoop0003:~/app/hbase-2.1.5
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0002:~/
[hadoop@hadoop0001 ~]$ scp .bashrc hadoop@hadoop0003:~/
配置 backup-masters(备用 master 节点)
在 hbase 根目录下的 conf 目录下的 backup-masters文件加入如下配置:
# master 节点配置,可配置多个
hadoop0002
启动 hbse 集群
start-hbase.sh
注意:在主节点出现 HMaster、HRegionServer(有可能没有,属于正常)及备用节点 出现 HMaster、HRegionServer;其他节点出现 HRegionServer;说明Hbase集群搭建成功;
停止 hbase 集群
stop-hbase.sh
Phoenix 集群安装
下载 Phoenix
wget http://mirror.bit.edu.cn/apache/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
解压 Phoenix
tar -zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
复制以下 jar 包到所有节点的 Habse 根目录下的 lib 目录下
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-queryserver.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-queryserver.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-5.0.0-HBase-2.0-server.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-5.0.0-HBase-2.0-server.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ cp phoenix-core-5.0.0-HBase-2.0.jar ~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0002:~/app/hbase-2.1.5/lib/
[hadoop@hadoop0001 apache-phoenix-5.0.0-HBase-2.0-bin]$ scp phoenix-core-5.0.0-HBase-2.0.jar hadoop@hadoop0003:~/app/hbase-2.1.5/lib/
配置 Phoenix 环境变量(无需复制到其他节点)
# added by phoenix installer
export PHOENIX_HOME=/root/app/apache-phoenix-5.0.0-HBase-2.0-bin
export CLASSPATH=$CLASSPATH:$PHOENIX_HOME
export PATH=$PATH:$PHOENIX_HOME/bin
启动 Phoenix queryserver 模式
queryserver.py start
停止 Phoenix queryserver 模式
queryserver.py stop
连接 Phoenix queryserver
sqlline-thin.py hadoop0001:8765
客户端 jdbc 连接(jdbcUrl)
jdbc:phoenix:thin:url=http://10.2.1.102:8765?doAs=alice