配置 hadoop+yarn+hbase+storm+kafka+spark+zookeeper 高可用集群,同时安装相关组建:JDK,MySQL,Hive,Flume
虚拟机数量:8 台
操作系统版本:CentOS-7-x86_64-Minimal-1611.iso
每台虚拟机的配置如下:
虚拟机名称 | CPU核心数 | 内存(G) | 硬盘(G) | 网卡 |
---|---|---|---|---|
hadoop1 | 2 | 8 | 100 | 2 |
hadoop2 | 2 | 8 | 100 | 2 |
hadoop3 | 2 | 8 | 100 | 2 |
hadoop4 | 2 | 8 | 100 | 2 |
hadoop5 | 2 | 8 | 100 | 2 |
hadoop6 | 2 | 8 | 100 | 2 |
hadoop7 | 2 | 8 | 100 | 2 |
hadoop8 | 2 | 8 | 100 | 2 |
8节点Hadoop+Yarn+Spark+Hbase+Kafka+Storm+ZooKeeper高可用集群搭建:
集群 | 虚拟机节点 |
---|---|
HadoopHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
YarnHA集群 | hadoop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
ZooKeeper集群 | hadoop3,hadoop4,hadoop5 |
Hbase集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
Kafka集群 | hadoop6,hadoop7,hadoop8 |
Storm集群 | hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 |
SparkHA集群 | hadooop1,hadoop2,hadoop3,hadoop4,hadoop5,hadoop6,hadoop7,hadoop8 |
集群详细规划:
虚拟机名称 | IP | 安装软件 | 进程 | 功能 |
---|---|---|---|---|
hadoop1 | 11.11.11.121 | jdk,hadoop,mysql | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),master(spark) | hadoop的NameNode节点,spark的master节点,yarn的ResourceManeger节点 |
hadoop2 | 11.11.11.122 | jdk,hadoop,spark | NameNode,ResourceManeger,DFSZKFailoverController(zkfc),worker(spark) | hadoop(yarn)的容灾节点,spark的容灾节点 |
hadoop3 | 11.11.11.123 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HMaster,…(storm),worker(spark) | storm,hbase,zookeeper的主节点 |
hadoop4 | 11.11.11.124 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop5 | 11.11.11.125 | jdk,hadoop,zookeeper,hbase,storm,spark | DataNode,NodeManager,journalnode,QuorumPeerMain(zk),HRegionServer,…(storm),worker(spark) | |
hadoop6 | 11.11.11.126 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | kafka的主节点 |
hadoop7 | 11.11.11.127 | jdk,hadoop,hbase,storm,kafka,spark | DataNode,NodeManager,journalnode,kafka,HRegionServer,…(storm),worker(spark) | |
hadoop8 | 11.11.11.128 | jdk,hadoop,kafka,spark | DataNode,NodeManager,journalnode,kafka,worker(spark) |
JDK版本: jdk-8u65-linux-x64.tar.gz
hadoop版本: hadoop-2.7.6.tar.gz
zookeeper版本: zookeeper-3.4.12.tar.gz
hbase版本: hbase-1.2.6-bin.tar.gz
Storm版本: apache-storm-1.1.3.tar.gz
kafka版本: kafka_2.11-2.0.0.tgz
MySQL版本: mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
hive版本: apache-hive-2.3.3-bin.tar.gz
Flume版本: apache-flume-1.8.0-bin.tar.gz
Spark版本: spark-2.3.1-bin-hadoop2.7.tgz
每台主机节点都进行相同设置
千万注意:不要在root权限下配置集群
$> groupadd centos
$> useradd centos -g centos
$> passwd centos
$> nano /etc/sudoers
添加如下语句:
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
centos ALL=(ALL) ALL
$> sudo nano /etc/hostname
用户名:hadoop1,hadoop2.....
$> sudo nano /etc/hosts
添加内容如下:
127.0.0.1 localhost
11.11.11.121 hadoop1
11.11.11.122 hadoop2
11.11.11.123 hadoop3
11.11.11.124 hadoop4
11.11.11.125 hadoop5
11.11.11.126 hadoop6
11.11.11.127 hadoop7
11.11.11.128 hadoop8
命令:pwd。形如 ~ 转换为 /home/centos。方便确定当前文件的路径
[centos@hadoop1 ~]$ sudo nano /etc/profile
在末尾添加:
export PS1='[\u@\h `pwd`]\$'
// source /etc/profile 马上生效
[centos@hadoop1 /home/centos]$
hadoop1 和 hadoop2 是容灾节点(解决单点故障问题),所以这两个主机除了能互相访问之外,还需要登录其他主机节点,可以免密登录
[centos@hadoop1 /home/centos]$ yum list installed | grep ssh
[centos@hadoop1 /home/centos]$ ps -Af | grep sshd
[centos@hadoop1 /home/centos]$ mkdir .ssh
[centos@hadoop1 /home/centos]$ chmod 700 ~/.ssh
//生成秘钥对
[centos@hadoop1 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
//进入 ~/.ssh 文件夹下
[centos@hadoop1 /home/centos]$ cd ~/.ssh
//追加公钥到~/.ssh/authorized_keys文件中
[centos@hadoop1 /home/centos/.ssh]$ cat id_rsa.pub >> authorized_keys
// 修改authorized_keys文件的权限为644
[centos@hadoop1 /home/centos/.ssh]$ chmod 644 authorized_keys
//重名名
[centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop1.pub
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop2:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop3:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop4:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop5:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop6:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop7:/home/centos/.ssh/authorized_keys
[centos@hadoop1 /home/centos/.ssh]$ scp id_rsa_hadoop1.pub centos@hadoop8:/home/centos/.ssh/authorized_keys
//生成秘钥对
[centos@hadoop2 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
//重名名
[centos@hadoop2 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop2.pub
//追加公钥到~/.ssh/authorized_keys文件中
[centos@hadoop1 /home/centos/.ssh]$ cat id_rsa_hadoop2.pub >> authorized_keys
//将authorized_keys分发给其他节点
[centos@hadoop1 /home/centos/.ssh]$ scp authorized_keys centos@hadoop:/home/centos/.ssh/
... 分发给其他节点
为了保证集群正常启动,先要关闭各台主机的防火墙,一些命令如下:
[cnetos 6.5之前的版本]
$>sudo service firewalld stop //停止服务
$>sudo service firewalld start //启动服务
$>sudo service firewalld status //查看状态
[centos7]
$>sudo systemctl enable firewalld.service //"开机启动"启用
$>sudo systemctl disable firewalld.service //"开机自启"禁用
$>sudo systemctl start firewalld.service //启动防火墙
$>sudo systemctl stop firewalld.service //停止防火墙
$>sudo systemctl status firewalld.service //查看防火墙状态
[开机自启]
$>sudo chkconfig firewalld on //"开启自启"启用
$>sudo chkconfig firewalld off //"开启自启"禁用
提示:为了全局可用,脚本都放在 /usr/local/bin 目录下。只在hadoop1和hadoop2节点配置
//以本地用户身份创建xcall.sh
$>touch ~/xcall.sh //centos
//将其复制到 /usr/local/bin 目录下
$>sudo mv xcall.sh /usr/local/bin
//修改权限
$>sudo chmod a+x xcall.sh
//添加脚本
$>sudo nano xcall.sh
#!/bin/bash
params=$@
i=1
for (( i=1 ; i <= 8 ; i = $i + 1 )) ; do
echo ============= s$i $params =============
ssh hadoop$i "$params"
done
#!/bin/bash
if [[ $# -lt 1 ]] ; then echo no params ; exit ; fi
p=$1
#echo p=$p
dir=`dirname $p`
#echo dir=$dir
filename=`basename $p`
#echo filename=$filename
cd $dir
fullpath=`pwd -P .`
#echo fullpath=$fullpath
user=`whoami`
for (( i = 1 ; i <= 8 ; i = $i + 1 )) ; do
echo ======= hadoop$i =======
rsync -lr $p ${user}@hadoop$i:$fullpath
done ;
准备JDK:jdk-8u65-linux-x64.tar.gz,将其上传到主机hadoop1的 /home/centos/localsoft 目录下,该目录用于存放所有需要安装的软件安装包
在根目录下(/)新建一个 soft 文件夹,并将该文件夹的用户组权限和用户权限改为 centos,该文件夹下为所有需要安装的软件
//创建soft文件夹
[centos@hadoop1 /home/centos]$ sudo mkdir /soft
//修改权限(centosmin0是自己的本机用户名)
[centos@hadoop1 /home/centos]$ sudo chown centos:centos /soft
// 从 /home/centos/localsoft 下解压到 /soft
[centos@hadoop1 /home/centos/localsoft]$ tar -xzvf jdk-8u65-linux-x64.tar.gz -C /soft
// 创建符号链接
[centos@hadoop1 /soft]$ ln -s /soft/jdk1.8.0_65 jdk
// 进入profile
[centos@hadoop1 /home/centos]$ sudo nano /etc/profile
// 环境变量
# jdk
export JAVA_HOME=/soft/jdk
export PATH=$PATH:$JAVA_HOME/bin
// source 立即生效
[centos@hadoop1 /home/centos]$ source /etc/profile
[centos@hadoop1 /home/centos]$ java -version
// 显示如下
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
// 从 /home/centos/localsoft 下解压到 /soft
[centos@hadoop1 /home/centos/localsoft]$ tar -xzvf hadoop-2.7.6.tar.gz -C /soft
// 创建符号链接
[centos@hadoop1 /soft]$ ln -s /soft/hadoop-2.7.6 hadoop
// 进入profile
[centos@hadoop1 /home/centos]$ sudo nano /etc/profile
// 环境变量
# hadoop
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin
// source 立即生效
[centos@hadoop1 /home/centos]$ source /etc/profilea
// 检测是否安装成功
[centos@hadoop1 /home/centos]$ hadoop version
显示如下:
Hadoop 2.7.6
Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 085099c66cf28be31604560c376fa282e69282b8
Compiled by kshvachk on 2018-04-18T01:33Z
Compiled with protoc 2.5.0
From source with checksum 71e2695531cb3360ab74598755d036
This command was run using /soft/hadoop-2.7.6/share/hadoop/common/hadoop-common-2.7.6.jar
提示: 现在的操作在hadoop1节点上,先不用在其他节点进行安装配置,等后续配置结束后再一起将配置传给其他节点,能大大节省工作量。
基于hadoop的原生NameNode HA搭建,后面会与zookeeper集群进行整合,实现自动容灾(Yarn+NameNode)
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop ha
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop full
[centos@hadoop1 /soft/hadoop/etc]$ cp hadoop pesudo
// 创建符号链接
[centos@hadoop1 /soft/hadoop/etc]$ ln -s /soft/hadoop/etc/ha hadoop
[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--- 配置新的本地目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/centos/hadoop</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>20</value>
</property>
<property>
<name>ipc.client.connect.retry.interval</name>
<value>5000</value>
</property>
</configuration>
[hdfs-site.xml]
<configuration>
<!-- 配置nameservice -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- myucluster下的名称节点两个id -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- 配置每个nn的rpc地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop2:8020</value>
</property>
<!-- 配置webui端口 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop2:50070</value>
</property>
<!-- 名称节点共享编辑目录 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop3:8485;hadoop4:8485;hadoop5:8485;hadoop6:8485;hadoop7:8485;hadoop8:8485/mycluster</value>
</property>
<!-- java类,client使用它判断哪个节点是激活态 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 脚本列表或者java类,在容灾情况下保护激活态的nn -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/centos/.ssh/id_rsa</value>
</property>
<!-- 配置JN存放edit的本地路径 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/centos/hadoop/journal</value>
</property>
</configuration>
[mapred-site.xml]
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[yarn-site.xml]
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
// 配置 DataNode 从属节点
[centos@hadoop1 /soft/hadoop/etc/ha]$ nano slaves
添加如下:
hadoop3
hadoop4
hadoop5
hadoop6
hadoop7
hadoop8
// 修改hadoop-env.sh中的一个细节
[centos@hadoop1 /soft/hadoop/etc/ha]$ nano hadoop-env.sh
将 export JAVA_HOME={JAVA_HOME} 改为 export JAVA_HOME=/soft/jdk
// 批分发
[centos@hadoop1 /soft]$ xsync hadoop-2.7.6
// 将符号链接也分发给其他主机节点
[centos@hadoop1 /soft]$ rsync -lr hadoop centos:/soft/
注意: /etc/profile 配置文件还需每台主机单独配置,配置内容与hadoop1一样
// 环境变量
# hadoop
export HADOOP_HOME=/soft/hadoop
export PATH=$PATH:$HADOOP_HOME/bin/:$HADOOP_HOME/sbin
$>hadoop-daemon.sh start journalnode
// hadoop1 节点
[centos@hadoop1 /home/centos]$ hadoop namenode -format
// hadoop2 节点
[centos@hadoop2 /home/centos]$ hadoop namenode -format
[centos@hadoop1 /home/centos]$ hdfs namenode -initializeSharedEdits
[hadoop1]
$>hadoop-daemon.sh start namenode //启动名称节点
$>hadoop-daemons.sh start datanode //启动所有数据节点
[hadoop2]
$>hadoop-daemon.sh start namenode //启动名称节点
HA 手动管理命令:
$>hdfs haadmin -transitionToActive nn1 //切成激活态
$>hdfs haadmin -transitionToStandby nn1 //切成待命态
$>hdfs haadmin -transitionToActive --forceactive nn2//强行激活
$>hdfs haadmin -failover nn1 nn2 //模拟容灾演示,从nn1切换到nn2
注意!!!
zookeeper集群节点为:hadoop3,hadoop4,hadoop5。先在hadoop3上进行安装配置,然后分发给hadoop4和hadoop5节点。
// 解压
[centos@hadoop3 /home/centos/localsoft]$ tar -xzvf zookeeper-3.4.12.tar.gz -C /soft/
// 创建符号链接
[centos@hadoop3 /soft]$ln -s /soft/zookeeper-3.4.12 zk
[centos@hadoop3 /home/centos]$sudo nano /etc/profile
//导入环境变量
export ZK_HOME=/soft/zk
export PATH=$PATH:$ZK_HOME/bin
// 复制
[centos@hadoop3 /soft/zk/conf]$cp zoo_sample.cfg zoo.cfg
// 配置
[centos@hadoop3 /soft/zk/conf]$nano zoo.cfg
// 配置如下:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/centos/zookeeper //配置临时文件路径
clientPort=2181
server.1=hadoop3:2888:3888
server.2=hadoop4:2888:3888
server.3=hadoop5:2888:3888
// 分发给hadoop4
[centos@hadoop3 /soft]$scp -r zookeeper-3.4.12 centos@hadoop4:/soft/
// 分发给hadoop5
[centos@hadoop3 /soft]$scp -r zookeeper-3.4.12 centos@hadoop5:/soft/
// 分发符号引用给hadoop4
[centos@hadoop3 /soft]$rsync -lr zk centos@hadoop4:/soft/
// 分发符号引用给hadoop5
[centos@hadoop3 /soft]$rsync -lr zk centos@hadoop5:/soft/
//导入环境变量
export ZK_HOME=/soft/zk
export PATH=$PATH:$ZK_HOME/bin
[hadoop3]
[centos@hadoop3 /home/centos]$ echo 1 > /home/centos/zookeeper/myid
[hadoop4]
[centos@hadoop4 /home/centos]$ echo 2 > /home/centos/zookeeper/myid
[hadoop5]
[centos@hadoop5 /home/centos]$ echo 3 > /home/centos/zookeeper/myid
//启动服务器
$> zkServer.sh start
//关闭
$> zkServer.sh stop
//查看状态
$>zkServer.sh status
zookeeper基本命令:
$>zkCli.sh -server hadoop3:2181 //进入zk命令行
$zk]help //查看帮助
$zk]quit //退出
$zk]create /a tom //
$zk]get /a //查看数据
$zk]ls / //列出节点
$zk]set /a tom //设置数据
$zk]delete /a //删除一个节点
$zk]rmr /a //递归删除所有节点。
[centos@hadoop1 /home/centos]$
// 删除所有节点的日志
[centos@hadoop1 /home/centos]$ xcall.sh "rm -rf /soft/hadoop/logs/*"
// 删除所有节点的本地数据
[centos@hadoop1 /home/centos]$ xcall.sh "rm -rf /home/centos/hadoop/*"
$> hadoop-daemon.sh start journalnode
[centos@hadoop1 /home/centos]$ hadoop namenode -format
[centos@hadoop1 /home/centos]$ scp -r ~/hadoop/* centos@hadoop2:/home/centos/hadoop
在未格式化的NN(hadoop2)节点上做standby引导
[centos@hadoop1 /home/centos]$ hadoop-daemon.sh start namenode
[centos@hadoop2 /home/centos]$ hdfs namenode -bootstrapStandby
[centos@hadoop1 /home/centos]$ hdfs namenode -initializeSharedEdits
注意: 假如出现被锁住无法引导的情况,需要删除 /home/centos/hadoop/dfs/name 下的 in_use.lock 文件
启动所有数据节点
[centos@hadoop1 /home/centos]$ hadoop-daemons.sh start datanode
[centos@hadoop1 /home/centos]$ hadoop-daemon.sh start namenode
[centos@hadoop1 /home/centos]$ stop-all.sh
//配置core-site.xml,指定zk的连接地址
[core-site.xml]
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop3:2181,hadoop4:2181,hadoop5:2181</value>
</property>
//配置hdfs-site.xml,启用自动容灾
[hdfs-site.xml]
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
//分发文件到其他主机
[centos@hadoop1 /soft/hadoop/etc/ha]$ xsync.sh hdfs-site.xml
[centos@hadoop1 /soft/hadoop/etc/ha]$ xsync.sh core-site.xml
$> zkServer.sh start
[centos@hadoop1 /home/centos]$ hdfs zkfc -formatZK
// 进入zk客户端
[centos@hadoop3 /home/centos]$ zkCli.sh
[centos@hadoop1 /home/centos]$ start-dfs.sh
yarn的高可用配置相对简单
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop3:2181,hadoop4:2181,hadoop5:2181</value>
</property>
[hadoop1]
[centos@hadoop1 /home/centos]$ start-yarn.sh
[hadoop2]
[centos@hadoop2 /home/centos]$ yarn-daemon.sh start resourcemanager
//管理命令
//查看状态
$>yarn rmadmin -getServiceState rm1
//切换状态到standby
$>yarn rmadmin -transitionToStandby rm1
hadoop1:8088
hadoop2:8088
5台主机节点:hadoop3,hadoop4,hadoop5,hadoop6,hadoop7。hadoop3是master节点,其余节点为slave节点。
hbase是具有从属关系的集群,所以hadoop3需要能够免密登录到其他四个节点主机,需要进行ssh免密登录配置
// 生成秘钥对
[centos@hadoop3 /home/centos]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
// 重命名
[centos@hadoop3 /home/centos/.ssh]$ mv id_rsa.pub id_rsa_hadoop3.pub
// 添加公钥到authorized_keys
[centos@hadoop3 /home/centos/.ssh]$ cat id_rsa_hadoop3.pub >> authorized_keys
[centos@hadoop3 /home/centos/.ssh]$scp id_rsa_hadoop3.pub centos@hadoop4:~/.ssh/
[centos@hadoop3 /home/centos/.ssh]$scp id_rsa_hadoop3.pub centos@hadoop5:~/.ssh/
[centos@hadoop3 /home/centos/.ssh]$scp id_rsa_hadoop3.pub centos@hadoop6:~/.ssh/
[centos@hadoop3 /home/centos/.ssh]$scp id_rsa_hadoop3.pub centos@hadoop7:~/.ssh/
// 添加公钥到authorized_keys中
$> cat id_rsa_hadoop3.pub >> authorized_keys
[centos@hadoop3 /home/centos]$ tar -xzvf hbase-1.2.6-bin.tar.gz -C /soft/
// 创建符号链接
[centos@hadoop3 /soft]$ln -s /soft/hbase-1.2.6 hbase
[centos@hadoop3 /home/centos]$ sudo nano /etc/profile
导入环境变量:
export HBASE_HOME=/soft/hbase
export PATH=$PATH:$HBASE_HOME/bin
// hadoop version
HBase 1.2.6
Source code repository file:///home/busbey/projects/hbase/hbase-assembly/target/hbase-1.2.6 revision=Unknown
Compiled by busbey on Mon May 29 02:25:32 CDT 2017
From source with checksum 7e8ce83a648e252758e9dae1fbe779c9
// 配置hbase-env.sh
[centos@hadoop3 /soft/hbase/conf]$ nano hbase-env.sh
//找到如下配置进行修改
export JAVA_HOME=/soft/jdk
export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/home/centos/hbase/pids
export HBASE_CLASSPATH=$HBASE_CLASSPATH:/soft/hadoop/etc/hadoop
// 配置 regionservers
[centos@hadoop3 /soft/hbase/conf]$ nano regionservers
hadoop4
hadoop5
hadoop6
hadoop7
[centos@hadoop3 /soft/hbase/conf]$nano hbase-site.xml
[hbase-site.xml]
<!-- 使用完全分布式 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定hbase数据在hdfs上的存放路径 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://mycluster/hbase</value>
</property>
<!-- 配置zk地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop3:2181,hadoop4:2181,hadoop5:2181</value>
</property>
<!-- zk的本地目录 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/centos/zookeeper</value>
</property>
[centos@hadoop3 /soft/hbase/conf]$ ln -s /soft/hadoop/etc/hadoop/hdfs-site.xml /soft/hbase/conf/hdfs-site.xml
// 解压文件
[centos@hadoop3 /soft]$ scp -r hbase-1.2.6 centos@hadoop4:/soft/
[centos@hadoop3 /soft]$ scp -r hbase-1.2.6 centos@hadoop5:/soft/
[centos@hadoop3 /soft]$ scp -r hbase-1.2.6 centos@hadoop6:/soft/
[centos@hadoop3 /soft]$ scp -r hbase-1.2.6 centos@hadoop7:/soft/
// 分发符号链接
[centos@hadoop3 /soft]$ rsync -lr hbase centos@hadoop4:/soft/
[centos@hadoop3 /soft]$ rsync -lr hbase centos@hadoop5:/soft/
[centos@hadoop3 /soft]$ rsync -lr hbase centos@hadoop6:/soft/
[centos@hadoop3 /soft]$ rsync -lr hbase centos@hadoop7:/soft/
// 在这4台主机节点的 /etc/profile 中配置环境变量,source /etc/profile 立即生效
export HBASE_HOME=/soft/hbase
export PATH=$PATH:$HBASE_HOME/bin
注意: 在启动hbase集群前,应该保证hadoop集群启动,并且NameNode节点为激活状态。否则会报异常:Operation category READ is not supported in state standby
[centos@hadoop3 /home/centos]$ start-hbase.sh
hbase基本命令
//启动HBase集群:
$> start-hbase.sh
//单独启动一个HMaster进程:
$> hbase-daemon.sh start master
//单独停止一个HMaster进程:
$> hbase-daemon.sh stop master
//单独启动一个HRegionServer进程:
$> hbase-daemon.sh start regionserver
//单独停止一个HRegionServer进程:
$> hbase-daemon.sh stop regionserver
//进入hbase shell
$> hbase shell
在hadoop6,hadoop7,hadoop8三台主机上构建kafka集群
// 解压
[centos@hadoop6 /home/centos/localsoft]$tar -xzvf kafka_2.11-2.0.0.tgz -C /soft/
// 创建符号链接
[centos@hadoop6 /soft]$ln -s /soft/kafka_2.11-2.0.0 kafka
[centos@hadoop6 /soft]$sudo nano /etc/profile
导入环境变量
export KAFKA_HOME=/soft/kafka
export PATH=$PATH:$KAFKA_HOME/bin
[centos@hadoop6 /soft/kafka/config]$nano server.properties
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=6 // hadoop6,hadoop7,hadoop8的broker_id分别为 6,7,8
listeners=PLAINTEXT://:9092
# A comma separated list of directories under which to store log files
log.dirs=/home/centos/kafka/logs
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=hadoop6:2181,hadoop7:2181,hadoop8:2181
// 发送解压文件
[centos@hadoop6 /soft]$scp -r kafka_2.11-2.0.0 centos@hadoop7:/soft/
[centos@hadoop6 /soft]$scp -r kafka_2.11-2.0.0 centos@hadoop8:/soft/
// 发送符号链接
[centos@hadoop6 /soft]$rsync -lr kafka centos@hadoop7:/soft/
[centos@hadoop6 /soft]$rsync -lr kafka centos@hadoop8:/soft/
导入环境变量
export KAFKA_HOME=/soft/kafka
export PATH=$PATH:$KAFKA_HOME/bin
分别修改broker_id
启动kafka服务器
先启动ZooKeeper:zkServer.sh start
启动kafka集群:hadoop3,hadoop4,hadoop5
// 守护进程
[centos@hadoop6 /home/centos]$ kafka-server-start.sh /soft/kafka/config/server.properties &
[centos@hadoop6 /home/centos]$ netstat -anop | grep 9092
[centos@hadoop6 /home/centos]$ kafka-server-start.sh /soft/kafka/config/server.properties &
// 查看主题列表
[centos@hadoop6 /home/centos]$ kafka-topics.sh --list --zookeeper hadoop3:2181
// 创建主题:first
[centos@hadoop6 /home/centos]$ kafka-topics.sh --create --zookeeper hadoop3:2181 --replication-factor 3 --partitions 3 --topic first
// 删除主题
[centos@hadoop6 /home/centos]$ kafka-topics.sh --delete --zookeeper hadoop3:2181
参数说明:
--zookeeper:指定ZooKeeper主机
--topic:定义topic的名字
--replication-factor:定义副本数
--partitions:定义分区数
注意:副本数量不能大于broker的数量(kafka集群主机的数量)
生产者生产消息:
[centos@hadoop6 /home/centos]$ kafka-console-producer.sh --broker-list hadoop6:9092 --topic first
消费者消费消息:
[centos@hadoop6 /home/centos]$ kafka-console-consumer.sh --bootstrap-server hadoop6:9092 --from-beginning --topic first
--from-beginning:从头开始消费,即会将生产者生产的所有消息都消费掉
在hadoop3,hadoop4,hadoop5,hadoop6,hadoop7 五台主机上构建storm集群
// 解压
[centos@hadoop3 /home/centos/localsoft]$ tar -xzvf apache-storm-1.1.3.tar.gz -C /soft/
// 创建符号链接
[centos@hadoop3 /soft]$ ln -s /soft/apache-storm-1.1.3 storm
[centos@hadoop3 /soft]$ sudo nano /etc/profile
导入环境变量
export STORM_HOME=/soft/storm
export PATH=$PATH:$STORM_HOME/bin
[centos@hadoop3 /soft/storm/conf]$ nano storm.yaml
storm.zookeeper.servers:
- "hadoop3"
- "hadoop4"
- "hadoop5"
nimbus.seeds: ["hadoop3"]
storm.local.dir: "/home/centos/storm"
storm.zookeeper.port: 2181
ui.host: 0.0.0.0
ui.port: 8080
supervisor.slots/ports:
- 6700
- 6701
- 6702
- 6703
// 分发解压文件
[centos@hadoop3 /soft]$scp -r apache-storm-1.1.3 centos@hadoop4:/soft/
[centos@hadoop3 /soft]$scp -r apache-storm-1.1.3 centos@hadoop5:/soft/
[centos@hadoop3 /soft]$scp -r apache-storm-1.1.3 centos@hadoop6:/soft/
[centos@hadoop3 /soft]$scp -r apache-storm-1.1.3 centos@hadoop7:/soft/
// 分发符号链接
[centos@hadoop3 /soft]$rsync -lr storm centos@hadoop4:/soft/
[centos@hadoop3 /soft]$rsync -lr storm centos@hadoop5:/soft/
[centos@hadoop3 /soft]$rsync -lr storm centos@hadoop6:/soft/
[centos@hadoop3 /soft]$rsync -lr storm centos@hadoop7:/soft/
导入环境变量
export STORM_HOME=/soft/storm
export PATH=$PATH:$STORM_HOME/bin
启动集群
启动ZooKeeper集群:zkServer.sh start(hadoop3,hadoop4,hadoop5)
启动主机hadoop3的 nimbus 进程
[centos@hadoop3 /home/centos]$ storm nimbus &
[centos@hadoop3 /home/centos]$ storm supervisor &
[centos@hadoop3 /home/centos]$ storm ui &
在webui中查看:hadoop3:8080
在8个节点上配置spark集群,其中hadoop1是master节点,hadoop2~ hadoop8是worker节点
// 解压
[centos@hadoop1 /home/centos/localsoft]$tar -xzvf spark-2.3.1-bin-hadoop2.7.tgz -C /soft/
// 创建符号链接
[centos@hadoop1 /soft]$ln -s /soft/spark-2.3.1-bin-hadoop2.7 spark
[centos@hadoop1 /home/centos]$sudo nano /etc/profile
# spark
export SPARK_HOME=/soft/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
// 在slaves文件中写入其他7台主机节点
hadoop3
hadoop4
hadoop5
hadoop6
hadoop7
hadoop8
// 在spark-env.sh中写入
SPARK_MASTER_HOST=hadoop1
SPARK_MASTER_PORT=7077
// 在 spark-env.sh 中添加如下配置
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://hadoop1:8020/directory"
// 在spark-defaults.conf中修改
spark.master spark://hadoop1:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop1:8020/directory
// 分发解压文件
[centos@hadoop1 /soft]$xsync.sh spark-2.3.1-bin-hadoop2.7
// 分发符号链接
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop2:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop3:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop4:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop5:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop6:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop7:/soft/
[centos@hadoop1 /soft]$rsync -lr spark centos@hadoop8:/soft/
环境变量
export SPARK_HOME=/soft/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[centos@hadoop1 /soft/spark/sbin]$./start-all.sh
// 进入spark shell
[centos@hadoop1 /soft/spark/sbin]$ ./spark-shell
// 运行
scala> sc.textFile("hdfs://hadoop1:8020/wc.txt").flatMap(_.split(" "))
.map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://hadoop1:8020/out")
停掉spark集群:./spark-stop.sh
修改 spark-env.sh 文件
// 添加如下配置
#ZK HA
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop3:2181,hadoop3:2181,hadoop4:2181 -Dspark.deploy.zookeeper.dir=/spark"
MASTER=spark://hadoop1:7077,spark://hadoop2:7077 bin/spark-shell
[centos@hadoop1 /soft/spark/sbin]$ ./start-all.sh
[centos@hadoop2 /soft/spark/sbin]$./start-master.sh
// 需要指定所有的master节点
[centos@hadoop1 /home/centos]$spark-shell --master spark://hadoop1:7077,hadoop2:7077
常见异常请查看:https://www.cnblogs.com/arachis/p/Spark_Exception.html
大功告成!
mysql、hive、flume …
在Linux上安装MySQL一般有两种方式。第一种就是使用yum命令进行安装(比较轻量,安装包较小,但依赖多,容易出错)。第二种方式就是通过解压MySQL包安装(包比较大,但是不容易出错)。本文采用第二种方式安装配置MySQL。
下载mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz,并将其放在共享目录下(/mnt/hgfs/bigdata/soft/…)。
切换到root,卸载自带的 Mariadb
$> rpm -qa|grep mariadb // 查询出来已安装的mariadb
$> rpm -e --nodeps 文件名 // 卸载mariadb,文件名为上述命令查询出来的文件
$> rm /etc/my.cnf
//创建mysql用户组
$> groupadd mysql
//创建一个用户名为mysql的用户并加入mysql用户组
$> useradd -g mysql mysql
$> cp /mnt/hgfs/bigdata/soft/mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz /usr/local
//解压该压缩文件
$> tar -zxvf mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
$> mv 解压出来的文件夹名 mysql
[root@s201 /usr/local/mysql]# cp my-default.cnf /etc/my.cnf
[root@s201 /usr/local/mysql]# nano /etc/my.cnf
[mysql]
# 设置mysql客户端默认字符集
default-character-set=utf8
socket=/var/lib/mysql/mysql.sock
[mysqld]
skip-name-resolve
#设置3306端口
port = 3306
socket=/var/lib/mysql/mysql.sock
# 设置mysql的安装目录
basedir=/usr/local/mysql
# 设置mysql数据库的数据的存放目录
datadir=/usr/local/mysql/data
# 允许最大连接数
max_connections=200
# 服务端使用的字符集默认为8比特编码的latin1字符集
18. character-set-server=utf8
# 创建新表时将使用的默认存储引擎
default-storage-engine=INNODB
lower_case_table_name=1
max_allowed_packet=16M
user=mysql //设置用户为mysql
$> yum install -y perl
$> yum install -y perl-Module-Install.noarch
如果是centos系统,修改 /etc/selinux/config,把SELINUX=enforcing改为SELINUX=disabled,保存退出后重启机器
进入安装 mysql 软件目录,安装数据库
[root@hadoop1 ~]# cd /usr/local/mysql
[root@hadoop1 /usr/local/mysql]# chown -R mysql:mysql ./ 修改当前目录拥有着为mysql用户
[root@hadoop1 /usr/local/mysql]# ./scripts/mysql_install_db --user=mysql --basedir=/usr/local/mysql/ --datadir=/usr/local/mysql/data/
[root@hadoop1 /usr/local/mysql]# chown -R mysql:mysql data
[root@hadoop1 /usr/local/mysql]# chown 777 /etc/my.cnf
[root@hadoop1 /usr/local/mysql]# cp ./support-files/mysql.server /etc/rc.d/init.d/mysqld
[root@hadoop1 /usr/local/mysql]# chmod +x /etc/rc.d/init.d/mysqld
[root@hadoop1 /usr/local/mysql]# chkconfig --add mysqld
[root@hadoop1 /usr/local/mysql]# chkconfig --list mysqld
[root@hadoop1 /usr/local/mysql]# mkdir /var/lib/mysql
//设置权限
[root@hadoop1 /usr/local/mysql]# chmod 777 /var/lib/mysql
[root@hadoop1 /usr/local/mysql]# nano ~/.bash_profile
//在文件最后添加如下信息: 指定环境变量启动程序位置
export PATH=$PATH:/usr/local/mysql/bin
//执行下面的命令是修改的内容立即生效 :
[root@hadoop1 /usr/local/mysql]# source ~/.bash_profile
//启动mysql服务
[root@hadoop1 /usr/local/mysql]# service mysqld start
//关闭mysql服务
[root@hadoop1 /usr/local/mysql]# service mysqld stop
[root@hadoop1 /usr/local/mysql]# mysql -u root -p
注意: mysql -u root -p 命令可能会有异常:-bash: mysql: command not found。这是因为系统在默认的命令文件夹 /usr/bin 下没有找到 mysql 命令,需要在此文件夹下建一个符号链接:
[root@hadoop1 /usr/local/mysql]# ln -s /usr/local/mysql/bin/mysql /usr/bin
mysql>use mysql
mysql>update user set password=password('root') where user='root' and host='localhost';
mysql>flush privileges;
查看用户
select Host,User,Password from mysql.user;
创建用户
create user test identified by '123456';
分配权限
grant all privileges on *.* to 'test'@'%'identified by '123456' with grant option;
//匹配整个数据库权限
grant all privileges on *.* to 'root'@'%' identified by 'root';
说明: 第一个’root’是用户名,第二个’%’是所有的ip都可以远程访问,第三个’123456’表示用户密码 如果不常用就关闭掉。
刷新
flush privileges ;
修改指定用户密码
update mysql.user set password=password('新密码') where User="test" and Host="localhost";
删除用户
delete from user where User='test' and Host='localhost';
重启防火墙
firewall-cmd --reload
停止防火墙
systemctl stop firewalld.service
禁止防火墙开机启动
systemctl disable firewalld.service
删除
firewall-cmd --zone= public --remove-port=80/tcp --permanent
关闭防火墙
1) 永久性生效,重启后不会复原
开启: chkconfig iptables on
关闭: chkconfig iptables off
2) 即时生效,重启后复原
开启: service iptables start
关闭: service iptables stop
MySQL提示:The server quit without updating PID file
参考:1. https://www.cnblogs.com/wangshaojun/p/5065298.html
2. http://blog.51cto.com/fengyunshan911/2070818
解决linux mysql命令 bash: mysql: command not found 的方法
参考:https://www.cnblogs.com/jr1260/p/6590860.html
本文详细参考了 https://blog.csdn.net/u013421629/article/details/79638315 文档,在此对原作者的辛苦整理表示感谢!!!
hadoop3,hadoop4,hadoop5上安装
// 解压
[centos@hadoop3 /home/centos/localsoft]$ tar -xzvf apache-hive-2.3.3-bin.tar.gz -C /soft
// 创建符号链接
[centos@hadoop3 /soft]$ ln -s /soft/apache-hive-2.3.3-bin hive
[centos@hadoop3 /home/centos]$ sudo nano /etc/profile
导入环境变量
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
使用MySQL来存放hive的元数据
[centos@hadoop3 /home/centos/localsoft]$cp mysql-connector-java-5.1.44.jar /soft/hive/lib/
$> cp /soft/hive/conf/hive-default.xml.template /soft/hive/conf/hive-site.xml
//配置如下属性
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://59.68.29.79:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/centos/hive</value>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/centos/hive/downloads</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/home/centos/hive/querylog</value>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/centos/hive/server2_logs</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
mysql> create database hive;
[centos@hadoop3 /soft/hive/bin]$schematool -dbType mysql -initSchema
// 分发解压文件
[centos@hadoop3 /soft]$scp -r apache-hive-2.3.3-bin centos@hadoop4:/soft/
[centos@hadoop3 /soft]$scp -r apache-hive-2.3.3-bin centos@hadoop5:/soft/
// 分发符号链接
[centos@hadoop3 /soft]$rsync -lr hive centos@hadoop4:/soft/
[centos@hadoop3 /soft]$rsync -lr hive centos@hadoop5:/soft/
导入环境变量
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
hive的基本命令:
// 进入hive
[centos@hadoop3 /home/centos]$ hive
$hive>create database mydb2 ; //创建数据库 mydb2
$hive>show databases ;
$hive>use mydb2 ;
$hive>create table mydb2.t(id int,name string,age int);
$hive>drop table t ;
$hive>drop table mydb2.t ;
$hive>select * from mydb2.t ; //查看指定库的表
$hive>exit ; //退出
Flume是一个日志采集系统,在8台主机上都进行配置
// 解压
[centos@hadoop1 /home/centos/localsoft]$tar -xzvf apache-flume-1.8.0-bin.tar.gz -C /soft/
// 创建符号链接
[centos@hadoop1 /soft]$ln -s /soft/apache-flume-1.8.0-bin flume
[centos@hadoop3 /home/centos]$ sudo nano /etc/profile
导入环境变量
export FLUME_HOME=/soft/flume
export PATH=$PATH:$FLUME_HOME/bin
// 分发解压文件
[centos@hadoop1 /soft]$xsync.sh apache-flume-1.8.0-bin
// 分发符号链接
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop2:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop3:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop4:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop5:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop6:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop7:/soft/
[centos@hadoop1 /soft]$rsync -lr flume centos@hadoop8:/soft/
导入环境变量
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin