大数据集群环境搭建

文章目录

    • 安装软件包
    • 主机名及ip映射关系
    • Linux系统环境设置
    • 安装jdk8
    • 安装zookeeper
    • 安装hadoop
      • Hadoop HDFS高可用集群搭建(NameNode HA With QJM)
      • Hadoop YARN高可用集群搭建(ResourceManager HA)
    • 安装Hive
    • 安装HBase

安装软件包

linux CentOS-6.5-x86_64-minimal.iso
jdk jdk-8u192-linux-x64.tar.gz
hadoop hadoop-2.6.0-cdh5.16.1.tar.gz
zookeeper zookeeper-3.4.5-cdh5.16.1.tar.gz
hive hive-1.1.0-cdh5.16.1.tar.gz
hbase hbase-1.2.0-cdh5.16.1.tar.gz
sqoop sqoop-1.4.6-cdh5.16.1.tar.gz
flume flume-ng-1.6.0-cdh5.16.1.tar.gz
oozie oozie-4.1.0-cdh5.16.1.tar.gz
hue hue-3.9.0-cdh5.16.1.tar.gz

主机名及ip映射关系

hostname ip
master 192.168.93.7
slaver1 192.168.93.8
slaver2 192.168.93.9

Linux系统环境设置

  • 关闭并禁用防火墙(开机不自启)
# service ntpd status
# service ntpd stop
# chkconfig iptables off
  • 禁用selinux
# vi /etc/sysconfig/selinux
修改	selinux=disabled
  • 设置文件打开数量和用户最大进程数(每台都要设置)
#  vi /etc/security/limits.conf
 - soft nofile 65535
 - hard nofile 65535
 - soft nproc 32000
 - hard nproc 32000
  • 配置阿里云yum源
# 安装wget,wget是linux最常用的下载命令(有些系统默认安装,可忽略)
yum -y install wget

#备份当前的yum源
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup

#下载阿里云的yum源配置
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo

#清除原来文件缓存,构建新加入的repo结尾文件的缓存
yum clean all
yum makecache
  • 设置自动更新时间
修改本地时区及ntp服务:
yum -y install ntp ntpdate
rm -rf /etc/localtime
ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
/usr/sbin/ntpdate -u pool.ntp.org
或者同步阿里云的时间
ntpdate ntp1.aliyun.com

自动同步时间:
#设置定时任务,每10分钟同步一次,配置/etc/crontab文件,实现自动执行任务
#建议直接crontab -e  来写入定时任务即可。
*/10 * * * *  /usr/sbin/ntpdate -u pool.ntp.org >/dev/null 2>&1
#重启定时任务
service crond restart
#查看日期
date
  • 清除节点MAC地址
清除/etc/udev/rules.d/70-persistent-net.rules 文件,
这个文件记录了这台机器的MAC地址,虚拟机在第一次启动时候会在这个文件中自动生成MAC地址,
下面我们要克隆虚拟机,需要将这个文件删除,如果不删除,克隆出来的虚拟机也是这个MAC地址,
那么就会有冲突,导致新克隆的机器ip不可使用。
# cd /etc/udev/rules.d
# rm -rf 70-persistent-net.rules 
  • 保存快照

  • 克隆虚拟机节点

  • 给每台克隆的节点配置ip、主机名和映射关系

   # vi /etc/sysconfig/network-scripts/ifcfg-eth0
  主要修改以下信息:
   	->ONBOOT=yes	
   	->BOOTPROTO=static	
   	->IPADDR=192.168.93.7
   	->NETMASK=255.255.255.0
   	->GATEWAY=192.168.93.2
   	->DNS1=114.114.114.114	
   	->DNS2=8.8.8.8

   永久性的设置主机名称:# vi /etc/sysconfig/network
   NETWORKING=yes
   HOSTNAME=master
   
   配置IP映射关系:# vi /etc/hosts
   127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
   ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

   192.168.93.7	master
   192.168.93.8	slaver1
   192.168.93.9	slaver2
   
   重启虚拟机:# init 6		
  • Windows中的映射关系
C:\Windows\System32\drivers\etc\hosts:
   192.168.93.7	master
   192.168.93.8	slaver1
   192.168.93.9	slaver2

  • ssh免密登陆配置
在3台机器上都执行
   # ssh-keygen -t rsa 
然后三次回车,运行结束会在~/.ssh下生成两个新文件:
	id_rsa.pub和id_rsa就是公钥和私钥
然后也是在3台机器上都执行:
   # ssh-copy-id master;ssh-copy-id slaver1;ssh-copy-id slaver2

测试是否可以免密登录
   [root@master~]# ssh slaver1
   [root@slaver1~]# 

创建存放目录

在opt目录下创建5个文件夹(用于整个集群中的文件及数据的存放):

# mkdir -p datas modules softwares tools test
# ls
datas  modules  softwares  test  tools

安装jdk8

# tar -zxvf jdk-8u192-linux-x64.tar.gz -C /opt/modules/java

将jdk添加到环境变量

# vi + /etc/profile
在末尾添加以下内容:
# JAVA_HOME
export JAVA_HOME=/opt/tools/java/jdk1.8 
export PATH=$PATH:$JAVA_HOME/bin

使文件生效
# source /etc/profile

发送到其他节点

# scp -r java slave1:`pwd`
# scp -r java slave2:`pwd`

同样需要配置环境变量

安装zookeeper

  • 进入master节点
1.解压
	# tar -zxvf zookeeper-3.4.5-cdh5.16.1.tar -C /opt/modules/cdh/
	# mv zookeeper-3.4.5-cdh5.16.1 zookeeper-3.4.5
	
2.修改zookeeper配置文件
	# cd /opt/modules/cdh/zookeeper-3.4.5/conf/
	# cp zoo_sample.cfg zoo.cfg
	# vi + zoo.cfg
	修改:
		dataDir=/opt/datas/zookeeper/zkData
	并在末尾加入:
		server.1=master:2888:3888
		server.2=slaver1:2888:3888
		server.3=slaver2:2888:3888
		
3.在dataDir=/opt/datas/zookeeper/zkData目录下,新建myid文件(文件名必须是这个),输入1(这里的1与server.1是一一对应的)
	# cd /opt/datas/zookeeper/zkData/
	# vi myid
	1
  • 将zookeeper添加到环境变量中
# vi /etc/profile
#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/modules/cdh/zookeeper-3.4.5

export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin

# source /etc/profile
  • 将zookeeper分发到slaver1,slaver2,分别修改其myid文件值为2,3
# scp -r /opt/modules/cdh/zookeeper-3.4.5/ slaver1:`pwd`
# scp -r /opt/modules/cdh/zookeeper-3.4.5/ slaver2:`pwd`
# scp -r /opt/datas/zookeeper/zkData slaver1:`pwd`
# scp -r /opt/datas/zookeeper/zkData slaver2:`pwd`
  • 启动3台zookeeper服务
# cd /opt/modules/cdh/zookeeper-3.4.5/
# zkServer.sh start  //启动zookeeper
# zkServer.sh status  //查看zookeeper的状态
# zkCli.sh  //zookeeper客户端连接
  • 开启zookeeper集群,可以看到一台主机是leader,其他2台主机是follower,则说明zookeeper集群搭建成功

安装hadoop

Hadoop HDFS高可用集群搭建(NameNode HA With QJM)

  • 集群规划
hostname NN-1 NN-2 DN zookeeper zkfc journalnode
master
slaver1
slaver2
  • 在master节点上,配置hadoop
解压:
	# tar -zxvf hadoop-2.6.0-cdh5.16.1.tar.gz -C /opt/modules/cdh/
	# mv hadoop-2.6.0-cdh5.16.1 hadoop-2.6.0
删掉doc文件夹,减少占用空间:
	# cd /opt/modules/cdh/hadoop-2.6.0/share
	# rm -rf doc/
  • 将hadoop添加到环境变量中
# vi /etc/profile
#HADOOP_HOME
export HADOOP_HOME=/opt/modules/cdh/hadoop-2.6.0

export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# source /etc/profile

进入到$HADOOP_HOME/etc/hadoop/目录下,修改以下文件

  • hadoop-env.sh
export JAVA_HOME=/opt/tools/java/jdk1.8
  • mapred-env.sh
export JAVA_HOME=/opt/tools/java/jdk1.8
  • yarn-env.sh
export JAVA_HOME=/opt/tools/java/jdk1.8
  • core-site.xml
<configuration>
	
    <property>
        <name>fs.defaultFSname>
        <value>hdfs://myclustervalue>
    property>
    
    <property>
        <name>hadoop.tmp.dirname>
        <value>/opt/datas/hadoop/ha-hadoopvalue>
    property>
	
    <property>
        <name>hadoop.http.staticuser.username>
        <value>rootvalue>
    property>
    
    <property>
        <name>ha.zookeeper.quorumname>
        <value>master:2181,slaver1:2181,slaver2:2181value>
    property>
configuration>
  • hdfs-site.xml
<configuration>
   <property>
       <name>dfs.replicationname>
       <value>3value>
   property>	
   
   <property>
   	<name>dfs.permissions.enabledname>
   	<value>falsevalue>
   property>
   
   <property>
       <name>dfs.nameservicesname>
       <value>myclustervalue>
   property>
   
   <property>
   	<name>dfs.ha.namenodes.myclustername>
   	<value>nn1,nn2value>
   property>
   
   <property>
   	<name>dfs.namenode.rpc-address.mycluster.nn1name>
   	<value>master:8020value>
   property>
   <property>
   	<name>dfs.namenode.rpc-address.mycluster.nn2name>
   	<value>slaver1:8020value>
   property>
   
   <property>
   	<name>dfs.namenode.http-address.mycluster.nn1name>
   	<value>master:50070value>
   property>
   <property>
   	<name>dfs.namenode.http-address.mycluster.nn2name>
   	<value>slaver1:50070value>
   property>
   
   <property>
   	<name>dfs.namenode.shared.edits.dirname>
   	<value>qjournal://master:8485;slaver1:8485;slaver2:8485/myclustervalue>
   property>
   
   <property>
   	<name>dfs.journalnode.edits.dirname>
   	<value>/opt/datas/hadoop/ha-hadoop/journaldatavalue>
   property>
   
   <property>
   	<name>dfs.ha.automatic-failover.enabledname>
   	<value>truevalue>
   property>
   
   <property>
   	<name>dfs.client.failover.proxy.provider.myclustername>
   	<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
   property>	
   
   <property>
   	<name>dfs.ha.fencing.methodsname>
   	<value>
   		sshfence
   	value>
   property>
   
   <property>
   	<name>dfs.ha.fencing.ssh.private-key-filesname>
   	<value>/root/.ssh/id_rsavalue>
   property>
   
   <property>
   	<name>dfs.ha.fencing.ssh.connect-timeoutname>
   	<value>30000value>
   property>
configuration>
  • slaves
slaves1
slaves2
  • 将hadoop分发到slaver1,slaver2节点的相同目录下,并配置相应的环境变量
[root@master cdh]# scp -r hadoop-2.6.0/ slaver1:`pwd`
[root@master cdh]# scp -r hadoop-2.6.0/ slaver2:`pwd`
  • 启动集群(严格按照以下步骤依次执行)
 1.启动zookeeper 3台(出现1个leader,2个follower,启动成功)
	# zkServer.sh start
	# zkServer.sh status	
 2.启动journalnode(分别在master、slaver1、slaver2上执行)
	# cd /opt/modules/cdh/hadoop-2.6.0
	# sbin/hadoop-daemon.sh start journalnode
	# jps
   	1360 JournalNode
   	1399 Jps
   	1257 QuorumPeerMain
   出现JournalNode则表示journalnode启动成功。
   
 3.格式化namenode(只要格式化一台,另一台同步,两台都格式化,你就做错了!!如:在master节点上)
   	# bin/hdfs namenode -format
   如果在倒数4行左右的地方,出现这一句就表示成功
   INFO common.Storage: Storage directory /opt/datas/hadoop/ha-hadoop/dfs/name has been successfully formatted.

   启动namenode 
   # sbin/hadoop-daemon.sh start namenode
   # jps
   1540 NameNode
   1606 Jps
   1255 QuorumPeerMain
   1358 JournalNode
   
[root@master hadoop-2.6.0]# cat /opt/datas/hadoop/ha-hadoop/dfs/name/current/VERSION
#Fri Apr 26 18:11:14 CST 2019
namespaceID=382254375
clusterID=CID-acc0fe63-c016-421b-b10e-5d5ff2344a1d
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1527463736-192.168.93.7-1556273474794
layoutVersion=-60

   
 4.同步master元数据到slaver1中(必须先启动master节点上的namenode)
 	在slaver1节点上执行:
	# bin/hdfs namenode -bootstrapStandby
	如果出现:INFO common.Storage: Storage directory /opt/datas/hadoop/ha-hadoop/dfs/name has been successfully formatted.说明同步成功
	
[root@slaver1 hadoop-2.6.0]# cat /opt/datas/hadoop/ha-hadoop/dfs/name/current/VERSION 
#Fri Apr 26 18:23:49 CST 2019
namespaceID=382254375
clusterID=CID-acc0fe63-c016-421b-b10e-5d5ff2344a1d
cTime=0
storageType=NAME_NODE
blockpoolID=BP-1527463736-192.168.93.7-1556273474794
layoutVersion=-60

master和slaver1显示的信息一致,则namenode数据同步成功
   
 5.格式化ZKFC(在master上执行一次即可)自动故障转移
 	# bin/hdfs zkfc -formatZK
   若在倒数第4行显示:INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.则表示ZKFC格式化成功
   自动故障转移的配置其实要在zookeeper上生成一个节点 hadoop-ha
   # bin/zkCli.sh
   [zk: localhost:2181(CONNECTED) 0] ls /
   [zookeeper, hadoop-ha]
   [zk: localhost:2181(CONNECTED) 1] ls /zookeeper
   [quota]
   [zk: localhost:2181(CONNECTED) 2] ls /zookeeper/quota
   []
   [zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha
   [mycluster]
   [zk: localhost:2181(CONNECTED) 4] ls /hadoop-ha/mycluster
   [ActiveBreadCrumb, ActiveStandbyElectorLock]  
   [zk: localhost:2181(CONNECTED) 5] ls /hadoop-ha/mycluster/ActiveBreadCrumb 
   []
   [zk: localhost:2181(CONNECTED) 6] ls /hadoop-ha/mycluster/ActiveStandbyElectorLock
   []
   
 6.启动HDFS(在master上执行)
    # sbin/start-dfs.sh
    
 7.在slaver1节点上启动namenode
 	# sbin/hadoop-daemon.sh start namenode

 8.访问master:50070,slaver1:50070

Hadoop YARN高可用集群搭建(ResourceManager HA)

  • 集群规划
hostname NN-1 NN-2 DN zookeeper zkfc journalnode RM NM
master
slaver1
slaver2
  • mapred-site.xml
<configuration>
   
   <property>              
       <name>mapreduce.framework.namename>
       <value>yarnvalue>
   property>
   <property>
   	<name>mapreduce.jobhistory.addressname>
   	<value>master:10020value>
   property>
   <property>
   	<name>mapreduce.jobhistory.webapp.addressname>
   	<value>master:19888value>
   property>
configuration>
  • yarn-site.xml
<configuration>
   
   <property>
       <name>yarn.resourcemanager.ha.enabledname>
       <value>truevalue>
   property>
   
   <property>
       <name>yarn.resourcemanager.cluster-idname>
       <value>yrcvalue>
   property>
   
   <property>
       <name>yarn.resourcemanager.ha.rm-idsname>
       <value>rm1,rm2value>
   property>
   
   <property>
       <name>yarn.resourcemanager.hostname.rm1name>
   <value>slaver1value>
   property>
   <property>
       <name>yarn.resourcemanager.hostname.rm2name>
   <value>slaver2value>
   property>
   <property>
       <name>yarn.resourcemanager.webapp.address.rm1name>
       <value>slaver1:8088value>
   property>
   <property>
       <name>yarn.resourcemanager.webapp.address.rm2name>
       <value>slaver2:8088value>
   property>
   
   <property>
       <name>yarn.resourcemanager.zk-addressname>
       <value>master:2181,slaver1:2181,slaver2:2181value>
   property>

   <property>
       <name>yarn.nodemanager.aux-servicesname>
       <value>mapreduce_shufflevalue>
   property>
   <property>
       <name>yarn.log-aggregation-enablename>
       <value>truevalue>
   property>
   	<property>
       <name>yarn.log-aggregation.retain-secondsname>
       <value>106800value>
   property>
    
   <property>
      <name>yarn.resourcemanager.recovery.enabledname>
      <value>truevalue>
   property>
    
    
   <property>
      <name>yarn.resourcemanager.store.classname>
      <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
   property>
configuration>
  • 将以上修改文件分发到slaver1、slaver2节点相同目录
# scp yarn-site.xml slaver1:`pwd`
# scp mapred-site.xml slaver1:`pwd`
# scp yarn-site.xml slaver2:`pwd`
# scp mapred-site.xml slaver2:`pwd`
 
  • 启动yarn
   先启动zookepper:zkServer.sh start
   再启动hdfs:start-dfs.sh
   启动yarn:start-yarn.sh
  • 观察web 8088端口
当slaver1的ResourceManager是Active状态的时候,访问slaver2的ResourceManager会自动跳转到slaver1的web页面
测试HA的可用性
# yarn rmadmin -getServiceState rm1 ##查看rm1的状态
# yarn rmadmin -getServiceState rm2 ##查看rm2的状态

[root@slaver2 hadoop-2.6.0]# bin/yarn rmadmin -getServiceState rm1
19/04/26 20:07:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
active
[root@slaver2 hadoop-2.6.0]# bin/yarn rmadmin -getServiceState rm2
19/04/26 20:07:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
standby

安装Hive

Hive的安装方式采用Hive多用户模式,有服务端和客户端

  • 集群规划
hostname MySQL管理原数据 Hive服务端 Hive客户端
master
slaver1
slaver2

在slaver1和slaver2节点上,安装hive服务端和hive客户端
slaver1节点(服务端)上

解压
	# tar -zxvf hive-1.1.0-cdh5.16.1.tar.gz -C /opt/modules/cdh/
	# mv hive-1.1.0-cdh5.16.1/ hive-1.1.0
配置环境变量
	# vi /etc/profile
	#HIVE_HOME
	export HIVE_HOME=/opt/modules/cdh/hive-1.1.0
	# source /etc/profile
  • 修改$HIVE_HOME/conf/hive-log4j.properties
# cp hive-log4j.properties.template hive-log4j.properties
hive.log.dir=/opt/modules/cdh/hive-1.1.0/logs
  • 修改$HIVE_HOME/conf/hive-env.sh
# cp hive-env.sh.template hive-env.sh
# vi hive-env.sh
JAVA_HOME=/opt/tools/java/jdk1.8
HADOOP_HOME=/opt/modules/cdh/hadoop-2.6.0
export HIVE_CONF_DIR=/opt/modules/cdh/hive-1.1.0/conf
  • 修改$HIVE_HOME/conf/hive-site.xml
# vi hive-site.xml



<configuration>  
   <property>  
   	<name>hive.metastore.warehouse.dirname>  
   	<value>/user/hive/warehousevalue>  
   property>
   
   <property>  
   	<name>javax.jdo.option.ConnectionURLname>  
   	<value>jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF8&serverTimezone=GMTvalue>  
   property>  
   
   <property>  
   	<name>javax.jdo.option.ConnectionDriverNamename>  
   	<value>com.mysql.cj.jdbc.Drivervalue>  
   property> 
       
   <property>  
   	<name>javax.jdo.option.ConnectionUserNamename>  
   	<value>rootvalue>  
   property>  
   
   <property>  
   	<name>javax.jdo.option.ConnectionPasswordname>  
   	<value>123456value>  
   property> 
    
   <property>
   	<name>hive.cli.print.current.dbname>
   	<value>truevalue>
   property>
    
   <property>
   	<name>hive.cli.print.headername>
   	<value>truevalue>
   property>
configuration> 
  • 远程传输到slaver2(客户端)节点上,并修改$HIVE_HOME/conf/hive-site.xml
# scp -r hive-1.1.0/ slaver2:`pwd`
# vi hive-site.xml



<configuration>
   <property>
   	<name>hive.metastore.warehouse.dirname>
   	<value>/user/hive/warehousevalue>
   property>
    
   <property>
   	<name>hive.metastore.urisname>
   	<value>thrift://slaver1:9083value>
   property>
    
   <property>
   	<name>hive.cli.print.current.dbname>
   	<value>truevalue>
   property>
    
   <property>
   	<name>hive.cli.print.headername>
   	<value>truevalue>
   property>
configuration>
  • 在服务端slaver1上初始化hive
需要将mysql-connector-java-8.0.15.jar 包上传到slaver1节点的$HIVE_HOME/lib下

1)下载jar
    # wget http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.15/mysql-connector-java-8.0.15.jar
	# scp mysql-connector-java-8.0.15.jar /opt/modules/cdh/hive-1.1.0/lib/

2)初始化hive,hive2.x版本后都需要初始化(必须)
	# schematool -dbType mysql -initSchema

3)在slaver1节点中启动hive 测试(前提:启动zookeeper和hdfs、yarn)
    # zkServer.sh start
	# start-all.sh
	
4)查看namenode状态(如果配置了HA模式)
    # hdfs haadmin -getServiceState nn1
	# hdfs haadmin -getServiceState nn2

5)启动hive
   开启服务端:(slaver1节点上)
	# hive --service metastore & --> 代表后台运行
   开启客户端:(slaver2节点上)
    # hive 
hive (default)> show databases;
OK
database_name
default
Time taken: 29.177 seconds, Fetched: 1 row(s)

#创建数据库mydb
hive (default)> create database mydb;
OK
Time taken: 0.682 seconds
hive (default)> use mydb;

#创建表test 
hive (mydb)> create table if not exists test (name string comment 'String Column',age int comment 'Integer Column') row format delimited fields terminated by '\t';

#插入数据 的方式
   1) 普通插入数据
   	hive (mydb)> insert into test values ("zhangsan",18);
   2) 从本地路径(加local )中加载多条数据(多次执行会追加)
   	hive (mydb)> load data local inpath '/opt/datas/testdatas/testhive1.txt' into table test;
   覆盖以前的数据(加关键字overwrite )
   	hive (mydb)> load data local inpath '/opt/datas/testdatas/testhive1.txt' overwrite into table test;
   3) 从HDFS路径(不加local )中加载多条数据(先将数据上传到HDFS上)
   上传数据到HDFS:
   	# bin/hdfs dfs -put /opt/datas/testdatas/testhive1.txt /user/datas/testdata
   加载数据(从HDFS中):
   	hive (mydb)> load data inpath '/user/datas/testdata/testhive1.txt' into table test;
   	
#查询
hive (mydb)> select * from test;

注意:如果在客户端使用Hive,则需要在服务端启动Metastore 服务,在slaver1节点中:
hive --service metastore &
问题:
	启动hive之后,一大串的警告:Establishing SSL connection without server's identity verification is not recommended… ...
	
	这是由于MySQL库的原因,解决:使用JDBC连接MySQL服务器时设置useSSL参数:设置useSSL=false即可。
	这里注意,一般在连接数据库路径后面加上&useSSL=false即可:
		jdbc:mysql://c7node2:3306/hive?createDatabaseIfNotExist=true&useSSL=false
	但是在hive中 & 符号使用 & 来表示,即在hive-site.xml中修改配置即可:
	jdbc:mysql://master:3306/metastore?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF8&serverTimezone=GMT  


安装HBase

  • 集群规划
hostname reginservers backup-masters zookeeper
master
slaver1
slaver2
  • 在master节点上
解压:
# tar -zxvf hbase-1.2.0-cdh5.16.1.tar.gz -C /opt/modules/cdh/
# cd /opt/modules/cdh/
# mv hbase-1.2.0-cdh5.16.1/ hbase-1.2.0
# cd hbase-1.2.0/
# rm -rf doc
  • 将hbase添加到环境变量中
# vi /etc/profile

#HBASE_HOME
export HBASE_HOME=/opt/modules/cdh/hbase-1.2.0

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin

# source /etc/profile
  • hbase-env.sh
export JAVA_HOME=/opt/tools/java/jdk1.8
export HBASE_MANAGES_ZK=false
  • hbase-site.xml
<configuration>
   <property>
   	<name>hbase.tmp.dirname>
   	<value>/opt/modules/cdh/hbase-1.2.0/data/tmpvalue>
   property>
   
   <property>
   	<name>hbase.rootdirname>
   	<value>hdfs://mycluster/hbasevalue>
   property>
   
   <property>
   	<name>hbase.cluster.distributedname>
   	<value>truevalue>
   property>
   
   <property>
   	<name>hbase.zookeeper.quorumname>
   	<value>master,slaver1,slaver2value>
   property>
configuration>
  • regionservers
# vi regionservers
slaver1
slaver2
  • 备用节点(至少一台,可以多台,一般不要在备用节点上启动hbase,因此我可以在master和slaver1节点上启动hbase)
# vi backup-masters	##手动创建此文件,名字必须是这个
slaver2
  • 拷贝$HADOOP_HOME/etc/hadoop/hdfs-site.xml到$HBASE_HOME/conf/
# scp /opt/modules/cdh/hadoop-2.6.0/etc/hadoop/hdfs-site.xml /opt/modules/cdh/hbase-1.2.0/conf/
  • 远程分发到slaver1和slaver2上,并配置对应的环境变量
# scp -r hbase-1.2.0/ slaver1:`pwd`
# scp -r hbase-1.2.0/ slaver2:`pwd`
  • 启动hbase(在master节点上,需要先启动zookeeper和hdfs)
# zkServer.sh start
# start-dfs.sh
# start-hbase.sh 
# jps
2913 JournalNode
1063 QuorumPeerMain
3690 NameNode
5867 HMaster
3755 DataNode
3484 NodeManager
5998 Jps 
  • 测试集群
1、查看webui
  master的界面
   	master:60010  或  slaver2:60010
   	
  reginserver的界面
   	slaver1:60030  或  slaver1:60030
   	
2、hbase启动之后出现的两个变化
   1)hdfs上的变化
   	/hbase/data  --存放的是hbase表的储存目录
   	/hbase/data/default --是hbase默认的数据库,当我们创建表不命名空间的时候就会创建在这里面
   	/hbase/data/hbase  自带的系统表
   		meta表:元数据
   		namespace:命名空间的信息
   		
   2)zookeeper上,会多一个hbase
   # bin/zkCli.sh
   [zk: localhost:2181(CONNECTED) 0] ls /
   [zookeeper, yarn-leader-election, hadoop-ha, hbase, rmstore]
   [zk: localhost:2181(CONNECTED) 1] ls /hbase
   [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, master-maintenance, region-in-transition, online-snapshot, master, switch, running, recovering-regions, draining, namespace, hbaseid, table]    
   [zk: localhost:2181(CONNECTED) 2] ls /hbase/rs
   [slaver2,60020,1556291770042, slaver1,60020,1556291817162]
3、kill -9 HMster的进程号,查看slaver2:60010页面
   再重启master观察
	# hbase-daemon.sh start master

你可能感兴趣的:(大数据)