软件环境:
linux系统: CentOS6.7
Hadoop版本: 2.6.5
zookeeper版本: 3.4.8
主机配置:
一共m1, m2, m3这五部机, 每部主机的用户名都为centos
192.168.179.201: m1
192.168.179.202: m2
192.168.179.203: m3
m1: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Master, Worker
m2: Zookeeper, Namenode, DataNode, ResourceManager, NodeManager, Worker
m3: Zookeeper, DataNode, NodeManager, Worker
集群搭建:
一.搭建基本功能的Hive(注:Hive只在一个节点上安装即可)
1.下载Hive2.1.1安装包
http://www.apache.org/dyn/closer.cgi/hive/
2.解压
tar -zxvf hive-0.9.0.tar.gz -C /home/hadoop/soft
3.配置环境变量
vi /etc/profile
# Hive
export HIVE_HOME=/home/centos/soft/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=$PATH:$HIVE_HOMW/bin
source /etc/profile
4.配置MySQL(注:切换到root用户)
- 卸载CentOS自带的MySQL
rpm -qa | grep mysql
rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps
yum -y install mysql-server
- 初始化MySQL
(1) 修改mysql的密码(root
权限执行)
cd /usr/bin
./mysql_secure_installation
(2) 输入当前MySQL数据库的密码为root
, 初始时root是没有密码的, 所以直接回车
Enter current password for root (enter for none):
(3) 设置MySQL中root
用户的密码(应与下面Hive配置一致,下面设置为123456
)
Set root password? [Y/n] Y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
... Success!
(4)删除匿名用户
Remove anonymous users? [Y/n] Y
... Success!
(5)是否不允许用户远程连接,选择N
Disallow root login remotely? [Y/n] N
... Success!
(6)删除test数据库
Remove test database and access to it? [Y/n] Y
- Dropping test database...
... Success!
- Removing privileges on test database...
... Success!
(7)重装
Reload privilege tables now? [Y/n] Y
... Success!
(8)完成
All done! If you've completed all of the above steps, your MySQL
installation should now be secure.
Thanks for using MySQL!
(9)登陆mysql
mysql -uroot -p
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;
FLUSH PRIVILEGES;
exit;
至此MySQL配置完成
5.配置Hive
1.将hive-env.sh.template
文件复制为hive-env.sh
, 编辑hive-env.xml
文件
JAVA_HOME=/home/centos/soft/jdk
HADOOP_HOME=/home/centos/soft/hadoop
HIVE_HOME=/home/centos/soft/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib
export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"
2.配置hive-site.xml
文件, 将hive-default.xml.template
文件拷贝为hive-default.xml
, 并编辑hive-site.xml
文件(删除所有内容,只留一个
)
配置项参考:
hive.server2.thrift.port– TCP 的监听端口,默认为10000。
hive.server2.thrift.bind.host– TCP绑定的主机,默认为localhost
hive.server2.thrift.min.worker.threads– 最小工作线程数,默认为5。
hive.server2.thrift.max.worker.threads – 最小工作线程数,默认为500。
hive.server2.transport.mode – 默认值为binary(TCP),可选值HTTP。
hive.server2.thrift.http.port– HTTP的监听端口,默认值为10001。
hive.server2.thrift.http.path – 服务的端点名称,默认为 cliservice。
hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程,默认为5。
hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程,默认为500。
javax.jdo.option.ConnectionURL
jdbc:mysql://m1:3306/hive?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
123
password to use against metastore database
datanucleus.autoCreateSchema
true
datanucleus.autoCreateTables
true
datanucleus.autoCreateColumns
true
hive.metastore.warehouse.dir
/hive
location of default database for the warehouse
hive.downloaded.resources.dir
/home/centos/soft/hive/tmp/resources
Temporary local directory for added resources in the remote file system.
hive.exec.dynamic.partition
true
hive.exec.dynamic.partition.mode
nonstrict
hive.exec.local.scratchdir
/home/centos/soft/hive/tmp/HiveJobsLog
Local scratch space for Hive jobs
hive.downloaded.resources.dir
/home/centos/soft/hive/tmp/ResourcesLog
Temporary local directory for added resources in the remote file system.
hive.querylog.location
/home/centos/soft/hive/tmp/HiveRunLog
Location of Hive run time structured log file
hive.server2.logging.operation.log.location
/home/centos/soft/hive/tmp/OpertitionLog
Top level directory where operation tmp are stored if logging functionality is enabled
hive.hwi.war.file
/home/centos/soft/hive/lib/hive-hwi-2.1.1.jar
This sets the path to the HWI war file, relative to ${HIVE_HOME}.
hive.hwi.listen.host
m1
This is the host address the Hive Web Interface will listen on
hive.hwi.listen.port
9999
This is the port the Hive Web Interface will listen on
hive.server2.thrift.bind.host
m1
hive.server2.thrift.port
10000
hive.server2.thrift.http.port
10001
hive.server2.thrift.http.path
cliservice
hive.server2.webui.host
m1
hive.server2.webui.port
10002
hive.scratch.dir.permission
755
hive.aux.jars.path
file:///home/centos/soft/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
hive.server2.enable.doAs
false
hive.auto.convert.join
false
spark.dynamicAllocation.enabled
true
动态分配资源
spark.driver.extraJavaOptions
-XX:PermSize=128M -XX:MaxPermSize=512M
3.配置日志地址, 修改hive-log4j.properties
文件
cp hive-log4j.properties.template hive-log4j.properties
vi hive-log4j.properties
hive.log.dir=/home/centos/soft/hive/tmp ## 将hive.log日志的位置改为${HIVE_HOME}/tmp目录下
mkdir ${HIVE_HOME}/tmp
4.配置$HIVE_HOME/conf/hive-config.sh
文件
## 增加以下三行
export JAVA_HOME=/home/centos/soft/java
export HIVE_HOME=/home/centos/soft/hive
export HADOOP_HOME=/home/centos/soft/hadoop
## 修改下列该行
HIVE_CONF_DIR=$HIVE_HOME/conf
6.将JDBC的jar包放入$HIVE_HOME/lib
目录下
cp /home/centos/soft/tar.gz/mysql-connector-java-5.1.6-bin.jar /home/centos/soft/hive/lib/
7.拷贝jline扩展包
将$HIVE_HOME/lib
目录下的jline-2.12.jar
包拷贝到$HADOOP_HOME/share/hadoop/yarn/lib
目录下,并删除$HADOOP_HOME/share/hadoop/yarn/lib
目录下旧版本的jline
包
8.复制$JAVA_HOME/lib
目录下的tools.jar
到$HIVE_HOME/lib
下
cp $JAVA_HOME/lib/tools.jar ${HIVE_HOME}/lib
9.执行初始化Hive
操作
选用MySQLysql
和Derby
二者之一为元数据库
注意:先查看MySQL
中是否有残留的Hive
元数据,若有,需先删除
schematool -dbType mysql -initSchema ## MySQL作为元数据库
其中mysql
表示用mysql
做为存储hive
元数据的数据库, 若不用mysql
做为元数据库, 则执行
schematool -dbType derby -initSchema ## Derby作为元数据库
脚本hive-schema-1.2.1.mysql.sql
会在配置的Hive
元数据库中初始化创建表
10.启动Metastore服务
执行Hive
前, 须先启动metastore
服务, 否则会报错
./hive --service metastore
然后打开另一个终端窗口,之后再启动Hive
进程
11.测试
hive
show databases;
show tables;
create table book (id bigint, name string) row format delimited fields terminated by '\t';
select * from book;
select count(*) from book;