安装Apache Hive前提是要先安装hadoop集群,并且hive只需要在hadoop的namenode节点集群里安装即可:需要在namenode上安装,可以不在datanode节点的机器上安装。
还需要说明的是,虽然修改配置文件并不需要把hadoop运行起来,但是本文中用到了hadoop的hdfs命令,在执行这些命令时你必须确保hadoop是正在运行着的,而且启动hive的前提也需要hadoop在正常运行着,所以建议先把hadoop集群启动起来。
本次安装的软件版本罗列如下:
有关如何在CentOS7.3.x 上安装hadoop集群请参考我的博客: CentOS7.3.x + Hadoop 2.9.0 集群搭建实战
下载地址:http://hive.apache.org/downloads.html
点击下图中的某个下载地址,优先选择国内源, 本次安装下载的上2.3.2版本,下载地址如下:
http://ftp.cuhk.edu.hk/pub/packages/apache.org/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz
2.安装Apache Hive
把 apache-hive-2.3.2-bin.tar.gz 下载到Hadoop NameNode主机上,并解压到 /opt目录下。
# cp apache-hive-2.3.2-bin.tar.gz /opt
# cd /opt ; tar zxvf apache-hive-2.3.2-bin.tar.gz
# vim /etc/profile
#在文件结尾添加内容如下:
export HIVE_HOME=/opt/apache-hive-2.3.2-bin/
export PATH=$PATH:$HIVE_HOME/bin
# . /etc/profile
进入目录$HIVE_HOME/conf,将hive-default.xml.template文件复制一份并改名为hive-site.xml
cd $HIVE_HOME/conf ; cp hive-default.xml.template hive-site.xml
在hive-site.xml中设置有如下配置,你自己在你的环境里修改为别的目录也可以。
hive.metastore.warehouse.dir
/data/hive/warehouse
location of default database for the warehouse
hive.exec.scratchdir
/data/hive/tmp
HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/ is created, with ${hive.scratch.dir.permission}.
hive.druid.broker.address.default
10.70.27.12:8082
Address of the Druid broker. If we are querying Druid from Hive, this address needs to be
declared
hive.druid.coordinator.address.default
10.70.27.8:8081
Address of the Druid coordinator. It is used to check the load status of newly created segments
#新建目录/data/hive/warehouse
# $HADOOP_HOME/bin/hdfs dfs -mkdir -p /data/hive/warehouse
#给新建的目录赋予读写权限
# $HADOOP_HOME/bin/hdfs dfs -chmod 777 /data/hive/warehouse
#查看修改后的权限
# $HADOOP_HOME/bin/hdfs dfs -ls /data/hive
Found 1 items
drwxrwxrwx - root supergroup 0 2018-03-19 20:25 /data/hive/warehouse
#运用hadoop命令新建/data/hive/tmp目录
# $HADOOP_HOME/bin/hdfs dfs -mkdir -p /data/hive/tmp
#给目录/tmp/hive赋予读写权限
# $HADOOP_HOME/bin/hdfs dfs -chmod 777 /data/hive/tmp
#检查创建好的目录
# $HADOOP_HOME/bin/hdfs dfs -ls /data/hive/
Found 2 items
drwxrwxrwx - root supergroup 0 2018-03-19 20:32 /data/hive/tmp
drwxrwxrwx - root supergroup 0 2018-03-19 20:25 /data/hive/warehouse
[root@apollo conf]# cd $HIVE_HOME
[root@apollo hive]# mkdir tmp
CentOS7.0安装mysql请参考:CentOS7 rpm包安装mysql5.7,本文不再累述。
到下面的官方网站上去下载mysql connector:
https://dev.mysql.com/downloads/connector/j/
本文选择的是mysql-connector-java-5.1.46.tar.gz,然后按如下步骤把它copy到hive系统中
# tar zxvf mysql-connector-java-5.1.46.tar.gz
# cd mysql-connector-java-5.1.46; cp mysql-connector-java-5.1.46.jar $HIVE_HOME/lib
搜索 javax.jdo.option.ConnectionURL, 将该name对应的value修改为MySQL的地址:
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://10.70.27.12:3306/hive?createDatabaseIfNotExist=truevalue>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
description>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
<description>Driver class name for a JDBC metastoredescription>
property>
<property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>hivevalue>
<description>Username to use against metastore databasedescription>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>hive888value>
<description>password to use against metastore databasedescription>
property>
<property>
<name>hive.metastore.schema.verificationname>
<value>falsevalue>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
description>
property>
# cd $HIVE_HOME/conf
#将hive-env.sh.template 复制一份并重命名为hive-env.sh
# cp hive-env.sh.template hive-env.sh
#打开hive-env.sh并添加如下内容
# vim hive-env.sh
export HADOOP_HOME=/opt/hadoop-2.9.0
export HIVE_CONF_DIR=/opt/apache-hive-2.3.2-bin/conf
export HIVE_AUX_JARS_PATH=/opt/apache-hive-2.3.2-bin/lib
首先用root登陆mysql去授权和建库。登陆后执行下面的命令。
create user 'hive'@'%' identified by 'hive888';
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
GRANT ALL ON hive.* TO 'hive'@'%';
flush privileges;
quit;
然后进入$HIVE/bin
# cd $HIVE_HOME/bin
#对数据库进行初始化:
# schematool -initSchema -dbType mysql
mysql> use hive;
Database changed
mysql> show tables;
.....
hive>
hive>show functions;
OK
...
hive> describe database bigtreetrial;
OK
bigtreetrial hdfs://hadoopServer3:9000/data/hive/warehouse/bigtreetrial.db root USER
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive>
hive> desc function sum;
OK
sum(x) - Returns the sum of a set of numbers
Time taken: 0.008 seconds, Fetched: 1 row(s)
#新建数据库
hive> create database bigtreeTrial;
#新建数据表
hive> use bigtreeTrial;
hive> create table student(id int, name string) row format delimited fields terminated by '\t';
hive> desc student;
OK
id int
name string
Time taken: 0.114 seconds, Fetched: 2 row(s)
hive> select * from student;
OK
Time taken: 1.089 seconds
# cd $HIVE_HOME
新建文件student.dat
# touch student.dat
在文件中添加如下内容
[root@apollo hive]# vim student.dat
001 daniel
002 bill
003 bruce
004 xin
hive> load data local inpath '/opt/apache-hive-2.3.2-bin/student.dat' into table bigtreeTrial.student;
Loading data to table sbux.student
OK
Time taken: 4.844 seconds
hive> use bigtreeTrial;
OK
Time taken: 0.022 seconds
hive> select * from student;
OK
1 daniel
2 bill
3 bruce
4 xin
Time taken: 1.143 seconds, Fetched: 4 row(s)
在浏览器里打开如下的连接 (hadoop的namenode)来查看HIVE的HDFS信息。
http://10.70.27.3:50070/explorer.html#/data/hive/warehouse
说明:先打开 http://10.70.27.3:50070,然后在最右边的菜单Utilities -> Browse File System, 输入 /, 然后选择go, 就可以一步一步地浏览
mysql> select * from hive.TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
| 1 | 1521517202 | 6 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL | |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+
1 row in set (0.00 sec)
这个fix 要3.0.0才有,我们目前只能手工打patch 步骤如下:
注: 本台机器上必须安装 jdk8 和 maven 工具。
1. 在maven 的 /usr/share/maven/conf/settings.xml 做如下的配置,可以加速构建。
alimaven
aliyun maven
http://maven.aliyun.com/nexus/content/groups/public/
central
2. cd apache-hive-2.3.2-src; mvn clean package -Pdist -DskipTests
经过比较长的编译过程,等构建完毕。
# cd ./packaging/target/
该目录下就会有新生成的 apache-hive-2.3.2-bin.tar.gz。
第一步:配置和启动 tranquility 服务器
下载 tranquility-distribution-0.8.2.tar to /opt/
step2: # tar xvf download tranquility-distribution-0.8.2.tar
step3: # cd /opt/tranquility-distribution-0.8.2/conf
vi server.json
{
"dataSources" : {
"pageviews" : {
"spec" : {
"dataSchema" : {
"dataSource" : "pageviews",
"parser" : {
"type" : "string",
"parseSpec" : {
"timestampSpec" : {
"format": "auto",
"column": "time"
},
"dimensionsSpec" : {
"dimensions": ["url", "user"]
},
"format" : "json"
}
},
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "hour",
"queryGranularity" : "none"
},
"metricsSpec" : [
{"name": "views", "type": "count"},
{"name": "latencyMs", "type": "doubleSum", "fieldName": "latencyMs"}
]
},},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
} },
"properties" : {
"zookeeper.connect" : "10.70.27.8:2181,10.70.27.10:2181,10.70.27.12:2181",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "8"
}
}
启动tranquility server
# cd /opt/tranquility-distribution-0.8.2 ; ./bin/tranquility server conf/server.json
....
2018-03-28 02:00:24,210 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector@406ca9fc{HTTP/1.1}{0.0.0.0:8200}
2018-03-28 02:00:24,210 [main] INFO org.eclipse.jetty.server.Server - Started @3868ms
第二步:向 tranquility 服务器发送数据
post : http://10.70.27.8:8200/v1/post/pageviews
// 10.70.27.8 是tranquility 服务器运行的地址。pageviews 是上面配置文件中的data source地址。
text/plain; raw
{"time": "2018-03-27T12:42:49Z", "url": "/foo/bar", "user": "billhongbin", "latencyMs": 45}
第三步:查看druid task
http://【overlord server IP】:8090/console.html
可以看到任务完成。
第四步,下载hive并配置hive中的druid设置
第五步,从hive中检索数据
# /opt/apache-hive-2.3.2-bin/bin/hive
hive> show databases;
OK
bigtreetrial
default
Time taken: 3.255 seconds, Fetched: 2 row(s)
hive> use bigtreetrial;
hive > CREATE EXTERNAL TABLE bill_druid_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "pageviews");
hive> describe formatted bill_druid_table;
OK
# col_name data_type comment
__time timestamp from deserializer
latencyms string from deserializer
url string from deserializer
user string from deserializer
views bigint from deserializer
# Detailed Table Information
Database: bigtreetrial
Owner: root
CreateTime: Tue Mar 27 20:48:43 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://hadoopServer3:9000/data/hive/warehouse/bigtreetrial.db/bill_druid_table
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
EXTERNAL TRUE
druid.datasource pageviews
numFiles 0
numRows 0
rawDataSize 0
storage_handler org.apache.hadoop.hive.druid.DruidStorageHandler
totalSize 0
transient_lastDdlTime 1522154923
# Storage Information
SerDe Library: org.apache.hadoop.hive.druid.serde.DruidSerDe
InputFormat: null
OutputFormat: null
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.046 seconds, Fetched: 37 row(s)
hive> select * from bill_druid_table;
OK
2018-03-28 11:37:04 NULL /datang/machine billtang 1
2018-03-28 11:37:04 NULL /datang/machine tiger 1
2018-03-28 12:42:15 NULL /datang/machine billtang 1
2018-03-28 12:48:15 NULL /datang/machine billtang 1
2018-03-28 12:48:15 NULL /sina/machine bigtree 1
Time taken: 2.037 seconds, Fetched: 5 row(s)