本文讲述的是如何在CentOS 7中搭建Hive 1.2.2集群环境,并运行一个简单查询的例子。
涉及内容:
1.安装
2.简单操作
1.安装
1.1.下载
下载地址:传送门
1.2.解压
tar -zxvf /opt/soft-install/apache-hive-1.2.2-bin.tar.gz -C /opt/soft
1.3.修改配置文件
1.3.1.修改conf/hive-env.sh
cp hive-env.sh.template hive-env.sh
加入
HADOOP_HOME=/opt/soft/hadoop-2.7.3
1.3.2.新建conf/hive-site.xml
vi hive-site.xml
加入
javax.jdo.option.ConnectionURL
jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
root
password to use against metastore database
hive.cli.print.current.db
true
hive.cli.print.header
true
hive.metastore.warehouse.dir
/user/hive/warehouse
配置使用MySQL来保存Hive的元数据表,其中:
hive.cli.print.current.db=true 表示显示当前数据库
hive.cli.print.header=true 表示显示表头
hive.metastore.warehouse.dir=/user/hive/warehouse 表示指定Hive数据库存储路径
1.4.拷贝MySQL的jar到hive/lib下
(Hive2)初始化Hive在mysql里的脚本 $HIVE_HOME/scripts
./bin/schematool -initSchema -dbType mysql
2.简单操作
2.1.启动shell
必须先启动hadoop集群
bin/hive
2.2.查询
show tables;
show databases |schemas;
show partitions table_name;
show functions;
desc extended table_name;
desc formatted table_name;
describe database database_name;
describe table table_name;
2.3.DDL
2.3.1.创建数据库
create database if not exists test;
查看数据库
show databases;
使用数据库
use test;
2.3.2.创建表
1、创建内部表
create table student(sno int,sname string,sex string,sage int,sdept string) row format delimited fields terminated by ',';
2、删除表
drop table student
2.3.3.导入数据
load data local inpath '/opt/soft-install/data/student.txt' overwrite into table student;
其中student.txt
1001,张三,男,22,高一
1002,李四,女,25,高二
2.4.DML
2.4.1.查询数据
hive> select * from student;
OK
1001 张三 男 22 高一
1002 李四 女 25 高二
Time taken: 1.464 seconds, Fetched: 2 row(s)
hive> select count(*) from student;
Query ID = hadoop_20180514002841_6dd61d4a-6c8c-4c22-aada-5e3bc89b9cbb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1526265117233_0001, Tracking URL = http://hadoop1:8088/proxy/application_1526265117233_0001/
Kill Command = /opt/soft/hadoop-2.7.3/bin/hadoop job -kill job_1526265117233_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-05-14 00:28:59,129 Stage-1 map = 0%, reduce = 0%
2018-05-14 00:29:09,174 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.62 sec
2018-05-14 00:29:19,709 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.02 sec
MapReduce Total cumulative CPU time: 4 seconds 20 msec
Ended Job = job_1526265117233_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.02 sec HDFS Read: 6856 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 20 msec
OK
2
Time taken: 39.698 seconds, Fetched: 1 row(s)
2.3.5.删除数据
1、清空表数据
insert overwrite table student select * from student where 1=0;
truncate table student