Hive 题:
启动先电大数据平台的 Hive 数据仓库,启动 Hvie 客户端,通过 Hive 查看 hadoop 所有文件路径(相关数据库命令语言请全部使用小写格式)。
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> dfs -ls;
Found 2 items
drwxr-xr-x - root hdfs 0 2019-05-07 12:00 .hiveJars
drwx------ - root hdfs 0 2019-05-06 21:46 .staging
2.使用 Hive 工具来创建数据表 xd_phy_course,将 phy_course_xd.txt 导入到该表中,其中 xd_phy_course 表的数据结构如下表所示。导入完成后,通过hive 查询数据表 xd_phy_course 中数据在 HDFS 所处的文件位置列表信息。
(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 2.698 seconds
hive>load data local inpath '/opt/phy_course_xd.txt' into table xd_phy_course; Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 1.053 seconds
hive> dfs -ls /apps/hive/warehouse;
Found 1 items
drwxrwxrwx - root hadoop 0 2019-05-07 12:18 /apps/hive/warehouse/xd_phy_course
3.使用 Hive 工具来创建数据表 xd_phy_course,并定义该表为外部表,外部存储位置为/1daoyun/data/hive,将 phy_course_xd.txt 导入到该表中,其中xd_phy_course 表的数据结构如下表所示。导入完成后,在 hive 中查询数据表 xd_phy_course 的数据结构信息。(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> create table xd_phy_course(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/1daoyun/data/hive';
OK
Time taken: 1.028 seconds
hive>load data local inpath '/opt/phy_course_xd.txt' into table xd_phy_course; Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1, totalSize=450]
OK
Time taken: 0.863 seconds
hive> desc xd_phy_course;
OK
stname string
stid int
class string
opt_cour string
Time taken: 0.414 seconds, Fetched: 4 row(s)
4.使用 Hive 工具来查找出 phy_course_xd.txt 文件中某高校 Software_1403 班级报名选修 volleyball 的成员所有信息,其中 phy_course_xd.txt 文件数据结构如下表所示,选修科目字段为 opt_cour,班级字段为 class。(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> create table phy_course_xd(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.254 seconds
hive>load data local inpath '/opt/phy_course_xd.txt' into table phy_course_xd; Loading data to table default.phy_course_xd
Table default.phy_course_xd stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 0.797 seconds
hive>select * from phy_course_xd where class='Software_1403' and opt_cour='volleyball';
OK
student409 10120408 Software_1403 volleyball
student411 10120410 Software_1403 volleyball
student413 10120412 Software_1403 volleyball
student419 10120418 Software_1403 volleyball
student421 10120420 Software_1403 volleyball
student422 10120421 Software_1403 volleyball
student424 10120423 Software_1403 volleyball
student432 10120431 Software_1403 volleyball
student438 10120437 Software_1403 volleyball
student447 10120446 Software_1403 volleyball
Time taken: 0.093 seconds, Fetched: 10 row(s)
5.使用 Hive 工具来统计 phy_course_xd.txt 文件中某高校报名选修各个体育科目的总人数,其中 phy_course_xd.txt 文件数据结构如下表所示,选修科目字段为 opt_cour,将统计的结果导入到表 phy_opt_count 中,通过 SELECT语句查询表 phy_opt_count 内容。(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)
[root@master ~]#hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> create table phy_course_xd(stname string,stID int,class string,opt_cour string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.254 seconds
hive>load data local inpath '/opt/phy_course_xd.txt' into table phy_course_xd; Loading data to table default.phy_course_xd
Table default.phy_course_xd stats: [numFiles=1, numRows=0, totalSize=450, rawDataSize=0]
OK
Time taken: 0.797 second
hive> create table phy_opt_count(opt_cour string,cour_count int) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.397 seconds
hive>insert overwrite table phy_opt_count select xd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_course group by xd_phy_course.opt_cour;
Query ID = root_20190508160534_538b0a0d-7e07-4ee1-bf5e-4a5d55b5a816
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1556972378662_0019)
Loading data to table default.phy_opt_count
Table default.phy_opt_count stats: [numFiles=1, numRows=1, totalSize=14, rawDataSize=13]
OK
Time taken: 71.7 seconds
hive> select * from phy_opt_count;
OK
volleyball 10
Time taken: 0.248 seconds, Fetched: 1 row(s)
6.使用 Hive 工具来查找出 phy_course_score_xd.txt 文件中某高校Software_1403 班级体育选修成绩在 90 分以上的成员所有信息,其中phy_course_score_xd.txt 文件数据结构如下表所示,选修科目字段为 opt_cour,成绩字段为 score。(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)score(float)
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive> create table phy_course_score_xd(stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 1.255 seconds
hive>load data local inpath '/opt/phy_course_score_xd.txt' into table phy_course_score_xd;
Loading data to table default.phy_course_score_xd
Table default.phy_course_score_xd stats: [numFiles=1, numRows=0, totalSize=354, rawDataSize=0]
OK
Time taken: 0.78 seconds
hive> select * from phy_course_score_xd where class='Software_1403' and score>=90;
OK
student433 10120432 Software_1403 football 98.0
student444 10120443 Software_1403 swimming 99.0
student445 10120444 Software_1403 tabletennis 97.0
student450 10120449 Software_1403 basketball 97.0
Time taken: 0.703 seconds, Fetched: 4 row(s)
7.使用 Hive 工具来统计 phy_course_score_xd.txt 文件中某高校各个班级体育课的平均成绩,使用 round 函数保留两位小数。其中 phy_course_score_xd.txt文件数据结构如下表所示,班级字段为 class,成绩字段为 score。(相关数据库命令语言请全部使用小写格式)
stname(string)stID(int)class(string)opt_cour(string)score(float)
[root@master ~]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
hive>create table phy_course_score_xd(stname string,stID int,class string,opt_cour string,score float) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 1.255 seconds
hive>load data local inpath '/opt/phy_course_score_xd.txt' into table phy_course_score_xd; Loading data to table default.phy_course_score_xd
Table default.phy_course_score_xd stats: [numFiles=1, numRows=0, totalSize=354, rawDataSize=0]
OK
Time taken: 0.78 seconds
hive> select class,round(avg(score)) from phy_course_score_xd group by class;
Query ID = root_20190508200705_810d57d4-27a5-448b-91da-93d211a6382b
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1556972378662_0021)
OK
Software_1403 98.0
Time taken: 5.829 seconds, Fetched: 4 row(s)
8.使用 Hive 工具来统计 phy_course_score_xd.txt 文件中某高校各个班级体育课的最高成绩。其中 phy_course_score_xd.txt 文件数据结构如下表所示,班级字段为 class,成绩字段为 score。(相关数据库命令语言请全部使用小写格式)
stname(s stID class(st opt_cour(s score(floattring) (int) ring) tring) )
hive>select class,max(score) from phy_course_score_xd group by class;
Query ID = root_20190509102108_c1d4fb39-ac37-4d61-a509-cdc554adfe1d
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1556972378662_0022)
OK
Software_1403 99.0
Time taken: 54.38 seconds, Fetched: 4 row(s)
9.在 Hive 数据仓库将网络日志 weblog_entries.txt 中分开的 request_date 和request_time 字段进行合并,并以一个下划线“_”进行分割,如下图所示,其中 weblog_entries.txt 的数据结构如下表所示。(相关数据库命令语言请全部使用小写格式)
md5(STRING) url (STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries(md5 string,url string,request_date string,request_time string,ip string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.237 seconds
hive>load data local inpath '/opt/weblog_entries.txt' into table weblog_entries; Loading data to table default.weblog_entries
Table default.weblog_entries stats: [numFiles=1, numRows=0, totalSize=251130, rawDataSize=0]
OK
Time taken: 0.845 seconds
hive>select concat_ws('_',request_date,request_time) from weblog_entries;
OK
2012-05-10_21:25:44
2012-05-10_21:11:20
2012-05-10_21:32:08
2012-05-10_21:25:17
2012-05-10_21:18:10
2012-05-10_21:24:12
2012-05-10_21:23:00
2012-05-10_21:15:40
2012-05-10_21:28:18
2012-05-10_21:07:31
Time taken: 0.134 seconds, Fetched: 3000 row(s)
10.在 Hive 数据仓库将网络日志 weblog_entries.txt 中的 IP 字段与ip_to_country 中 IP 对应的国家进行简单的内链接,输出结果如下图所示,其中 weblog_entries.txt 的数据结构如下表所示。(相关数据库命令语言请全部使用小写格式)
md5(STRING) url (STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table ip_to_country(ip string,country string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.208 seconds
hive>load data local inpath '/opt/ip_to_country.txt' into table ip_to_country;
Loading data to table default.ip_to_country
Table default.ip_to_country stats: [numFiles=1, numRows=0, totalSize=3922915, rawDataSize=0]
OK
Time taken: 0.81 seconds
hive> select wle.*,itc.country from weblog_entries wle join ip_to_country itc on wle.ip=itc.ip;
Query ID = root_20190509113135_78213e7d-72b3-4150-8df9-50bf6af38f38
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1556972378662_0024)
OK
Time taken: 15.494 seconds
11.使用 Hive 动态地关于网络日志 weblog_entries.txt 的查询结果创建 Hive表。通过创建一张名为 weblog_entries_url_length 的新表来定义新的网络日志数据库的三个字段,分别是 url,request_date,request_time。此外,在表中定义一个获取 url 字符串长度名为“url_length”的新字段,其中weblog_entries.txt 的数据结构如下表所示。完成后查询weblog_entries_url_length 表文件内容。(相关数据库命令语言请全部使用小写格式)
md5(STRING) url (STRING) request_date (STRING) request_time (STRING) ip(STRING)
hive> create table weblog_entries_url_length as select url,request_date,request_time,length(url) as url_length from weblog_entries;
Query ID = root_20190509115805_a09cac28-bd57-408a-b3cb-4260f7a772d3
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening…
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1556972378662_0025)
Moving data to directory hdfs://master.hadoop:8020/apps/hive/warehouse/weblog_entries_url_length
Table default.weblog_entries_url_length stats: [numFiles=1, numRows=3000, totalSize=121379, rawDataSize=118379]
OK
Time taken: 17.004 seconds
hive> select * from weblog_entries_url_length;
OK
/jcwbtvnkkujo.html 2012-05-10 21:25:44 18
/ckyhatbpxu.html 2012-05-10 21:11:20 16
/rr.html 2012-05-10 21:32:08 8
/illrd.html 2012-05-10 21:25:17 11
/tdevxhsb.html 2012-05-10 21:18:10 14
/mxheswcltscr.html 2012-05-10 21:24:12 18
/copswmadilmr.html 2012-05-10 21:23:00 18
/syrljptrmvibfneyh.html 2012-05-10 21:15:40 23
/jgexfstojyfox.html 2012-05-10 21:28:18 19
/abgobopvr.html 2012-05-10 21:07:31 15
/tqtjguo.html 2012-05-10 21:26:52 13
/rsokcykmibgrmjdhn.html 2012-05-10 21:06:37 23
/mwwqgc.html 2012-05-10 21:24:47 12
/hyoiybyfedteqeeeutbp.html 2012-05-10 21:32:36 26
Time taken: 0.091 seconds, Fetched: 3000 row(s)