启动大数据平台的Hive数据仓库,启动Hvie客户端,通过Hive查看hadoop所有文件路径(相关数据库命令语言请全部使用小写格式),将查询结果以文本形式提交到答题框中。
[root@master ~]# hive
WARNING: Use "yarn jar" to launch YARNapplications.
Logging initialized using configuration infile:/etc/hive/2.4.3.0-227/0/hive-log4j.properties
hive> dfs -ls;
Found 5 items
drwx------ -root hdfs 0 2017-04-20 18:56.Trash
drwxr-xr-x - roothdfs 0 2017-05-07 05:59.hiveJars
drwx------ -root hdfs 0 2017-05-07 05:43.staging
drwxr-xr-x -root hdfs 0 2017-05-07 05:43hbase-staging
drwxr-xr-x -root hdfs 0 2017-04-20 18:56samll-file
27.使用 Hive工具来创建数据表xd_phy_course,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,通过hive查询数据表xd_phy_course中数据在HDFS所处的文件位置列表信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
新:
hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' stored as textfile;
OK
Time taken: 4.067 seconds
hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]
OK
Time taken: 1.422 seconds
hive> dfs -ls /apps/hive/warehouse;
Found 1 items
drwxrwxrwx -hive hdfs 0 2017-05-19 03:30/apps/hive/warehouse/xd_phy_course
28.使用Hive工具来创建数据表xd_phy_course,并定义该表为外部表,外部存储位置为/1daoyun/data/hive,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,在hive中查询数据表xd_phy_course的数据结构信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
hive> create external table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' location '/1daoyun/data/hive';
OK
Time taken: 1.197 seconds
hive> load data local inpath '/root/phy_course_xd.txt'into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]
OK
Time taken: 0.96 seconds
hive> desc xd_phy_course2;
OK
stname string
stid int
class string
opt_cour string
Time taken: 0.588 seconds, Fetched: 4 row(s)
29.使用Hive工具来查找出phy_course_xd.txt文件中某高校Software_1403班级报名选修volleyball的成员所有信息,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,班级字段为class,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
新:
hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n';
OK
Time taken: 4.067 seconds
hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]
OK
Time taken: 1.422 seconds
hive> select * from xd_phy_course whereclass='Software_1403' and opt_cour='volleyball';
OK
student409 10120408 Software_1403 volleyball
student411 10120410 Software_1403 volleyball
student413 10120412 Software_1403 volleyball
student419 10120418 Software_1403 volleyball
student421 10120420 Software_1403 volleyball
student422 10120421 Software_1403 volleyball
student424 10120423 Software_1403 volleyball
student432 10120431 Software_1403 volleyball
student438 10120437 Software_1403 volleyball
student447 10120446 Software_1403 volleyball
Time taken: 0.985 seconds, Fetched: 10 row(s)
30.使用Hive工具来统计phy_course_xd.txt文件中某高校报名选修各个体育科目的总人数,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,将统计的结果导入到表phy_opt_count中,通过SELECT语句查询表phy_opt_count内容,将统计语句以及查询命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
hive> create table xd_phy_course (stname string,stIDint,class string,opt_cour string) row format delimited fields terminated by'\t' lines terminated by '\n';
OK
Time taken: 4.067 seconds
hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;
Loading data to table default.xd_phy_course
Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]
OK
Time taken: 1.422 seconds
hive> create table phy_opt_count (opt_courstring,cour_count int) row format delimited fields terminated by '\t' linesterminated by '\n';
OK
Time taken: 1.625 seconds
hive> insert overwrite table phy_opt_count selectxd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_coursegroup by xd_phy_course.opt_cour;
Query ID =root_20170507125642_6af22d21-ae88-4daf-a346-4b1cbcd7d9fe
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0004)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.51 s
--------------------------------------------------------------------------------
Loading data to table default.phy_opt_count
Table default.phy_opt_count stats: [numFiles=1,numRows=10, totalSize=138, rawDataSize=128]
OK
Time taken: 13.634 seconds
hive> select * from phy_opt_count;
OK
badminton 234
basketball 224
football 206
gymnastics 220
opt_cour 0
swimming 234
table tennis 277
taekwondo 222
tennis 223
volleyball 209
Time taken: 0.065 seconds, Fetched: 10 row(s)
31.使用Hive工具来查找出phy_course_score_xd.txt文件中某高校Software_1403班级体育选修成绩在90分以上的成员所有信息,其中phy_course_score_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
hive> create table phy_course_score_xd (stnamestring,stID int,class string,opt_cour string,score float) row format delimitedfields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.339 seconds
hive> load data local inpath'/root/phy_course_score_xd.txt' into table phy_course_score_xd;
Loading data to table default.phy_course_score_xd
Table default.phy_course_score_xd stats: [numFiles=1,totalSize=1910]
OK
Time taken: 1.061 seconds
hive> select * from phy_course_score_xd whereclass='Software_1403' and score>90;
OK
student433 10120432 Software_1403 football 98.0
student434 10120433 Software_1403 table tennis 97.0
student438 10120437 Software_1403 volleyball 93.0
student439 10120438 Software_1403 badminton 100.0
student444 10120443 Software_1403 swimming 99.0
student445 10120444 Software_1403 table tennis 97.0
student450 10120449 Software_1403 basketball 97.0
Time taken: 0.21 seconds, Fetched: 7 row(s)
32.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的平均成绩,使用round函数保留两位小数。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
hive> select class,round(avg(score)) fromphy_course_score_xd group by class;
Query ID = root_20170507131823_0bfb1faf-3bfb-42a5-b7eb-3a6a284081ae
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 26.68 s
--------------------------------------------------------------------------------
OK
Network_1401 73.0
Software_1403 72.0
class NULL
Time taken: 27.553 seconds, Fetched: 3 row(s)
33.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的最高成绩。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。
hive> select class,max(score) fromphy_course_score_xd group by class;
Query ID =root_20170507131942_86a2bf55-49ac-4c2e-b18b-8f63191ce349
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.08 s
--------------------------------------------------------------------------------
OK
Network_1401 95.0
Software_1403 100.0
class NULL
Time taken: 144.035 seconds, Fetched: 3 row(s)
34.在Hive数据仓库将网络日志weblog_entries.txt中分开的request_date和request_time字段进行合并,并以一个下划线“_”进行分割,如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。
hive> create external table weblog_entries (md5string,url string,request_date string,request_time string,ip string) row formatdelimited fields terminated by '\t' lines terminated by '\n' location'/data/hive/weblog/';
OK
Time taken: 0.502 seconds
hive> load data local inpath'/root/weblog_entries.txt' into table weblog_entries;
Loading data to table default.weblog_entries
Table default.weblog_entries stats: [numFiles=1,totalSize=251130]
OK
Time taken: 1.203 seconds
hive> select concat_ws('_', request_date,request_time) from weblog_entries;
2012-05-10_21:29:01
2012-05-10_21:13:47
2012-05-10_21:12:37
2012-05-10_21:34:20
2012-05-10_21:27:00
2012-05-10_21:33:53
2012-05-10_21:10:19
2012-05-10_21:12:05
2012-05-10_21:25:58
2012-05-10_21:34:28
Time taken: 0.265 seconds, Fetched: 3000 row(s)
35.在Hive数据仓库将网络日志weblog_entries.txt中的IP 字段与ip_to_country中IP对应的国家进行简单的内链接,输出结果如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。
hive> create table ip_to_country (ip string,countrystring) row format delimited fields terminated by '\t' lines terminated by '\n'location '/data/hive/ip_to_county/';
OK
Time taken: 0.425 seconds
hive> load data local inpath'/root/ip_to_country.txt' into table ip_to_country;
Loading data to table default.ip_to_country
Table default.ip_to_country stats: [numFiles=1,totalSize=75728]
OK
Time taken: 2.016 seconds
hive> select wle.*,itc.country from weblog_entrieswle join ip_to_country itc on wle.ip=itc.ip;
Query ID = root_20170507064740_a52870a0-2405-4fd4-85c2-43f8a229b3c3
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App idapplication_1494136863427_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 2 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.30 s
--------------------------------------------------------------------------------
OK
3e8146764aefe5d87353dd4e0ae9ac5/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 164.210.124.152 United States
fdb388d28c8466d4eb7d93677af194 /sbbiuot.html 2012-05-10 21:13:47 168.17.158.38 United States
4a1a345f85fa5fa2659e27f623dff11 /ofxi.html 2012-05-10 21:12:37 174.24.173.11 United States
6a09d25407766a7bb8653d359feca4 /hjmdhaoogwqhp.html 2012-05-10 21:34:20 143.64.173.176 United States
aeecff9b31d1134c8843248bedbca5bd /angjbmea.html 2012-05-10 21:27:00 160.164.158.125 Italy
f61954aad39de057cd6f51ba3deed241 /mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 15.111.128.4 United States
7cdf2c1efd653867278417dd465c1a65 /eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22.71.176.163 United States
22b2549649dcc284ba8bf7d4993ac62 /e.html2012-05-10 21:12:05 105.57.100.182 Morocco
3ab7888ffe27c2f98d48eb296449d5 /khvc.html 2012-05-10 21:25:58 111.147.83.42 China
65827078a9f7ccce59632263294782db /c.html 2012-05-10 21:34:28 137.157.65.89 Australia
Time taken: 15.331 seconds, Fetched: 3000 row(s)
36.使用Hive动态地关于网络日志weblog_entries.txt的查询结果创建Hive表。通过创建一张名为weblog_entries_url_length的新表来定义新的网络日志数据库的三个字段,分别是url,request_date,request_time。此外,在表中定义一个获取url字符串长度名为“url_length”的新字段,其中weblog_entries.txt的数据结构如下表所示。完成后查询weblog_entries_url_length表文件内容,将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。
hive> create tableweblog_entries_url_length as select url, request_date, request_time,length(url) as url_length from weblog_entries;
Query ID = root_20170507065123_e3105d8b-84b6-417f-ab58-21ea15723e0a
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing onYARN cluster with App id application_1494136863427_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>]100% ELAPSED TIME: 4.10 s
--------------------------------------------------------------------------------
Moving data to:hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length
Tabledefault.weblog_entries_url_length stats: [numFiles=1, numRows=3000,totalSize=121379, rawDataSize=118379]
OK
Time taken: 5.874 seconds
hive> select * fromweblog_entries_url_length;
/qnrxlxqacgiudbtfggcg.html 2012-05-10 21:29:01 26
/sbbiuot.html 2012-05-10 21:13:47 13
/ofxi.html 2012-05-10 21:12:37 10
/hjmdhaoogwqhp.html 2012-05-10 21:34:20 19
/angjbmea.html 2012-05-10 21:27:00 14
/mmdttqsnjfifkihcvqu.html 2012-05-10 21:33:53 25
/eorxuryjadhkiwsf.html 2012-05-10 21:10:19 22
/e.html 2012-05-10 21:12:05 7
/khvc.html 2012-05-10 21:25:58 10
/c.html 2012-05-10 21:34:28 7
Time taken: 0.08 seconds,Fetched: 3000 row(s)