大数据平台运维之Hive

启动大数据平台的Hive数据仓库,启动Hvie客户端,通过Hive查看hadoop所有文件路径(相关数据库命令语言请全部使用小写格式),将查询结果以文本形式提交到答题框中。

[root@master ~]# hive

WARNING: Use "yarn jar" to launch YARNapplications.

 

Logging initialized using configuration infile:/etc/hive/2.4.3.0-227/0/hive-log4j.properties

 

 

hive> dfs -ls;

Found 5 items

drwx------   -root hdfs          0 2017-04-20 18:56.Trash

drwxr-xr-x   - roothdfs          0 2017-05-07 05:59.hiveJars

drwx------   -root hdfs          0 2017-05-07 05:43.staging

drwxr-xr-x   -root hdfs          0 2017-05-07 05:43hbase-staging

drwxr-xr-x   -root hdfs          0 2017-04-20 18:56samll-file

 

27.使用 Hive工具来创建数据表xd_phy_course,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,通过hive查询数据表xd_phy_course中数据在HDFS所处的文件位置列表信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

新:

hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' stored as textfile;

OK

Time taken: 4.067 seconds

 

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

OK

Time taken: 1.422 seconds

 

hive> dfs -ls /apps/hive/warehouse;

Found 1 items

drwxrwxrwx   -hive hdfs          0 2017-05-19 03:30/apps/hive/warehouse/xd_phy_course

 

28.使用Hive工具来创建数据表xd_phy_course,并定义该表为外部表,外部存储位置为/1daoyun/data/hive,将phy_course_xd.txt导入到该表中,其中xd_phy_course表的数据结构如下表所示。导入完成后,在hive中查询数据表xd_phy_course的数据结构信息,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

hive> create external table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n' location '/1daoyun/data/hive';

OK

Time taken: 1.197 seconds

 

hive> load data local inpath '/root/phy_course_xd.txt'into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

OK

Time taken: 0.96 seconds

 

hive> desc xd_phy_course2;

OK

stname                 string                                     

stid                   int                                         

class                  string                                     

opt_cour               string                                     

Time taken: 0.588 seconds, Fetched: 4 row(s)

 

 

29.使用Hive工具来查找出phy_course_xd.txt文件中某高校Software_1403班级报名选修volleyball的成员所有信息,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,班级字段为class,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

新:

hive> create table xd_phy_course (stnamestring,stID int,class string,opt_cour string) row format delimited fieldsterminated by '\t' lines terminated by '\n';

OK

Time taken: 4.067 seconds

 

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

OK

Time taken: 1.422 seconds

 

hive> select * from xd_phy_course whereclass='Software_1403' and opt_cour='volleyball';

OK

student409     10120408        Software_1403   volleyball

student411     10120410        Software_1403   volleyball

student413     10120412        Software_1403   volleyball

student419     10120418        Software_1403   volleyball

student421     10120420        Software_1403   volleyball

student422     10120421        Software_1403   volleyball

student424     10120423        Software_1403   volleyball

student432     10120431        Software_1403   volleyball

student438     10120437        Software_1403   volleyball

student447     10120446        Software_1403   volleyball

Time taken: 0.985 seconds, Fetched: 10 row(s)

 

30.使用Hive工具来统计phy_course_xd.txt文件中某高校报名选修各个体育科目的总人数,其中phy_course_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,将统计的结果导入到表phy_opt_count中,通过SELECT语句查询表phy_opt_count内容,将统计语句以及查询命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

hive> create table xd_phy_course (stname string,stIDint,class string,opt_cour string) row format delimited fields terminated by'\t' lines terminated by '\n';

OK

Time taken: 4.067 seconds

 

hive> load data local inpath'/root/phy_course_xd.txt' into table xd_phy_course;

Loading data to table default.xd_phy_course

Table default.xd_phy_course stats: [numFiles=1,totalSize=89444]

OK

Time taken: 1.422 seconds

 

hive> create table phy_opt_count (opt_courstring,cour_count int) row format delimited fields terminated by '\t' linesterminated by '\n';

OK

Time taken: 1.625 seconds

 

hive> insert overwrite table phy_opt_count selectxd_phy_course.opt_cour,count(distinct xd_phy_course.stID) from xd_phy_coursegroup by xd_phy_course.opt_cour;

Query ID =root_20170507125642_6af22d21-ae88-4daf-a346-4b1cbcd7d9fe

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.

 

 

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0004)

 

--------------------------------------------------------------------------------

       VERTICES      STATUS  TOTAL COMPLETED  RUNNING  PENDING FAILED  KILLED

--------------------------------------------------------------------------------

Map 1 ..........  SUCCEEDED      1          1        0       0       0       0

Reducer 2 ......  SUCCEEDED      1          1        0       0       0       0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100%  ELAPSED TIME: 4.51 s    

--------------------------------------------------------------------------------

Loading data to table default.phy_opt_count

Table default.phy_opt_count stats: [numFiles=1,numRows=10, totalSize=138, rawDataSize=128]

OK

Time taken: 13.634 seconds

 

hive> select * from phy_opt_count;

OK

badminton      234

basketball     224

football       206

gymnastics     220

opt_cour       0

swimming       234

table tennis   277

taekwondo      222

tennis  223

volleyball     209

Time taken: 0.065 seconds, Fetched: 10 row(s)

 

 

31.使用Hive工具来查找出phy_course_score_xd.txt文件中某高校Software_1403班级体育选修成绩在90分以上的成员所有信息,其中phy_course_score_xd.txt文件数据结构如下表所示,选修科目字段为opt_cour,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

hive> create table phy_course_score_xd (stnamestring,stID int,class string,opt_cour string,score float) row format delimitedfields terminated by '\t' lines terminated by '\n';

OK

Time taken: 0.339 seconds

 

hive> load data local inpath'/root/phy_course_score_xd.txt' into table phy_course_score_xd;

Loading data to table default.phy_course_score_xd

Table default.phy_course_score_xd stats: [numFiles=1,totalSize=1910]

OK

Time taken: 1.061 seconds

 

hive> select * from phy_course_score_xd whereclass='Software_1403' and score>90;

OK

student433     10120432        Software_1403   football        98.0

student434     10120433        Software_1403   table tennis    97.0

student438     10120437        Software_1403   volleyball      93.0

student439     10120438        Software_1403   badminton       100.0

student444      10120443        Software_1403   swimming        99.0

student445     10120444        Software_1403   table tennis    97.0

student450     10120449        Software_1403   basketball      97.0

Time taken: 0.21 seconds, Fetched: 7 row(s)

 

32.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的平均成绩,使用round函数保留两位小数。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

hive> select class,round(avg(score)) fromphy_course_score_xd group by class;

Query ID = root_20170507131823_0bfb1faf-3bfb-42a5-b7eb-3a6a284081ae

Total jobs = 1

Launching Job 1 out of 1

 

 

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)

 

--------------------------------------------------------------------------------

       VERTICES      STATUS  TOTAL COMPLETED  RUNNING  PENDING FAILED  KILLED

--------------------------------------------------------------------------------

Map 1 ..........  SUCCEEDED      1          1        0       0       0       0

Reducer 2 ......  SUCCEEDED      1          1        0       0       0       0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100%  ELAPSED TIME: 26.68 s   

--------------------------------------------------------------------------------

OK

Network_1401   73.0

Software_1403  72.0

class   NULL

Time taken: 27.553 seconds, Fetched: 3 row(s)

 

 

33.使用Hive工具来统计phy_course_score_xd.txt文件中某高校各个班级体育课的最高成绩。其中phy_course_score_xd.txt文件数据结构如下表所示,班级字段为class,成绩字段为score,将以上操作命令(相关数据库命令语言请全部使用小写格式)和输出结果以文本形式提交到答题框。

hive> select class,max(score) fromphy_course_score_xd group by class;

Query ID =root_20170507131942_86a2bf55-49ac-4c2e-b18b-8f63191ce349

Total jobs = 1

Launching Job 1 out of 1

 

 

Status: Running (Executing on YARN cluster with App idapplication_1494149668396_0005)

 

--------------------------------------------------------------------------------

       VERTICES      STATUS  TOTAL COMPLETED  RUNNING  PENDING FAILED  KILLED

--------------------------------------------------------------------------------

Map 1 ..........  SUCCEEDED      1          1        0       0       0       0

Reducer 2 ......  SUCCEEDED      1          1        0       0       0       0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100%  ELAPSED TIME: 5.08 s    

--------------------------------------------------------------------------------

OK

Network_1401    95.0

Software_1403  100.0

class   NULL

Time taken: 144.035 seconds, Fetched: 3 row(s)

 

 

34.在Hive数据仓库将网络日志weblog_entries.txt中分开的request_date和request_time字段进行合并,并以一个下划线“_”进行分割,如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。

hive> create external table weblog_entries (md5string,url string,request_date string,request_time string,ip string) row formatdelimited fields terminated by '\t' lines terminated by '\n' location'/data/hive/weblog/';

OK

Time taken: 0.502 seconds

 

hive> load data local inpath'/root/weblog_entries.txt' into table weblog_entries;

Loading data to table default.weblog_entries

Table default.weblog_entries stats: [numFiles=1,totalSize=251130]

OK

Time taken: 1.203 seconds

 

hive> select concat_ws('_', request_date,request_time) from weblog_entries;

2012-05-10_21:29:01

2012-05-10_21:13:47

2012-05-10_21:12:37

2012-05-10_21:34:20

2012-05-10_21:27:00

2012-05-10_21:33:53

2012-05-10_21:10:19

2012-05-10_21:12:05

2012-05-10_21:25:58

2012-05-10_21:34:28

Time taken: 0.265 seconds, Fetched: 3000 row(s) 

35.在Hive数据仓库将网络日志weblog_entries.txt中的IP 字段与ip_to_country中IP对应的国家进行简单的内链接,输出结果如下图所示,其中weblog_entries.txt的数据结构如下表所示。将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。 

 

hive> create table ip_to_country (ip string,countrystring) row format delimited fields terminated by '\t' lines terminated by '\n'location '/data/hive/ip_to_county/';

OK

Time taken: 0.425 seconds

 

hive> load data local inpath'/root/ip_to_country.txt' into table ip_to_country;

Loading data to table default.ip_to_country

Table default.ip_to_country stats: [numFiles=1,totalSize=75728]

OK

Time taken: 2.016 seconds

 

hive> select wle.*,itc.country from weblog_entrieswle join ip_to_country itc on wle.ip=itc.ip;

Query ID = root_20170507064740_a52870a0-2405-4fd4-85c2-43f8a229b3c3

Total jobs = 1

Launching Job 1 out of 1

Tez session was closed. Reopening...

Session re-established.

 

 

Status: Running (Executing on YARN cluster with App idapplication_1494136863427_0002)

 

--------------------------------------------------------------------------------

       VERTICES      STATUS  TOTAL COMPLETED  RUNNING  PENDING FAILED  KILLED

--------------------------------------------------------------------------------

Map 1 ..........  SUCCEEDED      1          1        0       0       0       0

Map 2 ..........  SUCCEEDED      1          1        0       0       0       0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100%  ELAPSED TIME: 6.30 s    

--------------------------------------------------------------------------------

OK

3e8146764aefe5d87353dd4e0ae9ac5/qnrxlxqacgiudbtfggcg.html     2012-05-10      21:29:01        164.210.124.152 United States

fdb388d28c8466d4eb7d93677af194  /sbbiuot.html   2012-05-10      21:13:47        168.17.158.38   United States

4a1a345f85fa5fa2659e27f623dff11 /ofxi.html      2012-05-10      21:12:37        174.24.173.11   United States

6a09d25407766a7bb8653d359feca4  /hjmdhaoogwqhp.html     2012-05-10      21:34:20        143.64.173.176  United States

aeecff9b31d1134c8843248bedbca5bd        /angjbmea.html  2012-05-10      21:27:00        160.164.158.125 Italy

f61954aad39de057cd6f51ba3deed241        /mmdttqsnjfifkihcvqu.html       2012-05-10      21:33:53        15.111.128.4    United States

7cdf2c1efd653867278417dd465c1a65        /eorxuryjadhkiwsf.html  2012-05-10      21:10:19        22.71.176.163   United States

22b2549649dcc284ba8bf7d4993ac62 /e.html2012-05-10      21:12:05        105.57.100.182  Morocco

3ab7888ffe27c2f98d48eb296449d5  /khvc.html      2012-05-10      21:25:58        111.147.83.42   China

65827078a9f7ccce59632263294782db        /c.html 2012-05-10      21:34:28        137.157.65.89   Australia

Time taken: 15.331 seconds, Fetched: 3000 row(s)

 

36.使用Hive动态地关于网络日志weblog_entries.txt的查询结果创建Hive表。通过创建一张名为weblog_entries_url_length的新表来定义新的网络日志数据库的三个字段,分别是url,request_date,request_time。此外,在表中定义一个获取url字符串长度名为“url_length”的新字段,其中weblog_entries.txt的数据结构如下表所示。完成后查询weblog_entries_url_length表文件内容,将以上操作命令(相关数据库命令语言请全部使用小写格式)和后十行输出结果以文本形式提交到答题框。

 

hive> create tableweblog_entries_url_length as select url, request_date, request_time,length(url) as url_length from weblog_entries;

Query ID = root_20170507065123_e3105d8b-84b6-417f-ab58-21ea15723e0a

Total jobs = 1

Launching Job 1 out of 1

 

 

Status: Running (Executing onYARN cluster with App id application_1494136863427_0002)

 

--------------------------------------------------------------------------------

        VERTICES      STATUS TOTAL  COMPLETED  RUNNING PENDING  FAILED  KILLED

--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1         1        0        0      0       0

--------------------------------------------------------------------------------

VERTICES: 01/01  [==========================>>]100%  ELAPSED TIME: 4.10 s    

--------------------------------------------------------------------------------

Moving data to:hdfs://master:8020/apps/hive/warehouse/weblog_entries_url_length

Tabledefault.weblog_entries_url_length stats: [numFiles=1, numRows=3000,totalSize=121379, rawDataSize=118379]

OK

Time taken: 5.874 seconds

 

hive> select * fromweblog_entries_url_length;

/qnrxlxqacgiudbtfggcg.html      2012-05-10      21:29:01        26

/sbbiuot.html   2012-05-10      21:13:47        13

/ofxi.html      2012-05-10      21:12:37        10

/hjmdhaoogwqhp.html     2012-05-10      21:34:20        19

/angjbmea.html  2012-05-10     21:27:00        14

/mmdttqsnjfifkihcvqu.html       2012-05-10      21:33:53        25

/eorxuryjadhkiwsf.html  2012-05-10      21:10:19        22

/e.html 2012-05-10      21:12:05        7

/khvc.html      2012-05-10      21:25:58        10

/c.html 2012-05-10      21:34:28        7

Time taken: 0.08 seconds,Fetched: 3000 row(s)  

你可能感兴趣的:(linux相关运维知识,大数据集群部署与运维,大数据运维)