Hive实战之Youtube数据集

1 数据来源

本次实战的数据来自于"YouTube视频统计与社交网络"的数据集,是西蒙弗雷泽大学计算机学院在2008年所爬取的数据
数据集地址

1. 1 Youtube视频表格式如下:

列名 注释
视频ID 一个11位字符串,是唯一的
上传 一个字符串的视频上传者的用户名
年龄 视频上传日期和2007年2月15日之间的整数天(YouTube的设立)
类别 由上传者选择的视频类别的字符串
长度 视频长度的整数v
观看数 一整数的视图
一个浮点数的视频速率
评分 整数的评分
评论数 一整数的评论
相关视频ID 最多20个字符串的相关视频ID

数据之间采用"\t"作为分隔符

具体数据如下:

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

数据量大小为1G,条数为500万+

1.2 用户表

列名 uploader videos friends
类型 string int int
解释 上传者 上传视频数 朋友数

2 实战演练准备

2.1 环境搭建

使用环境为
hive-1.1.0-cdh5.4.5
hadoop-2.6.0-cdh5.4.5

演示形式为使用hive shell

2.2 数据清洗

我们一起来看看数据

video ID uploader age category length views rate ratings comments related IDs
ifnlnji-Y4s Hooran 1162 Travel & Events 239 189 4.8 10 3 tpAL3I0urI4 ... ifnlnji-Y4s

主要的问题在于category和relatedIDs处理,由于Hive是支持array格式的,所以我们想到的是使用array来存储category和relatedIDs,但是我们发现category的分割符是"&"而realatedIDs的分隔符是"\t",我们在创建表格的时候能够指定array的分隔符,但是只能指定一个,所以再将数据导入到Hive表格之前我们需要对数据进行一定转换和清洗

并且数据中肯定会存在一些不完整数据和一些奇怪的格式,所以数据的清洗是必要的,我在这里所使用的数据清洗方式是使用Spark进行清洗,也可以使用自定义UDF函数来进行清洗

数据清洗注意点
1)我们可以看到每行数据以"\t"作为分隔符,每行有十列数据,最后一列关联ID可以为空,那么我们对数据进行split之后数组的大小要大于8
2)数据中存在"uNiKXDA8eyQ KRQE 1035 News & Politics 107"这样格式的数据,所以在处理category时需要注意 News & Politics中间的&

处理后的数据如下:

video ID uploader age category length views rate ratings comments related IDs
PkGUU_ggO3k theresident 704 Entertainment 262 11235 3.85 247 280 PkGUU_ggO3k&EYC5bWF0ss8&...shU2hfHKmU0&p0lq5-8IDqY
RX24KLBhwMI lemonette 697 People&Blogs 512 24149 4.22 315 474 t60tW0WevkE&WZgoejVDZlo&...s8xf4QX1UvA&2cKd9ERh5-8

下面的实战都是基于数据清洗后的数据进行的

2.3 创建表格和数据导入

2.3.1 youtube表格的创建和导入

1)youtube1的创建,文件格式为textfile
create table youtube1(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as textfile;

2)youtube2的创建,文件格式为orc
create table youtube2(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as orc;

3)youtube3的创建,文件格式为orc,进行桶分区
create table youtube3(videoId string, uploader string, age int, category array, length int, views int, rate float, ratings int, comments int,relatedId array)
clustered by (uploader) into 8 buckets
row format delimited
fields terminated by "\t"
collection items terminated by "&"
stored as orc;

数据导入:
1)load data inpath "path" into table youtube1;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从youtube1导入到youtube2和youtube3:
insert into table youtube2 select * from youtube1;
insert into table youtube3 select * from youtube1;

2.3.2 user表格的创建和导入

1)user_tmp的创建,文件格式textfile,24buckets
create table user_tmp(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as textfile;

2)user的创建,文件格式orc,24buckets
create table user(uploader string,videos int,friends int)
clustered by (uploader) into 24 buckets
row format delimited
fields terminated by "\t"
stored as orc;

user表的数据导入也是同理
数据导入:
1)load data inpath "path" into table user_tmp;
2)由于无法将textfile格式的数据导入到orc格式的表格,所以数据需要从user_tmp导入到user:
insert into table user select * from user_tmp;

3 实战需求

1)统计出观看数最多的10个视频
2)统计出视频类别热度的前10个类型
3)统计出视频观看数最高的50个视频的所属类别
4)统计出观看数最多的前N个视频所关联的视频的所属类别排行
5)筛选出每个类别中热度最高的前10个视频
6)筛选出每个类别中评分最高的前10个视频
7)找出用户中上传视频最多的10个用户的所有视频
8)筛选出每个类别中观看数Top10

4 实战演练

4.1 统计出观看数最多的10个视频

select * from youtube3 order by views desc limit 10;

结果如下:

hive> select * from youtube3 order by views desc limit 10;
Query ID = hadoop_20170710155353_4bc057f0-bbd5-4bfe-a66c-ec0e17cb3ca9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0101, Tracking URL = http://master:8088/proxy/application_1499153664137_0101/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0101
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 15:53:55,297 Stage-1 map = 0%,  reduce = 0%
...
2017-07-10 15:56:06,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 106.93 sec
MapReduce Total cumulative CPU time: 1 minutes 46 seconds 930 msec
Ended Job = job_1499153664137_0101
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 106.93 sec   HDFS Read: 574632526 HDFS Write: 2916 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 46 seconds 930 msec
OK
dMH0bHeiRNg judsonlaipply   415 ["Comedy"]  360 79897120    4.65    287260  131356  ["HSoVKUVOnfQ","pv5zWaTEVkI","v1U8f7TmYkA","ZCT3MIUTc3o","mDiBOF8XI44","P9LmHXXWiJs","fo_QVq2lGMs","EtGQgSY9Nn4","5P6UU6m3cqk","61heClTFc5w","h3gdSHGcUU4","OPmYbP0F4Zw","gsOaQGF7kiQ","H2gw9VE16mo","rGkIUlYEQT8","innfyQZHPpo","cu8tUy14zQo","cQ25-glGRzI","WqDbn2iLwwY","jvz0bvYmnto"]
cQ25-glGRzI RCARecords  742 ["Music"]   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
12Z3J1uzd0Q kaejane 404 ["Film","Animation"]    615 65341925    3.03    9189    5508    ["innfyQZHPpo","-_CSo1gOd48","1Al4crLW9EM","nhSZs-aAZbo","Af9aPlG2WFw","5P6UU6m3cqk","uBHsOTYO6AI","lW19DnWz6vg","vr3x_RRJdd4","5GE82tqcYYQ","D2kJZOfq7zk","lsO6D1rwrKc","dMH0bHeiRNg","kNLXjXxj3J8","vQbYvjmfbr4","G-HonZGBWus","MuOvqeABHvQ","-2caf6KlSyw","cQ25-glGRzI","pv5zWaTEVkI"]
LpAI8TzQDes IMVUinc 848 ["Entertainment"]   68  65078772    1.38    5678    2499    ["c2eP-UADAi8","5P6UU6m3cqk","cQ25-glGRzI","FYbqcgd97NQ","12Z3J1uzd0Q","VS4wFOWFUt8","RB-wUgnyGv0","sBQLq2VmZcA","w2xUzv6iZWo","oOF3T9Zvizc","244qR7SvvX0","pv5zWaTEVkI","dMH0bHeiRNg","b3u65f4CRLk"]
7AVHXe-ol-s internmarket    603 ["Music"]   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
244qR7SvvX0 donotasyoudo    960 ["Entertainment"]   6   57790943    1.45    66412   14913   ["v3ARyAb_1Bs","nhSZs-aAZbo","2pNTrYd-4FQ","dMH0bHeiRNg","kHmvkRoEowc","yh0xYO3dYx8","5GE82tqcYYQ","ktUSIJEiOug","6PrDw6T3-rM","-xEzGIuY7kw","innfyQZHPpo","pv5zWaTEVkI","E7QOUJfRs3Y","9oRQJK61MWs"]
ePyRrb2-fzs 1988basti   826 ["Music"]   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 ["Music"]   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys  953 ["Music"]   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords  729 ["Music"]   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
Time taken: 172.566 seconds, Fetched: 10 row(s)

4.2 统计出视频类别热度的前10个类型

select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from youtube3 lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc limit 10;
Query ID = hadoop_20170710155757_614ec9dd-ffa0-465d-8d62-c47b5ad585f0
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0102, Tracking URL = http://master:8088/proxy/application_1499153664137_0102/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0102
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 2
2017-07-10 15:58:23,797 Stage-1 map = 0%,  reduce = 0%
...
2017-07-10 15:59:16,341 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 19.51 sec
MapReduce Total cumulative CPU time: 19 seconds 510 msec
Ended Job = job_1499153664137_0102
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0103, Tracking URL = http://master:8088/proxy/application_1499153664137_0103/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0103
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-07-10 15:59:51,645 Stage-2 map = 0%,  reduce = 0%
2017-07-10 16:00:18,373 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.97 sec
2017-07-10 16:00:48,410 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.57 sec
MapReduce Total cumulative CPU time: 2 seconds 570 msec
Ended Job = job_1499153664137_0103
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 2   Cumulative CPU: 19.51 sec   HDFS Read: 47327614 HDFS Write: 900 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 2.57 sec   HDFS Read: 5596 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 80 msec
OK
Entertainment   1304724
Music   1274825
Comedy  449652
Blogs   447581
People  447581
Film    442109
Animation   442109
Sports  390619
Politics    186753
News    186753
Time taken: 185.399 seconds, Fetched: 10 row(s)

4.3 统计出视频观看数最高的50个视频的所属类别

select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;

结果:

hive> select tagId, count(a.videoid) as sum from (select videoid,tagId from (select * from youtube3 order by views desc limit 20) e lateral view explode(category) catetory as tagId) a group by a.tagId order by sum desc;
Query ID = hadoop_20170710160909_c6cbbe29-4df3-4c0b-ad70-bd34857acc80
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0104, Tracking URL = http://master:8088/proxy/application_1499153664137_0104/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0104
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 16:09:48,197 Stage-1 map = 0%,  reduce = 0%
2017-07-10 16:10:17,734 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 3.09 sec
...
2017-07-10 16:10:59,048 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 47.74 sec
MapReduce Total cumulative CPU time: 47 seconds 740 msec
Ended Job = job_1499153664137_0104
Launching Job 2 out of 3
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0105, Tracking URL = http://master:8088/proxy/application_1499153664137_0105/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0105
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 2
2017-07-10 16:11:35,860 Stage-2 map = 0%,  reduce = 0%
2017-07-10 16:12:04,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.07 sec
2017-07-10 16:12:23,187 Stage-2 map = 100%,  reduce = 50%, Cumulative CPU 2.61 sec
2017-07-10 16:12:32,480 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 3.93 sec
MapReduce Total cumulative CPU time: 3 seconds 930 msec
Ended Job = job_1499153664137_0105
Launching Job 3 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0106, Tracking URL = http://master:8088/proxy/application_1499153664137_0106/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0106
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-10 16:13:09,244 Stage-3 map = 0%,  reduce = 0%
2017-07-10 16:13:37,263 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.18 sec
2017-07-10 16:14:05,048 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 3.5 sec
MapReduce Total cumulative CPU time: 3 seconds 500 msec
Ended Job = job_1499153664137_0106
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 47.74 sec   HDFS Read: 57839263 HDFS Write: 228 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 2   Cumulative CPU: 3.93 sec   HDFS Read: 6065 HDFS Write: 324 SUCCESS
Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 6600 HDFS Write: 53 SUCCESS
Total MapReduce CPU Time Spent: 55 seconds 170 msec
OK
Music   12
Comedy  3
Entertainment   3
Film    2
Animation   2
Time taken: 297.466 seconds, Fetched: 5 row(s)

4.4 统计出观看数最多的前50个视频所关联的视频的所属类别排行

思路:

  1. 首先筛选出前50个视频所关联的视频,
    select * from youtube3 order by views desc limit 50
  2. 再将结果和youtube3进行join
    select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  3. 然后去重
    select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId
  4. 最后得出所有视频的类别排行
    select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;

结果:

hive> select tagId, count(e.videoid) as sum from (select videoid,tagId from (select distinct(b.videoId),c.category from (select explode(relatedId) as videoId from (select * from youtube3 order by views desc limit 50) a) b join youtube3 c on b.videoId = c.videoId) d lateral view explode(category) catetory as tagId) e group by tagId order by sum desc;
Query ID = hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb
Total jobs = 7
Launching Job 1 out of 7
Number of reduce tasks determined at compile time: 1

Starting Job = job_1499153664137_0111, Tracking URL = http://master:8088/proxy/application_1499153664137_0111/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0111
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
2017-07-10 17:08:48,662 Stage-1 map = 0%,  reduce = 0%
2017-07-10 17:09:25,175 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 20.83 sec
...
2017-07-10 17:10:15,276 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.01 sec
MapReduce Total cumulative CPU time: 1 minutes 21 seconds 10 msec
Ended Job = job_1499153664137_0111
Stage-10 is filtered out by condition resolver.
Stage-11 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/10 17:10:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170710170808_cfbfde68-c016-4c5d-860b-fe840a2d50cb.log
2017-07-10 05:10:23 Starting to launch local task to process map join;  maximum memory = 518979584
2017-07-10 05:10:34 Dump the side-table for tag: 0 with group count: 743 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable
2017-07-10 05:10:34 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-10_17-08-08_628_1950388697756865753-1/-local-10009/HashTable-Stage-8/MapJoin-mapfile30--.hashtable (22611 bytes)
2017-07-10 05:10:34 End of local task; Time Taken: 11.392 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 7
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0112, Tracking URL = http://master:8088/proxy/application_1499153664137_0112/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0112
Hadoop job information for Stage-8: number of mappers: 4; number of reducers: 0
2017-07-10 17:11:13,493 Stage-8 map = 0%,  reduce = 0%
2017-07-10 17:11:57,229 Stage-8 map = 100%,  reduce = 0%, Cumulative CPU 15.79 sec
MapReduce Total cumulative CPU time: 15 seconds 790 msec
Ended Job = job_1499153664137_0112
Launching Job 4 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0113, Tracking URL = http://master:8088/proxy/application_1499153664137_0113/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0113
Hadoop job information for Stage-3: number of mappers: 3; number of reducers: 2
2017-07-10 17:12:31,982 Stage-3 map = 0%,  reduce = 0%
2017-07-10 17:13:17,467 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.53 sec
MapReduce Total cumulative CPU time: 6 seconds 530 msec
Ended Job = job_1499153664137_0113
Launching Job 5 out of 7
Number of reduce tasks not specified. Defaulting to jobconf value of: 2
Starting Job = job_1499153664137_0114, Tracking URL = http://master:8088/proxy/application_1499153664137_0114/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0114
Hadoop job information for Stage-4: number of mappers: 2; number of reducers: 2
2017-07-10 17:13:55,123 Stage-4 map = 0%,  reduce = 0%
2017-07-10 17:14:22,876 Stage-4 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:14:23,905 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec
2017-07-10 17:14:40,601 Stage-4 map = 100%,  reduce = 50%, Cumulative CPU 3.35 sec
2017-07-10 17:14:52,033 Stage-4 map = 100%,  reduce = 100%, Cumulative CPU 4.96 sec
MapReduce Total cumulative CPU time: 4 seconds 960 msec
Ended Job = job_1499153664137_0114
Launching Job 6 out of 7
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0115, Tracking URL = http://master:8088/proxy/application_1499153664137_0115/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0115
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
2017-07-10 17:15:27,647 Stage-5 map = 0%,  reduce = 0%
2017-07-10 17:15:54,560 Stage-5 map = 50%,  reduce = 0%, Cumulative CPU 0.91 sec
2017-07-10 17:15:55,587 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 2.03 sec
2017-07-10 17:16:23,357 Stage-5 map = 100%,  reduce = 100%, Cumulative CPU 3.59 sec
MapReduce Total cumulative CPU time: 3 seconds 590 msec
Ended Job = job_1499153664137_0115
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4  Reduce: 1   Cumulative CPU: 81.01 sec   HDFS Read: 463074822 HDFS Write: 26949 SUCCESS
Stage-Stage-8: Map: 4   Cumulative CPU: 15.79 sec   HDFS Read: 47322670 HDFS Write: 30829 SUCCESS
Stage-Stage-3: Map: 3  Reduce: 2   Cumulative CPU: 6.53 sec   HDFS Read: 43694 HDFS Write: 1126 SUCCESS
Stage-Stage-4: Map: 2  Reduce: 2   Cumulative CPU: 4.96 sec   HDFS Read: 8888 HDFS Write: 725 SUCCESS
Stage-Stage-5: Map: 2  Reduce: 1   Cumulative CPU: 3.59 sec   HDFS Read: 7001 HDFS Write: 207 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 51 seconds 880 msec
OK
Music   399
Entertainment   103
Comedy  94
People  36
Blogs   36
Film    19
Animation   19
Pets    12
Animals 12
UNA 11
Style   6
Politics    6
News    6
Howto   6
Sports  4
Vehicles    3
Autos   3
Travel  2
Events  2
Technology  1
Science 1
Time taken: 496.822 seconds, Fetched: 21 row(s)

4.5 筛选出某个类别(如music)中热度最高的前10个视频

思路:

  • 创建一个表格,存储每个视频对应一个标签的信息
    create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array)
    row format delimited
    fields terminated by "\t"
    collection items terminated by "&"
    stored as orc;
  • 将转换后的数据进行插入
    insert into table youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
  • 根据观看数和类别进行查询
    select * from youtube_category where categoryId="Music" order by views desc limit 10;

结果如下:

hive> create table youtube_category(videoId string, uploader string, age int, categoryId string, length int, views int, rate float, ratings int, comments int,relatedId array)
    > row format delimited
    > fields terminated by "\t"
    > collection items terminated by "&"
    > stored as orc;
OK
Time taken: 0.256 seconds
hive> insert into youtube_category select videoid,uploader,age,categoryId,length,views,rate,ratings,comments,relatedId from youtube3 lateral view explode(category) catetory as categoryId;
Query ID = hadoop_20170711091010_7c76d7bd-5af0-43f9-812f-e30c612ee60b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0116, Tracking URL = http://master:8088/proxy/application_1499153664137_0116/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0116
Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
2017-07-11 09:11:18,741 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:19:05,239 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 402.71 sec
MapReduce Total cumulative CPU time: 6 minutes 42 seconds 710 msec
Ended Job = job_1499153664137_0116
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://bydcluster1/user/hive/warehouse/youtube_category/.hive-staging_hive_2017-07-11_09-10-38_970_8191962088714912782-1/-ext-10000
Loading data to table default.youtube_category
Table default.youtube_category stats: [numFiles=4, numRows=6742209, totalSize=608748677, rawDataSize=10187373874]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 4   Cumulative CPU: 404.25 sec   HDFS Read: 574634998 HDFS Write: 608749047 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 44 seconds 250 msec
OK
Time taken: 511.914 seconds
hive> select * from youtube_category where categoryId="music" order by views desc limit 10;
Query ID = hadoop_20170711091919_a48aa98b-9cfa-4f58-a106-9da85484f0dd
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0117, Tracking URL = http://master:8088/proxy/application_1499153664137_0117/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0117
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:20:37,607 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:21:32,382 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 36.15 sec
MapReduce Total cumulative CPU time: 36 seconds 150 msec
Ended Job = job_1499153664137_0117
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 36.15 sec   HDFS Read: 607706126 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 150 msec
OK
Time taken: 96.836 seconds
hive> select * from youtube_category where catagoryId="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryId': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where catagoryid="Music" order by views desc limit 10;
FAILED: SemanticException [Error 10004]: Line 1:37 Invalid table alias or column reference 'catagoryid': (possible column names are: videoid, uploader, age, categoryid, length, views, rate, ratings, comments, relatedid)
hive> select * from youtube_category where categoryid="Music" order by views desc limit 10;
Query ID = hadoop_20170711092525_53f32ccf-7d04-4615-b446-9009cf16dc7f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0118, Tracking URL = http://master:8088/proxy/application_1499153664137_0118/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0118
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:26:08,727 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:27:17,297 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.89 sec
MapReduce Total cumulative CPU time: 57 seconds 890 msec
Ended Job = job_1499153664137_0118
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 58.14 sec   HDFS Read: 607706243 HDFS Write: 3043 SUCCESS
Total MapReduce CPU Time Spent: 58 seconds 140 msec
OK
cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
7AVHXe-ol-s internmarket    603 Music   264 60349673    3.16    1033    594 ["trrRo3kGRv0","CCsGkN1PEec","l5mQKJDO-nY","DP_4rQcU0G8","7xz5aOvP2YA","kCjKIus3iBA","nD0MILEXZP0","IQcyLMa716k","d05KUL8aW4E","szeeHnu2DM8","7b_wObF6vgs","UazqVaOg9uc","LRBD9l3Nv6Y","108YigkYFgM","_qCau44yfX0","NkkGj4_1m9A","kuyRSrklGU8","uLr7IJYyNnM","IEuyk_277xY","XmAaYo-CQNw"]
ePyRrb2-fzs 1988basti   826 Music   204 45984219    4.87    51039   27065   ["fm0T7_SGee4","h8oBykb_Pqs","iWg3IMN_rhU","MdOAr_4FJvc","xNp7OxgFJJM","nzaw_Gk9BAU","u9rl4apDFfY","fMzMm4sJYGo","b5eEa5a2Rio","aNR0tW_afdk","Kj_MceyjS6g","GojTUmjxVHU","CGPUuPHdHQg","SIM4DCn7AlE","ktUSIJEiOug","Dn6kGzRSjhU","gxxU68z8quM","HRSrFm_8YK8","2__Qdd11rfA","XewBty95JVg"]
xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
b3u65f4CRLk srcrecords  729 Music   256 43511791    4.76    55148   27818   ["4fZF9UsS8LY","5f7kfhHHH_A","H1vaszd6NnA","6Di-dAIR9RA","HiBcQeIax4g","JVciSFc5Wrw","Md6rURKhZmA","IG5ReXP0SSg","D9g2szHsoz0","EflOarvoeVs","7PxBGHjABnU","Gjq1g3j7WFc","mDvCcrU-Ob8","V9YE8dHMxvw","iWg3IMN_rhU","Un2dbprtZFE","WpgMuHwAdC4","RtQ5JENdx5I","ZEYgAgKVuO4","D_2jb8D8AOI"]
iWg3IMN_rhU TimbalandMusic  853 Music   216 43323757    4.8 58761   36142   ["SIM4DCn7AlE","GojTUmjxVHU","ePyRrb2-fzs","2__Qdd11rfA","P_TKpULdDvo","Dn6kGzRSjhU","1iLDIj0pDHk","xsRWpK4pf90","XewBty95JVg","y4jiyjDQGLY","h8oBykb_Pqs","xNp7OxgFJJM","b3u65f4CRLk","S-783rzHIS4","cZd1Js0QaOI","kZGEgVxyPHU","43o0vwAmFM8","BMMUWvavORI","v_-1peCW6Ok","9gkjyM7ZOUc"]
innfyQZHPpo chai0322    468 Music   210 41564032    2.61    13564   6126    ["nhSZs-aAZbo","hh0nVc0NYTE","12Z3J1uzd0Q","V1s9queYhF8","1Al4crLW9EM","61e1h4vALS0","8h98jb9Lk74","Z-HjmP7BCVc","o_pIQIV_NuU","82BDV1gYjdM","Af9aPlG2WFw","cC27PTFP4V8","-_CSo1gOd48","00c08ijIlHo","cIKxlWuTviE","dMH0bHeiRNg","iWg3IMN_rhU","v3ARyAb_1Bs","5P6UU6m3cqk","QjA5faZF1A8"]
Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
Time taken: 111.631 seconds, Fetched: 10 row(s)

4.6 筛选出每个类别中评分最高的前10个视频

select * from youtube_category where categoryId="Music" order by ratings desc limit 10;

结果如下:

hive> select * from youtube_category where categoryId="Music" order by ratings desc limit 10;
Query ID = hadoop_20170711093838_6ab5573a-964f-486e-b0dc-77e5853fcfb5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0119, Tracking URL = http://master:8088/proxy/application_1499153664137_0119/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0119
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1
2017-07-11 09:39:27,903 Stage-1 map = 0%,  reduce = 0%
...
2017-07-11 09:40:36,298 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 57.41 sec
MapReduce Total cumulative CPU time: 57 seconds 410 msec
Ended Job = job_1499153664137_0119
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 1   Cumulative CPU: 57.41 sec   HDFS Read: 607706243 HDFS Write: 2957 SUCCESS
Total MapReduce CPU Time Spent: 57 seconds 410 msec
OK
QjA5faZF1A8 guitar90    308 Music   320 40294882    4.83    329108  169563  ["r2BOApUvFpw","ATub40Npxik","m7Jh1BV1EOc","owAj5LiXG5w","Ddn4MGaS3N4","GxplDa3M5Io","by8oyJztzwo","aZpD0btOZx8","pZ9jrBg4Lwc","heISA256CRo","6wpPk8qk3uQ","i4BYMvVvMg0","2xjJXT0C0X4","p-VR-cXghko","-9ao_vOsZkg","wfqu4YEufYc","WetVXbYRfWk","JdxkVQy7QLM","fdjamy75C4A","dVUgd8ot6BE"]
cQ25-glGRzI RCARecords  742 Music   227 77674728    4.45    188154  186858  ["otMB3WVQNVg","CAoo71VEYhs","6gqSk1UBFyo","pULa7F1gWZM","C2eUCfREAbw","neZw2CH1h9s","2fjW33W7424","aVnl9QCfNyE","DKhnmUdmz74","tPW70sgrHiY","W4kR8OQCrlQ","-WtcXpnMIT8","YhRZx2QIJKg","5iaYFN5eERA","KChJyERIclc","xsRWpK4pf90","lOq-Q60zWQs","ywNZPURFJNU","dMH0bHeiRNg","tvkh29RKFRY"]
pv5zWaTEVkI OkGo    531 Music   184 32022043    4.83    121516  39974   ["dMH0bHeiRNg","x49WZRyXGe0","RbdbVhBGETQ","yRmqZRPgK1w","DjCL0_0Il7w","gq7r3F1SoX0","k66epna2Sss","1LNbzqoOPu4","vr3x_RRJdd4","ERV-wh4VwZI","xsRWpK4pf90","iAQZ_uui1SY","5P6UU6m3cqk","bav63MWNUKg"]
ktUSIJEiOug aliciakeys  953 Music   251 43583367    4.85    103074  59672   ["_VD-PDzmW4M","vmjl5ARtQWM","X_SmJpAmirw","glBHQSasVMQ","IGj3T7C-RdE","6U_4ANczMPY","oh3Pkw6cokk","j97WOHviXbE","FqsGSvrcfDE","2PWfB4lurT4","Yci8zVNMEdg","sF84pIhP5UM","dDfEjBSWj54","Zdm7FJktDOM","a4X7eFbP3u4","jhPAK8HjcPI","ePyRrb2-fzs","8_8KJfSNgC4","kN9vm95SocU","4DC4Rb9quKk"]
Lt6o8NlrbHg seankingston    867 Music   257 41171303    4.74    101352  67579   ["qwflMOAaOf0","aVB5ViG_0yE","_QmajsQI9SM","Skz6h2gb-t8","t2dkCGDgVzk","HJgsbsMSe5w","BXhN2DbI0hc","sFJQAfW_jgw","RZJ32D-lrTE","U1bf7XLGGRE","ktUSIJEiOug","TWEez0U-9ag","BhkIjh-nuRo","a4X7eFbP3u4","b3u65f4CRLk","sm2fTDpuyyM","9pzro0T8rgY","TBzfqR4mrgE","c2cyose42jU","iWg3IMN_rhU"]
xsRWpK4pf90 universalmusicgroup 902 Music   240 44614530    4.73    99373   53441   ["a4X7eFbP3u4","4WAapKx2TvM","PgZgubuT2DM","FIUv3dOBbCk","Dk8NQB_T1ws","7NAbTHGIPP8","mekQxrnhQ3I","OouU4PFUw3Y","rwgXvL9o0QQ","4eeyhtlJp5A","ktUSIJEiOug","UZStqFeYk3Y","sF84pIhP5UM","FKXm3Qg7sBo","a1ti0KccvOg","ERV-wh4VwZI","iWg3IMN_rhU","i1PKmEaUqes","gsAN_Zfc1nc","CmBCLWkrysY"]
K2cYWfq--Nw FrEckleStudios  841 Music   224 17005186    4.82    90991   50788   ["SyIC3Munnyw","cZd1Js0QaOI","lLYD_-A_X5E","alqM0IYeH54","wQVEPFzkhaM","oGECJP3phyY","nPLOiBM8hLk","VpBDqtUEWcM","MJPdVVOmbz4","bPZJYQXQsm8","nPBmXEO3yUU","3jzSh_MLNcY","_EXeBLvmllg","Cva_sGN_0VA","Sr2JneittqQ","SYpYmkcadRA","71eBjNbdXDg","DgBgnoEY4iM","bl6RJyZdBSU","FAK_jtOf70g"]
-xEzGIuY7kw alyankovic  580 Music   171 24095019    4.84    90091   45517   ["HYokLWfqbaU","N26KWq7MmSc","JCAt9WcCFbM","Rt1_6uz_sVU","Nh9mVsBKwYs","p9Zt8mn14hY","8n7ncJEFuSw","FT060JGp9sQ","E6Zc9NyYH-k","ixyTNd-Ln38","zIllRdSzSug","GsfVw9xxoNY","XkDeJgGrtdU","fqz1ojIQTBk","5GE82tqcYYQ","ODdGhOOUOpI","v3ARyAb_1Bs","XbVtbc_XzrI","U1ULxKM75rY","Jw00EUh0GT4"]
EwTZ2xpQwpA TayZonday   796 Music   292 16841569    4.23    83514   129200  ["2x2W12A8Qow","P6dUCOS1bM0","NattlyH0IeM","nTQOpibv_OA","9mSKBgvHdoE","hjD6iigdB-g","caIBKOztlAo","xUz2YMmiq0k","aWY3eYOX3U0","mD5_GUovjiM","1oFS-q8BIps","deXAEN70CDY","0pElTyjfxe0","eyDuGwlrFRs","m6SjPfc_xNA","qYGvGWY1FDs","N0amCfgnwY8","JPu4uErBFks","pgSA-ErKd8c","ZZgGGlOGyUg"]
xWHf_vYZzQ8 universalmusicgroup 771 Music   262 25607299    4.79    83419   70529   ["jvz0bvYmnto","zElEs8yw7fw","9FcBnaLjxY4","95wgKdSJGDo","ueOgZPBXY4A","k3O5uy-MBBk","O3sGTaQ9s9c","aT434G38OBg","4PdDPrwIwhI","Y-VifE8EK8w","9wpxno6qUd0","B17SYROT3GI","wJtUuxmm-B0","LBDnkJ5h1ho","CfzpBjKpSPE","QF2AmC2xyXM","eoa6Gx4HxTc","mvPvcV44rCc"]
Time taken: 109.053 seconds, Fetched: 10 row(s)

4.7 找出用户中上传视频最多的10个用户的所有视频

思路:

  • 筛选出用户表中上传视频数最多的前十位
    select * from user order by videos desc limit 10;
  • 将筛选结果跟youtube3进行join得出结果
    select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;

结果如下:

hive> select b.*,a.videos,a.friends from (select * from user order by videos desc limit 10) a join youtube3 b on a.uploader = b.uploader order by views desc limit 20;
Query ID = hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1499153664137_0125, Tracking URL = http://master:8088/proxy/application_1499153664137_0125/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0125
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-07-11 10:17:30,594 Stage-1 map = 0%,  reduce = 0%
2017-07-11 10:18:03,439 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 6.03 sec
2017-07-11 10:18:25,025 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.28 sec
MapReduce Total cumulative CPU time: 10 seconds 280 msec
Ended Job = job_1499153664137_0125
Stage-8 is filtered out by condition resolver.
Stage-9 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
17/07/11 10:18:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20170711101616_d9ebb69e-7fbe-4f09-ad55-54f4dfd11ebe.log
2017-07-11 10:18:33 Starting to launch local task to process map join;  maximum memory = 518979584
2017-07-11 10:18:44 Dump the side-table for tag: 0 with group count: 10 into file: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable
2017-07-11 10:18:44 Uploaded 1 File to: file:/tmp/hadoop/146c18a2-9a94-440d-9302-72e50ede944e/hive_2017-07-11_10-16-50_653_2968482866464413222-1/-local-10007/HashTable-Stage-6/MapJoin-mapfile90--.hashtable (611 bytes)
2017-07-11 10:18:44 End of local task; Time Taken: 11.244 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 3 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1499153664137_0126, Tracking URL = http://master:8088/proxy/application_1499153664137_0126/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0126
Hadoop job information for Stage-6: number of mappers: 4; number of reducers: 0
2017-07-11 10:19:25,285 Stage-6 map = 0%,  reduce = 0%
...
2017-07-11 10:20:26,162 Stage-6 map = 100%,  reduce = 0%, Cumulative CPU 44.09 sec
MapReduce Total cumulative CPU time: 44 seconds 90 msec
Ended Job = job_1499153664137_0126
Launching Job 4 out of 5
Number of reduce tasks determined at compile time: 1
Starting Job = job_1499153664137_0127, Tracking URL = http://master:8088/proxy/application_1499153664137_0127/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0127
Hadoop job information for Stage-3: number of mappers: 2; number of reducers: 1
2017-07-11 10:21:01,746 Stage-3 map = 0%,  reduce = 0%
2017-07-11 10:21:30,462 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.27 sec
2017-07-11 10:21:48,968 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.36 sec
MapReduce Total cumulative CPU time: 6 seconds 360 msec
Ended Job = job_1499153664137_0127
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 10.28 sec   HDFS Read: 9796823 HDFS Write: 434 SUCCESS
Stage-Stage-6: Map: 4   Cumulative CPU: 44.09 sec   HDFS Read: 574641054 HDFS Write: 2133846 SUCCESS
Stage-Stage-3: Map: 2  Reduce: 1   Cumulative CPU: 6.36 sec   HDFS Read: 2144607 HDFS Write: 6335 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 0 seconds 730 msec
OK
lUOe76YPY7M expertvillage   801 ["Howto","Style"]   119 1014983 3.36    596 689 ["5cEsa-rswrg","HOoQPbcF8Rk","ht3DMDx1spo","VFnf17cp3Eg","IP35rm_31RQ","hXuiuwZPX0M","s8pH5jgSuP8","KIEkwm_6DD4","ovFSahIPaVQ","rQtwcSDh3Hk","COeT3WR7SFc","ZNLBKRpWt7w","qncPBZ9DRRk","tOBqg3yDlyE","YgfWUXsbr7w","qBB3AsgjN1M","XX8ouWx6xm8","BtF9dYRuiGU","NRanI5qR81I","evfl0NGLjVE"]   86228   5659
2Aj2wx9PYxI expertvillage   815 ["Howto","Style"]   170 918553  3.82    1596    1533    ["hb5QaCfm7bg","YVDPSZe21KI","C1elpMjZ4wE","8gD1RG-tnNk","d-IDrRSVkCQ","IAOr6SYGAyw","3mbx03mP5eg","o-pN8qAiZhQ","K_qw03-3gFg","JgsO3A_vbtk","ffKr98Ium-M","WKYcBKa8_7Q","dgeX8CZmqvo","jgUCs72RDHs","PaNhROW-wEo","WhRCVm-1r2k","5I5O8P-r5Rk","reJSIZ3ugsE","-51iHvqP0Rs","9wItsn3r_kc"]   86228   5659
VBGHer3yFyc libertaddigitaltv   998 ["News","Politics"] 29  828616  4.68    1337    10763   ["HL9p8I3lOFA","Fb3HZ7K1gTk","AARX6XOA5h0","t3DPDKbRxio","VAmz8MNZdlk","WrC6_uBe4eA","d9X8DYOA5DE","pZuZv8UK7YA","NwswzwoA4pA","g1J2gvjp4fs","3SMhrQZdABc","h6M42Il4kN8","izM_JwEkWPs","_FM_IoPIFq4","uMZCCPtl_Do","5cZFinVFubQ","om0iHwO4d6o","lUZxlXkbaxM","iSJ3qKpC-3o","tRdq9_Si0Ws"]   6874    1
07jnqD8wvyE expertvillage   776 ["Howto","Style"]   217 574365  4.55    1073    1526    ["3wNpOp50uGI","-taU9d26wT4","_kly-fVUi1I","kZtRmnly_sU","LKe50AJvhz8","0GAUnuuBkW4","eM6Bpz6-dio","i9xf62PKC5M","mwFfk7igYC0","ilLTA4p4MMw","qyoLuTjguJA","aC-KOYQsIvU","0V6LWQZjYRk","_uZkvzYEXp0","auQbi_fkdGE","qgiUSEpg8Xc","4I79yc3uKfE","7Lz-XGjynN8","taHOwsdQXWI","2Aj2wx9PYxI"]   86228   5659
NDwZcLGekE8 expertvillage   801 ["Howto","Style"]   94  491582  4.35    553 588 ["CEmYX76UJy0","0sF-PoXR9ZU","qDfoNYpozxk","T4P3yp6mFfY","rzcj_Wq6BIE","A2LTVhqHAdo","4gyH0mJPqY8","m3PsQCMz3sY","wckYj_PAhgM","YkvwdM9Xstc","Ph2UAjGfeB4","vT_uXPfSAVE","89mAouUu8YM","0t5pPZ4AXX0","sLNFN75CYbw","QQGHXEPfMcQ","p6gcz4hkOmw","qk6R-X-YmUw","NR_9VlHFOu8","0gA_3BAxtVM"]   86228   5659
CEmYX76UJy0 expertvillage   801 ["Howto","Style"]   90  422284  4.48    448 459 ["NDwZcLGekE8","jRAKgd50Jh0","0sF-PoXR9ZU","IjjJWXgqWL4","m3PsQCMz3sY","8Di58l1dfPA","DpEVqy954OY","dmYMeImNy1s","qk6R-X-YmUw","wckYj_PAhgM","qZSD3JLPURk","YkvwdM9Xstc","fdSFh_z0HJY","n5LDyWyHJTo","Dmm5O1DPQYc","A2LTVhqHAdo","I2mM0r76b7o","iE16Z9Gh3TQ","rzcj_Wq6BIE","8szPfWiK-ag"]   86228   5659
E6VWi7IaroA expertvillage   946 ["Howto","Style"]   228 338889  4.46    300 251 ["fLy3M2jfe4w","cC1FBmbYgMk","xEJhInSxsQg","RxMLoqzKZ8s","Z9a--4RQ9O8","7neOXKLrgzk","JS-lS9ngMtY","aKIFfgPLY8I","EsqPNhtkWIo","NJUPNtZN2rc","vV7pajY0zOk","GTtZapaRDbg","-0UVDD6ZwYE","B9jNaUP59Vo","2QDGPe50E4Y","lJDZO_pX4aw","wuD2CemlvP0","ZTx2ZFI1spg","CzZghYdupiQ","5ueEKSJ8J-U"]   86228   5659
0sF-PoXR9ZU expertvillage   801 ["Sports"]  77  326365  4.21    512 663 ["24FJ58Q22D4","m3PsQCMz3sY","qk6R-X-YmUw","CEmYX76UJy0","NDwZcLGekE8","QiIunH47qew","I2mM0r76b7o","c_qSPdLPReM","5eYBaQ5Cg-c","fdSFh_z0HJY","8szPfWiK-ag","Dmm5O1DPQYc","A2LTVhqHAdo","n-Cvzk84X8o","T4P3yp6mFfY","1vo3CCBIcaY","fnGFR1AFCJc","n5LDyWyHJTo","YkvwdM9Xstc","emNfLmLTQb0"]    86228  5659
ohdiRHLSw5Y expertvillage   447 ["Howto","Style"]   78  321521  3.79    139 102 ["12_nJamoyTk","dXLhjYgMZ68","ukzFwRoFj4k","sOUgrJo2kIg","HhO39nCDfMg","3iVP0tzwhVc","MFNA0PqLynY","ZlcMzLhiBjg","7F8ajh_DDYs","6vaPIz6S6sk","-L_uhGXOtF0","-libzR5AV58","CSdSH7XKTtQ","KZX5jXPAWIk","Vkj5fbrQpNs","YCgnFIk5Acg","hjPDmVf6KJw","BZnhMl85dq4","iuqVpMdb1NM","GO2_3q6euug"]   86228   5659
0sF-PoXR9ZU expertvillage   801 ["Howto","Style"]   77  294737  4.21    438 599 ["24FJ58Q22D4","m3PsQCMz3sY","CEmYX76UJy0","NDwZcLGekE8","A2LTVhqHAdo","qk6R-X-YmUw","5eYBaQ5Cg-c","fdSFh_z0HJY","jGdXcitOUzY","n-Cvzk84X8o","wckYj_PAhgM","I2mM0r76b7o","QMZaxtjhZ5k","Dmm5O1DPQYc","vT_uXPfSAVE","8szPfWiK-ag","QiIunH47qew","YkvwdM9Xstc","DpEVqy954OY","c_qSPdLPReM"]   86228   5659
3Xc-kxSznZ0 expertvillage   430 ["Howto","Style"]   107 292329  3.33    565 964 ["9YAKfkVvvEc","wEkcPjBaHjs","zMl3ixv1kHw","Q4ZUPEgbzu0","-jIEGZwLPvo","YxdxGLBCCRA","Jp0LaJ_ftT0","PgCRWvwdFHM","yZL2MOFk6I0","qDJk3-ofbk0","e4nHAAuDiPE","cuRk9bTbU6A","z53cPMciVio","MREw0dIWaQ0","237AQAvXtw8","a3dnOupmt7Y","3-va5g_QVss","mFdaiY8YhEY","D6li4tYmKf0","bEVHBHKqBfQ"]   86228   5659
VxYkdycN3WE expertvillage   801 ["Howto","Style"]   77  257702  4.19    153 137 ["ZpCE7mnYJK0","NYDdCGtbr8E","904S9W9AZg8","Rl5KEODnCfI","7-i2gl9288I","TC1XlhRwhsU","rD_4nCKf5Ns","pFKZZ120fyU","oN4Esih54V8","N1Ov6cCUXkc","rDM97S4jPiw","IK2uLNND3i0","OZXbM6kRuuA","5pvyXkFTa18","bExUJAnCLZw","VoREkZvkyI8","BWhfpzAItx4","EEybT3IeyzA","QteH0KOJx0g","dbXHvd_SGkk"]   86228   5659
-libzR5AV58 expertvillage   447 ["Howto","Style"]   391 249682  3.34    73  62  ["-L_uhGXOtF0","4LvTYBbgMjE","-wO07skEauo","CSdSH7XKTtQ","kzQDoXKDgTM","e1kzLj-FZ6k","Guhgb3pWRAQ","KIbyySDSGEc","DGUBzs-GNxE","KZX5jXPAWIk","ohdiRHLSw5Y","6vaPIz6S6sk","5MrAuq_F2dU","gOv-yPqZ3LE","BZnhMl85dq4","oCDdDCqygOg","12_nJamoyTk","7F8ajh_DDYs","HhO39nCDfMg","X3-6IT-7S80"]   86228   5659
HL9p8I3lOFA libertaddigitaltv   998 ["News","Politics"] 17  235313  4.61    114 581 ["VBGHer3yFyc","Fb3HZ7K1gTk","Z3Pe8ff37gk","t3DPDKbRxio","pZuZv8UK7YA","_FM_IoPIFq4","1P_EQjd7lq4","yEzNx2g7ors","_neKPvLdG8Y","7i_WDprxACU","zCM0nsym0cE","om0iHwO4d6o","h6M42Il4kN8","Bjk1McjrWtg","VAmz8MNZdlk","-t2hwljZmA8","gH6PHFrA6g8","ysAwzxqbWD8","U5lX_EIFOOA","4zu_X_3qHLA"]   6874    1
oE5ZhBHy6Rs expertvillage   776 ["Howto","Style"]   143 234551  4.64    358 280 ["rCWRUtqJgzI","476vNb6thyM","eg3eo8x_9rw","grK43Poye1U","HGncDLMNPRI","iylaKSDfLSc","JFqiDcvRW2Y","XrbXGRLIDTc","NmxoR9lsu-c","JW8FJefw82k","n0Mi550FDng","QXsW44Wh6tY","6oJDYT5MlEA","IJy05ssELhg","TvT-M7tTQHE","GYNVEWCO_X0","5pJ_o19GVRA","q_Nn_CnvD9s","uqpQNkFxVAU","KKP8vWeEc30"]   86228   5659
wOnP_oAXUMA expertvillage   429 ["Howto","Style"]   120 228842  3.56    236 601 ["BlDWdfTAx8o","WF1kRLJ_4B8","6bu9csQC45c","xTAYAl4g7HE","871VTEF7qIY","UheCchftswc","TKg_cdwq9l4","ADQ4Bjq42wE","cSJCDcAKShA","SYklbxHP2tI","fktuPPNkQpQ","L6267-JZu3E","_IwJ3Aj2XrQ","hf-4ppNmH-U","aO8evzfTR8E","E-kNUEv0YgA","Hsr1-xcXFL8","VCiLZMjo_PQ","rMEgKgZOJi8","YeYU_OE5oIU"]   86228   5659
MVPCc4eODys expertvillage   448 ["Howto","Style"]   69  225879  4.29    171 315 ["iPPgmvcsJNw","xc8sysT31mQ","svt_fnKE_80","kBjUDCyDCuI","ysa50-plo48","yutDlQL9Ct0","gCra4qOrjFw","CN1QJDGinwE","dFJjaj7pXsA","LvJzFdcYSag","4I79yc3uKfE","AcryZ1o4RQE","WyyB0M6wAV8","A0sOVKbYR20","8XCyd0XEejc","5BmVKKpu_CU","8_QNzZ9WPRo","pNiX_l-HEGM","CQJSZs-euZU","ZwHHYzK8UCg"]   86228   5659
7neOXKLrgzk expertvillage   946 ["Howto","Style"]   111 215943  4.07    215 314 ["xEJhInSxsQg","PSR5kinMX3Y","E6VWi7IaroA","5ovGW0CzQp4","DeMcNhz0Vz8","CzZghYdupiQ","72EYsQEw35k","tScm-eZInBE","Hz4lnt7d88s","dgLUbXSqwSs","G3P4VFrB_zM","pU0O2-o67C0","pDGP8Q3Zzb8","hQhdK8l6CuY","VK9VflDsunI","LWgzG8I9bWY","csL7WNTWK9U","LE93Q5k5-fI","Uirspi_t3xM","wGLNJK4zcdE"]   86228   5659
AaO1aLwk53Y expertvillage   443 ["Howto","Style"]   156 204450  4.14    210 290 ["Q8YYZGvKM-8","Bn59zha-uAQ","BjnkKuI-YAY","aQeXHXkL6ow","Qtp2ibwd2Ms","nstbrkjk3OM","Odj4ATUPh9w","v2mm2-9JQ-I","328YVJVtMKU","B-jzkyKxDLo","yB-yGNxNEZg","b0msCCnzmNA","sfh5AJhRto8","SEILiSkGzrY","jm4uxgVDU6o","svOBjuvvy-k","y_1nj8fBQOE","zb8ArkvxOxo","VgWayeJdV0w","Yor1dHH6orE"]   86228   5659
m3PsQCMz3sY expertvillage   801 ["Howto","Style"]   99  203912  4.38    177 207 ["CEmYX76UJy0","0sF-PoXR9ZU","NDwZcLGekE8","qk6R-X-YmUw","DpEVqy954OY","Ph2UAjGfeB4","fdSFh_z0HJY","YkvwdM9Xstc","8szPfWiK-ag","Dmm5O1DPQYc","T4P3yp6mFfY","I2mM0r76b7o","7GtVwKZBf9U","mugebyUqK6o","qDfoNYpozxk","XA7dRqkp8YA","a28QcMXjHyI","n5LDyWyHJTo","ovoNU_CbmfA","Q2j6MlACIoc"]   86228   5659
Time taken: 300.414 seconds, Fetched: 20 row(s)

4.8 筛选出每个类别中观看数Top10

** select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;**

hive> select a.* from (select videoId,categoryId,views,row_number() over(partition by categoryId order by views desc) rank from youtube_category) a where rank<=10;
Query ID = hadoop_20170713170101_76ffdd80-e72c-4c7f-b9ad-9b8c3f7e17ab
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 3
Starting Job = job_1499153664137_0143, Tracking URL = http://master:8088/proxy/application_1499153664137_0143/
Kill Command = /home/hadoop/app/hadoop/bin/hadoop job  -kill job_1499153664137_0143
Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 3
2017-07-13 17:01:52,690 Stage-1 map = 0%,  reduce = 0%
2017-07-13 17:02:24,374 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 10.49 sec
...
2017-07-13 17:03:08,803 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 89.07 sec
MapReduce Total cumulative CPU time: 1 minutes 29 seconds 70 msec
Ended Job = job_1499153664137_0143
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 3   Cumulative CPU: 89.07 sec   HDFS Read: 73070026 HDFS Write: 7487 SUCCESS
Total MapReduce CPU Time Spent: 1 minutes 29 seconds 70 msec
OK
qruSOZq-wJg Activism    2673823 1
tFxk7glmMbo Activism    1063928 2
dYN278GB2Kg Activism    831836  3
n-Akqik3hME Activism    778987  4
LtqRAYWjz2Q Activism    768422  5
2v-whrgAbf4 Activism    742172  6
ja4d02s0WKQ Activism    733014  7
IpAOYskH1s8 Activism    714580  8
ieuVTOJyLBs Activism    640994  9
6CXBgbP_ZKs Activism    562163  10
12Z3J1uzd0Q Animation   65341925    1
bFytHZHFXhA Animation   38937813    2
sdUUx5FdySs Animation   16151661    3
vQbYvjmfbr4 Animation   12114231    4
rNKefRRvV7g Animation   11884311    5
6HIavxnUHls Animation   11735507    6
gm5DHKI8o5o Animation   9381674 7
316BF17k5d8 Animation   9372064 8
ZuQMn6Z0k_w Animation   9195588 9
6B26asyGKDo Animation   8915176 10
dMH0bHeiRNg Comedy  79897120    1
5P6UU6m3cqk Comedy  42525795    2
Tx1XIm6q4r4 Comedy  38378910    3
Q5im0Ssyyus Comedy  21170471    4
k66epna2Sss Comedy  17944367    5
AYxu_MQSTTY Comedy  15665340    6
pYak2F1hUYA Comedy  15634357    7
_OBlgSz8sSM Comedy  14567743    8
wCF3ywukQYA Comedy  14227802    9
sHzdsFiBbFc Comedy  13606292    10
Mz9BlgiV45I Education   2807232 1
VnGb6UQvwJs Education   2441432 2
j6lADdhmAec Education   2306689 3
pYfTQpsErsM Education   2066429 4
7GNE6AhL0qI Education   2037257 5
NiGZf15W9xc Education   1867126 6
-FeAK-q5Cok Education   1741504 7
N5P6o2YaqVI Education   1656512 8
Dl-agXA63nw Education   1529566 9
JX3VmDgiFnY Education   1373008 10
cQ25-glGRzI Music   77674728    1
7AVHXe-ol-s Music   60349673    2
ePyRrb2-fzs Music   45984219    3
xsRWpK4pf90 Music   44614530    4
ktUSIJEiOug Music   43583367    5
b3u65f4CRLk Music   43511791    6
iWg3IMN_rhU Music   43323757    7
innfyQZHPpo Music   41564032    8
Lt6o8NlrbHg Music   41171303    9
QjA5faZF1A8 Music   40294882    10
qruSOZq-wJg Nonprofits  2673823 1
tFxk7glmMbo Nonprofits  1063928 2
dYN278GB2Kg Nonprofits  831836  3
n-Akqik3hME Nonprofits  778987  4
LtqRAYWjz2Q Nonprofits  768422  5
2v-whrgAbf4 Nonprofits  742172  6
ja4d02s0WKQ Nonprofits  733014  7
IpAOYskH1s8 Nonprofits  714580  8
ieuVTOJyLBs Nonprofits  640994  9
6CXBgbP_ZKs Nonprofits  562163  10
SkELRp4wKPs Politics    12730681    1
AFFQrUyi8-s Politics    11953598    2
YgW7or1TuFk Politics    8516433 3
JgiGrXpOhYg Politics    8211043 4
up5jmbSjWkw Politics    7869023 5
hr23tpWX8lM Politics    7209885 6
Kje7NUNebL8 Politics    6707868 7
I4u3449L5VI Politics    6675798 8
jjXyqcx-mYY Politics    6462051 9
a9Vde3FHMmc Politics    6028642 10
8h98jb9Lk74 UNA 33880568    1
AR5yq0aMxI8 UNA 18409445    2
DH7FVB15EPU UNA 15487752    3
Qvfx6UiAAG0 UNA 13977592    4
n-FdoZdfpmE UNA 12241555    5
MddPeH1DAvY UNA 12211192    6
sI8Sus_KRpY UNA 10460047    7
14oqm9ywLIw UNA 10344366    8
LytRWuhn4BM UNA 9899270 9
iSByZARPJSU UNA 9316897 10
Oro28yg7W74 Gaming  727015  1
0to1HDxtkM4 Gaming  529346  2
M-w-gLVfi30 Gaming  396471  3
i6aH2F1WjsY Gaming  382350  4
EgbUSsblCSQ Gaming  266718  5
7SMnWr9MqwQ Gaming  210147  6
NQMBIRipp5A Gaming  207783  7
vi1lVqJSbsM Gaming  186709  8
aId2hK4pI2M Gaming  186054  9
tWPWcyLdrko Gaming  181979  10
sLGLum5SyKQ Howto   31121122    1
eMhGpzyFdhE Howto   16320501    2
KPOOWvP_dd8 Howto   11131527    3
91wuBqlny50 Howto   8579613 4
mr5ghuaTK14 Howto   8361812 5
GfPJeDssBOM Howto   6101232 6
6gmP4nk0EOE Howto   4970382 7
mM-30cmM33s Howto   4936417 8
STQ3nhXuuEM Howto   4655542 9
XZGgeGHU1Bs Howto   4424698 10
LU8DDYz68kM Pets    27721690    1
epUk3T2Kfno Pets    10352882    2
z3U0udLH974 Pets    9461084 3
kkT7A3jegBc Pets    9269896 4
TZ860P4iTaM Pets    9009434 5
7tRWRSfcDuQ Pets    8538635 6
Qit3ALTelOo Pets    7939352 7
Zi9GOvR3Ynw Pets    7351184 8
Kxa0mnDj0bs Pets    7289545 9
PadauuWF94w Pets    6271287 10
W1czBcnX1Ww Science 3234852 1
D99NHb6B03s Science 3176792 2
tk_F2Y-F2kE Science 3121903 3
nhyH7lQ6D2k Science 2879861 4
JCbKv9yiLiQ Science 2672391 5
U5vs2ly_grk Science 2611389 6
M0ODskdEPnQ Science 2555284 7
p4ebtj1jR7c Science 2536109 8
8wTlureUMP8 Science 2477804 9
NZNTgglPbUA Science 2230729 10
vt4X7zFfv4k Sports  12598542    1
P-bWsOK-h98 Sports  12101588    2
OS5tQvQOB-Y Sports  9047732 3
P9LmHXXWiJs Sports  7415813 4
NIKdK-T-jZM Sports  7175403 5
euMu1SKi-ak Sports  7040023 6
zKQgTiqhPbw Sports  7017699 7
yeXoxNP8_xY Sports  6422039 8
q8t7iSGAKik Sports  6312667 9
COcczatkNP4 Sports  6258611 10
sLGLum5SyKQ Style   31121122    1
eMhGpzyFdhE Style   16320501    2
KPOOWvP_dd8 Style   11131527    3
91wuBqlny50 Style   8579613 4
mr5ghuaTK14 Style   8361812 5
GfPJeDssBOM Style   6101232 6
6gmP4nk0EOE Style   4970382 7
mM-30cmM33s Style   4936417 8
STQ3nhXuuEM Style   4655542 9
XZGgeGHU1Bs Style   4424698 10
ZeBd_F2Bz5Y Vehicles    8623041 1
ju6t-yyoU8s Vehicles    7325889 2
uUurALr_Ckk Vehicles    7258142 3
WShY1ObPvhQ Vehicles    7047156 4
q3idQKi5EqM Vehicles    6470816 5
9JWywFpZkg4 Vehicles    6465830 6
tth9krDtxII Vehicles    6394563 7
npTRXr4Sgxg Vehicles    5450317 8
aCamHfJwSGU Vehicles    5354163 9
S-ppGiwc0wQ Vehicles    5039415 10
LU8DDYz68kM Animals 27721690    1
epUk3T2Kfno Animals 10352882    2
z3U0udLH974 Animals 9461084 3
kkT7A3jegBc Animals 9269896 4
TZ860P4iTaM Animals 9009434 5
7tRWRSfcDuQ Animals 8538635 6
Qit3ALTelOo Animals 7939352 7
Zi9GOvR3Ynw Animals 7351184 8
Kxa0mnDj0bs Animals 7289545 9
PadauuWF94w Animals 6271287 10
ZeBd_F2Bz5Y Autos   8623041 1
ju6t-yyoU8s Autos   7325889 2
uUurALr_Ckk Autos   7258142 3
WShY1ObPvhQ Autos   7047156 4
q3idQKi5EqM Autos   6470816 5
9JWywFpZkg4 Autos   6465830 6
tth9krDtxII Autos   6394563 7
npTRXr4Sgxg Autos   5450317 8
aCamHfJwSGU Autos   5354163 9
S-ppGiwc0wQ Autos   5039415 10
v3ARyAb_1Bs Blogs   31812447    1
5GE82tqcYYQ Blogs   30209692    2
ervaMPt4Ha0 Blogs   23859297    3
uWow42TCwzg Blogs   22389389    4
-_CSo1gOd48 Blogs   21176701    5
D2kJZOfq7zk Blogs   20458159    6
nhSZs-aAZbo Blogs   20423158    7
GuMMfgWhm3g Blogs   18634621    8
4jbkRGPxvaM Blogs   15980887    9
hMnk7lh9M3o Blogs   13553069    10
LpAI8TzQDes Entertainment   65078772    1
244qR7SvvX0 Entertainment   57790943    2
1uwOL4rB-go Entertainment   39883413    3
w2xUzv6iZWo Entertainment   25222946    4
vr3x_RRJdd4 Entertainment   25093671    5
lj3iNxZ8Dww Entertainment   23737579    6
RB-wUgnyGv0 Entertainment   23067889    7
lsO6D1rwrKc Entertainment   21758300    8
7iYWxfNSjYk Entertainment   19029677    9
5pGJCkCDK5A Entertainment   18858695    10
p0aQvKDA1K0 Events  12239023    1
bNF_P281Uu4 Events  9125026 2
AlPqL7IUT6M Events  6441558 3
J833f9fqWBA Events  4776832 4
xIvIWJbzimo Events  4284770 5
QuTj9a04o-s Events  4053317 6
eejQPUyeNiY Events  3573812 7
3QL97xldoXc Events  3010296 8
r43yCiKlbCo Events  2806995 9
z42fchrzhHY Events  2565257 10
12Z3J1uzd0Q Film    65341925    1
bFytHZHFXhA Film    38937813    2
sdUUx5FdySs Film    16151661    3
vQbYvjmfbr4 Film    12114231    4
rNKefRRvV7g Film    11884311    5
6HIavxnUHls Film    11735507    6
gm5DHKI8o5o Film    9381674 7
316BF17k5d8 Film    9372064 8
ZuQMn6Z0k_w Film    9195588 9
6B26asyGKDo Film    8915176 10
SkELRp4wKPs News    12730681    1
AFFQrUyi8-s News    11953598    2
YgW7or1TuFk News    8516433 3
JgiGrXpOhYg News    8211043 4
up5jmbSjWkw News    7869023 5
hr23tpWX8lM News    7209885 6
Kje7NUNebL8 News    6707868 7
I4u3449L5VI News    6675798 8
jjXyqcx-mYY News    6462051 9
a9Vde3FHMmc News    6028642 10
v3ARyAb_1Bs People  31812447    1
5GE82tqcYYQ People  30209692    2
ervaMPt4Ha0 People  23859297    3
uWow42TCwzg People  22389389    4
-_CSo1gOd48 People  21176701    5
D2kJZOfq7zk People  20458159    6
nhSZs-aAZbo People  20423158    7
GuMMfgWhm3g People  18634621    8
4jbkRGPxvaM People  15980887    9
hMnk7lh9M3o People  13553069    10
W1czBcnX1Ww Technology  3234852 1
D99NHb6B03s Technology  3176792 2
tk_F2Y-F2kE Technology  3121903 3
nhyH7lQ6D2k Technology  2879861 4
JCbKv9yiLiQ Technology  2672391 5
U5vs2ly_grk Technology  2611389 6
M0ODskdEPnQ Technology  2555284 7
p4ebtj1jR7c Technology  2536109 8
8wTlureUMP8 Technology  2477804 9
NZNTgglPbUA Technology  2230729 10
p0aQvKDA1K0 Travel  12239023    1
bNF_P281Uu4 Travel  9125026 2
AlPqL7IUT6M Travel  6441558 3
J833f9fqWBA Travel  4776832 4
xIvIWJbzimo Travel  4284770 5
QuTj9a04o-s Travel  4053317 6
eejQPUyeNiY Travel  3573812 7
3QL97xldoXc Travel  3010296 8
r43yCiKlbCo Travel  2806995 9
z42fchrzhHY Travel  2565257 10
Time taken: 121.381 seconds, Fetched: 250 row(s)

5 总结

上面的8个例子向大家展示Hive中简单和稍微复杂的操作,有的启动一个Job就可以完成,有的则需要启动多个Job才能完成,我们也可以看到,启动的Job越多运行时间就会越长,但是实际工作中的操作只会远比我们所演示要更加复杂,越是复杂的操作就更加需要去优化,来达到减少运行时间的目的,所以下一篇我们来看看Hive的优化实践

你可能感兴趣的:(Hive实战之Youtube数据集)