组件版本
hadoop集群
- hive 2.1.1
- hive on spark 1.6.3
- tez 0.8.5
数据准备
创建report.data_security_lab,表结构与阳泉集群相同
CREATE EXTERNAL TABLE `report.data_security_lab`(
xxx
PARTITIONED BY (
`stat_date` string,
`log_id` string)
stored as ORC;
创建分区
alter table report.data_security_lab add partition (stat_date=20170614,log_id=xxxxxxx)
性能测试 hive cli
test 1 统计pv:
select count(1) from report.data_security_lab where stat_date=20170614;
mr |
tez |
yq01-mr |
90.228 seconds |
69.559 seconds(container.reuse:30-40 seconds) |
127.341 seconds |
test 2 每个log_id的uv
select log_id,count(1) from( select cuid,log_id from report.data_security_lab where stat_date=20170614 group by 1,2)tmp group by 1;
mr |
tez |
yq01-mr |
368.222 seconds |
324.259 seconds(container.reuse:300 seconds) |
229.018 seconds |
test 3 每个log_id的pv,uv
select log_id,count(1) ,sum(pv) from( select cuid,log_id,count(1) pv from report.data_security_lab where stat_date=20170614 group by 1,2)tmp group by 1;
mr |
tez |
yq01-mr |
392.168 seconds |
352.286 seconds(container.reuse:330 seconds) |
218.734 seconds |
yq集群速度快hive2.1.1与hive1.2.1执行计划不同,hive1.2.1 2个stage,hive2.1.1 1个stage
test 4: uv
select count(1) from (select cuid from report.data_security_lab where stat_date=20170614 group by 1) tmp;
mr |
tez |
yq01-mr |
146.33 seconds |
129.805 seconds |
193.618 seconds |
test 5 join
select count(1) from
(select cuid,stat_date from report.data_security_lab where stat_date=20170614 and log_id=1003003 ) a
join
(select cuid from report.data_security_lab where stat_date=20170614 and log_id=1011105 ) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where stat_date=20170614 and log_id=1007102 ) c
on a.cuid= c.cuid
mr |
tez |
yq01-mr |
360.74 seconds |
318.365 seconds(container.reuse:290 seconds) |
475.085 seconds |
性能测试 hiveserver2(性能与hive cli相同):
join
select count(1) from
(select cuid,stat_date from report.data_security_lab where stat_date=20170614 and log_id=1003003 ) a
join
(select cuid from report.data_security_lab where stat_date=20170614 and log_id=1011105 ) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where stat_date=20170614 and log_id=1007102 ) c
on a.cuid= c.cuid
mr |
tez |
yq01-mr |
5 mins 54 secs |
5 mins 12 secs (contaner 不可重用) |
|
性能测试 小数据量(10G)测试
数据(7.6 G):hdfs://szth-ns1/user/hive/warehouse/report.db/data_security_lab/stat_date=20170614/log_id=1003123
test 1 pv
select count(1) from report.data_security_lab where stat_date=20170614 and log_id=1003123
mr |
tez |
spark |
yq01-mr |
24.151 seconds |
18.18 seconds (contaner reuse 6.131 seconds) |
16.125 seconds |
35.223 seconds |
test 2 uv
select count(1) from(select cuid from report.data_security_lab where stat_date=20170614 and log_id=1003123 group by 1)tmp
mr |
tez |
spark |
yq01-mr |
62.181 seconds |
26.182 seconds (contaner reuse 16.198 seconds seconds) |
296.828 seconds |
96.494 seconds |
test 3 join 多 stage
select count(1) from
(select cuid,stat_date from report.data_security_lab where
stat_date=20170614 and log_id=1003003 group by 1,2) a
join
(select cuid from report.data_security_lab where stat_date=20170614
and log_id=1011105 group by 1) b
on a.cuid = b.cuid
join
(select cuid,stat_date from report.data_security_lab where
stat_date=20170614 and log_id=1007102 group by 1,2) c
on a.cuid= c.cuid
|
mr |
tez |
spark |
yq01-mr |
duration |
142.052 seconds |
98.273 seconds (with some task failed) |
296.828 seconds |
837.784 seconds |
result |
13501072 |
13501072 |
46225003 |
13501072 |
spark计算结果有误