hadoop集群与伪分布式的性能对比

hadoop集群与伪分布式的性能对比

镜像centos6.5,hadoop-2.6.0,mysql-5.4.0,hive-1.2.1

集群配置:master 4g1核,2个处理器,子节点(2个):1g1核,1个处理器

伪分布式:4g1核,2个处理器

使用hql测试:

数据:

hive> select *from course;
01 语文 02
02 数学 01
03 英语 03

hive> select *from teacher;
01 张三
02 李四
03 王五

hive> select *from student;
01 赵雷 1990-01-01 男
02 钱电 1990-12-21 男
03 孙风 1990-05-20 男
04 李云 1990-08-06 男
05 周梅 1991-12-01 女
06 吴兰 1992-03-01 女
07 郑竹 1989-07-01 女
08 王菊 1990-01-20 女

hive> select *from score;
01 01 80
01 02 90
01 03 99
02 01 70
02 02 60
02 03 80
03 01 80
03 02 80
03 03 80
04 01 50
04 02 30
04 03 20
05 01 76
05 02 87
06 01 31
06 03 34
07 02 89
07 03 98

执行测试:

select distinct s1.s_id from score s1 where s1.c_id=‘01’ and s1.s_id not in(select distinct s2.s_id from score s2 where s2.c_id=‘02’)

集群 :Time taken::127

伪分布:Time taken: 117.481 seconds, Fetched: 1 row(s)

select a.s_id from (select s_id from score where c_id =1 ) a left join (select s_id from score where c_id =2 ) b on a.s_id = b.s_id where b.s_id is null;

集群: Time taken::18

伪分布式:Time taken: 15.239 seconds, Fetched: 1 row(s)

select student.* from student join (select count(c_id)num1 from course)tmp1
left join (select s_id,count(c_id)num2 from score group by s_id) tmp2
on student.s_id=tmp2.s_id and tmp1.num1=tmp2.num2
where tmp2.s_id is null;
集群:Time taken: 67.308

伪分布式:Time taken: 62.53 seconds, Fetched: 4 row(s)

select student.*,a.s_score as 01_score,b.s_score as 02_score from student
join score a on a.c_id=‘01’
join score b on b.c_id=‘02’
where a.s_id=student.s_id and b.s_id=student.s_id and a.s_score>b.s_score;
集群:Time taken: 16.281

伪分布式:Time taken: 14.016 seconds, Fetched: 2 row(s)

你可能感兴趣的:(运维)