700多万hadoophive和spark性能测试

700多万的数据,每个都执行两遍

------------------rdd---------------

val rdd = sc.textFile("hdfs://master:9000/spark/SogouQ/")
rdd.cache()
rdd.count()


6/09/09 19:19:11 INFO scheduler.DAGScheduler: Job 1 finished: count at :24, took 15.594766 s
res1: Long = 7265051
--------------spark sql---------------------------------------
select count(*) from hive_test


7265051
Time taken: 15.448 seconds, Fetched 1 row(s)


---------------hive---------------------------------------------------
select count(*) from hive_test


第一次

OK
7265051

Time taken: 168.611 seconds, Fetched: 1 row(s)

第二次

OK
7265051
Time taken: 96.413 seconds, Fetched: 1 row(s)

---------------------------------------------------------------------


总结:spark速度最好,不是一般的快

你可能感兴趣的:(hadoop2.x)