Spark sql test

table scan

  • SQL
select count(1) from dmp.trait_zamplus_supply_v2;

Table message

type value
input files 600
input size 296.7 G
average file size 500M
rows num 1795165725

* Test result :

dimentions MapReduce Spark Test1 Spark Test2
use cores about 400 400 400
Time Spent
(seconds)
181.089 313.455 71.575
  • MapReduce
Map Data type map task nums
Data-local map 704
Rack-local map 419
Other local map 64
ALL map 1187

Average Map Time 25sec
Average Shuffle Time 56sec

  • Spark Test1
Data type task nums
NODE_LOCAL 2374
RACK_LOCAL 26
ALL TASKS 2400

Total Time Across All Tasks: 5.9 h
Input Size / Records: 296.7 GB / 1795165725
Shuffle Write: 72.7 KB / 2400

  • Spark Test2
Data type task nums
NODE_LOCAL 2381
RACK_LOCAL 19
ALL TASKS 2400

Total Time Across All Tasks: 4.2 h
Input Size / Records: 296.7 GB / 1795165725
Shuffle Write: 72.7 KB / 2400

  • Note
    Our hadoop block size is 64M.In hive I set mapreduce.input.fileinputformat.split.maxsize to 256000000. Spark test1 set mapred.max.split.size=64M and Spark test2 set mapred.max.split.size=256000000.

你可能感兴趣的:(spark)