spark性能调优一--常规调优

    • 一分配更多的资源
    • 二设置spark application的并行度
    • 三RDD架构重构和优化
    • 四广播大变量
    • 五在项目中使用Kryo序列化
    • 六在项目中使用fastutil框架
    • 七调节本地化等待时长

一,分配更多的资源

bin/spark-submit \
 --class cn.spark.sparktest.core.WordCountCluster \
 --driver-memory 100m \配置driver的内存(影响不大)
 --num-executors 3 \ 配置executor的数量
 --executor-memory 100m \ 配置每个executor的内存大小
 --executor-cores 3 \ 配置每个executor的cpu core数量 
 /usr/local/SparkTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar 

二,设置spark application的并行度

SparkConf conf=new SparkConf().set("spark.default.paralelism","500")


三,RDD架构重构和优化

四,广播大变量

final Broadcast<Map<String,Map<String,List<Integer>>>> dateHourExtractMapBroadcast=sc.broadcast(dateHourExtractMap);

Map<String, Map<String, List<Integer>>> dateHourExtractMap =dateHourExtractMapBroadcast.value();

五,在项目中使用Kryo序列化

set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

六,在项目中使用fastutil框架

import it.unimi.dsi.fastutil.ints.IntArrayList;
import it.unimi.dsi.fastutil.ints.IntList;

        Map<String,Map<String,IntList>> fastutilDateHourExtractMap=new HashMap<String, Map<String, IntList>>();
        for(Map.Entry<String, Map<String,List<Integer>>> dateHourExtractEntry:dateHourExtractMap.entrySet()){
            String date=dateHourExtractEntry.getKey();
            Map<String,List<Integer>> hourExtractMap=dateHourExtractEntry.getValue();
            Map<String, IntList> fastutilHourExtractMap = new HashMap<String, IntList>();
            for(Map.Entry<String, List<Integer>> hourExtractEntry : hourExtractMap.entrySet()){
                String hour = hourExtractEntry.getKey();
                List<Integer> extractList = hourExtractEntry.getValue();

                IntList fastutilExtractList = new IntArrayList();
                for(int i = 0; i < extractList.size(); i++) {
                    fastutilExtractList.add(extractList.get(i));  
                }
                fastutilHourExtractMap.put(hour, fastutilExtractList);
            }
            fastutilDateHourExtractMap.put(date, fastutilHourExtractMap);
        }

七,调节本地化等待时长

SparkConf conf = new SparkConf()
                        .setAppName(Constants.SPARK_APP_NAME_SESSION)
                        .setMaster("local")
                        .set("spark.default.paralelism", "500")
                        .set("spark.locality.wait","10")
                        .set("spark.serializer","org.apache.spark.serializer.KryoSerializer")

你可能感兴趣的:(spark)