spark常用RDD算子 - reduceByKey

def reduceByKey(func: (V, V) => V): RDD[(K, V)]

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

该函数用于将RDD[K,V]中每个K对应的V值根据映射函数来运算。

参数numPartitions用于指定分区数;

参数partitioner用于指定分区函数;

reduceByKey 算子示例

List> list = Arrays.asList(
                new Tuple2("w1",1),
                new Tuple2("w2",2),
                new Tuple2("w3",3),
                new Tuple2("w2",22),
                new Tuple2("w1",11)
        );

JavaPairRDD pairRdd = javaSparkContext.parallelizePairs(list);

JavaPairRDD result = pairRdd.reduceByKey(new Function2() {
            @Override
            public Integer call(Integer integer, Integer integer2) throws Exception {
                return integer+integer2;
            }
        },2);

System.out.println(result.collect());
//返回的结果 [(w3,3), (w1,12), (w2,24)]

 

你可能感兴趣的:(#,spark,spark)