作为JDK8新特性之一,stream引入了许多新的方法,reduce就是其中一种。
Reduction 操作
首先来看什么是reduce。在官方文档中有详细的说明【1】,说的很详细,这里全文引用。
A reduction operation (also called a fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation, such as finding the sum or maximum of a set of numbers, or accumulating elements into a list. The streams classes have multiple forms of general reduction operations, called
reduce()
andcollect()
, as well as multiple specialized reduction forms such assum()
,max()
, orcount()
.
reduction 操作(也称为fold),通过反复的对一个输入序列的元素进行某种组合操作(如对数的集合求和、求最大值,或者将所有元素放入一个列表),最终将其组合为一个单一的概要信息。stream类包含多种形式的通用reduction操作,如reduce和collect,以及其他多种专用reduction形式:sum,max或者count。
Of course, such operations can be readily implemented as simple sequential loops, as in:
int sum = 0;
for (int x : numbers) {
sum += x;
}
当然,这种操作可以很容易的以简单序列循环的方式实现:
int sum = 0;
for (int x : numbers) {
sum += x;
}
However, there are good reasons to prefer a reduce operation over a mutative accumulation such as the above. Not only is a reduction "more abstract" -- it operates on the stream as a whole rather than individual elements -- but a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless.
然而,reduce 操作有其优势。不仅仅是因为redution“更加抽象”——它将stream作为一个整体进行操作而不是作为多个独立的个体,而是因为合理构造的reduction操作天生就是支持并行执行的,只要处理元素的函数满足结合律并且是无状态的的。
For example, given a stream of numbers for which we want to find the sum, we can write:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
or:
int sum = numbers.stream().reduce(0, Integer::sum);
These reduction operations can run safely in parallel with almost no modification:
int sum = numbers.parallelStream().reduce(0, Integer::sum);
例如,给定一个数的stream,可以对其进行求和:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
或者:
int sum = numbers.stream().reduce(0, Integer::sum);
这些reduction操作可以安全的并发执行而几乎不需要任何修改。
int sum = numbers.parallelStream().reduce(0, Integer::sum);
Reduction parallellizes well because the implementation can operate on subsets of the data in parallel, and then combine the intermediate results to get the final correct answer. (Even if the language had a "parallel for-each" construct, the mutative accumulation approach would still required the developer to provide thread-safe updates to the shared accumulating variable
sum
, and the required synchronization would then likely eliminate any performance gain from parallelism.) Usingreduce()
instead removes all of the burden of parallelizing the reduction operation, and the library can provide an efficient parallel implementation with no additional synchronization required.
Reduction之所以能很好的支持并行,是因为其实现可以并行的操作数据的子集,然后再将中间结果组合起来以得到最终的正确结果。(就算语言内包含了“parallel for-each”的结构,“mutative accumulation”方案下(前面看到的for loop),仍然需要开发者为共享的累计变量sum提供线程安全的保证,而其所需的同步很可能会抵消并行带来的好处)。使用reduce()则可以去除reduction操作在并发上的负担,库可以提供有效的并行实现而不需要任何同步。
The "widgets" examples shown earlier shows how reduction combines with other operations to replace for loops with bulk operations. If
widgets
is a collection ofWidget
objects, which have agetWeight
method, we can find the heaviest widget with:
OptionalInt heaviest = widgets.parallelStream()
.mapToInt(Widget::getWeight)
.max();
前面的“widgets”例子展示了reduction是如何和其他操作合作,从而将for循环替换为块操作。如果widgets是一个widget的集合,而widget有一个getWidget方法,以下代码可以帮我们找到最重的widget:
OptionalInt heaviest = widgets.parallelStream()
.mapToInt(Widget::getWeight)
.max();
In its more general form, a
reduce
operation on elements of typeyielding a result of type
requires three parameters:
U reduce(U identity,
BiFunction accumulator,
BinaryOperator combiner);
更为一般的形式,一个运行于
U reduce(U identity,
BiFunction accumulator,
BinaryOperator combiner);
Here, the identity element is both an initial seed value for the reduction and a default result if there are no input elements. The accumulator function takes a partial result and the next element, and produces a new partial result. The combiner function combines two partial results to produce a new partial result. (The combiner is necessary in parallel reductions, where the input is partitioned, a partial accumulation computed for each partition, and then the partial results are combined to produce a final result.)
这里,identity不仅是reduction的初始值,而且在没有输入元素的情况下,它还是默认的结果。accumulator函数接受一个局部结果和下一个元素,产生一个新的局部结果。combiner函数将两个局部结果组合起来生成一个新的局部结果。(combiner在并行reduction中是必须的,在并行reduction中,输入是partitioned的,对于每个partition,由局部累计函数计算其结果,然后这些局部结果会被组合以生成最终结果。)
More formally, the
identity
value must be an identity for the combiner function. This means that for allu
,combiner.apply(identity, u)
is equal tou
. Additionally, thecombiner
function must be associative and must be compatible with theaccumulator
function: for allu
andt
,combiner.apply(u, accumulator.apply(identity, t))
must beequals()
toaccumulator.apply(u, t)
.
更为正式说法,identity 的值对于combiner来说,必须是identify,意味着对于所有的u,combiner.apply(identity, u) 必须等于 u。此外,combiner函数必须满足结合律,同时必须和accumulator函数兼容,即对于所有的u和t,combiner.apply(u, accumulator.apply(identity, t)) 和accumulator.apply(u, t)的结果必须是equals()的。
The three-argument form is a generalization of the two-argument form, incorporating a mapping step into the accumulation step. We could re-cast the simple sum-of-weights example using the more general form as follows:
int sumOfWeights = widgets.stream()
.reduce(0,
(sum, b) -> sum + b.getWeight())
Integer::sum);
三参数重载形式是两参数重载形式的一般化,它在累积函数的基础上,加入了Mapping的步骤。可以将sum-of-weights用一般化的形式重写:
int sumOfWeights = widgets.stream()
.reduce(0,
(sum, b) -> sum + b.getWeight())
Integer::sum);
though the explicit map-reduce form is more readable and therefore should usually be preferred. The generalized form is provided for cases where significant work can be optimized away by combining mapping and reducing into a single function.
尽管map-reduce的显式形式因为可读性更强而更为常用,一般化的形式仍然有其用武之地,即通过将mapping和reducing组合为一个函数,优化繁重的工作。
reduce的三种形式
reduce有三种重载形式:
Optional reduce(BinaryOperator accumulator)
T reduce(T identity, BinaryOperator accumulator)
U reduce (U identity,
BiFunction accumulator,
BinaryOperator combiner)
简化形式
Optional reduce(BinaryOperator accumulator)
该方法接受一个BinaryOperator
型的变量,返回Optional
的结果。
BinaryOperator
是一个函数式接口【2】,代表一个在两个操作数上执行的操作,生成一个和操作数类型相同的结果。
Optional
是JDK8的有一个特性【3】,这是一个容器对象,可以包含或者不包含一个非空的值。get()方法将获取其包含的值,如果其不包含一个非空的值,get()将抛出异常。
private static void testReduce1Empty(){
List list = new ArrayList<>();
System.out.println(list.stream().reduce((a,b)->a+b)); // Optional.empty
}
private static void testReduce1EmptyGet(){
List list = new ArrayList<>();
System.out.println(list.stream().reduce((a,b)->a+b).get());
// Exception in thread "main" java.util.NoSuchElementException:
//No value present
// https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html
}
应用举例1
private static void testReduce1(){
List list = new ArrayList<>();
list.add(1);
System.out.println(list.stream().reduce((a,b)->a+b)); // Optional[1]
}
private static void testReduce1Get(){
List list = new ArrayList<>();
list.add(1);
System.out.println(list.stream().reduce((a,b)->a+b).get()); // 1
}
类库中的具体实现方法
private static void testTransferSum(){
List> lists = new ArrayList<>();
List list1 = new ArrayList<>();
list1.add(1);
List list2 = new ArrayList<>();
list2.add(2);
lists.add(list1);
lists.add(list2);
int rel = lists.stream().mapToInt(x->x.get(0)).sum();
System.out.println(rel);
}
private static void testTransferSumEmpty(){
List> lists = new ArrayList<>();
int rel = lists.stream().mapToInt(x->x.get(0)).sum();
System.out.println(rel);
}
两参数重载形式
T reduce(T identity, BinaryOperator accumulator)
相比较于简化形式,两参数重载形式主要有两点增强:
- 自动处理stream为空的情况
private static void testTwoParaSumEmpty(){
List list = new ArrayList<>();
int rel = list.stream().reduce(1,Integer::sum);
System.out.println(rel);
}
- 自定义初始值
private static void testTwoParaSum(){
List list = new ArrayList<>();
list.add(1);
list.add(2);
int rel = list.stream().reduce(1,Integer::sum);
System.out.println(rel);
}
三参数重载形式
U reduce (U identity,
BiFunction accumulator,
BinaryOperator combiner)
三参数重载形式引入了一个新的变量combiner,主要提供了两项增强:
- 可以返回同stream内元素类型不同的结果。
- 在并行条件下和parallelStream一起使用,提高效率。
Interface BiFunction
consumer的类型是 Interface BiFunction
T - 函数第一个参数的类型(实际为U,accumulator函数返回结果的类型)
U - 函数第二个参数的类型(实际为T,stream内元素的类型)
R - 函数返回值的类型 (这里实际为U)
实例
private static void testThreeParaSum(){
List> lists = new ArrayList<>();
List list1 = new ArrayList<>();
list1.add(1);
List list2 = new ArrayList<>();
list2.add(2);
lists.add(list1);
lists.add(list2);
int rel = lists.stream().reduce(1, (a,b)->a + b.get(0), (x,y)->x+y);
System.out.println(rel);
}
在非并行的情况下,该代码没有问题,它解决了stream中的元素类型和返回结果类型不同的问题,但是其显然不满足文档中对 combiner的定义:identity 的值对于combiner来说,必须是identify,意味着对于所有的u,combiner.apply(identity, u) 必须等于 u。
combiner的限制
private static void testThreeParaSumParallel(){
List> lists = new ArrayList<>();
List list1 = new ArrayList<>();
list1.add(1);
List list2 = new ArrayList<>();
list2.add(2);
lists.add(list1);
lists.add(list2);
int rel = lists.parallelStream().reduce(1, (a,b)->a + b.get(0), (x,y)->x+y);
System.out.println(rel); // return 5 instead of 4
}
而在非并行的情况下,conbiner其实并没有发挥作用:
private static void testThreeParaSumNonSense(){
List> lists = new ArrayList<>();
List list1 = new ArrayList<>();
list1.add(1);
List list2 = new ArrayList<>();
list2.add(2);
lists.add(list1);
lists.add(list2);
int rel = lists.stream().reduce(1, (a,b)->a + b.get(0), (x,y)->0);
System.out.println(rel); // return 4
}
【1】Reduction
【2】BinaryOperator
【3】Optional
【4】BiFunction