上一篇记录了Java8中的流与收集器的使用方式,这篇记录一下自定义收集器可能会遇到的问题和使用陷阱。
Collector接口里定义了一个枚举类Characteristics
,里面有三个值:CONCURRENT
, UNORDERED
, IDENTITY_FINISH
。当自定义收集器的时候,就需要实现characteristics这个方法,返回一个Set
把局部的类代码贴出来:
enum Characteristics {
/**
* Indicates that this collector is concurrent, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source.
*/
CONCURRENT,
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*/
IDENTITY_FINISH
}
代码是最好的佐证:
ReferencePipeline类的collect方法的最后一个return语句:
举个例子验证描述中的第3点:
public class MySetCollector2<T> implements Collector<T, Set<T>, Map<T, T>> {
private List> setList = new ArrayList<>();
@Override
public Supplier> supplier() {
System.out.println("supplier invoked...");
return HashSet::new;
}
@Override
public BiConsumer, T> accumulator() {
System.out.println("accumulator invoked...");
return (set, item) -> {
set.add(item);
setList.add(set);
System.out.println(Thread.currentThread().getName() + ": " + item + ", is the same address:" + isSameAddress(setList));
};
}
@Override
public BinaryOperator> combiner() {
System.out.println("combiner invoked...");
return (set1, set2) -> {
set1.addAll(set2);
System.out.println("really to combin... ");
return set1;
};
}
@Override
public Function, Map> finisher() {
System.out.println("finisher invoked...");
return (set) -> {
Map map = new HashMap<>();
set.forEach(item -> map.put(item, item));
return map;
};
}
@Override
public Set characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED, Characteristics.CONCURRENT));
}
/**
* 测试每次的Set是否为同一个对象
*
* @param list
* @return
*/
private boolean isSameAddress(List> list) {
for (int i = 0; i < list.size() - 1; i++) {
for (int j = i + 1; j < list.size(); j++) {
if (list.get(i) != list.get(j)) {
return false;
}
}
}
return true;
}
public static void main(String[] args) {
List list = Arrays.asList("hello", "world", "hello world", "a", "b", "c");
Map result = list.parallelStream().collect(new MySetCollector2<>());
System.out.println(result);
}
}
上述代码characteristics方法的返回包含了UNORDERED,CONCURRENT两个枚举值,并且最后通过并行流调用collect方法.
isSameAddress方法验证结果容器是否为同一个对象。
运行结果如下:
supplier invoked...
accumulator invoked...
main: a, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-1: world, current set content is :not display. is the same set? true
main: c, current set content is :not display. is the same set? true
main: b, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-2: hello, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-1: hello world, current set content is :not display. is the same set? true
finisher invoked...
{a=a, b=b, world=world, c=c, hello world=hello world, hello=hello}
稍微该下代码,验证描述中的第5点:
public class MySetCollector2<T> implements Collector<T, Set<T>, Map<T, T>> {
private List> setList = new ArrayList<>();
@Override
public Supplier> supplier() {
System.out.println("supplier invoked...");
return HashSet::new;
}
@Override
public BiConsumer, T> accumulator() {
System.out.println("accumulator invoked...");
return (set, item) -> {
//System.out.println("accumulator thread:" + Thread.currentThread().getName());
set.add(item);
setList.add(set);
System.out.println(Thread.currentThread().getName() + ": " + item + ", current set content is :" + set + "" + "is the same set? " + isSameAddress(setList));
};
}
@Override
public BinaryOperator> combiner() {
System.out.println("combiner invoked...");
return (set1, set2) -> {
set1.addAll(set2);
System.out.println("really to combin... ");
return set1;
};
}
@Override
public Function, Map> finisher() {
System.out.println("finisher invoked...");
return (set) -> {
Map map = new HashMap<>();
set.forEach(item -> map.put(item, item));
return map;
};
}
@Override
public Set characteristics() {
return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
}
/**
* 测试每次的Set是否为同一个对象
*
* @param list
* @return
*/
private boolean isSameAddress(List> list) {
for (int i = 0; i < list.size() - 1; i++) {
for (int j = i + 1; j < list.size(); j++) {
if (list.get(i) != list.get(j)) {
return false;
}
}
}
return true;
}
public static void main(String[] args) {
List list = Arrays.asList("hello", "world", "hello world", "a", "b", "c");
Map result = list.parallelStream().collect(new MySetCollector2<>());
System.out.println(result);
}
}
上述代码characteristics方法的返回没有包含CONCURRENT,并且最后通过并行流调用collect方法。
运行结果如下:
supplier invoked...
accumulator invoked...
combiner invoked...
main: a, current set content is :[a]is the same set? false
main: c, current set content is :[c]is the same set? false
main: b, current set content is :[b]is the same set? false
really to combin...
really to combin...
ForkJoinPool.commonPool-worker-3: hello, current set content is :[hello]is the same set? false
ForkJoinPool.commonPool-worker-1: world, current set content is :[world]is the same set? false
ForkJoinPool.commonPool-worker-2: hello world, current set content is :[hello world]is the same set? false
really to combin...
really to combin...
really to combin...
finisher invoked...
{a=a, b=b, world=world, c=c, hello=hello, hello world=hello world}
通过运行结果可以看出:程序操作的是多个可变的结果容器,并且combiner 方法的返回得到了执行。
当认为数据源是无序的,比如Set,就可以添加这个特性,否则不应该添加该枚举值。因为该特性不承诺保存的顺序和元素出现的顺序一致。
个人认为枚举类Characteristics
表示的是一种契约或者说规定,当使用者使用了某个枚举值,程序就会默认满足这种规定并执行。所以开发者一定要弄清楚不同的枚举值在什么场景下使用,否则程序就会出现不正确的结果或者抛异常。