java8学习笔记3_流与自定义收集器使用陷阱总结

 上一篇记录了Java8中的流与收集器的使用方式,这篇记录一下自定义收集器可能会遇到的问题和使用陷阱。

 Collector接口里定义了一个枚举类Characteristics,里面有三个值:CONCURRENT , UNORDERED, IDENTITY_FINISH。当自定义收集器的时候,就需要实现characteristics这个方法,返回一个Set对象。那么在返回这几个枚举值有什么注意点呢?

 把局部的类代码贴出来:

enum Characteristics {
       /**
        * Indicates that this collector is concurrent, meaning that
        * the result container can support the accumulator function being
        * called concurrently with the same result container from multiple
        * threads.
        *
        * 

If a {@code CONCURRENT} collector is not also {@code UNORDERED}, * then it should only be evaluated concurrently if applied to an * unordered data source. */ CONCURRENT, /** * Indicates that the collection operation does not commit to preserving * the encounter order of input elements. (This might be true if the * result container has no intrinsic order, such as a {@link Set}.) */ UNORDERED, /** * Indicates that the finisher function is the identity function and * can be elided. If set, it must be the case that an unchecked cast * from A to R will succeed. */ IDENTITY_FINISH }

1. IDENTITY_FINISH

  1. 当Set中包含了IDENTITY_FINISH枚举值时,finisher 函数就不会执行了。因为一旦有这个值,程序就会认为可变的结果容器结果就是最终结果,没有必要再去转换。
  2. Set中包含了IDENTITY_FINISH枚举值时,我们自己就需要保证中间结果类型就是最终的结果类型,否则就会强制类型转化失败,代码会抛出ClassCastException异常。

代码是最好的佐证:
  ReferencePipeline类的collect方法的最后一个return语句:
java8学习笔记3_流与自定义收集器使用陷阱总结_第1张图片

2. CONCURRENT

  1. 当Set中包含了CONCURRENT时,意味着程序就会默认多线程(前提是得到并行流)可以并行调用accumulator函数并且最终能返回正确的结果。
  2. 当Set中包含了CONCURRENT时,多线程(前提是得到并行流)其实操作的是同一个结果容器,这个时候就需要开发者自己保证多线程操作同一个结果容器的准确性。
  3. 当Set中包含了CONCURRENT时,尽管是并行流,combiner方法返回的函数式接口实例不会得到调用,这是因为操作的是同一个结果容器,没必要执行。
  4. 由于2,当使用并行流时,不要在累加器返回的函数式接口实例中做额外的操作,比如打印(迭代)set内容,否则可能会抛出ConcurrentModificationException(it is not generally permissible for one thread to modify a Collection
    while another thread is iterating over it)。
  5. 当Set中**不包含**CONCURRENT时, 并行流在调用collect方法时操作的是多个不同的结果容器,并且一定会执行combiner方法返回的函数式接口实例。 注意与第3点对比。

举个例子验证描述中的第3点:

public class MySetCollector2<T> implements Collector<T, Set<T>, Map<T, T>> {

    private List> setList = new ArrayList<>();

    @Override
    public Supplier> supplier() {
        System.out.println("supplier invoked...");
        return HashSet::new;
    }

    @Override
    public BiConsumer, T> accumulator() {
        System.out.println("accumulator invoked...");
        return (set, item) -> {
            set.add(item);
            setList.add(set);
            System.out.println(Thread.currentThread().getName() + ": " + item + ", is the same address:" + isSameAddress(setList));
        };
    }

    @Override
    public BinaryOperator> combiner() {
        System.out.println("combiner invoked...");
        return (set1, set2) -> {
            set1.addAll(set2);
            System.out.println("really to combin... ");
            return set1;
        };
    }

    @Override
    public Function, Map> finisher() {
        System.out.println("finisher invoked...");
        return (set) -> {
            Map map = new HashMap<>();
            set.forEach(item -> map.put(item, item));
            return map;
        };
    }

    @Override
    public Set characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED, Characteristics.CONCURRENT));
    }

    /**
     * 测试每次的Set是否为同一个对象
     *
     * @param list
     * @return
     */
    private boolean isSameAddress(List> list) {
        for (int i = 0; i < list.size() - 1; i++) {
            for (int j = i + 1; j < list.size(); j++) {
                if (list.get(i) != list.get(j)) {
                    return false;
                }
            }
        }
        return true;
    }

    public static void main(String[] args) {
        List list = Arrays.asList("hello", "world", "hello world", "a", "b", "c");
        Map result = list.parallelStream().collect(new MySetCollector2<>());
        System.out.println(result);

    }
}

 上述代码characteristics方法的返回包含了UNORDERED,CONCURRENT两个枚举值,并且最后通过并行流调用collect方法.
 isSameAddress方法验证结果容器是否为同一个对象。

运行结果如下:

supplier invoked...
accumulator invoked...
main: a, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-1: world, current set content is :not display. is the same set? true
main: c, current set content is :not display. is the same set? true
main: b, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-2: hello, current set content is :not display. is the same set? true
ForkJoinPool.commonPool-worker-1: hello world, current set content is :not display. is the same set? true
finisher invoked...
{a=a, b=b, world=world, c=c, hello world=hello world, hello=hello}

稍微该下代码,验证描述中的第5点:

public class MySetCollector2<T> implements Collector<T, Set<T>, Map<T, T>> {

    private List> setList = new ArrayList<>();

    @Override
    public Supplier> supplier() {
        System.out.println("supplier invoked...");
        return HashSet::new;
    }

    @Override
    public BiConsumer, T> accumulator() {
        System.out.println("accumulator invoked...");
        return (set, item) -> {
            //System.out.println("accumulator thread:" + Thread.currentThread().getName());
            set.add(item);
            setList.add(set);
            System.out.println(Thread.currentThread().getName() + ": " + item + ", current set content is :" +  set + "" + "is the same set? " + isSameAddress(setList));
        };
    }

    @Override
    public BinaryOperator> combiner() {
        System.out.println("combiner invoked...");
        return (set1, set2) -> {
            set1.addAll(set2);
            System.out.println("really to combin... ");
            return set1;
        };
    }

    @Override
    public Function, Map> finisher() {
        System.out.println("finisher invoked...");
        return (set) -> {
            Map map = new HashMap<>();
            set.forEach(item -> map.put(item, item));
            return map;
        };
    }

    @Override
    public Set characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));
    }

    /**
     * 测试每次的Set是否为同一个对象
     *
     * @param list
     * @return
     */
    private boolean isSameAddress(List> list) {
        for (int i = 0; i < list.size() - 1; i++) {
            for (int j = i + 1; j < list.size(); j++) {
                if (list.get(i) != list.get(j)) {
                    return false;
                }
            }
        }
        return true;
    }

    public static void main(String[] args) {
        List list = Arrays.asList("hello", "world", "hello world", "a", "b", "c");
        Map result = list.parallelStream().collect(new MySetCollector2<>());
        System.out.println(result);

    }
}

上述代码characteristics方法的返回没有包含CONCURRENT,并且最后通过并行流调用collect方法。
运行结果如下:

supplier invoked...
accumulator invoked...
combiner invoked...
main: a, current set content is :[a]is the same set? false
main: c, current set content is :[c]is the same set? false
main: b, current set content is :[b]is the same set? false
really to combin... 
really to combin... 
ForkJoinPool.commonPool-worker-3: hello, current set content is :[hello]is the same set? false
ForkJoinPool.commonPool-worker-1: world, current set content is :[world]is the same set? false
ForkJoinPool.commonPool-worker-2: hello world, current set content is :[hello world]is the same set? false
really to combin... 
really to combin... 
really to combin... 
finisher invoked...
{a=a, b=b, world=world, c=c, hello=hello, hello world=hello world}

 通过运行结果可以看出:程序操作的是多个可变的结果容器,并且combiner 方法的返回得到了执行。

3. UNORDERED

 当认为数据源是无序的,比如Set,就可以添加这个特性,否则不应该添加该枚举值。因为该特性不承诺保存的顺序和元素出现的顺序一致。

4. 总结

 个人认为枚举类Characteristics表示的是一种契约或者说规定,当使用者使用了某个枚举值,程序就会默认满足这种规定并执行。所以开发者一定要弄清楚不同的枚举值在什么场景下使用,否则程序就会出现不正确的结果或者抛异常。

你可能感兴趣的:(java8)