Stream-数据操作

java 中数据聚合分解的神器

对比例子

---一个排序、取值
---------传统写法
List groceryTransactions = new Arraylist<>();
for(Transaction t: transactions){
 if(t.getType() == Transaction.GROCERY){
 groceryTransactions.add(t);
 }
}
Collections.sort(groceryTransactions, new Comparator(){
 public int compare(Transaction t1, Transaction t2){
 return t2.getValue().compareTo(t1.getValue());
 }
});
List transactionIds = new ArrayList<>();
for(Transaction t: groceryTransactions){
 transactionsIds.add(t.getId());
}

----------stream写法:代码更加简洁易读;而且使用并发模式,程序执行速度更快
List transactionsIds = transactions.parallelStream().
 filter(t -> t.getType() == Transaction.GROCERY).
 sorted(comparing(Transaction::getValue).reversed()).
 map(Transaction::getId).
 collect(toList());

数据聚合3个步骤

  • 取数:形成stream
  • 中间方法(Intermediate):如过滤、排序、去重、模型转换等
  • 吐出终端数据(Terminal):给出想要呈现的数据形式、或结果数据

stream 方法

  • Intermediate:map (mapToInt, flatMap 等)、 filter、 distinct、 sorted、 peek、 skip、 parallel、 sequential、 unordered

  • Terminal:forEach、 forEachOrdered、 toArray、 reduce、 collect、 min、 max、 count、iterator

  • Short-circuiting:
    anyMatch、 allMatch、 noneMatch、 findFirst、 findAny、 limit

Intermediate

concat

将两个Stream连接在一起,合成一个Stream。若两个输入的Stream都是排序的,则新Stream也是排序的;若输入的Stream中任何一个是并行的,则新的Stream也是并行的;若关闭新的Stream时,原两个输入的Stream都将执行关闭处理。

Stream.concat(Stream.of(1, 2, 3), Stream.of(4, 5))
       .forEach(integer -> System.out.print(integer + "  "));
// 打印结果
// 1  2  3  4  5  

distinct

除掉原Stream中重复的元素,生成的新Stream中没有没有重复的元素。

Stream.of(1,2,3,1,2,3)
        .distinct()
        .forEach(System.out::println); // 打印结果:1,2,3

filter

对原Stream按照指定条件过滤,在新建的Stream中,只包含满足条件的元素,将不满足条件的元素过滤掉。

Stream.of(1, 2, 3, 4, 5)
        .filter(item -> item > 3)
        .forEach(System.out::println);// 打印结果:4,5

map

map方法将对于Stream中包含的元素使用给定的转换函数进行转换操作,新生成的Stream只包含转换生成的元素。
为了提高处理效率,官方已封装好了,三种变形:mapToDouble,mapToInt,mapToLong。如果想将原Stream中的数据类型,转换为double,int或者是long是可以调用相对应的方法。

Stream.of("a", "b", "hello")
        .map(item-> item.toUpperCase())
        .forEach(System.out::println);

Stream.of("1", "2", "3")
  .mapToDouble(e->Double.valueOf(e))
  .forEach(d-> System.out.println("d = " + d));

flatMap

flatMap方法与map方法类似,都是将原Stream中的每一个元素通过转换函数转换,不同的是,该换转函数的对象是一个Stream,也不会再创建一个新的Stream,而是将原Stream的元素取代为转换的Stream。如果转换函数生产的Stream为null,应由空Stream取代。flatMap有三个对于原始类型的变种方法,分别是:flatMapToInt,flatMapToLong和flatMapToDouble。

Stream.of(1, 2, 3)
    .flatMap(integer -> Stream.of(integer * 10))
    .forEach(System.out::println);
    // 打印结果
    // 10,20,30

peek

peek方法生成一个包含原Stream的所有元素的新Stream,同时会提供一个消费函数(Consumer实例),新Stream每个元素被消费的时候都会执行给定的消费函数,并且消费函数优先执行

Stream.of(1, 2, 3, 4, 5)
        .peek(integer -> System.out.println("accept:" + integer))
        .forEach(System.out::println);
// 打印结果
// accept:1
//  1
//  accept:2
//  2
...

skip

skip方法将过滤掉原Stream中的前N个元素,返回剩下的元素所组成的新Stream。如果原Stream的元素个数大于N,将返回原Stream的后(原Stream长度-N)个元素所组成的新Stream;如果原Stream的元素个数小于或等于N,将返回一个空Stream。

Stream.of(1, 2, 3,4,5) 
.skip(2) 
.forEach(System.out::println); 
// 打印结果 
// 3,4,5

sorted

对原Stream进行排序,返回一个有序列的新Stream。sorterd有两种变体sorted(),sorted(Comparator),前者将默认使用Object.equals(Object)进行排序,而后者接受一个自定义排序规则函数(Comparator),可按照意愿排序。

Stream.of(5, 4, 3, 2, 1)
                .sorted()
                .forEach(System.out::println);

        Stream.of(1, 2, 3, 4, 5)
                .sorted( (a, b)-> a >= b ? -1 : 1)
                .forEach(System.out::println);

Terminal

collect 系列

在Stream接口提供了Collect的方法:

 R collect(Supplier supplier, //提供数据容器
                  BiConsumer accumulator, // 如何添加到容器
                  BiConsumer combiner // 多个容器的聚合策略
);
如:
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,StringBuilder::append).toString();
//等价于上面,这样看起来应该更加清晰
String concat = stringStream.collect(() -> new StringBuilder(),(l, x) -> l.append(x), (r1, r2) -> r1.append(r2)).toString();
// List转Map
Lists.newArrayList().stream()
        .collect(() -> new HashMap>(),
            (h, x) -> {
              List value = h.getOrDefault(x.getType(), Lists.newArrayList());
              value.add(x);
              h.put(x.getType(), value);
            },
            HashMap::putAll
        );

  R collect(Collector collector);
// 提供初始容器->加入元素到容器->并发下多容器聚合->对聚合后结果进行操作

Collector是Stream的可变减少操作接口(可变减少操作如:集合转换;计算元素相关的统计信息,例如sum,min,max或average等)

提供初始容器->加入元素到容器->并发下多容器聚合->对聚合后结果进行操作

Stream是支持并发操作的,为了避免竞争,对于reduce线程都会有独立的result,combiner的作用在于合并每个线程的result得到最终结果。

---collector 接口定义
* @param  the type of input elements to the reduction operation
 * @param  the mutable accumulation type of the reduction operation (often
 *            hidden as an implementation detail)
 * @param  the result type of the reduction operation
 * @since 1.8
 */
public interface Collector {
    /**
     * A function that creates and returns a new mutable result container.
     *
     * @return a function which returns a new, mutable result container
     */
    Supplier supplier();

    /**
     * A function that folds a value into a mutable result container.
     *
     * @return a function which folds a value into a mutable result container
     */
    BiConsumer accumulator();

    /**
     * A function that accepts two partial results and merges them.  The
     * combiner function may fold state from one argument into the other and
     * return that, or may return a new result container.
     *
     * @return a function which combines two partial results into a combined
     * result
     */
    BinaryOperator combiner();

    /**
     * Perform the final transformation from the intermediate accumulation type
     * {@code A} to the final result type {@code R}.
     *
     * 

If the characteristic {@code IDENTITY_TRANSFORM} is * set, this function may be presumed to be an identity transform with an * unchecked cast from {@code A} to {@code R}. * * @return a function which transforms the intermediate result to the final * result */ Function finisher(); /** * Returns a {@code Set} of {@code Collector.Characteristics} indicating * the characteristics of this Collector. This set should be immutable. * * @return an immutable set of collector characteristics */ Set characteristics(); /** * Returns a new {@code Collector} described by the given {@code supplier}, * {@code accumulator}, and {@code combiner} functions. The resulting * {@code Collector} has the {@code Collector.Characteristics.IDENTITY_FINISH} * characteristic. * * @param supplier The supplier function for the new collector * @param accumulator The accumulator function for the new collector * @param combiner The combiner function for the new collector * @param characteristics The collector characteristics for the new * collector * @param The type of input elements for the new collector * @param The type of intermediate accumulation result, and final result, * for the new collector * @throws NullPointerException if any argument is null * @return the new {@code Collector} */ public static Collector of(Supplier supplier, BiConsumer accumulator, BinaryOperator combiner, Characteristics... characteristics) { Objects.requireNonNull(supplier); Objects.requireNonNull(accumulator); Objects.requireNonNull(combiner); Objects.requireNonNull(characteristics); Set cs = (characteristics.length == 0) ? Collectors.CH_ID : Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH, characteristics)); return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, cs); } /** * Returns a new {@code Collector} described by the given {@code supplier}, * {@code accumulator}, {@code combiner}, and {@code finisher} functions. * * @param supplier The supplier function for the new collector * @param accumulator The accumulator function for the new collector * @param combiner The combiner function for the new collector * @param finisher The finisher function for the new collector * @param characteristics The collector characteristics for the new * collector * @param The type of input elements for the new collector * @param The intermediate accumulation type of the new collector * @param The final result type of the new collector * @throws NullPointerException if any argument is null * @return the new {@code Collector} */ public static Collector of(Supplier supplier, BiConsumer accumulator, BinaryOperator combiner, Function finisher, Characteristics... characteristics) { Objects.requireNonNull(supplier); Objects.requireNonNull(accumulator); Objects.requireNonNull(combiner); Objects.requireNonNull(finisher); Objects.requireNonNull(characteristics); Set cs = Collectors.CH_NOID; if (characteristics.length > 0) { cs = EnumSet.noneOf(Characteristics.class); Collections.addAll(cs, characteristics); cs = Collections.unmodifiableSet(cs); } return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, finisher, cs); } /** * Characteristics indicating properties of a {@code Collector}, which can * be used to optimize reduction implementations. */ enum Characteristics { /** * Indicates that this collector is concurrent, meaning that * the result container can support the accumulator function being * called concurrently with the same result container from multiple * threads. * *

If a {@code CONCURRENT} collector is not also {@code UNORDERED}, * then it should only be evaluated concurrently if applied to an * unordered data source. */ CONCURRENT, /** * Indicates that the collection operation does not commit to preserving * the encounter order of input elements. (This might be true if the * result container has no intrinsic order, such as a {@link Set}.) */ UNORDERED, /** * Indicates that the finisher function is the identity function and * can be elided. If set, it must be the case that an unchecked cast * from A to R will succeed. */ IDENTITY_FINISH } }

-- jdk inf
 R collect(Supplier supplier,
                  BiConsumer accumulator,
                  BiConsumer combiner);
-- emp
 List asList = stringStream.collect(ArrayList::new, ArrayList::add,
                                                    ArrayList::addAll);
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,
                                               StringBuilder::append)
                                      .toString();

-- jdk inf
 R collect(Collector collector);

-- emp
List asList = stringStream.collect(Collectors.toList());

Map> peopleByCity
              = personStream.collect(Collectors.groupingBy(Person::getCity));
Map>> peopleByStateAndCity
              = personStream.collect(Collectors.groupingBy(Person::getState,
                                                         Collectors.groupingBy(Person::getCity)));

Collector接受三个泛型参数,对可变减少操作的数据类型作相应限制:
T:输入元素类型
A:可变处理函数
R:结果类型

Collector接口声明了4个函数,一起协作,将元素放入容器,经过转换输出想要结果:

todo 自定义Collector

转换成其他集合

对于前面提到了很多Stream的链式操作,但是,我们总是要将Strea生成一个集合,比如:

  • 将流转换成集合
  • 在集合上进行一系列链式操作后, 最终希望生成一个值
  • 写单元测试时, 需要对某个具体的集合做断言
toList、toSet、...、toCollection、toMap
List collectList = Stream.of(1, 2, 3, 4)
        .collect(Collectors.toList());

// Collectors.toList() 内部实现
 public static 
    Collector> toList() {
        return new CollectorImpl<>((Supplier>) ArrayList::new, List::add,
                                   (left, right) -> { left.addAll(right); return left; },
                                   CH_ID);
    }

如希望生成一个不是由Stream类库自动指定的一种类型(如TreeSet)。此时使用toCollection,它接受一个函数作为参数, 来创建集合。

toMap最少应接受两个参数,一个用来生成key,另外一个用来生成value。toMap方法有三种变形:

toMap(Function keyMapper,Function valueMapper)

转成值

使用collect可以将Stream转换成值。如maxBy和minBy允许用户按照某个特定的顺序生成一个值。

Optional collectMaxBy = Stream.of(1, 2, 3, 4)
            .collect(Collectors.maxBy(Comparator.comparingInt(o -> o)));
分割数据块 partitioningBy

collect的一个常用操作将Stream分解成两个集合。

Map> collectParti = Stream.of(1, 2, 3, 4)
            .collect(Collectors.partitioningBy(it -> it % 2 == 0));
数据分组 groupingBy

数据分组是一种更自然的分割数据操作, 与将数据分成true和false两部分不同,可以使用任意值对数据分组。

调用Stream的collect方法,传入一个收集器,groupingBy接受一个分类函数,用来对数据分组,就像partitioningBy一样,接受一个
Predicate对象将数据分成true和false两部分。我们使用的分类器是一个Function对象,和map操作用到的一样。

Map> collectGroup= Stream.of(1, 2, 3, 4)
            .collect(Collectors.groupingBy(it -> it > 3));

// 内部实现
   public static >
    Collector groupingBy(Function classifier,  // 对key分类
                                  Supplier mapFactory, // Map的容器具体类型
                                  Collector downstream // 对Value的收集操作
) {
       .......
    }

// 举例
//原生形式
   Lists.newArrayList().stream()
        .collect(() -> new HashMap>(),
            (h, x) -> {
              List value = h.getOrDefault(x.getType(), Lists.newArrayList());
              value.add(x);
              h.put(x.getType(), value);
            },
            HashMap::putAll
        );
//groupBy形式
Lists.newArrayList().stream()
        .collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.toList()));

//因为对值有了操作,因此我可以更加灵活的对值进行转换
// 返回Map>
Lists.newArrayList().stream()
        .collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.mapping(Person::getName,Collectors.toSet())));
聚合 reducing
public static  Collector
    reducing(T identity, BinaryOperator op) {
        return new CollectorImpl<>(
                boxSupplier(identity), // 一个长度为1的Object[]数组 作为容器,为何用数组?因为不可变类型
                (a, t) -> { a[0] = op.apply(a[0], t); }, // 加入容器操作
                (a, b) -> { a[0] = op.apply(a[0], b[0]); return a; }, // 多容器合并
                a -> a[0], // 取聚合后的结果
                CH_NOID);
    }

// --------举例
//原生操作
final Integer[] integers = Lists.newArrayList(1, 2, 3, 4, 5)
        .stream()
        .collect(() -> new Integer[]{0}, (a, x) -> a[0] += x, (a1, a2) -> a1[0] += a2[0]);
//reducing操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
        .stream()
        .collect(Collectors.reducing(0, Integer::sum));    
//当然Stream也提供了reduce操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
        .stream().reduce(0, Integer::sum)
字符串拼接
String strJoin = Stream.of("1", "2", "3", "4")
        .collect(Collectors.joining(",", "[", "]"));
System.out.println("strJoin: " + strJoin);
// 打印结果
// strJoin: [1,2,3,4]

// Collectors.joining 内部实现:
public static Collector joining() {
        return new CollectorImpl(
                StringBuilder::new, StringBuilder::append,
                (r1, r2) -> { r1.append(r2); return r1; },
                StringBuilder::toString, CH_NOID);
    }

组合Collector

可以将Colletor 组合起来使用

Map partiCount = Stream.of(1, 2, 3, 4)
        .collect(Collectors.partitioningBy(it -> it.intValue() % 2 == 0,
                Collectors.counting()));
自定义Collector

求一段数字的和,如果是奇数,直接相加;如果是偶数,乘以2后在相加

定义一个类IntegerSum作为过渡容器
public class IntegerSum { 
Integer sum;
public IntegerSum(Integer sum) {
    this.sum = sum;
}

public IntegerSum doSum(Integer item) {
    if (item % 2 == 0) {
        this.sum += item * 2;
    } else {
        this.sum += item;
    }
    return this;

}

public IntegerSum doCombine(IntegerSum it) {
    this.sum += it.sum;
    return this;
}

public Integer toValue() {
    return this.sum;
}
}


        Integer sumRes = Stream.of(1, 2, 3, 4, 5).collect(new Collector() {
            /**
             * A function that creates and returns a new mutable result container.
             *
             * @return a function which returns a new, mutable result container
             */
            @Override
            public Supplier supplier() {
                return ()-> new IntegerSum(0);
            }

            /**
             * A function that folds a value into a mutable result container.
             *
             * @return a function which folds a value into a mutable result container
             */
            @Override
            public BiConsumer accumulator() {
                // return (a, b) -> a.doSum(b);
                return IntegerSum::doSum;
            }

            /**
             * A function that accepts two partial results and merges them.  The
             * combiner function may fold state from one argument into the other and
             * return that, or may return a new result container.
             *
             * @return a function which combines two partial results into a combined
             * result
             */
            @Override
            public BinaryOperator combiner() {
                // return (a,b)->a.doCombine(b);
                return IntegerSum::doCombine;
            }

            /**
             * Perform the final transformation from the intermediate accumulation type
             * {@code A} to the final result type {@code R}.
             *
             * 

If the characteristic {@code IDENTITY_TRANSFORM} is * set, this function may be presumed to be an identity transform with an * unchecked cast from {@code A} to {@code R}. * * @return a function which transforms the intermediate result to the final * result */ @Override public Function finisher() { return IntegerSum::toValue; } /** * Returns a {@code Set} of {@code Collector.Characteristics} indicating * the characteristics of this Collector. This set should be immutable. * * @return an immutable set of collector characteristics */ @Override public Set characteristics() { return Collections.emptySet(); } });

forEachOrdered

forEachOrdered方法与forEach类似,都是遍历Stream中的所有元素,不同的是,如果该Stream预先设定了顺序,会按照预先设定的顺序执行(Stream是无序的),默认为元素插入的顺序。

Stream.of(5,2,1,4,3)
        .forEachOrdered(integer -> {
            System.out.println("integer:"+integer);
        }); 
        // 打印结果
        // integer:5
        // integer:2
        // integer:1
        // integer:4
        // integer:3
max、min
// Comparator 可指定排序规则
Optional max = Stream.of(1, 2, 3, 4, 5)
        .max((o1, o2) -> o2 - o1);
System.out.println("max:" + max.get());// 打印结果:max:1
Optional max = Stream.of(1, 2, 3, 4, 5)
        .max((o1, o2) -> o1 - o2);
System.out.println("max:" + max.get());// 打印结果:min:5

Short-circuiting

  • allMatch:判断Stream中的元素是否全部满足指定条件。如果全部满足条件返回true,否则返回false。
  • anyMatch:是否有满足指定条件的元素。如果最少有一个满足条件返回true,否则返回false。
  • findAny:获取含有Stream中的某个元素的Optional,如果Stream为空,则返回一个空的Optional。
  • findFirst:获取含有Stream中的第一个元素的Optional
  • limit方法将截取原Stream,截取后Stream的最大长度不能超过指定值N。如果原Stream的元素个数大于N,将截取原Stream的前N个元素;如果原Stream的元素个数小于或等于N,将截取原Stream中的所有元素。
  • noneMatch方法将判断Stream中的所有元素是否满足指定的条件,如果所有元素都不满足条件,返回true;否则,返回false.

Ref:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
https://www.ibm.com/developerworks/cn/java/j-lo-java8streamapi/
https://www.baeldung.com/java-8-collectors
https://blog.csdn.net/IO_Field/article/details/54971608

你可能感兴趣的:(Stream-数据操作)