java 中数据聚合分解的神器
对比例子
---一个排序、取值
---------传统写法
List groceryTransactions = new Arraylist<>();
for(Transaction t: transactions){
if(t.getType() == Transaction.GROCERY){
groceryTransactions.add(t);
}
}
Collections.sort(groceryTransactions, new Comparator(){
public int compare(Transaction t1, Transaction t2){
return t2.getValue().compareTo(t1.getValue());
}
});
List transactionIds = new ArrayList<>();
for(Transaction t: groceryTransactions){
transactionsIds.add(t.getId());
}
----------stream写法:代码更加简洁易读;而且使用并发模式,程序执行速度更快
List transactionsIds = transactions.parallelStream().
filter(t -> t.getType() == Transaction.GROCERY).
sorted(comparing(Transaction::getValue).reversed()).
map(Transaction::getId).
collect(toList());
数据聚合3个步骤
- 取数:形成stream
- 中间方法(Intermediate):如过滤、排序、去重、模型转换等
- 吐出终端数据(Terminal):给出想要呈现的数据形式、或结果数据
stream 方法
Intermediate:map (mapToInt, flatMap 等)、 filter、 distinct、 sorted、 peek、 skip、 parallel、 sequential、 unordered
Terminal:forEach、 forEachOrdered、 toArray、 reduce、 collect、 min、 max、 count、iterator
Short-circuiting:
anyMatch、 allMatch、 noneMatch、 findFirst、 findAny、 limit
Intermediate
concat
将两个Stream连接在一起,合成一个Stream。若两个输入的Stream都是排序的,则新Stream也是排序的;若输入的Stream中任何一个是并行的,则新的Stream也是并行的;若关闭新的Stream时,原两个输入的Stream都将执行关闭处理。
Stream.concat(Stream.of(1, 2, 3), Stream.of(4, 5))
.forEach(integer -> System.out.print(integer + " "));
// 打印结果
// 1 2 3 4 5
distinct
除掉原Stream中重复的元素,生成的新Stream中没有没有重复的元素。
Stream.of(1,2,3,1,2,3)
.distinct()
.forEach(System.out::println); // 打印结果:1,2,3
filter
对原Stream按照指定条件过滤,在新建的Stream中,只包含满足条件的元素,将不满足条件的元素过滤掉。
Stream.of(1, 2, 3, 4, 5)
.filter(item -> item > 3)
.forEach(System.out::println);// 打印结果:4,5
map
map方法将对于Stream中包含的元素使用给定的转换函数进行转换操作,新生成的Stream只包含转换生成的元素。
为了提高处理效率,官方已封装好了,三种变形:mapToDouble,mapToInt,mapToLong。如果想将原Stream中的数据类型,转换为double,int或者是long是可以调用相对应的方法。
Stream.of("a", "b", "hello")
.map(item-> item.toUpperCase())
.forEach(System.out::println);
Stream.of("1", "2", "3")
.mapToDouble(e->Double.valueOf(e))
.forEach(d-> System.out.println("d = " + d));
flatMap
flatMap方法与map方法类似,都是将原Stream中的每一个元素通过转换函数转换,不同的是,该换转函数的对象是一个Stream,也不会再创建一个新的Stream,而是将原Stream的元素取代为转换的Stream。如果转换函数生产的Stream为null,应由空Stream取代。flatMap有三个对于原始类型的变种方法,分别是:flatMapToInt,flatMapToLong和flatMapToDouble。
Stream.of(1, 2, 3)
.flatMap(integer -> Stream.of(integer * 10))
.forEach(System.out::println);
// 打印结果
// 10,20,30
peek
peek方法生成一个包含原Stream的所有元素的新Stream,同时会提供一个消费函数(Consumer实例),新Stream每个元素被消费的时候都会执行给定的消费函数,并且消费函数优先执行
Stream.of(1, 2, 3, 4, 5)
.peek(integer -> System.out.println("accept:" + integer))
.forEach(System.out::println);
// 打印结果
// accept:1
// 1
// accept:2
// 2
...
skip
skip方法将过滤掉原Stream中的前N个元素,返回剩下的元素所组成的新Stream。如果原Stream的元素个数大于N,将返回原Stream的后(原Stream长度-N)个元素所组成的新Stream;如果原Stream的元素个数小于或等于N,将返回一个空Stream。
Stream.of(1, 2, 3,4,5)
.skip(2)
.forEach(System.out::println);
// 打印结果
// 3,4,5
sorted
对原Stream进行排序,返回一个有序列的新Stream。sorterd有两种变体sorted(),sorted(Comparator),前者将默认使用Object.equals(Object)进行排序,而后者接受一个自定义排序规则函数(Comparator),可按照意愿排序。
Stream.of(5, 4, 3, 2, 1)
.sorted()
.forEach(System.out::println);
Stream.of(1, 2, 3, 4, 5)
.sorted( (a, b)-> a >= b ? -1 : 1)
.forEach(System.out::println);
Terminal
collect 系列
在Stream接口提供了Collect的方法:
R collect(Supplier supplier, //提供数据容器
BiConsumer accumulator, // 如何添加到容器
BiConsumer combiner // 多个容器的聚合策略
);
如:
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,StringBuilder::append).toString();
//等价于上面,这样看起来应该更加清晰
String concat = stringStream.collect(() -> new StringBuilder(),(l, x) -> l.append(x), (r1, r2) -> r1.append(r2)).toString();
// List转Map
Lists.newArrayList().stream()
.collect(() -> new HashMap>(),
(h, x) -> {
List value = h.getOrDefault(x.getType(), Lists.newArrayList());
value.add(x);
h.put(x.getType(), value);
},
HashMap::putAll
);
R collect(Collector super T, A, R> collector);
// 提供初始容器->加入元素到容器->并发下多容器聚合->对聚合后结果进行操作
Collector是Stream的可变减少操作接口(可变减少操作如:集合转换;计算元素相关的统计信息,例如sum,min,max或average等)
提供初始容器->加入元素到容器->并发下多容器聚合->对聚合后结果进行操作
Stream是支持并发操作的,为了避免竞争,对于reduce线程都会有独立的result,combiner的作用在于合并每个线程的result得到最终结果。
---collector 接口定义
* @param the type of input elements to the reduction operation
* @param the mutable accumulation type of the reduction operation (often
* hidden as an implementation detail)
* @param the result type of the reduction operation
* @since 1.8
*/
public interface Collector {
/**
* A function that creates and returns a new mutable result container.
*
* @return a function which returns a new, mutable result container
*/
Supplier supplier();
/**
* A function that folds a value into a mutable result container.
*
* @return a function which folds a value into a mutable result container
*/
BiConsumer accumulator();
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* @return a function which combines two partial results into a combined
* result
*/
BinaryOperator combiner();
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* If the characteristic {@code IDENTITY_TRANSFORM} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
Function finisher();
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* @return an immutable set of collector characteristics
*/
Set characteristics();
/**
* Returns a new {@code Collector} described by the given {@code supplier},
* {@code accumulator}, and {@code combiner} functions. The resulting
* {@code Collector} has the {@code Collector.Characteristics.IDENTITY_FINISH}
* characteristic.
*
* @param supplier The supplier function for the new collector
* @param accumulator The accumulator function for the new collector
* @param combiner The combiner function for the new collector
* @param characteristics The collector characteristics for the new
* collector
* @param The type of input elements for the new collector
* @param The type of intermediate accumulation result, and final result,
* for the new collector
* @throws NullPointerException if any argument is null
* @return the new {@code Collector}
*/
public static Collector of(Supplier supplier,
BiConsumer accumulator,
BinaryOperator combiner,
Characteristics... characteristics) {
Objects.requireNonNull(supplier);
Objects.requireNonNull(accumulator);
Objects.requireNonNull(combiner);
Objects.requireNonNull(characteristics);
Set cs = (characteristics.length == 0)
? Collectors.CH_ID
: Collections.unmodifiableSet(EnumSet.of(Collector.Characteristics.IDENTITY_FINISH,
characteristics));
return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, cs);
}
/**
* Returns a new {@code Collector} described by the given {@code supplier},
* {@code accumulator}, {@code combiner}, and {@code finisher} functions.
*
* @param supplier The supplier function for the new collector
* @param accumulator The accumulator function for the new collector
* @param combiner The combiner function for the new collector
* @param finisher The finisher function for the new collector
* @param characteristics The collector characteristics for the new
* collector
* @param The type of input elements for the new collector
* @param The intermediate accumulation type of the new collector
* @param The final result type of the new collector
* @throws NullPointerException if any argument is null
* @return the new {@code Collector}
*/
public static Collector of(Supplier supplier,
BiConsumer accumulator,
BinaryOperator combiner,
Function finisher,
Characteristics... characteristics) {
Objects.requireNonNull(supplier);
Objects.requireNonNull(accumulator);
Objects.requireNonNull(combiner);
Objects.requireNonNull(finisher);
Objects.requireNonNull(characteristics);
Set cs = Collectors.CH_NOID;
if (characteristics.length > 0) {
cs = EnumSet.noneOf(Characteristics.class);
Collections.addAll(cs, characteristics);
cs = Collections.unmodifiableSet(cs);
}
return new Collectors.CollectorImpl<>(supplier, accumulator, combiner, finisher, cs);
}
/**
* Characteristics indicating properties of a {@code Collector}, which can
* be used to optimize reduction implementations.
*/
enum Characteristics {
/**
* Indicates that this collector is concurrent, meaning that
* the result container can support the accumulator function being
* called concurrently with the same result container from multiple
* threads.
*
* If a {@code CONCURRENT} collector is not also {@code UNORDERED},
* then it should only be evaluated concurrently if applied to an
* unordered data source.
*/
CONCURRENT,
/**
* Indicates that the collection operation does not commit to preserving
* the encounter order of input elements. (This might be true if the
* result container has no intrinsic order, such as a {@link Set}.)
*/
UNORDERED,
/**
* Indicates that the finisher function is the identity function and
* can be elided. If set, it must be the case that an unchecked cast
* from A to R will succeed.
*/
IDENTITY_FINISH
}
}
-- jdk inf
R collect(Supplier supplier,
BiConsumer accumulator,
BiConsumer combiner);
-- emp
List asList = stringStream.collect(ArrayList::new, ArrayList::add,
ArrayList::addAll);
String concat = stringStream.collect(StringBuilder::new, StringBuilder::append,
StringBuilder::append)
.toString();
-- jdk inf
R collect(Collector super T, A, R> collector);
-- emp
List asList = stringStream.collect(Collectors.toList());
Map> peopleByCity
= personStream.collect(Collectors.groupingBy(Person::getCity));
Map>> peopleByStateAndCity
= personStream.collect(Collectors.groupingBy(Person::getState,
Collectors.groupingBy(Person::getCity)));
Collector
T:输入元素类型
A:可变处理函数
R:结果类型
Collector接口声明了4个函数,一起协作,将元素放入容器,经过转换输出想要结果:
- Supplier supplier(): 创建新的结果
- BiConsumer accumulator(): 将元素添加到结果容器
- BinaryOperator combiner(): 将两个结果容器合并为一个结果容器
- Function finisher(): 对结果容器作相应的变换
todo 自定义Collector
转换成其他集合
对于前面提到了很多Stream的链式操作,但是,我们总是要将Strea生成一个集合,比如:
- 将流转换成集合
- 在集合上进行一系列链式操作后, 最终希望生成一个值
- 写单元测试时, 需要对某个具体的集合做断言
toList、toSet、...、toCollection、toMap
List collectList = Stream.of(1, 2, 3, 4)
.collect(Collectors.toList());
// Collectors.toList() 内部实现
public static
Collector> toList() {
return new CollectorImpl<>((Supplier>) ArrayList::new, List::add,
(left, right) -> { left.addAll(right); return left; },
CH_ID);
}
如希望生成一个不是由Stream类库自动指定的一种类型(如TreeSet)。此时使用toCollection,它接受一个函数作为参数, 来创建集合。
toMap最少应接受两个参数,一个用来生成key,另外一个用来生成value。toMap方法有三种变形:
toMap(Function super T, ? extends K> keyMapper,Function super T, ? extends U> valueMapper)
转成值
使用collect可以将Stream转换成值。如maxBy和minBy允许用户按照某个特定的顺序生成一个值。
Optional collectMaxBy = Stream.of(1, 2, 3, 4)
.collect(Collectors.maxBy(Comparator.comparingInt(o -> o)));
分割数据块 partitioningBy
collect的一个常用操作将Stream分解成两个集合。
- 两次过滤,如果过滤操作复杂,每个流上都要执行这样的操作, 代码也会变得冗余。
-
partitioningBy方法,它接受一个流,并将其分成两部分:使用Predicate对象,指定条件并判断一个元素应该属于哪个部分,并根据布尔值返回一个Map到列表
Map> collectParti = Stream.of(1, 2, 3, 4)
.collect(Collectors.partitioningBy(it -> it % 2 == 0));
数据分组 groupingBy
数据分组是一种更自然的分割数据操作, 与将数据分成true和false两部分不同,可以使用任意值对数据分组。
调用Stream的collect方法,传入一个收集器,groupingBy接受一个分类函数,用来对数据分组,就像partitioningBy一样,接受一个
Predicate对象将数据分成true和false两部分。我们使用的分类器是一个Function对象,和map操作用到的一样。
Map> collectGroup= Stream.of(1, 2, 3, 4)
.collect(Collectors.groupingBy(it -> it > 3));
// 内部实现
public static >
Collector groupingBy(Function super T, ? extends K> classifier, // 对key分类
Supplier mapFactory, // Map的容器具体类型
Collector super T, A, D> downstream // 对Value的收集操作
) {
.......
}
// 举例
//原生形式
Lists.newArrayList().stream()
.collect(() -> new HashMap>(),
(h, x) -> {
List value = h.getOrDefault(x.getType(), Lists.newArrayList());
value.add(x);
h.put(x.getType(), value);
},
HashMap::putAll
);
//groupBy形式
Lists.newArrayList().stream()
.collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.toList()));
//因为对值有了操作,因此我可以更加灵活的对值进行转换
// 返回Map>
Lists.newArrayList().stream()
.collect(Collectors.groupingBy(Person::getType, HashMap::new, Collectors.mapping(Person::getName,Collectors.toSet())));
聚合 reducing
public static Collector
reducing(T identity, BinaryOperator op) {
return new CollectorImpl<>(
boxSupplier(identity), // 一个长度为1的Object[]数组 作为容器,为何用数组?因为不可变类型
(a, t) -> { a[0] = op.apply(a[0], t); }, // 加入容器操作
(a, b) -> { a[0] = op.apply(a[0], b[0]); return a; }, // 多容器合并
a -> a[0], // 取聚合后的结果
CH_NOID);
}
// --------举例
//原生操作
final Integer[] integers = Lists.newArrayList(1, 2, 3, 4, 5)
.stream()
.collect(() -> new Integer[]{0}, (a, x) -> a[0] += x, (a1, a2) -> a1[0] += a2[0]);
//reducing操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
.stream()
.collect(Collectors.reducing(0, Integer::sum));
//当然Stream也提供了reduce操作
final Integer collect = Lists.newArrayList(1, 2, 3, 4, 5)
.stream().reduce(0, Integer::sum)
字符串拼接
String strJoin = Stream.of("1", "2", "3", "4")
.collect(Collectors.joining(",", "[", "]"));
System.out.println("strJoin: " + strJoin);
// 打印结果
// strJoin: [1,2,3,4]
// Collectors.joining 内部实现:
public static Collector joining() {
return new CollectorImpl(
StringBuilder::new, StringBuilder::append,
(r1, r2) -> { r1.append(r2); return r1; },
StringBuilder::toString, CH_NOID);
}
组合Collector
可以将Colletor 组合起来使用
Map partiCount = Stream.of(1, 2, 3, 4)
.collect(Collectors.partitioningBy(it -> it.intValue() % 2 == 0,
Collectors.counting()));
自定义Collector
求一段数字的和,如果是奇数,直接相加;如果是偶数,乘以2后在相加
定义一个类IntegerSum作为过渡容器
public class IntegerSum {
Integer sum;
public IntegerSum(Integer sum) {
this.sum = sum;
}
public IntegerSum doSum(Integer item) {
if (item % 2 == 0) {
this.sum += item * 2;
} else {
this.sum += item;
}
return this;
}
public IntegerSum doCombine(IntegerSum it) {
this.sum += it.sum;
return this;
}
public Integer toValue() {
return this.sum;
}
}
Integer sumRes = Stream.of(1, 2, 3, 4, 5).collect(new Collector() {
/**
* A function that creates and returns a new mutable result container.
*
* @return a function which returns a new, mutable result container
*/
@Override
public Supplier supplier() {
return ()-> new IntegerSum(0);
}
/**
* A function that folds a value into a mutable result container.
*
* @return a function which folds a value into a mutable result container
*/
@Override
public BiConsumer accumulator() {
// return (a, b) -> a.doSum(b);
return IntegerSum::doSum;
}
/**
* A function that accepts two partial results and merges them. The
* combiner function may fold state from one argument into the other and
* return that, or may return a new result container.
*
* @return a function which combines two partial results into a combined
* result
*/
@Override
public BinaryOperator combiner() {
// return (a,b)->a.doCombine(b);
return IntegerSum::doCombine;
}
/**
* Perform the final transformation from the intermediate accumulation type
* {@code A} to the final result type {@code R}.
*
* If the characteristic {@code IDENTITY_TRANSFORM} is
* set, this function may be presumed to be an identity transform with an
* unchecked cast from {@code A} to {@code R}.
*
* @return a function which transforms the intermediate result to the final
* result
*/
@Override
public Function finisher() {
return IntegerSum::toValue;
}
/**
* Returns a {@code Set} of {@code Collector.Characteristics} indicating
* the characteristics of this Collector. This set should be immutable.
*
* @return an immutable set of collector characteristics
*/
@Override
public Set characteristics() {
return Collections.emptySet();
}
});
forEachOrdered
forEachOrdered方法与forEach类似,都是遍历Stream中的所有元素,不同的是,如果该Stream预先设定了顺序,会按照预先设定的顺序执行(Stream是无序的),默认为元素插入的顺序。
Stream.of(5,2,1,4,3)
.forEachOrdered(integer -> {
System.out.println("integer:"+integer);
});
// 打印结果
// integer:5
// integer:2
// integer:1
// integer:4
// integer:3
max、min
// Comparator 可指定排序规则
Optional max = Stream.of(1, 2, 3, 4, 5)
.max((o1, o2) -> o2 - o1);
System.out.println("max:" + max.get());// 打印结果:max:1
Optional max = Stream.of(1, 2, 3, 4, 5)
.max((o1, o2) -> o1 - o2);
System.out.println("max:" + max.get());// 打印结果:min:5
Short-circuiting
- allMatch:判断Stream中的元素是否全部满足指定条件。如果全部满足条件返回true,否则返回false。
- anyMatch:是否有满足指定条件的元素。如果最少有一个满足条件返回true,否则返回false。
- findAny:获取含有Stream中的某个元素的Optional,如果Stream为空,则返回一个空的Optional。
- findFirst:获取含有Stream中的第一个元素的Optional
- limit方法将截取原Stream,截取后Stream的最大长度不能超过指定值N。如果原Stream的元素个数大于N,将截取原Stream的前N个元素;如果原Stream的元素个数小于或等于N,将截取原Stream中的所有元素。
- noneMatch方法将判断Stream中的所有元素是否满足指定的条件,如果所有元素都不满足条件,返回true;否则,返回false.
Ref:
https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
https://www.ibm.com/developerworks/cn/java/j-lo-java8streamapi/
https://www.baeldung.com/java-8-collectors
https://blog.csdn.net/IO_Field/article/details/54971608