前言:
在Stream中distinct()可以进行去重,内部原理是通过构建一个ConcurrentHashMap并使用putIfAbsent()来去重
但是由于ConcurrentHashMap与HashMap一样是通过hashCode()和equals()方法来查找和比对的而如果对象所属的类没有将hashCode()和equals()重写的话
就会导致同一个类的多个new出来的对象是无法判定为"相等"的哪怕属性一模一样(String相等是因为重写了hashCode()和equals())
因为distinct()只支持基本类型,基本类型的封装类型,已重写hashCode()和equals()的类的对象去重,无法将一个未重写
hashCode()和equals()的普通类进行去重所以有了以下解决方案
实践代码:
/**
* 用于对象去重
* @param keyExtractor 需要去重的属性
* @param
* @return
*/
private static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
//记录已有对象或者属性
ConcurrentSkipListMap<Object,Boolean> skipListMap = new ConcurrentSkipListMap<>();
//获取对象的属性值,且使用putIfAbsent判断存在则不添加到map而且返回数值不存在则添加返回null,value恒定为true
//JSONObject.toJSONString(keyExtractor.apply(t)) 是为了解决null参数和对象比较的问题
//在Stream distinct()中使用了支持null为key的hashSet来进行处理 java/util/stream/DistinctOps.java:90 但是没有解决对象比较的问题
//所以虽然序列化消耗性能但是也没有更好的办法
Predicate<T> predicate = t -> skipListMap.putIfAbsent(JSONObject.toJSONString(keyExtractor.apply(t)), Boolean.TRUE) == null;
return predicate;
}
private static class Test {
Integer aid;
Integer uid;
public Test(Integer aid,Integer uid) {
this.aid = aid;
this.uid = uid;
}
public Integer getAid() {
return aid;
}
public Test setAid(Integer aid) {
this.aid = aid;
return this;
}
public Integer getUid() {
return uid;
}
public Test setUid(Integer uid) {
this.uid = uid;
return this;
}
}
public static void main(String[] args) {
//待去重列表
List<Test> collect = Stream.of( new Test(2, 1), new Test(2, 1), new Test(2, 2),new Test(1, 1), new Test(1, 1), new Test(1, 2)).collect(Collectors.toList());
//filter单属性去重
System.out.println(JSONObject.toJSONString(collect.stream().filter(distinctByKey(Test::getAid)).collect(Collectors.toList())));
//filter多属性去重
System.out.println(JSONObject.toJSONString(collect.stream().filter(distinctByKey(test -> Stream.of(test.getAid(),test.getUid()).toArray())).collect(Collectors.toList())));
//collectingAndThen 单属性去重,会打乱collect原有排序变成自然升序
System.out.println(JSONObject.toJSONString(collect.stream().collect(Collectors.collectingAndThen(Collectors.toCollection(()->new ConcurrentSkipListSet<>(Comparator.comparing(Test::getAid))),ArrayList::new))));
//collectingAndThen 多属性去重,会打乱collect原有排序变成自然升序
System.out.println(JSONObject.toJSONString(collect.stream().collect(Collectors.collectingAndThen(Collectors.toCollection(()->new ConcurrentSkipListSet<>(Comparator.comparing(Test::getAid).thenComparing(Test::getUid))),ArrayList::new))));
}
运行结果:
[{
"aid":2,"uid":1},{
"aid":1,"uid":1}]
[{
"aid":2,"uid":1},{
"aid":2,"uid":2},{
"aid":1,"uid":1},{
"aid":1,"uid":2}]
[{
"aid":1,"uid":1},{
"aid":2,"uid":1}]
[{
"aid":1,"uid":1},{
"aid":1,"uid":2},{
"aid":2,"uid":1},{
"aid":2,"uid":2}]
总结:
filter需要额外的distinctByKey()方法还有序列化的性能消耗,但是不会打乱数据原有顺序
collectingAndThen 比较简单快捷性能也是ok的,但是会打乱数据原有顺序变成自然顺序
二者如何取舍请从实际情况出发