提到Group By,首先想到的往往是sql中的group by操作,对搜索结果进行分组。其实Java8 Streams API中的Collector也支持流中的数据进行分组和分区操作,本片文章讲简单介绍一下,如何使用groupingBy 和 partitioningBy来对流中的元素进行分组和分区。
groupingBy
首先看一下Java8之前如果想对一个List做分组操作,我们需要如下代码操作:
@Test
public void groupListBeforeJava8() {
Map
for (Employee e : employees) {
String city = e.getCity();
List
if (empsInCity == null) {
empsInCity = new ArrayList<>();
result.put(city, empsInCity);
}
empsInCity.add(e);
}
System.out.println(result);
assertEquals(result.get("London").size(), 2);
}
而如果使用Java8中Stream的groupingBy分组器,就可以这样操作:
/**
* 使用java8 stream groupingBy操作,按城市分组list
*/
@Test
public void groupingByTest() {
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity));
System.out.println(employeesByCity);
assertEquals(employeesByCity.get("London").size(), 2);
}
上面是groupingBy分组器最常见的一个用法,下面简单介绍一下其他用法:
统计每个分组的count
/**
* 使用java8 stream groupingBy操作,按城市分组list统计count
*/
@Test
public void groupingByCountTest() {
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity, Collectors.counting()));
System.out.println(employeesByCity);
assertEquals(employeesByCity.get("London").longValue(), 2L);
}
统计分组平均值
/**
* 使用java8 stream groupingBy操作,按城市分组list并计算分组销售平均值
*/
@Test
public void groupingByAverageTest() {
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity, Collectors.averagingInt(Employee::getSales)));
System.out.println(employeesByCity);
assertEquals(employeesByCity.get("London").intValue(), 175);
}
统计分组总值
/**
* 使用java8 stream groupingBy操作,按城市分组list并计算分组销售总值
*/
@Test
public void groupingBySumTest() {
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity, Collectors.summingLong(Employee::getSales)));
//对Map按照分组销售总值逆序排序
Map
employeesByCity.entrySet().stream()
.sorted(Map.Entry.
.reversed()).forEachOrdered(e -> finalMap.put(e.getKey(), e.getValue()));
System.out.println(finalMap);
assertEquals(finalMap.get("London").longValue(), 350);
}
Join分组List
/**
* 通过type分组list,通过join操作连接分组list
*/
@Test
public void groupingByConvertResultTest(){
List
blogPostList.add(new BlogPost("post1", "zhuoli", 1, 30));
blogPostList.add(new BlogPost("post2", "zhuoli", 1, 40));
blogPostList.add(new BlogPost("post3", "zhuoli", 2, 15));
blogPostList.add(new BlogPost("post4", "zhuoli", 3, 33));
blogPostList.add(new BlogPost("post5", "Alice", 1, 99));
blogPostList.add(new BlogPost("post6", "Michael", 3, 65));
Map
.collect(Collectors.groupingBy(BlogPost::getType,
Collectors.mapping(BlogPost::getTitle, Collectors.joining(", ", "Post titles: [", "]"))));
System.out.println(postsPerType);
}
转换分组结果List -> List
/**
* 使用java8 stream groupingBy操作,按城市分组list,将List转化为name的List
*/
@Test
public void groupingByCityMapList(){
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity, Collectors.mapping(Employee::getName, Collectors.toList())));
System.out.println(namesByCity);
assertThat(namesByCity.get("London"), contains("Alice", "Bob"));
}
转换分组结果List -> Set
/**
* 使用java8 stream groupingBy操作,按城市分组list,将List转化为name的Set
*/
@Test
public void groupingByCityMapListToSet(){
Map
employees.stream().collect(Collectors.groupingBy(Employee::getCity, Collectors.mapping(Employee::getName, Collectors.toSet())));
System.out.println(namesByCity);
assertThat(namesByCity.get("London"), containsInAnyOrder("Alice", "Bob"));
}
使用对象分组List
/**
* 使用java8 stream groupingBy操作,通过Object对象的成员分组List
*/
@Test
public void groupingByObjectTest(){
List
blogPostList.add(new BlogPost("post1", "zhuoli", 1, 30));
blogPostList.add(new BlogPost("post2", "zhuoli", 1, 40));
blogPostList.add(new BlogPost("post3", "zhuoli", 2, 15));
blogPostList.add(new BlogPost("post4", "zhuoli", 3, 33));
blogPostList.add(new BlogPost("post5", "Alice", 1, 99));
blogPostList.add(new BlogPost("post6", "Michael", 3, 65));
Map
.collect(Collectors.groupingBy(post -> new Tuple(post.getAuthor(), post.getType())));
System.out.println(postsPerTypeAndAuthor);
}
使用两个成员分组List
/**
* 通过author和type分组list
*/
@Test
public void groupingByMultiItemTest(){
List
blogPostList.add(new BlogPost("post1", "zhuoli", 1, 30));
blogPostList.add(new BlogPost("post2", "zhuoli", 1, 40));
blogPostList.add(new BlogPost("post3", "zhuoli", 2, 15));
blogPostList.add(new BlogPost("post4", "zhuoli", 3, 33));
blogPostList.add(new BlogPost("post5", "Alice", 1, 99));
blogPostList.add(new BlogPost("post6", "Michael", 3, 65));
Map
.collect(Collectors.groupingBy(BlogPost::getAuthor, Collectors.groupingBy(BlogPost::getType)));
System.out.println(map);
}
自定义DistinctBy对分组结果去重
使用groupingBy源于工作的一个需求,存在如下数据结构:
@Data
@AllArgsConstructor
public class TestData {
private Integer scene;
private Integer placement;
private Long bid;
}
对TestData的List分组,统计每个sene已被占用的placement,我当时直接使用groupIngBy进行分组,得到了一个Map
public class DistinctByKey {
@Test
public void distinctByKeyTest() {
TestData testData1 = new TestData(1, 1, 100L);
TestData testData2 = new TestData(1, 2, 1000L);
TestData testData3 = new TestData(1, 3, 100L);
TestData testData4 = new TestData(1, 1, 80L);
TestData testData5 = new TestData(2, 1, 1600L);
TestData testData6 = new TestData(2, 2, 1030L);
TestData testData7 = new TestData(2, 2, 1001L);
TestData testData8 = new TestData(2, 2, 1500L);
TestData testData9 = new TestData(3, 5, 1500L);
List
/*直接按照placement去重,scene为2的placement为1和2的元素被去掉*/
List
System.out.println(distinctBykeyList);
Map
.collect(Collectors.toMap(Map.Entry::getKey,
entry -> entry.getValue().stream().filter(distinctByKey(TestData::getPlacement)).map(TestData::getPlacement).collect(Collectors.toList())));
System.out.println(resultMap);
}
private static
Set