实用的Java8集合按指定大小分组的方法

实用的Java8集合按指定大小分组的方法

转自:https://e.printstacktrace.blog/divide-a-list-to-lists-of-n-size-in-Java-8/

Divide a list to lists of n size in Java 8

Every Java developer works with lists daily. There are many popular list (or collection) operations implemented in the standard Java 8 library, but there is one that is useful and commonly used, yet missing - partitioning. In this blog post, I would like to show you how you can split any list into chunks of fixed size without using any 3rd party library. Let’s start!

Introduction

Partitioning (also known as chunking) is an operation that transforms a collection of elements into a collection of chunks of a given size. For instance, let’s say we have a list of 7 elements (incrementing numbers from 1 to 7) and we want to split it into a list of chunks of size 2.

[1,2,3,4,5,6,7] -> [[1,2], [3,4], [5,6], [7]]

Let’s implement this operation using a new type of list, called Partition. We will extend the AbstractList> class, and we will implement two methods - get(int index) and size(). The whole idea here is to hide the original list inside this class and modify its behavior, so instead of returning single elements from the original list, we are going to return chunks of a given size. Let’s take a look at the code and the following example to make this concept much easier to understand.

Listing 1. src/main/java/com/github/wololock/Partition.java

package com.github.wololock;

import java.util.AbstractList;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public final class Partition extends AbstractList> {

    private final List list;
    private final int chunkSize;

    public Partition(List list, int chunkSize) {
        this.list = new ArrayList<>(list);
        this.chunkSize = chunkSize;
    }

    public static  Partition ofSize(List list, int chunkSize) {
        return new Partition<>(list, chunkSize);
    }

    @Override
    public List get(int index) {
        int start = index * chunkSize;
        int end = Math.min(start + chunkSize, list.size());

        if (start > end) {
            throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
        }

        return new ArrayList<>(list.subList(start, end));
    }

    @Override
    public int size() {
        return (int) Math.ceil((double) list.size() / (double) chunkSize);
    }
}

All we have to do is to override those two methods, and our list of elements starts behaving like a list of chunks of the input elements. Those overridden methods affect methods like toString(), forEach(), or iterator(). Let’s take a look at a simple usage example to see how this Partition object behaves:

Listing 2. An example of Partition class usage

final List numbers = Arrays.asList(1,2,3,4,5,6,7);

System.out.println(Partition.ofSize(numbers, 3));
System.out.println(Partition.ofSize(numbers, 2));

Running this example will produce the following input to the console:

Listing 3. The console output

[[1, 2, 3], [4, 5, 6], [7]]
[[1, 2], [3, 4], [5, 6], [7]]

Why bother with creating a new class?

We can also ask ourselves if creating such a Partition class is mandatory. Maybe there is an alternative way to achieve the same effect without using 3rd party libraries? Well, the good news is - there are at least two different ways to get the same result. Let’s take a look at the first of them.

Imperative for-loop

If we are OK with the imperative programming style, we can use Java’s for-loop and do the partitioning while iterating the input list. In this case, we need to control if the given number belongs to the existing chunk or does it belong to the next one (by its index number). The code could look like this:

Listing 4. Imperative for-loop method

final List numbers = Arrays.asList(1,2,3,4,5,6,7);
final int chunkSize = 3;
final AtomicInteger counter = new AtomicInteger();

for (int number : numbers) {
    if (counter.getAndIncrement() % chunkSize == 0) {
        result.add(new ArrayList<>());
    }
    result.get(result.size() - 1).add(number);
}

System.out.println(result);

It produces the following console output:

[[1, 2, 3], [4, 5, 6], [7]]

Using Java 8 Stream API + grouping collector

Alternatively, we could use Java 8 Stream API and its Collectors.groupingBy() collector method. Using this method we produce a map from a stream, but we can invoke values() method on the final map to get a collection of all its values. Then we can decide if we want to stay with the Collection> or if we want to cast it to List> by passing the collection to the, e.g. ArrayList constructor. Below you can find an exemplary code.

final List numbers = Arrays.asList(1,2,3,4,5,6,7);
final int chunkSize = 3;
final AtomicInteger counter = new AtomicInteger();

final Collection> result = numbers.stream()
    .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize))
    .values();

System.out.println(result);

Less lines of code, but the same console output:

[[1, 2, 3], [4, 5, 6], [7]]

Which method to chose?

Now you may wonder - which method is the best? Should we use the Partition class, or is it better to use a good old imperative style that is quite easy to read and reason about? It depends. There is one significant difference between the first approach and the remaining two - the efficiency. Partitioning very small lists won’t make a big difference, but it starts making a huge (I mean huuuuge) difference when we start to play with large and larger lists. Let’s not speculate, but look at the numbers instead.

I created benchmark tests using JMH for the following scenarios:

  • partitioning 20 elements list into chunks of size 3,
  • partitioning 10,000 elements list into chunks of size 23,
  • and finally, partitioning 10,000,000 elements list into chunks of size 1024.

We will measure the efficiency of 4 different methods:

  • partitioning with the Partition class,
  • partitioning with the imperative for-loop,
  • partitioning with the Java 8 Stream API and Collectors.groupingBy(),
  • partitioning with the Java 8 Stream API with a custom collector that makes use of Partition class.

All measurements use microsecond unit of time. (1 μs is equal to 0.001 ms and 0.000001 s).

All benchmark tests can be found in the wololock/java-performance-benchmarks Github repository.Feel free to clone the repository and run benchmarks on your own computer with the following command:$ ./gradlew jmhI’ve run tests on a Lenovo ThinkPad T440p laptop with Intel® Core™ i7-4900MQ CPU @ 2.80GHz and 16 GBs RAM. I used JDK 1.8.0_201 (Java HotSpot™ 64-Bit Server VM, 25.201-b09).Below you can find JMH settings used for each benchmark test case:# JMH version: 1.21 # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09 # VM invoker: /home/wololock/.sdkman/candidates/java/8.0.201-oracle/jre/bin/java # VM options: # Warmup: 1 iterations, 30 s each # Measurement: 42 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op

Here are the benchmark tests results.

Listing 5. Partitioning list benchmark results

Benchmark                                                 Mode  Cnt       Score      Error  Units
JavaListPartitionBenchmark.A1_smallListImperative         avgt   42       0,501 ±    0,002  us/op
JavaListPartitionBenchmark.A2_smallListStreamGroupingBy   avgt   42       0,637 ±    0,004  us/op
JavaListPartitionBenchmark.A3_smallListStreamPartitioned  avgt   42       0,311 ±    0,004  us/op
JavaListPartitionBenchmark.A4_smallListToPartition        avgt   42       0,132 ±    0,006  us/op
JavaListPartitionBenchmark.B1_largeListImperative         avgt   42     173,364 ±    2,818  us/op
JavaListPartitionBenchmark.B2_largeListStreamGroupingBy   avgt   42     255,984 ±    0,540  us/op
JavaListPartitionBenchmark.B3_largeListStreamPartitioned  avgt   42      65,313 ±    0,387  us/op
JavaListPartitionBenchmark.B4_largeListToPartition        avgt   42       4,686 ±    0,011  us/op
JavaListPartitionBenchmark.C1_hugeListImperative          avgt   42  154043,254 ± 2497,491  us/op
JavaListPartitionBenchmark.C2_hugeListStreamGroupingBy    avgt   42  249341,828 ±  842,087  us/op
JavaListPartitionBenchmark.C3_hugeListStreamPartitioned   avgt   42   91150,418 ± 1079,959  us/op
JavaListPartitionBenchmark.C4_hugeListToPartition         avgt   42    8737,578 ±  138,671  us/op

Let’s look at those results more closely and see what do they look like as graphs.

image-20200606154504348

The first benchmark tests a small list of 20 elements. As you can see, in this case, it doesn’t make a big difference which approach we choose. The difference between the fastest and the slowest approach is equal to 0.369 microseconds.

image-20200606154544083

However, things start to change when we have to deal with lists containing thousands of elements. The slowest method requires 255 microseconds to complete (0.255 ms), while the fastest method is almost 64 times more efficient and it needs only 4 microseconds to do its job (0.004 ms). At this point, you can start thinking about the efficiency of available solutions. The context still matters - if you perform partitioning operation rarely, you might not need the fastest method to do the job efficiently.

image-20200606154532470

The final benchmark has only a single winner - the Partition class method. In this case, the slowest method takes almost 249341 microseconds (~250 ms), while the fastest method does the same job in 8737 microseconds (8.737 ms). However, the gap between both methods decreased to the size of 28 times. It happens mostly because the Partition class creates a copy of the input list. If we use a reference to the existing list instead of creating its copy, we could reduce 8737 microseconds to less than 2 microseconds. It could be quite risky - using a reference instead of a copy means that whenever the input list changes outside the context of Partition class, it changes our partitioned list.

What if I want to use a 3rd party library?

That’s completely fine. There are at least two libraries that provide list partitioning operation, using a similar method to the Partition class one.

  • Google Guava has Lists.partition(List list, int size) method (docs)
  • Apache Commons Collections has ListUtils.partition(List list, int size) method ([docs](https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/ListUtils.html#partition(java.util.List, int)))

When you decide to use any of these two, be aware that they don’t create a copy of the input list. To avoid some nasty side effects, you may always want to pass a copy of a list instead.

Lists.partition(new ArrayList<>(numbers), 3);

Google 和Apache的 方式与建立Partition类的方法相似;

Google Guava Lists.partition(List list, int size) method 源码

public static  List> partition(List list, int size) {
    checkNotNull(list);
    checkArgument(size > 0);
    return (list instanceof RandomAccess)
        ? new RandomAccessPartition<>(list, size)
        : new Partition<>(list, size);
}

private static class RandomAccessPartition extends Partition implements RandomAccess {
    RandomAccessPartition(List list, int size) {
      super(list, size);
    }
  }

private static class Partition extends AbstractList> {
    final List list;
    final int size;

    Partition(List list, int size) {
      this.list = list;
      this.size = size;
    }

    @Override
    public List get(int index) {
      checkElementIndex(index, size());
      int start = index * size;
      int end = Math.min(start + size, list.size());
      return list.subList(start, end);
    }

    @Override
    public int size() {
      return IntMath.divide(list.size(), size, RoundingMode.CEILING);
    }

    @Override
    public boolean isEmpty() {
      return list.isEmpty();
    }
  }

你可能感兴趣的:(实用的Java8集合按指定大小分组的方法)