Java 内存文件映射方式读取超大文件解析

问:使用内存文件映射 MappedByteBuffer 读超大文件会有什么问题吗?

答:这种方式存在一个致命问题就是依然没法读取超大文件(大于 Integer.MAX_VALUE),因为 FileChannel 的 map 方法中 size 参数会有大小限制,源码中发现该参数值大于 Integer.MAX_VALUE 时会直接抛出 IllegalArgumentException("Size exceeds Integer.MAX_VALUE")异常,所以对于特别大的文件其依然不适合。

本质上是由于 java.nio.MappedByteBuffer 直接继承自 java.nio.ByteBuffer,而 ByteBuffer 的索引是 int 类型的,所以 MappedByteBuffer 也只能最大索引到 Integer.MAX_VALUE 的位置,所以 FileChannel 的 map 方法会做参数合法性检查。

我们可以通过多个内存文件映射来解决这个问题,具体如下。

class BigMappedByteBufferReader {
    private MappedByteBuffer[] mappedByteBuffers;
    private FileInputStream inputStream;
    private FileChannel fileChannel;

    private int bufferCountIndex = 0;
    private int bufferCount;

    private int byteBufferSize;
    private byte[] byteBuffer;

    public BigMappedByteBufferReader(String fileName, int byteBufferSize) throws IOException {
        this.inputStream = new FileInputStream(fileName);
        this.fileChannel = inputStream.getChannel();
        long fileSize = fileChannel.size();
        this.bufferCount = (int) Math.ceil((double) fileSize / (double) Integer.MAX_VALUE);
        this.mappedByteBuffers = new MappedByteBuffer[bufferCount];
        this.byteBufferSize = byteBufferSize;

        long preLength = 0;
        long regionSize = Integer.MAX_VALUE;
        for (int i = 0; i < bufferCount; i++) {
            if (fileSize - preLength < Integer.MAX_VALUE) {
                regionSize = fileSize - preLength;
            }
            mappedByteBuffers[i] = fileChannel.map(FileChannel.MapMode.READ_ONLY, preLength, regionSize);
            preLength += regionSize;
        }
    }

    public synchronized int read() {
        if (bufferCountIndex >= bufferCount) {
            return -1;
        }

        int limit = mappedByteBuffers[bufferCountIndex].limit();
        int position = mappedByteBuffers[bufferCountIndex].position();

        int realSize = byteBufferSize;
        if (limit - position < byteBufferSize) {
            realSize = limit - position;
        }
        byteBuffer = new byte[realSize];
        mappedByteBuffers[bufferCountIndex].get(byteBuffer);

        //current fragment is end, goto next fragment start.
        if (realSize < byteBufferSize && bufferCountIndex < bufferCount) {
            bufferCountIndex++;
        }
        return realSize;
    }

    public void close() throws IOException {
        fileChannel.close();
        inputStream.close();
        for (MappedByteBuffer byteBuffer: mappedByteBuffers) {
            byteBuffer.clear();
        }
        byteBuffer = null;
    }

    public synchronized byte[] getCurrentBytes() {
        return byteBuffer;
    }
}

public class Test {
    public static void main(String[] args) throws Exception {
        BigMappedByteBufferReader reader = new BigMappedByteBufferReader("superbig.txt", 1024);
        while (reader.read() != -1) {
            byte[] bytes = reader.getCurrentBytes();
            //超大文件搞事情
            System.out.println(new String(bytes));
        }
        reader.close();
    }
}

如上便是一种解决方案,其实质依然是分割。

本文参考自 内存文件映射方式读取超大文件踩坑题解析

你可能感兴趣的:(Java 内存文件映射方式读取超大文件解析)