文件 IO 操作的一些实践

[TOC]

FileChanel原理

官方对Channel的解释

image.png

（一个用于输入/输出操作的连接。通道表示对实体的开放连接，如硬件设备、文件、网络套接字或能够执行一个或多个不同的输入/输出操作的程序组件，例如读取或写入。）

image.png

Channel是对I/O操作的封装。
FileChannel配合着ByteBuffer，将读写的数据缓存到内存中，然后以批量/缓存的方式read/write，省去了非批量操作时的重复中间操作，操纵大文件时可以显著提高效率（和Stream以byte数组方式有什么区别？经过测试，效率上几乎无区别）。

不过对于运行在容器中的应用需要考虑GC，而ByteBuffer可以使用直接内存（系统内存）（allocateDirect），使用后无需jvm回收。

ByteBuffer还有一个子类MappedByteBuffer可以直接将文件映射到操作系统的虚拟内存，读写文件速度会更快

FileChannel和Stream的使用方式和效率对比代码

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.time.Duration;
import java.time.Instant;

public class FileChannelTest {

    public static void main(String[] args) {
        // 4GB的数据
        File sourceFile = new File("d://dd.iso");
        File targetFile = new File("d://ee.iso");
        targetFile.deleteOnExit();
        try {
            targetFile.createNewFile();
        } catch (IOException e) {
            e.printStackTrace();
        }

        // stream方式
        FileChannelTest.copyFileByStream(sourceFile, targetFile);

        // channel方式
//        FileChannelTest.copyFileByFileChannel(sourceFile, targetFile);
    }

    /**
     * channel方式
     *
     * @param sourceFile
     * @param targetFile
     */
    public static void copyFileByFileChannel(File sourceFile, File targetFile) {
        Instant begin = Instant.now();

        RandomAccessFile randomAccessSourceFile;
        RandomAccessFile randomAccessTargetFile;
        try {
            // 构造RandomAccessFile，用于获取FileChannel
            randomAccessSourceFile = new RandomAccessFile(sourceFile, "r");
            randomAccessTargetFile = new RandomAccessFile(targetFile, "rw");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            return;
        }

        FileChannel sourceFileChannel = randomAccessSourceFile.getChannel();
        FileChannel targetFileChannel = randomAccessTargetFile.getChannel();

        // 分配1MB的缓存空间
        ByteBuffer byteBuffer = ByteBuffer.allocate(1024 * 1024);
        try {
            while (sourceFileChannel.read(byteBuffer) != -1) {
                byteBuffer.flip();
                targetFileChannel.write(byteBuffer);
                byteBuffer.clear();
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                sourceFileChannel.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                targetFileChannel.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        System.out.println("total spent " + Duration.between(begin, Instant.now()).toMillis());
    }

    /**
     * stream方式
     *
     * @param sourceFile
     * @param targetFile
     */
    public static void copyFileByStream(File sourceFile, File targetFile) {
        Instant begin = Instant.now();

        FileInputStream fis;
        FileOutputStream fos;
        try {
            fis = new FileInputStream(sourceFile);
            fos = new FileOutputStream(targetFile);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            return;
        }
        // 使用byte数组读取方式，缓存1MB数据
        byte[] readed = new byte[1024 * 1024];
        try {
            while (fis.read(readed) != -1) {
                fos.write(readed);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                fis.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

        System.out.println("total spent " + Duration.between(begin, Instant.now()).toMillis());
    }
}

Java文件NIO读取的本质

FileInputStream典型代码

    public static void main(String[] args) {
        System.out.println(System.getProperty("user.dir"));
        File file = new File(System.getProperty("user.dir") + "/src/oio/file.txt");
        System.out.println("file name: " + file.getName());

        InputStream inputStream = null;
        try {
            inputStream = new FileInputStream(file);
            byte[] bytes = new byte[(int) file.length()];
            int len = inputStream.read(bytes);
            System.out.println("bytes len :" + len + " detail: " + new String(bytes));
        } catch (IOException e) {
            e.printStackTrace();
            if (inputStream != null) {
                try {
                    inputStream.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
    }

FileChannel典型代码

public class NIOTest {

    public static void main(String[] args) throws IOException {
        ByteBuffer byteBuffer = ByteBuffer.allocate(4);//①
        Path path = Paths.get(System.getProperty("user.dir") + "/assets/file.txt");
        FileChannel fileChannel = FileChannel.open(path, StandardOpenOption.READ);//②
        int len = fileChannel.read(byteBuffer);//③
        while (len != -1) {
            byteBuffer.flip();//④
            while (byteBuffer.hasRemaining()){
                System.out.print((char) byteBuffer.get());//⑤
            }
            byteBuffer.clear();//⑥
            len = fileChannel.read(byteBuffer);//⑦
        }
    }
}

FileInputStream和FileChannel的深度分析

FileInputStream的read方法，调用了native的read0

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/native/java/io/io_util.c的jint[] readSingle(JNIEnv *env, jobject this, jfieldID fid)：

jint
readSingle(JNIEnv *env, jobject this, jfieldID fid) {
    jint nread;
    char ret;
    FD fd = GET_FD(this, fid);
    if (fd == -1) {
        JNU_ThrowIOException(env, "Stream Closed");
        return -1;
    }
    nread = IO_Read(fd, &ret, 1);
    if (nread == 0) { /* EOF */
        return -1;
    } else if (nread == -1) { /* error */
        JNU_ThrowIOExceptionWithLastError(env, "Read error");
    }
    return ret & 0xFF;
}

核心是IO_Read方法。

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/windows/native/java/io/io_util_md.h定义了宏

#define IO_Read handleRead

http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/windows/native/java/io/io_util_md.c的handleRead方法：

JNIEXPORT
jint
handleRead(FD fd, void *buf, jint len)
{
    DWORD read = 0;
    BOOL result = 0;
    HANDLE h = (HANDLE)fd;
    if (h == INVALID_HANDLE_VALUE) {
        return -1;
    }
    result = ReadFile(h,          /* File handle to read */
                      buf,        /* address to put data */
                      len,        /* number of bytes to read */
                      &read,      /* number of bytes read */
                      NULL);      /* no overlapped struct */
    if (result == 0) {
        int error = GetLastError();
        if (error == ERROR_BROKEN_PIPE) {
            return 0; /* EOF */
        }
        return -1;
    }
    return (jint)read;
}

核心方法是ReadFile方法。

FileChannel的read方法

使用FIleChannelImpl作为FileChannel的实现类，read方法：

   public int read(ByteBuffer dst) throws IOException {
       ensureOpen();
       if (!readable)
           throw new NonReadableChannelException();
       synchronized (positionLock) {
           int n = 0;
           int ti = -1;
           try {
               begin();
               ti = threads.add();
               if (!isOpen())
                   return 0;
               do {
                   n = IOUtil.read(fd, dst, -1, nd);
               } while ((n == IOStatus.INTERRUPTED) && isOpen());
               return IOStatus.normalize(n);
           } finally {
               threads.remove(ti);
               end(n > 0);
               assert IOStatus.check(n);
           }
       }
   }

核心方法是IOUtil.read。

进入IOUtil类：

    static int read(FileDescriptor fd, ByteBuffer dst, long position,
                    NativeDispatcher nd)
        throws IOException
    {
        if (dst.isReadOnly())
            throw new IllegalArgumentException("Read-only buffer");
        if (dst instanceof DirectBuffer)
            return readIntoNativeBuffer(fd, dst, position, nd);

        // Substitute a native buffer
        ByteBuffer bb = Util.getTemporaryDirectBuffer(dst.remaining());
        try {
            int n = readIntoNativeBuffer(fd, bb, position, nd);
            bb.flip();
            if (n > 0)
                dst.put(bb);
            return n;
        } finally {
            Util.offerFirstTemporaryDirectBuffer(bb);
        }
    }

核心方法是readIntoNativeBuffer。

进入readIntoNativeBuffer方法

    private static int readIntoNativeBuffer(FileDescriptor fd, ByteBuffer bb,
                                            long position, NativeDispatcher nd)
        throws IOException
    {
        int pos = bb.position();
        int lim = bb.limit();
        assert (pos <= lim);
        int rem = (pos <= lim ? lim - pos : 0);

        if (rem == 0)
            return 0;
        int n = 0;
        if (position != -1) {
            n = nd.pread(fd, ((DirectBuffer)bb).address() + pos,
                         rem, position);
        } else {
            n = nd.read(fd, ((DirectBuffer)bb).address() + pos, rem);
        }
        if (n > 0)
            bb.position(pos + n);
        return n;
    }

核心方法是nd.pread（或nd.read，本质上一样）。这里的nd是抽象类sun.nio.ch.NativeDispatcher,具体类是sun.nio.ch.FileDispatcherImpl。

进入sun.nio.ch.FileDispatcherImpl类:

    int read(FileDescriptor var1, long var2, int var4) throws IOException {
        return read0(var1, var2, var4);
    }

最终进入了一个native的read0方法。

根据openjdk1.7，查看这个native方法，在http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/windows/native/sun/nio/ch/FileDispatcherImpl.c中（以windows为例）：

JNIEXPORT jint JNICALL
Java_sun_nio_ch_FileDispatcherImpl_read0(JNIEnv *env, jclass clazz, jobject fdo,
                                      jlong address, jint len)
{
    DWORD read = 0;
    BOOL result = 0;
    HANDLE h = (HANDLE)(handleval(env, fdo));

    if (h == INVALID_HANDLE_VALUE) {
        JNU_ThrowIOExceptionWithLastError(env, "Invalid handle");
        return IOS_THROWN;
    }
    result = ReadFile(h,          /* File handle to read */
                      (LPVOID)address,    /* address to put data */
                      len,        /* number of bytes to read */
                      &read,      /* number of bytes read */
                      NULL);      /* no overlapped struct */
    if (result == 0) {
        int error = GetLastError();
        if (error == ERROR_BROKEN_PIPE) {
            return IOS_EOF;
        }
        if (error == ERROR_NO_DATA) {
            return IOS_UNAVAILABLE;
        }
        JNU_ThrowIOExceptionWithLastError(env, "Read failed");
        return IOS_THROWN;
    }
    return convertReturnVal(env, (jint)read, JNI_TRUE);
}

核心方法是ReadFile方法。

FileInputStream和FileChannel最终均调用了native的ReadFile方法，本质是一样的！

初识 FileChannel 和 MMAP

首先，文件 IO 类型的比赛最重要的一点，就是选择好读写文件的方式，那 JAVA 中文件 IO 有多少种呢？原生的读写方式大概可以被分为三种：普通 IO，FileChannel(文件通道)，MMAP(内存映射)。区分他们也很简单，例如 FileWriter,FileReader 存在于 java.io 包中，他们属于普通 IO；FileChannel 存在于 java.nio 包中，属于 NIO 的一种，但是注意 NIO 并不一定意味着非阻塞，这里的 FileChannel 就是阻塞的；较为特殊的是后者 MMAP，它是由 FileChannel 调用 map 方法衍生出来的一种特殊读写文件的方式，被称之为内存映射。

使用 FIleChannel 的方式：

FileChannel fileChannel = new RandomAccessFile(new File("db.data"), "rw").getChannel();

获取 MMAP 的方式：

MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, filechannel.size();

MappedByteBuffer 便是 JAVA 中 MMAP 的操作类。

面向于字节传输的传统 IO 方式遭到了我们的唾弃，我们重点探讨 FileChannel 和 MMAP 这两种读写方式的区别。

FileChannel 读写

// 写
byte[] data = new byte[4096];
long position = 1024L;
// 指定 position 写入 4kb 的数据
fileChannel.write(ByteBuffer.wrap(data), position);
// 从当前文件指针的位置写入 4kb 的数据
fileChannel.write(ByteBuffer.wrap(data));

// 读
ByteBuffer buffer = ByteBuffer.allocate(4096);
long position = 1024L;
// 指定 position 读取 4kb 的数据
fileChannel.read(buffer,position)；
// 从当前文件指针的位置读取 4kb 的数据
fileChannel.read(buffer);

FileChannel 大多数时候是和 ByteBuffer 这个类打交道，你可以将它理解为一个 byte[] 的封装类，提供了丰富的 API 去操作字节，不了解的同学可以去熟悉下它的 API。值得一提的是，write 和 read 方法均是线程安全的，FileChannel 内部通过一把 private final Object positionLock = new Object(); 锁来控制并发。

FileChannel 为什么比普通 IO 要快呢？这么说可能不严谨，因为你要用对它，FileChannel 只有在一次写入 4kb 的整数倍时，才能发挥出实际的性能，这得益于 FileChannel 采用了 ByteBuffer 这样的内存缓冲区，让我们可以非常精准的控制写盘的大小，这是普通 IO 无法实现的。4kb 一定快吗？也不严谨，这主要取决你机器的磁盘结构，并且受到操作系统，文件系统，CPU 的影响，例如中间件性能挑战赛时的盘，一次至少写入 64kb 才能发挥出最高的 IOPS。

另外一点，成就了 FileChannel 的高效，介绍这点之前，我想做一个提问：FileChannel 是直接把 ByteBuffer 中的数据写入到磁盘吗？思考几秒…答案是：NO。ByteBuffer 中的数据和磁盘中的数据还隔了一层，这一层便是 PageCache，是用户内存和磁盘之间的一层缓存。我们都知道磁盘 IO 和内存 IO 的速度可是相差了好几个数量级。我们可以认为 filechannel.write 写入 PageCache 便是完成了落盘操作，但实际上，操作系统最终帮我们完成了 PageCache 到磁盘的最终写入，理解了这个概念，你就应该能够理解 FileChannel 为什么提供了一个 force() 方法，用于通知操作系统进行及时的刷盘。

同理，当我们使用 FileChannel 进行读操作时，同样经历了：磁盘 ->PageCache-> 用户内存这三个阶段，对于日常使用者而言，你可以忽略掉 PageCache.

MMAP 读写

/ 写
byte[] data = new byte[4];
int position = 8;
// 从当前 mmap 指针的位置写入 4b 的数据
mappedByteBuffer.put(data);
// 指定 position 写入 4b 的数据
MappedByteBuffer subBuffer = mappedByteBuffer.slice();
subBuffer.position(position);
subBuffer.put(data);

// 读
byte[] data = new byte[4];
int position = 8;
// 从当前 mmap 指针的位置读取 4b 的数据
mappedByteBuffer.get(data)；
// 指定 position 读取 4b 的数据
MappedByteBuffer subBuffer = mappedByteBuffer.slice();
subBuffer.position(position);
subBuffer.get(data);

FileChannel 已经足够强大了，MappedByteBuffer 还能玩出什么花来呢？请容许我卖个关子先，先介绍一下 MappedByteBuffer 的使用注意点。

当我们执行 fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, 1.5 * 1024 * 1024 * 1024); 之后，观察一下磁盘上的变化，会立刻获得一个 1.5G 的文件，但此时文件的内容全部是 0（字节 0）。这符合 MMAP 的中文描述：内存映射文件，我们之后对内存中 MappedByteBuffer 做的任何操作，都会被最终映射到文件之中，

mmap 把文件映射到用户空间里的虚拟内存，省去了从内核缓冲区复制到用户空间的过程，文件中的位置在虚拟内存中有了对应的地址，可以像操作内存一样操作这个文件，相当于已经把整个文件放入内存，但在真正使用到这些数据前却不会消耗物理内存，也不会有读写磁盘的操作，只有真正使用这些数据时，也就是图像准备渲染在屏幕上时，虚拟内存管理系统 VMS 才根据缺页加载的机制从磁盘加载对应的数据块到物理内存进行渲染。这样的文件读写文件方式少了数据从内核缓存到用户空间的拷贝，效率很高

看了稍微官方一点的描述，你可能对 MMAP 有了些许的好奇，有这么厉害的黑科技存在的话，还有 FileChannel 存在的意义吗！并且网上很多文章都在说，MMAP 操作大文件性能比 FileChannel 搞出一个数量级！然而，通过我比赛的认识，MMAP 并非是文件 IO 的银弹，它只有在一次写入很小量数据的场景下才能表现出比 FileChannel 稍微优异的性能。紧接着我还要告诉你一些令你沮丧的事，至少在 JAVA 中使用 MappedByteBuffer 是一件非常麻烦并且痛苦的事，主要表现为三点：

MMAP 使用时必须实现指定好内存映射的大小，并且一次 map 的大小限制在 1.5G 左右，重复 map 又会带来虚拟内存的回收、重新分配的问题，对于文件不确定大小的情形实在是太不友好了。
MMAP 使用的是虚拟内存，和 PageCache 一样是由操作系统来控制刷盘的，虽然可以通过 force() 来手动控制，但这个时间把握不好，在小内存场景下会很令人头疼。
MMAP 的回收问题，当 MappedByteBuffer 不再需要时，可以手动释放占用的虚拟内存，但…方式非常的诡异。

public static void clean(MappedByteBuffer mappedByteBuffer) {
    ByteBuffer buffer = mappedByteBuffer;
    if (buffer == null || !buffer.isDirect() || buffer.capacity()== 0)
        return;
    invoke(invoke(viewed(buffer), "cleaner"), "clean");
}

private static Object invoke(final Object target, final String methodName, final Class... args) {
    return AccessController.doPrivileged(new PrivilegedAction

文件 IO 操作的一些实践

FileChanel原理

FileChannel和Stream的使用方式和效率对比代码

Java文件NIO读取的本质

FileInputStream和FileChannel的深度分析

FileChannel的read方法

初识 FileChannel 和 MMAP

FileChannel 读写

MMAP 读写

顺序读比随机读快，顺序写比随机写快

直接内存 (堆外) VS 堆内内存

Direct IO

你可能感兴趣的:(文件 IO 操作的一些实践)