Java(1.8)高级特性-输入输出

I/O是一个考试必考，面试必问，又非常繁琐的知识点。各种分不清的类和初始化顺序一直让我觉得很琐碎。当我整体看完I/O的介绍，在脑子里搭建起I/O的系统才发现原来这些东西并不是零零散散的，各个类之间通过搭配使用实现了多功能的输入输出。

I/O

I/O主要分为两类，分别是字符I/O和字节I/O。字符I/O即文本的输入输出，会涉及到编码的问题；字节I/O即输入输出基本类型的二进制形式或对象序列化后的结果。所有的输出流均通过synchronized关键字保证了并发访问的安全性。

字符I/O

字符I/O比较简单，Java中提供了Writer和Reader两个类实现字符的输入和输出。

Writer

Writer的派生类有BufferedWriter、CharArrayWriter、FilterWriter、OutputStreamWriter（派生类有FileWriter）、PipedWriter、PrintWriter、StringWriter。下面来介绍各个类的用法。

Writer
Writer类定义了字符输出基本的方法，其派生类的相同方法就不再重复描述。对于所有的输出类来说，当Writer输出字符时会先将其保存在缓冲区中，直到缓冲区满了才会一次性将整个缓冲区的内容输出到文件中，如果想要手动的输出，则可以调用flush()方法或也可以启用自动冲刷模式，每当调用println就会输出到文件。

public void write(int c) throws IOException;
public void write(char cbuf[]) throws IOException;
abstract public void write(char cbuf[], int off, int len) throws IOException;
//append效果与writer相同
public Writer append(char c) throws IOException;
public Writer append(CharSequence csq) throws IOException;
public Writer append(CharSequence csq, int start, int end) throws IOException ;
abstract public void flush() throws IOException;
abstract public void close() throws IOException;

PrintWriter
PrintWriter是一个主要的输出字符的类，可以通过文件名、OutputStream对象、Writer对象或File对象来进行初始化，其中可以通过设定csn来指定文件的编码格式。

public PrintWriter(String fileName);
public PrintWriter(String fileName, String csn) throws FileNotFoundException, UnsupportedEncodingException;
public PrintWriter(OutputStream out);
public PrintWriter(OutputStream out, boolean autoFlush);
public PrintWriter (Writer out);
public PrintWriter(Writer out, boolean autoFlush);
public PrintWriter(File file) throws FileNotFoundException;
public PrintWriter(File file, String csn) throws FileNotFoundException, UnsupportedEncodingException;

除了Writer中已经定义的方法，PrintWriter还提供了print()和println()方法，其中参数可以是任意类型，底层都调用了write(String)的方法。此外，该类还支持类似C语言输出printf(String format, Object ... args)的方法。

BufferedWriter
BufferedWriter相较Writer没有提供更多的功能，该类主要的用途是对字符进行缓存来加快其余Writer派生类的输出效率，通过PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("foo.out")));可以加速out的输出效率，Java Doc中建议来这样使用BufferedWriter。实际上PrintWriter除了通过Writer out来初始化外，其余的初始化方法已经通过BufferedWriter加快了输出效率。
```
public PrintWriter(File file) throws FileNotFoundException {
    this(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file))),false);
}
//BufferedWriter的writer方法
public void write(int c) throws IOException {
    synchronized (lock) {
        ensureOpen();
        //如果缓冲区满了就手动flush
        if (nextChar >= nChars)
            flushBuffer();
        //cb是char buffer
        cb[nextChar++] = (char) c;
    }
}
```
FileWriter
FileWriter是一个便捷的输字符出到文件的类，没有实现更多的方法，并且不能指定编码格式，如果需要指定编码格式，则需要使用new OutputStreamWriter(new FileOutputStream())的方式。该类只提供了int、char[]、String格式的输出，其余的格式需要自己手动进行转换，或通过new PrintWriter(new FileWriter())输出。（那为什么不直接使用PrinterWriter呢？）
CharArrayWriter和StringWriter
这两类提供了向字符数组/Stringbuffer输出的功能。除了基本的write()和append()功能外，CharArrayWriter提供了toString()和toCharArray()的功能，StringWriter提供了toString()和getBuffer()的功能。

PipedWriter
该类实现了向管道输出字符串的功能。

public PipedWriter();
//在构造函数内已调用connect方法
public PipedWriter(PipedReader snk)  throws IOException;
public synchronized void connect(PipedReader snk) throws IOException;

FilterWriter
该类是一个抽象类，从源码看并没有实现什么特别的功能。

Reader

Reader的派生类有BufferedReader（派生类有LineNumberReader）、CharArrayReader、FilterReader（派生类有PushbackReader）、InputStreamReader（派生类有FileReader）、PipedReader、StringReader。有一点需要注意的是，所有的read()方法均是阻塞的，即如果没有输入将一直等待输入。

Reader
作为所有Reader的父类，该类声明了一些通用的方法。

//读一个字符
public int read() throws IOException;
public int read(java.nio.CharBuffer target) throws IOException;
abstract public int read(char cbuf[], int off, int len) throws IOException;;
public long skip(long n) throws IOException;
//stream是否可以被read
public boolean ready() throws IOException;
//标记stream当前的读取位置
public void mark(int readAheadLimit) throws IOException;
public boolean markSupported();
//返回到之前标记过的位置
public void reset() throws IOException;
abstract public void close() throws IOException;

BufferedReader和LineNumberReader
看到BufferedReader就知道这个类也是拿来加快字符的读取效率，同样的，可以通过new BufferedReader(new FileReader())的方法来实现加速。如果不通过缓存，每次调用read()和readLine()时都会执行从文件中读取字节，将字节转换为字符，返回这几步，非常耗时。下面是其实现的特定方法。
```
public String readLine() throws IOException;
String readLine(boolean ignoreLF) throws IOException;
public Stream lines();
```
LineNumberReader在BufferedReader的基础上增加了记录行号的功能，跟踪当前所读入的行数。
```
public void setLineNumber(int lineNumber);
public int getLineNumber();
```
FilterReader和PushbackReader
FilterReader是个抽象类，没有实现其特有的功能。其派生类PushbackReader实现了将已读取的字符放回输入流的功能。
```
public void unread(int c) throws IOException;
public void unread(char cbuf[]) throws IOException;
public void unread(char cbuf[], int off, int len) throws IOException;
```

InputStreamReader和FileReader
InputStreamReader是一个使用指定编码格式或平台默认的编码格式，将字节流转换为字符流的桥梁。为了最高的效率，建议与BufferedReader一起使用，new BufferedReader(new InputStreamReader())。

public InputStreamReader(InputStream in);
public InputStreamReader(InputStream in, String charsetName)
    throws UnsupportedEncodingException;
public InputStreamReader(InputStream in, Charset cs);
public InputStreamReader(InputStream in, CharsetDecoder dec) ;
public String getEncoding() ;

FileReader实现了从文件读取的功能，作为BufferedReader的派生类，同样建议与BufferedReader一起使用。

public FileReader(String fileName) throws FileNotFoundException;
public FileReader(File file) throws FileNotFoundException;
public FileReader(FileDescriptor fd);

CharArrayReader和StringReader
顾名思义。

//CharArrayReader
public CharArrayReader(char buf[]);
public CharArrayReader(char buf[], int offset, int length);
public long skip(long n) throws IOException;
public boolean ready() throws IOException;

//StringReader
public StringReader(String s);
...

PipedReader
实现了从管道中读取字符的功能。

public PipedReader();
public PipedReader(int pipeSize);
public PipedReader(PipedWriter src, int pipeSize) throws IOException;
public void connect(PipedWriter src) throws IOException;

字节I/O

字节IO又可以分为基本类型的IO和对象的序列化，下面是输入输出流的层次结构图，可以看到整个家族非常庞大，其中有不少类与Reader和Writer的派生类是相似的，这里将介绍部分输入输出流。

高级特性_IO层次结构.png

基本类型IO

OutputStream

观察OutputStream与其直接派生类可以发现和Writer类有很大的相似性。其中ByteArrayOutputStream、FileOutputStream、PipedOutputStream、PrintStream、BufferedOutputStream、与Writer中的相应派生类功能相同（方法相似，部分类实现了更多的方法）。ObjectOutputStream涉及到对象的序列化，留到下一部分。
接下来介绍FilterOutputStream的派生类。

CheckedOutputStream
该类维护了输出数据的综合cksum用于检验输出的完整性。
```
public Checksum getChecksum();
```
CipherOutputStream
该类重写了write()方法，提供了加密输出的功能。可以通过Cipher实例来初始化该类。
```
CipherOutputStream(OutputStream os);
CipherOutputStream(OutputStream os,Cipher c);
```

DigestOutputStream
该类提供了在输出的时候获取字符串的摘要（比如MD5、SHA-1、SHA-256等），可以在初始化的时候指定摘要的方法。

public DigestOutputStream(OutputStream stream, MessageDigest digest);
//是否启用自动摘要，如果关闭，则与普通的流相同
public void on(boolean on);

DataOutputStream
该类实现了DataOutput接口，该接口中实现了输出Byte、Short、Int等基本类型。DataOutput还提供了writeUTF()的方法，该方法使用修订过的8位Unicode转换格式输出字符串，只有当写出用于Java虚拟机的字符串时才使用该方法。DataOutputStream实现了该接口的方法。

//DataOutput方法
void write(int b) throws IOException;
void write(byte b[]) throws IOException;
void write(byte b[], int off, int len) throws IOException;
void writeBoolean(boolean v) throws IOException;
void writeByte(int v) throws IOException;
void writeShort(int v) throws IOException;
void writeChar(int v) throws IOException;
void writeInt(int v) throws IOException;
void writeLong(long v) throws IOException;
void writeFloat(float v) throws IOException;
void writeDouble(double v) throws IOException;
void writeBytes(String s) throws IOException;
void writeChars(String s) throws IOException;
void writeUTF(String s) throws IOException;
//DataOutputStream实现的其余方法
//返回已经输出的字节大小
int size();

DeflaterOutputStream
该类提供了向压缩文件输出的方法，可以在初始化的时候指明压缩文件的格式。
```
public DeflaterOutputStream(OutputStream out,
                            Deflater def,
                            int size,
                            boolean syncFlush)
```
Java基于该类实现了往GZIP、Zip、Jar文件输出的类。

InputStream

InputStream中大部分类的前缀与OutputStream中相同，其实现的功能也是相同的，只不过从输出编程了输入。下面介绍InputStream中不同的类。

AudioInputStream
该类实现了从音频文件中读入的功能，可以在初始化的时候指定音频的格式。该类配合AudioSystem能够实现除了读入音频之外的更多功能，比如音频格式转换、从网络读入音频等。
SequenceInputStream
该类实现了从多个文件中读入的功能，按照从第一个文件的头读到尾，第二个文件的头读到尾，第三个....的顺序读入，直到读完所有文件。以下是其初始化方法。
```
public SequenceInputStream(Enumeration e);
public SequenceInputStream(InputStream s1,InputStream s2);
```
ProgressMonitorInputStream
该类监控了从输入流读入的过程，当读入超过一定时间后会弹出窗口来告知用户，如果用户选择了取消读入，则在下一次调用Read()方法时会抛出InterruptedIOException。
ZipInputStream
zip文档通常以压缩格式存储了一个或多个文件，每个zip文档都有一个头部，包含了每个文件的名字和所使用的压缩方法等信息。通过getNextEntry()来获得一个描述ZipEntry类型的对象，通过closeEntry()关闭该Entry然后继续处理下一项。
```
ZipInputStream zin=new ZipInputStream(new FileInputStream(zipname));
ZipEntry entry;
while((entry=zin.getNextEntry())!=null){
  InputStream in=zin.getInputStream(entry);
  //read the contents of in
  zin.closeEntry();
}  
zin.close();
```

对象序列化与反序列化

如果想要将对象存储到文件中，需要保存该对象的所有信息。对于该对象保存的基本类型属性，可以直接转换成字节码存储；对于对象中的引用，如果保存内存地址，下次读入就没办法保证该内存地址中保存的就是目标对象，因此需要将直接引用转换成符号引用来存储，也就是通过对每个对象进行符号标记。如果一个对象引用了另一个对象，就在引用对象中存储代表被引用对象的符号，下次读入的时候再将符号引用转换为直接引用。在输出对象的时候，该符号被称为序列号（Serial Number）。

ObjectInputStream和ObjectOutputStream
以下是一个对象序列化和反序列化的例子。

Employee harry=new Employee("Harry",50000);
Manager carl=new Manager("Carl",80000);
Employee[] staff=new Employee[2];
staff[0]=harry;
staff[1]=carl;
try(ObjectOutputStream out=new ObjectOutputStream(new FileOutputStream("empolyee.dat")))
  out.writeObject(staff);
}
try(ObjectInputStream in=new ObjectInputStream(new FileInputStream("employee.dat"))){
  Employee[] newStaff=(Employee[])in.readObject();
}

序列化文件格式
序列化文件以AC ED这两字节的魔数开始，后面紧跟着对象序列化格式的版本号，1.8对应的是00 05。

类标识符
当序列化一个对象的时候，该对象所属的类也需要进行序列化，下面是类标识符的存储方式。72 <2字节的类名长度> <类名> <8字节长的指纹> <1字节长的标志> <2字节长的数据域描述符的数量> <数据域描述符> 78(结束标记) <超类类型(如果没有就是70)>。
数据域描述符
在类标识符中出现的数据域描述符的格式如下。<1字节长的类型编码> <2字节长的域名长度> <域名> <类名(如果域是对象)>，其中类型编码可以是B(对应byte)、C(char)、D(double)、F(float)、I(int)、J(long)、L(对象)、S(short)、Z(boolean)、[(数组)。

例子

AC ED 00 05                       //文件头
75                                //数组staff
    72 00 0B [LEmployee;          //新类、字符串长度、类名
    FC BF 36 11 C5 91 11 C7 02    //指纹和标志
    00 00                         //实数域的数量
    78                            //结束标志
    70                            //无超类
    00 00 00 02                   //数组项的数量

文件管理

文件管理主要涉及到Path接口和Files类。

Path
Path代表了一个目录名序列，其后面还可以跟着文件名，下面是一个使用Path访问文件的例子。

Path path = FileSystems.getDefault().getPath("logs", "access.log");
BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8);

Path接口提供了对一个目录进行相关操作的功能。

FileSystem getFileSystem();
boolean isAbsolute();
Path getRoot();
Path getFileName();
Path getParent();//返回上级目录或null
int getNameCount();
Path getName(int index);
Path subpath(int beginIndex, int endIndex);
boolean startsWith();
boolean endsWith();
Path normalize();//去掉路径中的冗余成分，如/.././
Path resolve(Path other);//路径拼接，如果other是绝对路径则返回other，否则将other拼接到该路径后返回
Path resolveSibling(Path other);//产生兄弟路径 如对/a/b/c调用该函数，以d为参数，则得到/a/b/d
Path relativize(Path other);//获取相对other分叉的路径，对/a/b/c调用函数，参数为/a/d，则得到../b/c
Path toAbsolutePath();
Path toRealPath(LinkOption... options) throws IOException;//融合了 normalize()和toAbsolutePath()方法
File toFile();
int compareTo(Path other);//字典序比较

Path和Paths、FileSystems能够很好的搭配使用，后两个类根据系统的分隔符来生成路径。

//Paths
public static Path get(String first, String... more);
//FileSystems
public static FileSystem getDefault();//获取本地文件系统
//FileSystem
public abstract Path getPath(String first, String... more);
//例子
Path p=Paths.get("/home","fred");
Path path = FileSystems.getDefault().getPath("logs", "access.log");

Files
Files类用来对文件进行操作。

简单读写
Files类提供了简单的方式对文件进行读写，和读取文件的相关信息。

public static byte[] readAllBytes(Path path) throws IOException;
public static List readAllLines(Path path, Charset cs) 
    throws IOException;
public static Path write(Path path, byte[] bytes, OpenOption... options)
    throws IOException;//OpenOption为打开方式，在StandardOpenOption枚举类中定义了READ、WRITE、APPEND等方法
//这两个方法可以快速的获得文件的输入输出流，适用于中小文件的快速读写
public static InputStream newInputStream(Path path, OpenOption... options)
    throws IOException；
public static OutputStream newOutputStream(Path path, OpenOption... options)
    throws IOException；
///读取文件信息
//读取类型为A的文件属性
public static Path readSymbolicLink(Path link) throws IOException;
public static boolean isSameFile(Path path, Path path2) throws IOException;
public static boolean isHidden(Path path) throws IOException;
public static boolean isSymbolicLink(Path path);
public static boolean isDirectory(Path path, LinkOption... options);
public static boolean isRegularFile(Path path, LinkOption... options);
public static long size(Path path) throws IOException;
public static boolean exists(Path path, LinkOption... options);
public static boolean exists(Path path, LinkOption... options);
public static boolean isReadable(Path path);
public static boolean isWritable(Path path);
public static boolean isExecutable(Path path) ;

文件和目录操作

///创建
public static Path createFile(Path path, FileAttribute... attrs)
    throws IOException;
//中间目录需已存在
public static Path createDirectory(Path dir, FileAttribute... attrs)
    throws IOException;
//自动创建中间目录
public static Path createDirectories(Path dir, FileAttribute... attrs)
    throws IOException;
public static Path createTempFile(Path dir,
                                  String prefix,
                                  String suffix,
                                  FileAttribute... attrs)
    throws IOException;
public static Path createTempDirectory(Path dir,
                                       String prefix,
                                       FileAttribute... attrs);
///删除
public static void delete(Path path) throws IOException;
public static boolean deleteIfExists(Path path) throws IOException;
///复制和移动
//CopyOption可以指定当文件存在时的操作
public static Path copy(Path source, Path target, CopyOption... options)
    throws IOException;
public static Path move(Path source, Path target, CopyOption... options)
    throws IOException;

遍历
Files提供了list(pathToDirectory)方法获取该目录下各个项的Stream，该方法不会进入子目录，如果需要进入子目录则可以使用walk(pathToRoot)方法。
如果需要更细粒度的遍历每个目录，则可以使用newDirectoryStream(dir)方法。
```
try(DirectoryStream entries=Files.newDirectoryStream(dir)){
for(Path p:entries)
  //do something
}
```

内存映射文件
我们可以将文件映射到内存中，这样会比文件操作要快得多。可以通过FileChannel的open方法获得一个文件的通道（Channel，用于磁盘文件的一种抽象，使我们可以访问内存映射、文件加锁机制以及文件间快速数据传递等操作系认的特性），然后调用map方法从通道中获得一个ByteBuffer，再进行操作。在获得ByteBuffer的时候可以对缓冲区设定映射模式，分别是
⑴ FileChannel.MapMode.READ_ONLY：所产生的缓冲区是只读的。
⑵ FileChannel.MapMode.READ_WRITE：缓冲区是可读写的，任何修改都会在某个时刻写回到文件中，但是不保证其余程序能够立刻看到修改。
⑶FileChannel.MapMode.PRIVATE：缓冲区是可读写的，但是修改对于缓冲区来说是私有的，不会写回到文件中。
```
FileChannel channel =FileChannel.Open(path);
MappedByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY,0,length);
```
正则表达式
在表示文件时，可能会需要用到正则表达式，以下是一些正则表达式的规则。

字符类：[Jj]、[0-9]、[^8]等，-表示一个范围，^表示补集，即除了该字符外的所有字符。
如果字符类中包含-，则必须是第一项或最后一项；如果要包含[，必须是第一项；如要要包含^，可以是除开始位置的所有位置。
有许多预定的字符类，如\d表示数字，\r换行符等。
大部分字符可以匹配自身，如[J|j]ava中的ava。
.符号可以匹配任何字符
\作为转义字符，例如使用\.匹配句号。
^和$匹配一行的开头和结尾。
如果X和Y是正则表达式，则X|Y表示匹配X或匹配Y的字符串。
X+（1个或多个）、X*（0个或多个）、X？（0个或1个）
使用后缀？（匹配最小的重复次数），使用后缀+（匹配最大的重复次数），如cab匹配[a-c]?ab，而cab不匹配[a-c]+ab。
可以使用()来定义子表达式，如([a-b][0-9])|([c-e][0-7])。
在Java中，可以通过如下方式使用正则表达式。

Pattern p=Pattern.compile(patternString);
Matcher m=p.macher(input);
if(m.matches()) ....