Java I/O性能由于JDK1.0.2java.io的设计与实现,现在已经成为了很多Java 应用的瓶颈。一个关键问题就是buffer——大多数的java.io类都不是真正的buffered。实际上,真正有buffer的类是BufferedInputStream和BufferedOutputStream,但是他们也仅仅提供了有限的方法。例如,在大多数的文件读写操作相关的应用中,你需要一行一行的解析文件。但是唯一提供readLine()方法的类是DataInputStream,但无内部缓冲区。DataInputStream类中的readLine方法确实是一个字符一个字符的读取进入输入流直到遇到‘\n’或者'\r\n'。每一次的字符读写都要设计到文件I/O操作。这种方法当读取比较大的文件的时候效率就很慢很慢了。如果不提供buffer的话,一个5-megabyte的文件至少需要有5000 000字符读入的文件I/O操作。
新的JDK1.1 利用额外的泛型集合collection中的Reader和Writer提升了I/O性能。在BufferedReader中的readLine方法在读物文件时至少要比DataInputStream中的快10~20被。不幸的是,JDK1.1并没有解决所有的性能问题。例如,在不读进内存的情况下,RandomAccessFile在解析文件时是很快的,但是仍然没有buffered在JDK1.1中,也没有提供功能类似于Reader的类。
RandomAccessFile
class. A new class is derived from theRandomAccessFile
class, in order to reuse all the methods in it. The new class is named Braf
(Bufferedrandomaccessfile).
public class Braf extends RandomAccessFile { }
byte
buffer instead of
char
buffer. The variables
buf_end
,
buf_pos
, and
real_pos
are used to record the effective positions on the buffer:
byte buffer[]; int buf_end = 0; int buf_pos = 0; long real_pos = 0;
public Braf(String filename, String mode, int bufsize) throws IOException{ super(filename,mode); invalidate(); BUF_SIZE = bufsize; buffer = new byte[BUF_SIZE]; }
read
method is written such that it always reads from the buffer first. It overrides the native
read
method in the original class, which is never engaged until the buffer has run out of room. In that case, the
fillBuffer
method is called to fill in the buffer. In
fillBuffer
, the original
read
is invoked. The private method
invalidate
is used to indicate that the buffer no longer contains valid contents. This is necessary when the
seek
method moves the file pointer out of the buffer.
public final int read() throws IOException{ if(buf_pos >= buf_end) { if(fillBuffer() < 0) return -1; } if(buf_end == 0) { return -1; } else { return buffer[buf_pos++]; } } private int fillBuffer() throws IOException { int n = super.read(buffer, 0, BUF_SIZE); if(n >= 0) { real_pos +=n; buf_end = n; buf_pos = 0; } return n; } private void invalidate() throws IOException { buf_end = 0; buf_pos = 0; real_pos = super.getFilePointer(); }
read
method also is overridden. The code for the new
read
is listed below. If there is enough buffer, it will simply call
System.arraycopy
to copy a portion of the buffer directly into the user-provided area. This presents the most significant performance gain because the
read
method is heavily used in the
getNextLine
method, which is our replacement for
readLine
.
public int read(byte b[], int off, int len) throws IOException { int leftover = buf_end - buf_pos; if(len <= leftover) { System.arraycopy(buffer, buf_pos, b, off, len); buf_pos += len; return len; } for(int i = 0; i < len; i++) { int c = this.read(); if(c != -1) b[off+i] = (byte)c; else { return i; } } return len; }
getFilePointer
and
seek
need to be overridden as well in order to take advantage of the buffer. Most of time, both methods will simply operate inside the buffer.
public long getFilePointer() throws IOException{ long l = real_pos; return (l - buf_end + buf_pos) ; } public void seek(long pos) throws IOException { int n = (int)(real_pos - pos); if(n >= 0 && n <= buf_end) { buf_pos = buf_end - n; } else { super.seek(pos); invalidate(); } }
Most important, a new method, getNextLine
, is added to replace the readLine
method. We can not simply override the readLine
method because it is defined as final
in the original class. The getNextLine
method first decides if the buffer still contains unread contents. If it doesn't, the buffer needs to be filled up. If the new line delimiter can be found in the buffer, then a new line is read from the buffer and converted into String
. Otherwise, it will simply call the read
method to read byte by byte. Although the code of the latter portion is similar to the original readLine
, performance is better here because the read
method is buffered in the new class.
/** * return a next line in String */ public final String getNextLine() throws IOException { String str = null; if(buf_end-buf_pos <= 0) { if(fillBuffer() < 0) { throw new IOException("error in filling buffer!"); } } int lineend = -1; for(int i = buf_pos; i < buf_end; i++) { if(buffer[i] == '\n') { lineend = i; break; } } if(lineend < 0) { StringBuffer input = new StringBuffer(256); int c; while (((c = read()) != -1) && (c != '\n')) { input.append((char)c); } if ((c == -1) && (input.length() == 0)) { return null; } return input.toString(); } if(lineend > 0 && buffer[lineend-1] == '\r') str = new String(buffer, 0, buf_pos, lineend - buf_pos -1); else str = new String(buffer, 0, buf_pos, lineend - buf_pos); buf_pos = lineend +1; return str; }
With the new Braf
class, we have experienced at least 25 times performance improvement over RandomAccessFile
when a large file needs to be parsed line by line. The method described here also applies to other places where intensive file I/O operations are involved.
Another factor responsible for slowing down Java's performance, besides the I/O problem discussed above, is thesynchronized
statement. Generally, the overhead of a synchronized method is about 6 times that of a conventional method. If you are writing an application without multithreading -- or a part of an application in which you know for sure that only one thread is involved -- you don't need anything to be synchronized. Currently, there is no mechanism in Java to turn off synchronization. A simple trick is to get the source code of a class, remove synchronized
statements, and generate a new class. For example, in BufferedInputStream
, both read
methods are synchronized, whereas all other I/O methods depend on them. You can simply rename the class to NewBIS
,for example, copy the source code from BufferedInputStream.java
provided by JavaSoft's JDK 1.1, remove synchronized
statements from NewBIS.java
, and recompile NewBIS
.