Java Tip 26: How to improve Java's I/O performance-如何提高java I/O性能


Java I/O性能由于JDK1.0.2java.io的设计与实现,现在已经成为了很多Java 应用的瓶颈。一个关键问题就是buffer——大多数的java.io类都不是真正的buffered。实际上,真正有buffer的类是BufferedInputStream和BufferedOutputStream,但是他们也仅仅提供了有限的方法。例如,在大多数的文件读写操作相关的应用中,你需要一行一行的解析文件。但是唯一提供readLine()方法的类是DataInputStream,但无内部缓冲区。DataInputStream类中的readLine方法确实是一个字符一个字符的读取进入输入流直到遇到‘\n’或者'\r\n'。每一次的字符读写都要设计到文件I/O操作。这种方法当读取比较大的文件的时候效率就很慢很慢了。如果不提供buffer的话,一个5-megabyte的文件至少需要有5000 000字符读入的文件I/O操作。


新的JDK1.1 利用额外的泛型集合collection中的Reader和Writer提升了I/O性能。在BufferedReader中的readLine方法在读物文件时至少要比DataInputStream中的快10~20被。不幸的是,JDK1.1并没有解决所有的性能问题。例如,在不读进内存的情况下,RandomAccessFile在解析文件时是很快的,但是仍然没有buffered在JDK1.1中,也没有提供功能类似于Reader的类。


如何解决I/O问题

为了解决文件I/O性能不高的问题。To tackle the problem of inefficient file I/O, we need a bufferedRandomAccessFile class. A new class is derived from theRandomAccessFile class, in order to reuse all the methods in it. The new class is named Braf(Bufferedrandomaccessfile).
  public class Braf extends RandomAccessFile {
  }

For efficiency reasons, we define a  byte  buffer instead of  char  buffer. The variables  buf_end buf_pos , and  real_pos  are used to record the effective positions on the buffer:
byte buffer[];
  int buf_end = 0;
  int buf_pos = 0;
  long real_pos = 0;


A new constructor is added with an additional parameter to specify the size of the buffer:
 public Braf(String filename, String mode, int bufsize) 
   throws IOException{
    super(filename,mode);
    invalidate();
    BUF_SIZE = bufsize;
    buffer = new byte[BUF_SIZE];    
  }

The new  read  method is written such that it always reads from the buffer first. It overrides the native  read  method in the original class, which is never engaged until the buffer has run out of room. In that case, the  fillBuffer  method is called to fill in the buffer. In  fillBuffer , the original  read  is invoked. The private method invalidate  is used to indicate that the buffer no longer contains valid contents. This is necessary when the  seek  method moves the file pointer out of the buffer.

public final int read() throws IOException{
    if(buf_pos >= buf_end) {
       if(fillBuffer() < 0)
       return -1;
    }
    if(buf_end == 0) {
         return -1;
    } else {
         return buffer[buf_pos++];
    }
  }
  private int fillBuffer() throws IOException {
    int n = super.read(buffer, 0, BUF_SIZE);
    if(n >= 0) {
      real_pos +=n;
      buf_end = n;
      buf_pos = 0;
    }
    return n;
  }
  private void invalidate() throws IOException {
    buf_end = 0;
    buf_pos = 0;
    real_pos = super.getFilePointer();
  }

The other parameterized  read  method also is overridden. The code for the new  read  is listed below. If there is enough buffer, it will simply call  System.arraycopy  to copy a portion of the buffer directly into the user-provided area. This presents the most significant performance gain because the  read  method is heavily used in the  getNextLine  method, which is our replacement for readLine .

public int read(byte b[], int off, int len) throws IOException {
   int leftover = buf_end - buf_pos;
   if(len <= leftover) {
             System.arraycopy(buffer, buf_pos, b, off, len);
        buf_pos += len;
        return len;
   }
   for(int i = 0; i < len; i++) {
      int c = this.read();
      if(c != -1)
         b[off+i] = (byte)c;
      else {
         return i;
      }
   }
   return len;
  }

The original methods  getFilePointer  and  seek  need to be overridden as well in order to take advantage of the buffer. Most of time, both methods will simply operate inside the buffer.

public long getFilePointer() throws IOException{
    long l = real_pos;
    return (l - buf_end + buf_pos) ;
  }
  public void seek(long pos) throws IOException {
    int n = (int)(real_pos - pos);
    if(n >= 0 && n <= buf_end) {
      buf_pos = buf_end - n;
    } else {
      super.seek(pos);
      invalidate();
    }
  }


Most important, a new method, getNextLine, is added to replace the readLine method. We can not simply override the readLinemethod because it is defined as final in the original class. The getNextLine method first decides if the buffer still contains unread contents. If it doesn't, the buffer needs to be filled up. If the new line delimiter can be found in the buffer, then a new line is read from the buffer and converted into String. Otherwise, it will simply call the read method to read byte by byte. Although the code of the latter portion is similar to the original readLine, performance is better here because the read method is buffered in the new class.

/**
   * return a next line in String 
   */
  public final String getNextLine() throws IOException {
   String str = null;
   if(buf_end-buf_pos <= 0) {
      if(fillBuffer() < 0) {
                throw new IOException("error in filling buffer!");
      }
   }
   int lineend = -1;
   for(int i = buf_pos; i < buf_end; i++) {
        if(buffer[i] == '\n') {
         lineend = i;
          break;
          }
   }
   if(lineend < 0) {
        StringBuffer input = new StringBuffer(256);
        int c;
             while (((c = read()) != -1) && (c != '\n')) {
                 input.append((char)c);
        }
        if ((c == -1) && (input.length() == 0)) {
          return null;
        }
        return input.toString();
   }
   if(lineend > 0 && buffer[lineend-1] == '\r')
        str = new String(buffer, 0, buf_pos, lineend - buf_pos -1);
   else str = new String(buffer, 0, buf_pos, lineend - buf_pos);
   buf_pos = lineend +1;
   return str;
   }


With the new Braf class, we have experienced at least 25 times performance improvement over RandomAccessFile when a large file needs to be parsed line by line. The method described here also applies to other places where intensive file I/O operations are involved.

Synchronization turn-off: An extra tip

Another factor responsible for slowing down Java's performance, besides the I/O problem discussed above, is thesynchronized statement. Generally, the overhead of a synchronized method is about 6 times that of a conventional method. If you are writing an application without multithreading -- or a part of an application in which you know for sure that only one thread is involved -- you don't need anything to be synchronized. Currently, there is no mechanism in Java to turn off synchronization. A simple trick is to get the source code of a class, remove synchronized statements, and generate a new class. For example, in BufferedInputStream, both read methods are synchronized, whereas all other I/O methods depend on them. You can simply rename the class to NewBIS,for example, copy the source code from BufferedInputStream.java provided by JavaSoft's JDK 1.1, remove synchronized statements from NewBIS.java, and recompile NewBIS.




你可能感兴趣的:(java,性能,IO)