java读取大文件

 

 java nio读取大文件: 

1. 获取文件通道FileChannel;

2. 使用通道,将文件内存映射到ByteBuffer;

 

 

相对于普通的ByteBuffer,使用内存映射,能大幅提高我们操作大文件的速度;

 

 

内存映射,借用操作系统对文件的读取:

经历了由当前Java态进入到操作系统内核态,再由操作系统读取文件,并返回数据到当前Java态的过程;

 

package mengka.bigFile_02;

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import com.mengka.common.JvmUtil;

/**
 * java nio读取大文件,内存映射:
 * <ul>
 * <li>获取文件通道FileChannel;</li>
 * <li>使用通道,将文件内存映射到ByteBuffer;</li>
 * </ul>
 * <br>
 * 》》使用内存映射,能大幅提高我们操作大文件的速度;(相对于普通的ByteBuffer) <br>
 * <br>
 * 》》内存映射不是直接把文件加载到JVM内存空间; <br>
 * <br>
 * 》》内存映射,借用操作系统对文件的读取:<br>
 * 经历了由当前Java态进入到操作系统内核态,再由操作系统读取文件,并返回数据到当前Java态的过程; <br>
 * <br>
 * 》》使用内存映射能大幅提高我们操作大文件的速度; <br>
 * <br>
 * 
 * @author mengka.hyy
 * 
 */
public class mappedByteBuffer_01 {

	private static final Log log = LogFactory.getLog(mappedByteBuffer_01.class);

	private static int capacity = 1024*1024;// 每次循环读取的大小,1M
	
	private static MappedByteBuffer mappedByteBuffer = null;
	
	private static byte[] bytes = new byte[capacity];
	
	public static CountDownLatch latch = null;
	
	public static final ExecutorService executorService = Executors.newFixedThreadPool(700);//一个线程一次读1M,占据内存700*1M=700M
	
	public static void main(String[] args)throws Exception {

		mappedByteBuffer_01 demo = new mappedByteBuffer_01();
		
		String path = "/Users/hyy044101331/work_hyy/mengka/src/main/java/mengka/bigFile_02/data.log";
		long start = System.currentTimeMillis();
		
		//使用内存映射读取文件,速度快
		demo.nio_mappedByteBuffer(path);
		
		long end = System.currentTimeMillis();  
		latch.await();
		System.out.println("mappedByteBuffer_01 time = "+(end - start)+"ms");
	}

	/**
	 * 使用内存映射读取超大文件
	 * 
	 * @throws IOException
	 */
	public void nio_mappedByteBuffer(String path) throws IOException {
		RandomAccessFile randomFile = null;
		FileChannel fileChannel = null;
		try {
			randomFile = new RandomAccessFile(new File(path), "rw");
			fileChannel = randomFile.getChannel();
			long size = fileChannel.size();

			mappedByteBuffer = fileChannel.map(
					MapMode.READ_ONLY, 0, size);
			byte[] bytes = new byte[capacity];
			long count = size / capacity;
			int remain = (int) (size % capacity);
			System.out.println("-------------, size = "+JvmUtil.getMemorySize(size)+" , count = "+count+" , remain = "+remain);
			
			//线程个数
			latch = new CountDownLatch((int)count);
			
			for (int i = 0; i < count; i++) {
				executorService.execute(new ReadTask(i));
			}

			if (remain > 0) {
				bytes = new byte[remain];
				ByteBuffer tmpBuffer = mappedByteBuffer.get(bytes);
				//System.out.println("remain = " + new String(bytes));
			}
		} finally {
			randomFile.close();
			fileChannel.close();
			executorService.shutdown();
		}
	}
	
	public class ReadTask implements Runnable{
		
		private int index;
		
		public ReadTask(int index){
			this.index = index;
		}

		@Override
		public void run() {
			ByteBuffer tmpBuffer = mappedByteBuffer.get(bytes);
//			System.out.println("data["+index+"] = " + new String(bytes));
			System.out.println("data["+index+"]");
			latch.countDown();
		}
		
	}

}

  

 

》》参数:

可用内存:1G

文件分割大小:1M

线程池个数:700

 

每次有700*1M < 1G,不会导致内存溢出;

 

日志文件有66132248条记录,文件大小1.8G;

需要花费时间:

 time = 153ms

你可能感兴趣的:(java)