如果想使用堆外内存,那么可以使用DirectByteBuffer。
主要用途:像Terracotta的BigMemory,既想要跟JVM相同进程内的存取,又希望不占用堆内存(因为对于需要长久保持的大数据占用过多的Heap会造成很多无用的Full GC,影响性能),那么就可以利用DirectByteBuffer。
实现原理,归根结底就是JNI。
上源代码:
http://www.docjar.org/html/api/java/nio/DirectByteBuffer.java.html
DirectByteBuffer(int capacity) {
this(PlatformAddressFactory.alloc(capacity, (byte) 0), capacity, 0); address.autoFree(); }
层层跟进这个PlatformAddressFacotry,最后跟进到OSMemory这个类
http://www.docjar.org/html/api/org/apache/harmony/luni/platform/OSMemory.java.html
public native long malloc(long length) throws OutOfMemoryError;
这个是分配内存。
获得buffer之后,可以调用put放入bytes,调用get重新取回byte。而这些对应了:
public native byte getByte(long address);
public native void setByte(long address, byte value);
public class DirectByteBufferTest { @Test public void test1() { int count = 100000; int cap = 1024 * 1024; testDirectBuf(count, cap); testNonDirectBuf(count, cap); } private void testDirectBuf(int count, int cap) { long st; long ed; ByteBuffer byteBuf = null; st = System.currentTimeMillis(); for (int i = 0; i < count; i++) { byteBuf = allocDirectByteBuffer(cap); } ed = System.currentTimeMillis(); System.out.println("alloc directByteBuffer for " + count + " times spends " + (ed - st) + "ms"); st = System.currentTimeMillis(); for (int i = 0; i < count; i++) { processBuf(byteBuf); } ed = System.currentTimeMillis(); System.out.println("directByteBuffer process " + count + " times spends " + (ed - st) + "ms"); } private ByteBuffer testNonDirectBuf(int count, int cap) { long st = System.currentTimeMillis(); ByteBuffer byteBuf = null; for (int i = 0; i < count; i++) { byteBuf = allocNonDirectByteBuffer(cap); } long ed = System.currentTimeMillis(); System.out.println("alloc nonDirectByteBuffer for " + count + " times spends " + (ed - st) + "ms"); st = System.currentTimeMillis(); for (int i = 0; i < count; i++) { processBuf(byteBuf); } ed = System.currentTimeMillis(); System.out.println("nonDirectByteBuffer process " + count + " times spends " + (ed - st) + "ms"); return byteBuf; } private ByteBuffer allocNonDirectByteBuffer(int cap) { ByteBuffer byteBuf = ByteBuffer.allocate(cap); return byteBuf; } private ByteBuffer allocDirectByteBuffer(int cap) { ByteBuffer directBuf = ByteBuffer.allocateDirect(cap); return directBuf; } private void processBuf(ByteBuffer buf) { byte[] bytes = "assfasf".getBytes(); buf.put(bytes); for (int i = 0; i < bytes.length; i++) { byte b = buf.get(i); byte[] bytes2 = new byte[] { b }; // System.out.print(new String(bytes2)); } // System.out.println(); // System.out.println(buf.capacity()); } }
directByteBuffer process 100000 times spends 271ms
alloc nonDirectByteBuffer for 100000 times spends 35,283ms
nonDirectByteBuffer process 100000 times spends 71ms
这个测试相当于创建100,000个ByteBuffer,耗时时间相差如此之大。创建比较大的内存时,direct memory 比 heap里慢一个数量级。
而使用最后一个ByteBuffer做存取。可以看出对于存取而已,direct这种方式由于有jni,慢了10倍。
我又固定了vm参数: 当参数为:-XX:MaxDirectMemorySize=1024m -Xmx1024m -Xms1024m时
directByteBuffer process 100000 times spends 240ms
alloc nonDirectByteBuffer for 100000 times spends 28,408ms
nonDirectByteBuffer process 100000 times spends 56ms
alloc directByteBuffer for 100000 times spends 1,722ms
directByteBuffer process 100000 times spends 415ms
alloc nonDirectByteBuffer for 100000 times spends 107,563ms
nonDirectByteBuffer process 100000 times spends 4,149ms
当参数为:-XX:MaxDirectMemorySize=1024m -Xmx1024m -Xms1024m时
alloc directByteBuffer for 100000 times spends 1,177ms
directByteBuffer process 100000 times spends 260ms
alloc nonDirectByteBuffer for 100000 times spends 63,807ms
nonDirectByteBuffer process 100000 times spends 4,523ms
这个测试是创建100,000个ByteBuffer,当创建比较小的内存时,direct buffer memory比在heap里创建效率快10倍。
而这个测试direct比heap里的慢几十倍。
对于Heap buffer而言,采用两种方式进行测试时,创建时间差距如此巨大的原因有可能是因为full gc,或者是因为操作系统内存不足(因为direct占用了太多的内存)启动了swap。
对于测试一,我单独测试heap buffer,注释掉了对于direct的测试,结果如下:
alloc nonDirectByteBuffer for 100000 times spends 28,141ms
nonDirectByteBuffer process 100000 times spends 237ms
这样就跟之前的结果很接近了。
而对于测试二而言,结果变成了:
alloc nonDirectByteBuffer for 100000 times spends 2,033ms
nonDirectByteBuffer process 100000 times spends 800ms
给测试二加上了-server
alloc nonDirectByteBuffer for 100000 times spends 1849ms
nonDirectByteBuffer process 100000 times spends 771ms
但是,还是比测试二的direct的1,177,260慢了不少。
主要原因可能是gc。
这也是BigMemory价值所在。
下面测试对象的情况,这个要对对象进行序列化和反序列化。
@Test public void test() { for(int i=0;i<100;i++){ System.out.println("===="+i+"====="); testOutofHeapCache(); } } private void testOutofHeapCache() { int cap=1000000; Foo foo=new Foo(); foo.setF1(String.valueOf(System.currentTimeMillis())); foo.setF2("f2"); long st=System.currentTimeMillis(); ByteBuffer directBuf = ByteBuffer.allocateDirect(cap); long ed=System.currentTimeMillis(); System.out.println("allocate cache spends "+(ed-st)+"ms"); st=System.currentTimeMillis(); byte[] bytesFromObject = getBytesFromObject(foo); ed=System.currentTimeMillis(); System.out.println("serialize spends "+(ed-st)+"ms"); st=System.currentTimeMillis(); directBuf.put(bytesFromObject); ed=System.currentTimeMillis(); System.out.println("put cache spends "+(ed-st)+"ms"); byte[] result=new byte[bytesFromObject.length]; st=System.currentTimeMillis(); for(int i=0,size=bytesFromObject.length;i<size;i++){ result[i]=directBuf.get(i); } ed=System.currentTimeMillis(); System.out.println("get cache spends "+(ed-st)+"ms"); st=System.currentTimeMillis(); Foo resultFoo=(Foo)this.getObjectFromBytes(result); ed=System.currentTimeMillis(); System.out.println("deserialize spends "+(ed-st)+"ms"); assertEquals(foo.getF1(),resultFoo.getF1()); assertEquals(foo.getF2(),resultFoo.getF2()); directBuf.clear(); } public static byte[] getBytesFromObject(Serializable obj) { if (obj == null) { return null; } ByteArrayOutputStream bo = new ByteArrayOutputStream(); ObjectOutputStream oo; try { oo = new ObjectOutputStream(bo); oo.writeObject(obj); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return bo.toByteArray(); } public static Object getObjectFromBytes(byte[] objBytes) { if (objBytes == null || objBytes.length == 0) { return null; } ByteArrayInputStream bi = new ByteArrayInputStream(objBytes); ObjectInputStream oi = null; try { oi = new ObjectInputStream(bi); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } try { return oi==null?null:oi.readObject(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ClassNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } return null; } static class Foo implements Serializable{ /** * */ private static final long serialVersionUID = 1L; private String f1; private String f2; public String getF1() { return f1; } public void setF1(String f1) { this.f1 = f1; } public String getF2() { return f2; } public void setF2(String f2) { this.f2 = f2; } }
结论是:
1 序列化和反序列化耗费了大量时间
2 多次执行,时间消耗为0;
====0=====
allocate cache spends 2ms
serialize spends 11ms
put cache spends 0ms
get cache spends 0ms
deserialize spends 3ms
====1=====
allocate cache spends 1ms
serialize spends 0ms
put cache spends 0ms
get cache spends 0ms
deserialize spends 0ms
====2=====
allocate cache spends 1ms
serialize spends 0ms
put cache spends 0ms
get cache spends 0ms
deserialize spends 1ms
====3=====
allocate cache spends 1ms
serialize spends 0ms
put cache spends 0ms
get cache spends 0ms
deserialize spends 0ms