毕业以后一直都在公司做虚拟机相关方面的开发,很多知识比较零碎,最近终于不用996了,可以空下来整理一下思路,写写博客做些总结。
Finalize 方法是java object在被heap完全回收之前,一定会被调用的方法。但是,虚拟机规范不保证该方法会在什么时候执行。前段时间,有个在native out of memory的bug,就是因为android的cts中有个leakmemory测试,会大量分分配native内存,然后在finalize中释放。但是由于lemur并没有及时的保证执行finalize方法导致虚拟地址用完程序crash。 所以建议,劲量不要依赖finalize方法去释放紧缺的资源。
在虚拟机内部,对于有Finalize方法的object,其实是通过FinalizeReference来实现的。FinalizeReference类似与softReference,weakReference等,都派生自Reference类。
不同的是,softReference这些是在java代码中显式的new 出来的,比如:
private static final WeakReference<String> weakTest = new WeakReference<String>(new String("weak"));但是,FinalizeReference不同,我们是不会显示的去new 一个Finalize reference的,换句话将对java层的程序员是透明的。
那么虚拟机是怎么处理的呢,是什么时候去new出Finalize reference的呢,并且怎样去执行finalize方法的呢?
这一切可以从虚拟机的class loader机制说起。虚拟机对class的load往细了说比较复杂,需要以后开专题讨论,但是概括的讲,就是几个过程:
FindClass->DefineClass->LoadClassFromDex->LoadMethod
在LoadMethod的阶段,虚拟机会去判断,该class的函数表中是否含有Finalize方法,如果存在Finalize方法就会给该class置一个标记位,对应与dalvik的代码 dalvik/vm/oo/class.cpp LoadMethodFromDex函数:
static void loadMethodFromDex(ClassObject* clazz, const DexMethod* pDexMethod, Method* meth) { DexFile* pDexFile = clazz->pDvmDex->pDexFile; const DexMethodId* pMethodId; const DexCode* pDexCode; pMethodId = dexGetMethodId(pDexFile, pDexMethod->methodIdx); meth->name = dexStringById(pDexFile, pMethodId->nameIdx); dexProtoSetFromMethodId(&meth->prototype, pDexFile, pMethodId); meth->shorty = dexProtoGetShorty(&meth->prototype); meth->accessFlags = pDexMethod->accessFlags; meth->clazz = clazz; meth->jniArgInfo = 0; <span style="color:#FF0000;"> if (dvmCompareNameDescriptorAndMethod("finalize", "()V", meth) == 0) { /* * The Enum class declares a "final" finalize() method to * prevent subclasses from introducing a finalizer. We don't * want to set the finalizable flag for Enum or its subclasses, * so we check for it here. * * We also want to avoid setting it on Object, but it's easier * to just strip that out later. */ if (clazz->classLoader != NULL || strcmp(clazz->descriptor, "Ljava/lang/Enum;") != 0) { SET_CLASS_FLAG(clazz, CLASS_ISFINALIZABLE); } }</span>这里有个问题,因为java/lang/object是含有finalize方法的,而其他的object包括class object都是继承自java/lang/object的,按理所,
<span style="color:#FF0000;"> dvmCompareNameDescriptorAndMethod("finalize", "()V", meth) </span>
这个判断会永久成立,但是请注意,在dalvik中,loadMethodFromdex是在LinkClass之前做的,即这个时候,父类的method还没有被拷贝到当前load的class的函数表中。所以不存在以上问题。
完成标记以后,会需要在object从heap分配以后为该object执行add finalize reference操作,具体的时间点,可以根据虚拟机的实现选择
1. dalvik在执行java/lang/obect的init方法之前,代码在汇编解释器中, 代码在vm/mterp/out/InterpAsm-armv7-a.S
.L_OP_INVOKE_OBJECT_INIT_RANGE: /* 0xf0 */ /* File: armv5te/OP_INVOKE_OBJECT_INIT_RANGE.S */ /* * Invoke Object.<init> on an object. In practice we know that * Object's nullary constructor doesn't do anything, so we just * skip it unless a debugger is active. */ FETCH(r1, 2) @ r1<- CCCC GET_VREG(r0, r1) @ r0<- "this" ptr cmp r0, #0 @ check for NULL beq common_errNullObject @ export PC and throw NPE ldr r1, [r0, #offObject_clazz] @ r1<- obj->clazz ldr r2, [r1, #offClassObject_accessFlags] @ r2<- clazz->accessFlags tst r2, #CLASS_ISFINALIZABLE @ is this class finalizable? bne .LOP_INVOKE_OBJECT_INIT_RANGE_setFinal @ yes, go
.LOP_INVOKE_OBJECT_INIT_RANGE_setFinal: EXPORT_PC() @ can throw bl dvmSetFinalizable @ call dvmSetFinalizable(obj) ldr r0, [rSELF, #offThread_exception] @ r0<- self->exception cmp r0, #0 @ exception pending? bne common_exceptionThrown @ yes, handle it b .LOP_INVOKE_OBJECT_INIT_RANGE_finish
void dvmSetFinalizable(Object *obj) { assert(obj != NULL); Thread *self = dvmThreadSelf(); assert(self != NULL); Method *meth = gDvm.methJavaLangRefFinalizerReferenceAdd; assert(meth != NULL); JValue unusedResult; dvmCallMethod(self, meth, NULL, &unusedResult, obj); }
FinalizeReference中add函数的实现在libcore/luni/src/main/java/java/lang/ref/Finalizereference.java中:
public static void add(Object referent) { <span style="color:#FF0000;">FinalizerReference<?> reference = new FinalizerReference<Object>(referent, queue);</span> synchronized (LIST_LOCK) { reference.prev = null; reference.next = head; if (head != null) { head.prev = reference; } head = reference; } }
private static FinalizerReference<?> head = null;
上面我们详细的分析了一个含有finalize方法的object转化为一个Finalize reference的过程,下面我们分析,在Heap gc掉该object以后,finalize进行了那些操作,finalize方法又是怎么执行的。
Dalvik的garbage gc的算法使用的是CMS(Concurrent mark-sweep )的算法,即先通过root集mark所以live的object,然后sweep掉所有非mark过的object。Gc也是虚拟机里面很重要的一个模块,我们以后再讨论,在这里只是粗略的总结一下,以可以继续我们后面的讨论。典型的concurrent mark-sweep算法分为5个阶段:
1. initliaze phase
2. mark root
3. scan object
4. remark
5. reclaim
处理reference的主要在scan object, remark和reclaim阶段。
在scan object阶段,我们需要区分出所有的refence并且用链表把所有的相应的reference串起来,请注意这里串起来的是FinalizeReference,而不是含有finalize方法的object,但是object在该FinalizeReference的referent域中。具体的代码在vm/alloc/marksweep.cpp中:
static void scanDataObject(const Object *obj, GcMarkContext *ctx) { assert(obj != NULL); assert(obj->clazz != NULL); assert(ctx != NULL); markObject((const Object *)obj->clazz, ctx); scanFields(obj, ctx); if (IS_CLASS_FLAG_SET(obj->clazz, CLASS_ISREFERENCE)) { delayReferenceReferent((Object *)obj, ctx); } }所有的reference都会进入delayReferencereferent中处理:
/* * Process the "referent" field in a java.lang.ref.Reference. If the * referent has not yet been marked, put it on the appropriate list in * the gcHeap for later processing. */ static void delayReferenceReferent(Object *obj, GcMarkContext *ctx) { assert(obj != NULL); assert(obj->clazz != NULL); assert(IS_CLASS_FLAG_SET(obj->clazz, CLASS_ISREFERENCE)); assert(ctx != NULL); GcHeap *gcHeap = gDvm.gcHeap; size_t pendingNextOffset = gDvm.offJavaLangRefReference_pendingNext; size_t referentOffset = gDvm.offJavaLangRefReference_referent; Object *pending = dvmGetFieldObject(obj, pendingNextOffset); Object *referent = dvmGetFieldObject(obj, referentOffset); if (pending == NULL && referent != NULL && !isMarked(referent, ctx)) { Object **list = NULL; if (isSoftReference(obj)) { list = &gcHeap->softReferences; } else if (isWeakReference(obj)) { list = &gcHeap->weakReferences; <span style="color:#FF0000;"> } else if (isFinalizerReference(obj)) { list = &gcHeap->finalizerReferences;</span> } else if (isPhantomReference(obj)) { list = &gcHeap->phantomReferences; } assert(list != NULL); <span style="color:#FF6666;"> enqueuePendingReference(obj, list);</span> } }
enqueuPendingReference就是一个串链表的过程,这里就不贴代码了。
完成mark阶段以后,所有的reference都已经被被串在了一起,在remark完以后会真正的处理FinalizeReference中的object,把object取出来,mark 它然后把它从FinalizeReference中的引用中剥离出来,代码在: vm/alloc/marksweep.cpp中:
static void enqueueFinalizerReferences(Object **list) { assert(list != NULL); GcMarkContext *ctx = &gDvm.gcHeap->markContext; size_t referentOffset = gDvm.offJavaLangRefReference_referent; size_t zombieOffset = gDvm.offJavaLangRefFinalizerReference_zombie; bool hasEnqueued = false; while (*list != NULL) { <span style="color:#FF0000;"> Object *ref = dequeuePendingReference(list); Object *referent = dvmGetFieldObject(ref, referentOffset); if (referent != NULL && !isMarked(referent, ctx)) { markObject(referent, ctx); /* If the referent is non-null the reference must queuable. */ assert(isEnqueuable(ref)); dvmSetFieldObject(ref, zombieOffset, referent); clearReference(ref); enqueueReference(ref); hasEnqueued = true; }</span> } if (hasEnqueued) { processMarkStack(ctx); } assert(*list == NULL); }
/* * Schedules a reference to be appended to its reference queue. */ static void enqueueReference(Object *ref) { assert(ref != NULL); assert(dvmGetFieldObject(ref, gDvm.offJavaLangRefReference_queue) != NULL); assert(dvmGetFieldObject(ref, gDvm.offJavaLangRefReference_queueNext) == NULL); <span style="color:#FF0000;"> enqueuePendingReference(ref, &gDvm.gcHeap->clearedReferences);</span> }
这个cleardReference是干嘛用的呢? 答案看下面的函数:
void dvmEnqueueClearedReferences(Object **cleared) { assert(cleared != NULL); if (*cleared != NULL) { Thread *self = dvmThreadSelf(); assert(self != NULL); <span style="color:#FF0000;"> Method *meth = gDvm.methJavaLangRefReferenceQueueAdd;</span> assert(meth != NULL); JValue unused; Object *reference = *cleared; <span style="color:#FF0000;">dvmCallMethod(self, meth, NULL, &unused, reference);</span> *cleared = NULL; } }是的,它正是用来将所有需要释放的Refernce返回到java 层用的。
至此,虚拟机内部对Finalize的处理已经结束,你是不是会很奇怪,因为搞了半天,貌似并没有地方真正地执行了object的finalize方法。
是的,真正的Finalize方法不是由虚拟机执行的,虚拟机只是把reference链接到需要处理的表中,真正的执行由Finalize Daemon进程来处理。 Finalize Daemon是一个守护进程,在那里监控Finalize Reference的queue中是否有新的reference进来,如果是,就去执行该object的finalize方法,代码在 libcore/libdvm/src/main/java/java/lang/Daemons.jav中:
@Override public void run() { while (isRunning()) { // Take a reference, blocking until one is ready or the thread should stop try { <span style="color:#FF0000;">doFinalize((FinalizerReference<?>) queue.remove());</span> } catch (InterruptedException ignored) { } } } @FindBugsSuppressWarnings("FI_EXPLICIT_INVOCATION") private void doFinalize(FinalizerReference<?> reference) { <span style="color:#FF0000;"> FinalizerReference.remove(reference); Object object = reference.get();</span> reference.clear(); try { finalizingStartedNanos = System.nanoTime(); finalizingObject = object; synchronized (FinalizerWatchdogDaemon.INSTANCE) { FinalizerWatchdogDaemon.INSTANCE.notify(); } <span style="color:#FF0000;"> object.finalize();</span> } catch (Throwable ex) { // The RI silently swallows these, but Android has always logged. System.logE("Uncaught exception thrown by finalizer", ex); } finally { finalizingObject = null; } } }但是又是谁来把reference放到Finalize Reference的queue中的呢,我从libcore中看到了两条路径,一条是从上面的cleard refernce queue中,它会被另一个daemon, ReferenceQueueDaemon处理,我们来看看它的处理过程:
@Override public void run() { while (isRunning()) { Reference<?> list; try { synchronized (ReferenceQueue.class) { while (ReferenceQueue.unenqueued == null) { ReferenceQueue.class.wait(); } list = ReferenceQueue.unenqueued; ReferenceQueue.unenqueued = null; } } catch (InterruptedException e) { continue; } enqueue(list); } } private void enqueue(Reference<?> list) { while (list != null) { Reference<?> reference; // pendingNext is owned by the GC so no synchronization is required if (list == list.pendingNext) { reference = list; reference.pendingNext = null; list = null; } else { reference = list.pendingNext; list.pendingNext = reference.pendingNext; reference.pendingNext = null; } <span style="color:#FF0000;">reference.enqueueInternal();</span> } } }
public final synchronized boolean enqueueInternal() { if (queue != null && queueNext == null) { <span style="color:#FF0000;"> queue.enqueue(this);</span> queue = null; return true; } return false; }
我们从虚拟机和libcore的角度分析了Finalize方法的执行流程,到现在我们可以得出以下两点结论:
1. Finalize 方法被执行的时间不确定,所以不能依赖与它来释放紧缺的资源。时间不确定的原因是:
a. 虚拟机调用GC的时间不确定
b. Finalize daemon线程被调度到的时间不确定
2. Finalize方法只会被执行一次,即使对象被复活,如果已经执行过了Finalize方法,再次被gc时也不会再执行了,原因是:
含有Finalize方法的object是在new的时候由虚拟机生成了一个Finalize reference在来引用到该Object的,而在Finalize方法执行的时候,该object所对应的Finalize Reference会被释放掉,即使在这个时候把该object复活(即用强引用引用住该object),再第二次被gc的时候由于没有了Finalize reference与之对应,所以Finalize方法不会再执行
3. 含有Finalize方法的object需要至少经过两轮GC才有可能被释放(所以在对内存回收速度有要求的情况下,)