http://www.fasterj.com/articles/finalizer1.shtml
In this article, I'm going to have a look into exactly what the JVM does when you create a finalizable object, following its lifecycle through until it gets garbage collected. Note this article only covers what happens in the Sun JVM, other JVMs may differ in procedure.
A Simple Example
I'll start with a small class that doesn't cause any finalization, so that we can clearly see the differences. My Test1 class (listing 1) is a simple loop that just creates and dereferences instances of Test1. It runs just as you would expect it to - creating lots of garbage instances, with the garbage collector kicking in periodically to clean up the garbage.
public class Test1 { static long NumberOfCreatedInstances = 0; static long NumberOfDeletedInstances = 0; public Test1() {NumberOfCreatedInstances++;} static public void main(String args[]) { System.out.println("starting...."); for (int i = 0; ; i++) { Test1 obj = new Test1(); obj = null; if (i%10000000 == 0) { System.out.println( NumberOfCreatedInstances-NumberOfDeletedInstances); } } } }
Listing 1: The Test1 class
Now I'll change Test1 very slightly by making it finalizable. I've called this class Test2 (listing 2). I've added just one tiny method, highlighted in the code. Doesn't seem like much, I don't even call that method from my code. But as I'm sure you all know, it's the special finalize() method that is called by the garbage collector when the object can be reclaimed.
class Test2 { static long NumberOfCreatedInstances = 0; static long NumberOfDeletedInstances = 0; public Test2() {NumberOfCreatedInstances++;} static public void main(String args[]) { System.out.println("starting...."); for (int i = 0; ; i++) { Test2 obj = new Test2(); obj = null; if (i%10000000 == 0) { System.out.println( NumberOfCreatedInstances-NumberOfDeletedInstances); } } } protected void finalize() { NumberOfDeletedInstances++; } }
Listing 2: The Test2 class - the only difference from the Test1 class is the addition of a finalize() method
It seems like there is not much of a difference between Test1 and Test2. But if I run the two programs, (the -verbose:gc option is particularly useful here), then I see very very different activities between the two invocations. Test1 happily sails along, creating objects continuously, interrupted occasionally by very fast young generations GCs - just exactly as you would expect from looking at the code.
Test2, on the other hand, slows down to a crawl in comparison (you may want to limit the heap to 64m with -Xmx64m, or even lower). If you run Test2 on a JVM earlier than a 1.6 JVM, then Test2 will very likely produce an OutOfMemoryError! (It might take a while.) On a 1.6+ JVM, it will probably limp along indefinitely, going quite slowly, or it might produce an OutOfMemoryError, depending on the system you run it on.
You might well be expecting this pattern if you have previously read about or encountered the cost of finalizers. Or maybe it's a surprise. It was certainly a surprise to me when I first saw it - a Java program that has very little code, and is definitely not holding on to more than one object according to the code, produces an OutOfMemoryError! Even those of you who were expecting that might still be surprised at the exact details of what is happening here.
A Simple Lifecycle
Let's start with Test1. This is a straightforward class, with instances that have a straightforward lifecycle. In the main() method, a Test1 object is created in the Eden space of the young generation at new Test1()
(see figure 1). Then the explicit dereference in the very next line (obj = null;
) eliminates any reference to that object (actually if that line wasn't there, the dereference would occur in the next loop iteration when obj
is set to point at the next Test1 instance, but I wanted to be explicit in the example).
Figure 1: New objects getting created in Eden space
At some point, we have created enough Test1 instances that Eden gets full - and that triggers a young generation garbage collection (usually termed "minor GC"). As nothing points to any of the objects in Eden (or possibly one instance is still referenced depending on when the GC occurs), Eden is simply set to empty very efficiently by the garbage collector (if one object is referenced, that single object will be copied to a survivor space first, see figure 2) And presto, we are very efficiently back to an empty Eden, and the main loop continues with further object creation.
Figure 2: A minor GC empties out Eden
Creation Of A Finalizer
Test2, on the other hand, looks rather different. First off when a Test2 instance is created, the JVM detects it has a finalize() method which is different from the one defined in the Object class. Yes, defining a non-trivial finalize() method in a class - or inheriting one for that matter - is on it's own sufficient to change how objects are created in the JVM.
The JVM will ignore a trivial finalize() method (e.g. one which just returns without doing anything, like the one defined in the Object class). Otherwise, if an instance is being created, and that instance has a non-trivial finalize() method defined or inherited, then the JVM will do the following:
- The JVM will create the instance
- The JVM will also create an instance of the java.lang.ref.Finalizer class, pointing to that object instance just created (and pointing to a queue that it will be put on by the GC)
- The java.lang.ref.Finalizer class holds on to the java.lang.ref.Finalizer instance that was just created (so that it is kept alive, otherwise nothing would keep it alive and it would be GCed at the next GC).
So that is what happens here with Test2. With every instance of Test2 created, we get a separate instance of java.lang.ref.Finalizer pointing to the Test2 instance. (see figure 3)
Figure 3: Finalizable objects have a Finalizer created with them
Sound bad? Well, it isn't particularly. I mean, okay we create two objects instead of one each time, but honestly the modern JVM is fantastically efficient at creating objects, so this really isn't a big deal.
The First GC
Well that was the creation of an object with a non-trivial finalize() method, but what happens next? Just as before, the Test2 instances are dereferenced, so can be garbage collected. So at some point the young generation Eden gets full, and a minor GC happens. But this time round, the GC has extra work to do. Firstly, instead of a bunch of objects which aren't being referenced, we have lots of objects which are referenced from Finalizer objects, which in turn are referenced from the Finalizer class. So everything stays alive! The GC will copy everything into the survivor space. And if that isn't big enough to hold all of the objects, it will have to move some to the old generation (which has a much more expensive GC cycle). Straight away we have significantly more work and are also storing up significantly more work for later.
But that's not even the end of it. Because the GC recognizes that nothing else points to the Test2 instances apart from the Finalizers, so it says to itself "aha! I can process any finalizers that point to those Test2 instances". So the GC adds each of those Finalizer objects to the reference queue at java.lang.ref.Finalizer.ReferenceQueue. Now the GC is finally finished, having done quite a bit more work than when we didn't have Finalizers in Test1 - and look at the state of our JVM. Test2 instances are hanging around, spread all over the place in survivor space and the old generation too; Finalizer instances are hanging around too, as they are still referenced from the Finalizer class (see figure 4). The GC is finished, but nothing seems to have been cleared up!
Figure 4: Finalizable objects don't get cleared on the first GC
The Finalizer Thread
Now that the minor GC is finished, the application threads start up again (they were suspended while the minor GC was running). Amongst the application threads are Test2's "main" thread, and several threads started by the JVM - and one in particular concerns us: the "Finalizer" daemon thread. In that same java.lang.ref.Finalizer class is an inner class called FinalizerThread, which starts the "Finalizer" daemon thread when the java.lang.ref.Finalizer is loaded in to the JVM.
The "Finalizer" daemon thread is quite a simple thread. It sits in a loop which is blocked waiting for something to become available to be popped from the java.lang.ref.Finalizer.ReferenceQueue queue. Conceptually, it looks like the loop in listing 3 (the actual loop is a bit more complicated to handle various error and access issues).
for(;;) { Finalizer f = java.lang.ref.Finalizer.ReferenceQueue.remove(); f.get().finalize(); }
Listing 3: A conceptual version of the Finalizer thread loop
So now we have a bit of a contention issue for the JVM heap resources. The main() loop has kicked back in and is producing lots more Test2 instances, with their associated Finalizer objects, while at the same time the "Finalizer" daemon thread is running through the Finalizer objects that the last GC pushed onto the Finalizer reference queue.
That would probably be fine - calling our Test2.finalize() method should be quicker than creating a new Test2 instance, so we'd expect to be able to catch up and get ahead by the next GC. Only there is one little problem with that optimistic expectation. The "Finalizer" daemon thread is run at a lower priority by the JVM than the default thread priority. Which means that the "main" thread gets lots more CPU time than the "Finalizer" daemon thread so, in Test2, it will never catch up.
In a normal application, this imbalance doesn't matter. Finalizable objects aren't usually created at such a high rate, and even where they might be created at a high rate in bursts as you can get in some applications, your average application will have some idle CPU time which will let the "Finalizer" daemon thread catch up.
But our Test2 class is a pathological Finalizable class. It does create Finalizable objects too fast for the "Finalizer" daemon thread to catch up, and so that Finalizer reference queue just keeps on building in size.
The Second GC
Nevertheless, some Finalizer objects will get off the queue, and the Test2 instances they point to will get their finalize() methods called. At this point, the "Finalizer" daemon thread also does one more job - it removes the reference from the Finalizer class to that Finalizer instance it just processed - remember, that is what was keeping the Finalizer instance alive. Now nothing points to the Finalizer instance, and it can be collected in the next GC - as can the Test2 instance since nothing else points to that.
So eventually, after all that, another GC will happen. And this time round, those Finalizer objects that have been processed will get cleared out by the GC. That's the end of the lifecycle for the ones that get to that stage. The others, still in the Finalizer reference queue, will get cycled through the Survivor spaces, where some will get processed and cleared, but most (in our Test2 application) will eventually end up in the old generation. Eventually, the old generation will also be full, and a major GC will occur. But the major GC, while capable of being a bit more thorough than the minor GCs, doesn't really help much here - now you can see now how the OutOfMemoryError occurs. That queue just keeps growing, and eventually there isn't sufficient space to create new objects, and pow, we hit the OOME.
(In 1.6, at least in some configurations, it looks like the garbage collector has enough smarts to be able to try and let the finalizer thread run a bit so that it can limp along, with the heap almost full indefinitely. It's better than an OOME crash, but not by much).
Fixing It
This article was primarily about seeing how finalizable objects are processed. The actual example we used was pathological, so that I could demonstrate finalizable objects lifecycle in a dramatic way - the vast majority of apps using finalizers will never see a significant queue build up. Nevertheless, there are some apps that have hit this finalizer queue build up problem in the past, so it is worth considering how to deal with it. One obvious way is to increase the priority of the "Finalizer" daemon thread - there is no API for this, so you have to run through all the threads to find it by name, then increase it's priority.
You could also take explicit control over finalization by removing the finalize() method and using your own explicit queue using your own Reference objects in a very similar way that the Finalizer class processes the objects and their finalize() methods (see for example Tony Printezis' article). That way you control your finalization processing thread's priority and schedule.
Note that neither of these techniques reduce the overheads in having finalizable objects, they just avoid the queue building up because of the lower priority thread. Bear in mind that: the vast majority of apps using finalizers will never see a significant queue build up; but that the overheads of having finalizable objects are significant. It's a good idea to try to keep the number of finalizable objects to a minimum. A few finalizable objects normally don't matter; too many can seriously stress the GC.