http://www.iecc.com/gclist/GC-faq.html
This is a draft FAQ for the GC-LIST. Comments, editorial remarks, and especiallyadditions are welcome. The file is currently broken up into three parts, corresponding roughly to general stuff, techniques and algorithms, language interfaces to GC, and more difficult topics. As sections grow, thesefiles may bereorganized in an attempt to keep the individual files smallenough to be quickly retrieved.
Text versions (converted with lynx) of these files are available,as GC-faq.txt, GC-algorithms.txt, GC-lang.txt, and GC-harder.txt.
There's been some concern that the emphasis here ought to be a bit moreevangelical, and a little less academic (perhaps that evangelism ought to beadded, rather than technical content subtracted). Concise arguments for thewonderfulness of garbage collection are more than welcome, as are pointers tonon-concise arguments.
Table of Contents
-
Common questions
-
Folk truths
-
Folk myths
-
GC, C, and C++
-
Language interfaces to GC
-
-
Implicit versus explicit use of heap storage
-
Collector invocation and control
-
Weak pointers
-
Finalization
-
Basic algorithms
-
-
Jargon
-
Reference counting
-
Mark-and-sweep
-
Copying
-
Variations
-
-
Conservative collection
-
Generational collection
-
Deferred reference counting
-
Lazy freeing
-
One-bit reference counting
-
Mark-and-don't-sweep
-
Snapshot mark-and-sweep
-
Baker's real time copying collection
-
Appel-Ellis-Li "real time" concurrent collection
-
Bartlett's%20conservative-compacting collection
-
Treadmill collection
-
Goldberg's taglesscollection
-
Blacklisting in a conservative collector
-
Harder stuff
-
-
Thread support
-
Parallel collection
-
Concurrent collection
-
Distributed objects
-
Databases,persistent stores, and file systems
-
Uncooperative environments
-
Hardware support
-
OS support
-
Compiler support
-
For more information
-
Contributors
Common questions
-
What is garbage collection?
-
Garbage collection is a part of a language's runtimesystem, or an add-on library, perhaps assisted by the compiler, the hardware, theOS, or any combination of the three, that automatically determines what memory aprogram is no longer using, and recycles it for other use. It is also known as``automatic storage (or memory) reclamation''.
-
Why is it good?
-
Manual memory management is (programmer-)time consuming, and error prone.Most programs still contain leaks. This is all doubly true with programs usingexception-handling and/or threads.
A second benefit of garbage collection,less obvious to people who haven't used it, is that relying on garbage collectionto manage memory simplifies the interfaces between components (subroutines,libraries, modules, classes) that no longer need expose memory management details("who is responsible for recycling this memory").
You can read more at Object-OrientationFAQ -- 3.9) Why is Garbage Collection A Good Thing?
-
Is garbage collection slow?
-
Not necessarily. Modern garbagecollectors appear to run as quickly as manual storage allocators(
malloc/free
or
new/delete
).Garbage collection probably will not run as quickly as customized memoryallocator designed for use in a specific program. On the other hand, the extracode required to make manual memory management work properly (for example,explicit reference counting) is often more expensive than a garbage collectorwould be. This is more likely to be true in a multithreaded program, if thespecialized allocator is a shared resource (which it usually is).
Since this was first written, memory has become so cheap that garbagecollectors have been applied to very-large heaps, for example more thana gigabyte. For a sufficiently large live set, pause times are stillan issue. On the other hand, for very many applications modern garbagecollectors provide pause times that are completely compatible with humaninteraction. Pause times below 1/10th of a second are often the case,and applications with relatively small live sets (or slowly changinglive sets, for generational collector) can obtain pause times below 1/100thof a second.
-
Can I use garbage collection with C or C++?
-
Probably. Modern(well-tested, efficient, non-pausing) garbage collectors are available that workwith all but the most pathological C and C++ programs, including legacy code. See GC, C, and C++ for more details.
-
Does garbage collection cause my program's execution to pause?
-
Notnecessarily. A variety of algorithms allow garbage collection to proceedconcurrently, incrementally, and (for some definitions of the term) in "realtime". There are incremental garbage collectors that work with C and C++, forinstance.
-
Where can I get a C or C++ garbage collector?
-
Boehm-Weiser collector
http://www.hpl.hp.com/personal/Hans_Boehm/gc/or
ftp://parcftp.xerox.com/pub/gc/gc.html
-
Great Circle from Geodesic Systems <[email protected]> or800-360-8388 or http://www.geodesic.com
-
Kevin Warne <[email protected]> or 800-707-7171
Folk myths
- GC is necessarily slower than manual memory management.
- GC will necessarily make my program pause.
- Manual memory management won't cause pauses.
- GC is incompatible with C and C++.
Folk truths
- Most allocatedobjects are dynamically referenced by a very small number of pointers. The mostimportant small number is ONE.
- Most allocated objects have shortlifetimes.
- Allocation patterns (size distributions, lifetimedistributions) are bursty, not uniform.
- VM behavior matters.
- Cache behavior matters.
- "Optimal" strategies can fail miserably.
Tradeoffs
- precise vs. conservative
- moving/compacting vs. non-moving
- explicit vs. implicit reclamation phase
- stopping vs. incremental vs. concurrent
- generational vs. non-generational
GC, C, and C++
What do you mean, garbage collection and C?
Rather than using
malloc
and
free
to obtain and reclaim memory, it is possible to link in agarbage collector and allow it to reclaim unused memory automatically. Thisusually even works if
malloc
is replaced with the garbagecollector's allocator and
free
is replaced with a do-nothingsubroutine. This approach has worked with the X11 library, for instance.
It is also possible to program in a style where free
stillreclaims storage, but the garbage collector acts as a backstop, preventingleaks that might otherwise occur. This style has also been tested withmany applications, and it works well. The advantage here is that whereit is easy for the programmer to manage memory, the programmer managesthe memory, but where it is not, the garbage collector does the job.This doesn't necessarily run any faster than free
-does-nothing,but it may help keep the heap smaller.
How is this possible?
C-compatible garbage collectors know where pointers may generally befound (e.g., "bss", "data", and stack), and maintain heap datastructures that allow them to quickly determine what bit patternsmight be pointers. Pointers, of course, look like pointers, so thisheuristic traces out all memory reachable through pointers. Whatisn't reached, is reclaimed.
This doesn't sound very portable. What if I need to port my code and there'sno garbage collector on the target platform?
Some of this code isnecessarily system-dependent, but the features of most operating systems havebeen enumerated, so garbage collection for C is available almost everywhere. That is, portability isn't a problem if the code has already been ported, and ithas. Speaking personally (this is David Chase) it's also not hard to port thesegarbage collectors to new platforms; I've ported the Boehm-Weiser collector twicemyself, when the code had not yet been ported to terribly many platforms, andwhen I had much less experience with the low-level interfaces to variousoperating systems.
Won't this leave bugs in my program?
This depends on your point of view. Using a garbage collector solves a lot of problems for a programmer, which givesa programmer time to solve other problems, or lets the job be finished faster. It's similar in flavor to floating point arithmetic or virtual memory. Both ofthese solve a tedious problem (scaling arithmetic, or paging unused data to disk)that a programmer could, in principle, solve. Some specialized code is writtenwithout FP or VM support, but in practice, if these features are available,people use them. They're generally judged to be well worth the cost.
If a program is developed using garbage collection, andthe collector is taken away, then yes, the result may contain bugs in the form ofmemory leaks. Similarly, if a program is developed using FP (or VM) and that istaken away, that program, too, may contain bugs.
Also in practice, many programs that use malloc
andfree
already leak memory, so use of a garbage collector can actuallyreduce the number of bugs in a program, and do so much more quickly than if theyhad to be tracked down and fixed by hand. This is especially true if the memoryleak is inherent in a library that cannot be repaired.
Can't a devious C programmer break the collector?
Certainly, but mostpeople have better ways to spend their time than dreaming up ways to break theirtools. The collector does rely on being able to locate copies of pointers
somewhere in an address space, so certain things won't work. Forinstance, the XOR'd pointers trick for compactly encoding a bidirectional listcannot be used -- the pointers don't look like pointers. If a process writespointers to a file, and reads them back again, the memory referenced by thosepointers may have been recycled. Most programs don't do these things, so mostprograms work with a garbage collector. Ordinary (legal) pointer arithmetic istolerated by garbage collectors for C.
One problem described by a team considering the use of GC is the use of pointermangling to get "really opaque" pointers. That is, pointers handed out from apackage to a client are XORed with a random number chosen at program starttime, and thus the client cannot access package data structures without goingthrough defined interfaces. This is simply incompatible with conservative GC. It is also incompatible with a strict interpretation of the Ansi C standard,and can confuse leak detection tools (which use conservative GC technology todetect leaks), but nonetheless people do it, and it generally does work.
Insert more questions here -- send them to <[email protected]>
What does a garbage collector do about destructors?
A destructor is some code that runs when an object is about to be freed. One ofthe main uses of destructors is to do manual memory management. For example,the destructor for an object may recursively free the objects it references. Agarbage collector obviates the need for such uses: If an object is garbage,all the objects it references will also be garbage if they are not referencedelsewhere, and so they, too, will be freed automatically.
There remains the question of what todo with destructors that do something other than assist in memory management. There are a couple of typical uses.
One use is for objects that have state outside the program itself. Thecanonical example is an object that refers to a file. When a file objectbecomes eligible for reclamation, the garbage collector needs to ensure thatbuffers are flushed, the file is closed, and resources associated with the fileare returned to the operating system.
Another use iswhere a program wants to keep a list of objects that are referenced elsewhere. The program may want know what objects are in existence for, say, accountingpurposes but does not want the mechanism of accounting to prevent objects fromotherwise being freed.
There are several ways of handling such situations:
- In systems where the garbage collector is "built in," it typically hasspecial knowledge of all the cases where outside resources can be referenced andcan deal with them appropriately.
- Many GC systems have a notion of a "weak pointer." A weak pointer is onethat is not considered as a reference by the garbage collector. So if an objectis referenced only by weak pointers, it is eligible for reclamation. Weakpointers can be used to implement the object list example.
For another example, in Java an external resource R might by protected like this:
class ClientR {
CRWeak wr;
// delegate all methods to wr;
ClientR() {
wr = new CRWeak(this);
}
}
class CRWeak extends WeakReference {
static ReferenceQueue rq = new ReferenceQueue();
static {
Thread th = new CRCleaner(rq);
th.setDaemon(true);
th.start();
}
CRWeak(Object x) {
super(x, rq);
}
ExternalResource r;
// delegated methods from ClientR
}
class CRCleaner extends Thread {
ReferenceQueue rq;
CRCleaner(ReferenceQueue rq) { this.rq = rq; }
public void run() {
while (true) {
CRWeak x = (CRWeak) rq.remove();
// Release x.r
}
}
}
When no clients have references to a ClientR, its memory is released, and the weakreference to it is placed on the respective reference queue. The cleaning threadcan ensure that the external resource is reclaimed.
- Many GC systems have a notion of "finalization."An object may beregistered with the GC system so that when it is about to reclaim the object, itruns a function on the object that can perform necessary cleanups. Finalizationis tricky. Some of the issues are:
- When can an object actually be finalized? This is trickier than it first appears in the presence of some normally-desirable optimizing transformations.
- In what thread, resource, or security context does a finalization function run?
- What happens when registered objects reference each other?
- What happens if a finalization function makes an object not be garbage any more?
For more information
-
A good book, recently published.
-
Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins, published by John Wiley and Sons, 1996. ISBN 0-471-94148-4.
-
http://www.memorymanagement.org/
-
Ravenbrook's (formerly, Harlequin's) Memory Management Reference
-
http://www.cs.utexas.edu/users/oops/papers.html
-
This is a collection of various papers on garbage collection. Among them is Paul Wilson's survey paper, which should be required reading for anyone claiming to be a practical computer scientist.
-
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
ftp://parcftp.xerox.com/pub/gc/gc.html
-
The Boehm-Weiser collector has been in use since the mid-1980s.It is widely ported, C and C++ compatible, conservative garbage collector.
-
http://www.geodesic.com
-
Geodesic systems sells garbage collectors for C and C++, among other things.I think they sell support as well.
-
Henry Baker's Archive of Research Papers
-
Many interesting random things, including his paper on real-time garbagecollection.
-
L. Peter Deutsch and Daniel G. Bobrow. Anefficient, incremental, automatic garbage collector.
Communications of theACM, 19(9):522-526, September 1976.
-
Combines heap reference counting with a stack scan to get relatively lowreference counting costs and incremental reclamation.
-
Henry G. Baker, Jr. List processingin real time on a serial computer.
Communications of the ACM,21(4):280-294, April 1978.
-
A clear exposition of how a real-time copying-compacting collector can work.
-
W. R. Stoye, T. J. W. Clarke and A. C. Norman.Some Practical Methods for Rapid Combinator Reduction. In
SIGPLAN Symposiumon LISP and Functional Programming . 1984.
-
Interesting issues and tricks, among them "one-bit reference counts".
-
David Ungar. GenerationScavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In
Proceedings of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium onPractical Software Development Environments. 1984.
-
A good paper on generational collection.
-
John Hughes. ADistributed Garbage Collection Algorithm. In
Functional Programming Languagesand Computer Architecture. 1985.
-
See title.
-
Hans Boehm and Mark Weiser. GarbageCollection in an Uncooperative Environment.
Software Practice andExperience. September, 1988.
-
Conservative garbage collection.
-
Andrew W. Appel, John R. Ellis and Kai Li.Real-time concurrent garbage collection on stock multiprocessors. In
SIGPLANSymposium on Programming Language Design and Implementation, 1988.
-
Not really "real time", but a nice adaptation of Baker's algorithm totypical hardware.
-
Joel F. Bartlett.
CompactingGarbage Collection with Ambiguous Roots. DEC WRL Research Report 88/2. February, 1988.
-
Conservative-compacting garbage collection is in fact possible.
-
John R. Ellis and David L. Detlefs.
Safe, Efficient Garbage Collection for C++. DEC SRC Research Report 102. June, 1993.
-
If you were dead-set on adding garbage collection to C++, this is what it might look like.
Contributors (so far)
-
David Gadbois <[email protected]>
-
Charles Fiterman <[email protected]>
-
David Chase <[email protected]>
-
Marc Shapiro <[email protected]>
-
Kelvin Nilsen <[email protected]>
-
Paul Haahr <[email protected]>
-
Nick Barnes <[email protected]>
-
Pekka P. Pirinen <[email protected]>