How Garbage Collection Really Works 垃圾清理究竟是如何工作的

How Garbage Collection Really Works 垃圾清理究竟是如何工作的

原文地址:http://www.dynatrace.com/en/javabook/how-garbage-collection-works.html

关键点

  • 标记清除算法
  • GC root 的选取:
    • 一个线程的栈所引用的局部变量
    • 活动的线程
    • 静态变量
    • JNI引用的变量

全文翻译

Java Memory Management, with its built-in garbage collection, is one of the language’s finest achievements. It allows developers to create new objects without worrying explicitly about memory allocation and deallocation, because the garbage collector automatically reclaims memory for reuse. This enables faster development with less boilerplate code, while eliminating memory leaks and other memory-related problems. At least in theory.

带有垃圾回收器的Java内存管理j机制是编程语言中最伟大的成就之一。它允许开发创建对象,而无需特别关心内存的分配和回收,因为垃圾清理机制将自动对内存进行回收再利用。此机制将会加速开发,并减少大量样板代码的使用。至少从理论上说,它消除了内存泄露和其他内存相关的问题。

Ironically, Java garbage collection seems to work too well, creating and removing too many objects. Most memory-management issues are solved, but often at the cost of creating serious performance problems. Making garbage collection adaptable to all kinds of situations has led to a complex and hard-to-optimize system. In order to wrap your head around garbage collection, you need first to understand how memory management works in a Java Virtual Machine (JVM).

讽刺的是,Java垃圾清理机制虽然看似完美,创建删除了不少对象,让大部分的内存管理问题得以被解决,但是这常常要以严重的性能损失为代价。为了让此机制能够适应各类情形,人们已经制作了一个复杂并且非常难以优化的系统。为了能让你在垃圾清理上游刃有余,你首先需要理解内存管理是如何在JVM中工作的。

How Garbage Collection Really Works 垃圾清理究竟是如何工作的

Many people think garbage collection collects and discards dead objects. In reality, Java garbage collection is doing the opposite! Live objects are tracked and everything else designated garbage. As you’ll see, this fundamental misunderstanding can lead to many performance problems.

在很多人眼里垃圾清理机制会收集并清理无效对象,而事实上,GC做的正好相反。有效对象会被追踪,而其他的都会被视为垃圾。如你所见,这个基本的错误认识会引起很多性能问题。

Let’s start with the heap, which is the area of memory used for dynamic allocation. In most configurations the operating system allocates the heap in advance to be managed by the JVM while the program is running. This has a couple of important ramifications:

  • Object creation is faster because global synchronization with the operating system is not needed for every single object. An allocation simply claims some portion of a memory array and moves the offset pointer forward (see Figure 2.1). The next allocation starts at this offset and claims the next portion of the array.
  • When an object is no longer used, the garbage collector reclaims the underlying memory and reuses it for future object allocation. This means there is no explicit deletion and no memory is given back to the operating system.

让我们从堆开始。堆是一块用于动态分配内存空间的区域。在绝大部分配置中,当程序运行时,操作系统会提前分配将要被JVM管理的堆内存区域。这有两个由此衍生的关键点需要主要一下:

  • 对象的创建会变的更快,因为并非每个对象都需要操作系统的全局同步。所谓分配,不过是简单声明了一块内存数组,并将偏移量指针前移(如图2.1)。下一次的分配就会从这个偏移量开始并声明下一块内存数组。
  • 当一个对象不再被使用时,GC就会回收潜在的内存空间以用于未来的对象空间分配。但这意味着没有显示的删除,也没有任何内存空间交还给操作系统。(如图所示,JVM只是简单的在已分配区域的最后端分配新的对象)
    How Garbage Collection Really Works 垃圾清理究竟是如何工作的_第1张图片

All objects are allocated on the heap area managed by the JVM. Every item that the developer uses is treated this way, including class objects, static variables, and even the code itself. As long as an object is being referenced, the JVM considers it alive. Once an object is no longer referenced and therefore is not reachable by the application code, the garbage collector removes it and reclaims the unused memory. As simple as this sounds, it raises a question: what is the first reference in the tree?

由JVM管理的所有对象都在堆中被分配。开发者使用的每个对象都是被这样对待的,包括class对象,静态变量甚至是代码本身。只要一个对象被引用,那么JVM就将视其为存活对象。一但一个对象不再有引用并因此对于应用程序代码而言变得不可达,GC就会清除它并回收这块内存。听上去很简单,但自然会引发一个问题:谁是树根?

Garbage-Collection Roots — The Source of All Object Trees 垃圾清理根节点——所有对象树的源头



 Every object tree must have one or more root objects. As long as the application can reach those roots, the whole tree is reachable. But when are those root objects considered reachable? Special objects called garbage-collection roots (GC roots; see Figure 2.2) are always reachable and so is any object that has a garbage-collection root at its own root.

每个对象树必须包含一个或多个根对象。只要这些根对应用可达,那么整棵树就是可达的。但是何时这些根对象是被视为可达的?名为垃圾清理根节点(GC 根)的特殊对象总是可达的。因此,当任何对象在自己所在树的根上具有这个GC 根节点时,这个对象就被认为是可达的。

There are four kinds of GC roots in Java:

  1. Local variables are kept alive by the stack of a thread. This is not a real object virtual reference and thus is not visible. For all intents and purposes, local variables are GC roots.
  2. Active Java threads are always considered live objects and are therefore GC roots. This is especially important for thread local variables.
  3. Static variables are referenced by their classes. This fact makes them de facto GC roots. Classes themselves can be garbage-collected, which would remove all referenced static variables. This is of special importance when we use application servers, OSGi containers or class loaders in general. We will discuss the related problems in the Problem Patterns section.
  4. JNI References are Java objects that the native code has created as part of a JNI call. Objects thus created are treated specially because the JVM does not know if it is being referenced by the native code or not. Such objects represent a very special form of GC root, which we will examine in more detail in the Problem Patterns section below.

Java中四种类型的GC根:

  1. 一个线程的栈所引用的局部变量:这并非一个真正的对象引用,因此对外也是不可见的。对于任何情况而言,这些本地变量就是GC root(译者注:引用是在栈帧中的本地变量表中的,真正的对象在堆中 ,故曰不是一个真正的引用)
  2. 活动的Java线程:它们总被视为存活的对象,也就因此成为了GC根节点。这对线程局部变量而言十分重要。
  3. 由类持有的静态变量:由于被类持有,它们因此成为了实际上的GC根节点。类是可以被回收掉的,回收时会移除所有的静态变量。当我们使用应用服务,OSGi容器,或是任何类加载器,这就显得尤为重要了。我们会在《问题类型》章节中讨论相关的问题
  4. JNI引用的对象:JNI引用指的是由Native代码创建的Java对象成为了JNI调用的一部分。这些对象是被特别对待的,因为JVM不知道Native代码是否正在引用它们。这些对象代表了GC根的一种特别类型。在以后的《问题类型》章节中我们会讨论相关的更多细节

How Garbage Collection Really Works 垃圾清理究竟是如何工作的_第2张图片

Therefore, a simple Java application has the following GC roots:

  • Local variables in the main method
  • The main thread
  • Static variables of the main class

因此,一个简单的应用程序有如下几种的GC根:

  • 在main方法中的局部变量
  • 主线程
  • Main类中的静态变量

Marking and Sweeping Away Garbage 垃圾对象的创建和处理

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm . As you might intuit, it’s a straightforward, two-step process:

  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.

为了决定哪些对象是不会再被使用,JVM间歇地运行一个非常适合被称为“标记清除”的算法。也许凭你的直觉你已经能感觉到,这就是一个直接的,两步走的过程:

  1. 这个算法会遍历所有对象的引用,从GC 根节点开始,并标记每个发现的存活的对象。
  2. 所有的没有被未标记对象占用的内存都会被回收,这些内存只是被标记为“空闲”,本质上相当于清理无用对象所在内存。

Garbage collection is intended to remove the cause for classic memory leaks: unreachable-but-not-deleted objects in memory. However, this works only for memory leaks in the original sense. It’s possible to have unused objects that are still reachable by an application because the developer simply forgot to dereference them. Such objects cannot be garbage-collected. Even worse, such a logical memory leak cannot be detected by any software (see Figure 2.3). Even the best analysis software can only highlight suspicious objects. We will examine memory leak analysis in the Analyzing the Performance Impact of Memory Utilization and Garbage Collection section, below.

垃圾清理试图铲除内存泄露的经典触发原因:未在内存中删除不可达对象。但是仅仅这对于此种情况而言,GC才能发挥效力。有一种可能就是虽然这个对象不会被再次使用,但是它对于应用而言依旧可达。因为开发者可能就是简单的忘记给他们解除引用了。这类的对象是不能被GC的。更糟糕的情况,这样的一种逻辑上造成的内存泄露并不能被任何软件所侦测到(如图所示)。最优秀的分析软件也仅能对这些可疑对象高亮标记。我们将会在后续的《分析内存利用上的性能影响》与《垃圾清理》章节对内存泄露继续探讨。

你可能感兴趣的:(java,垃圾清理根节点)