本篇同样节选自C++语言之父 Bjarne Stroustrup 撰写的C++发展史三部曲之二 Evolving a language in and for the real world: C++ 1991-2006。文章有一段解释了为什么C++语言在标准上不支持垃圾回收(Garbage Collection, GC),翻译分享给大家。你会看到这其中不仅是技术上的考量,更有可行性、兼容性上的妥协。有兴趣的读者非常建议阅读这三篇文章的原文。
Sometime in 1995, it dawned on me that a majority of the committee was of the opinion that plugging a garbage collector into a C++ program was not standard-conforming because the collector would inevitably perform some action that violated a standard rule. Worse, they were obviously right about the broken rules. For example:
1995年,我突然意识到委员会多数人的意见是为标准C++引入一个垃圾回收器是不符合标准的,因为它不可避免地会违反(C++语言设计的)标准规则。而且他们的观点对于“新标准不破坏旧程序”这一规则显然是正确的,例如:
// 以下程序,在引入GC后可能被破坏(注意指针p)
void f()
{
int* p = new int[100];
// fill *p with valuable data
file << p; // write the pointer to a file
p = 0; // remove the pointer to the ints
// work on something else for a week
file >> p;
if (p[37] == 5) { // now use the ints
// ...
}
}
My opinion — as expressed orally and in print — was roughly: “such programs deserve to be broken” and “it is perfectly good C++ to use a conservative garbage collector”. However, that wasn’t what the draft standard said. A garbage collector would undoubtedly have recycled that memory before we read the pointer back from the file and started using the integer array again. However, in standard C and standard C++, there is absolutely nothing that allows a piece of memory to be recycled without some explicit programmer action.
我的观点——如口头和纸面所述——大概为:“这样的程序就应该被破坏”,以及“C++使用一个保守的垃圾回收器是非常好的”。然而,起草标准不这么认为。如上例所示,GC无疑会在我们再次将文件读入 p
指针并作为数组使用前将其回收掉。然而在标准C和C++中,没有程序员的指示,绝对不会有哪怕一丁点的内存会被允许回收。
To fix this problem, I made a proposal to explicitly allow “optional automatic garbage collection” [123]. This would bring C++ back to what I had thought I had defined it to be and make the garbage collectors already in actual use [11, 47] standard conforming. Explicitly mentioning this in the standard would also encourage use of GC where appropriate. Unfortunately, I seriously underestimated the dislike of garbage collection in a large section of the committee and also mishandled the proposal.
为了解决这个问题,我做出了允许“可选自动垃圾回收”的提议。这会将C++带回我认为其应在的定义,并让已经在使用中的GC符合标准。在标准中明确指出这一点也会鼓励人们在适当的时候使用GC。不幸的是,我严重低估了委员会大部分成员对GC的厌恶,而且也没有处理好我的提案。
My fatal mistake with the GC proposal was to get thoroughly confused about the meaning of “optional”. Did “optional” mean that an implementation didn’t have to provide a garbage collector? Did it mean that the programmer could decide whether the garbage collector was turned on or not? Was the choice made at compile time or run time? What should happen if I required the garbage collector to be activated and the implementation didn’t supply one? Can I ask if the garbage collector is running? How? How can I make sure that the garbage collector isn’t running during a critical operation? By the time a confused discussion of such questions had broken out and different people had found conflicting answers attractive, the proposal was effectively dead.
我在GC提案上犯的致命错误是没有完全理清“可选”的含义。“可选”意味着库的实现中不一定要提供一个垃圾回收器吗?意味着程序员可以决定要开启还是关闭GC吗?决定是在编译期还是运行期做出的?如果我需要启动GC但库实现未提供会发生什么?我能询问GC是否在运作吗?如果询问?我如何确保在一个关键操作期间GC不会运行?当对如此一连串迷惑问题的讨论爆发,而且不同人认为相矛盾的回答都具有吸引力时,这个提案基本也已经流产了。
Realistically, garbage collection wouldn’t have passed in 1995, even if I hadn’t gotten confused. Parts of the committee
• strongly distrusted GC for performance reasons
• disliked GC because it was seen as a C incompatibility
• didn’t feel they understood the implications of accepting GC (we didn’t)
• didn’t want to build a garbage collector
• didn’t want to pay for a garbage collector (in terms of money, space, or time)
• wanted alternative styles of GC
• didn’t want to spend precious committee time on GC
现实地说,GC在1995年根本不可能通过标准,即使我没有被这些迷惑所困。委员会的一部分成员:
Basically, it was too late in the standards process to introduce something that major. To get anything involving garbage collection accepted, I should have started a year earlier.
根本上说,想要在标准化过程中引入一个如此重要的部分已经太迟了。想要让任何GC相关的内容被C++标准接纳,我应该早一些年开始。
My proposal for garbage collection reflected the then major use of garbage collection in C++ — that is, conservative collectors that don’t make assumptions about which memory locations contain pointers and never move objects around in memory [11]. Alternative approaches included creating a type-safe subset of C++ so that it is possible to know exactly where every pointer is, using smart pointers [34] and providing a separate operator (gcnew or new(gc)) for allocating objects on a “garbage-collected heap”. All three approaches are feasible, provide distinct benefits, and have proponents. This further complicates any effort to standardize garbage collection for C++.
我的GC提议反映了当时C++中GC的主要使用方式——一种保守的回收器,不对内容中哪些位置包含指针做假设,并且永远不在内存中移动对象。其它方法包括创建一个C++的类型安全的子集,从而可以知道每个指针的准确位置;使用智能指针并提供一个独立的操作符(gcnew 或 new(gc))以在“会被GC的堆”上创建对象。(注:例如笔者比较熟悉的UE的C++就是自己创建了一套非保守的垃圾回收机制(其实是一整套对象机制))这三种方法都是可行的,提供了不同的优势,各自有其提倡者。这更使C++的GC标准化变得复杂。(笔者感慨:这也是C++标准化这种缺乏中心领导,靠委员会多数决议的方式的缺陷所在。很多明显非常必要的特性却因设计和实现上的分歧而迟迟不能决定导致一拖再拖。结果就是各种外部项目早早开发出一套自己的实现机制,最后好不容易确定的标准还得避免和这些项目出现命名冲突。C++中一些略显古怪的关键字和类名就是这个原因,例如unordered_map(hash_map被占用),decltype(typeof被占用)。)
A common question over the years has been: Why don’t you add GC to C++? Often, the implication (or follow-up comment) is that the C++ committee must be a bunch of ignorant dinosaurs not to have done so already. First, I observe that in my considered opinion, C++ would have been stillborn had it relied on garbage collection when it was first designed. The overheads of garbage collection at the time, on the hardware available, precluded the use of garbage collection in the hardware-near and performance-critical areas that were C++’s bread and butter. There were garbage-collected languages then, such as Lisp and Smalltalk, and people were reasonably happy with those for the applications for which they were suitable. It was not my aim to replace those languages in their established application areas. The aim of C++ was to make object-oriented and data-abstraction techniques affordable in areas where these techniques at the time were “known” to be impractical. The core areas of C++ usage involved tasks, such as device drivers, high-performance computation, and hard-real-time tasks, where garbage collection was (and is) either infeasible or not of much use.
这些年的一个常见问题是:你为什么没有将GC加入到C++中?通常,其暗示(或者是接下来的评论)就是C++委员会一定是一帮无知的史前恐龙。首先,我认为C++如果最初设计时依赖于GC,其一定早已流产。GC在当时的硬件上带来的时间额外负担,使GC无法使用在靠近硬件以及高性能的领域,而那才是C++大显身手的地方。当时也有GC的编程语言,如Lisp和Smalltalk,它们在合适的应用领域也能让人们用的开心。我的目标不是去在那些领域取代它们。C++的目标是将面向对象和数据抽象技术在当时被认为无法实际使用的领域中变得可以承担。C++的核心使用领域包括设备驱动、高性能计算和硬实时任务,GC在这些领域中要么是不可行的,要么是没什么用的。
Once C++ was established without garbage collection and with a set of language features that made garbage collection difficult (pointers, casts, unions, etc.), it was hard to retrofit it without doing major damage. Also, C++ provides features that make garbage collection unnecessary in many areas (scoped objects, destructors, facilities for defining containers and smart pointers, etc.). That makes the case for garbage collection less compelling.
一旦C++最初以无GC的方式建立,并且拥有一系列让GC难以实现的语言特性(指针、类型转换(尤其是指针和非指针类型的双向转换)、联合体union等),想在不产生严重破坏的情况下对其做改造(以适应GC)就很难了。而且,C++提供了能让GC在很多领域不再必要的特性(有作用域的对象、析构函数、智能指针等)。
So, why would I like to see garbage collection supported in C++? The practical reason is that many people write software that uses the free store in an undisciplined manner. In a program with hundreds of thousands of lines of code with news and deletes all over the place, I see no hope for avoiding memory leaks and access through invalid pointers. My main advice to people who are starting a project is simply: “don’t do that!”. It is fairly easy to write correct and efficient C++ code that avoids those problems through the use of containers (STL or others; §4.1), resource handles (§5.3.1, and (if needed) smart pointers (§6.2). However, many of us have to deal with older code that does deal with memory in an undisciplined way, and for such code plugging in a conservative garbage collector is often the best option. I expect C++0x to require every C++ implementation to be shipped with a garbage collector that, somehow, can be either active or not.
那么,为什么我还说愿意看到C++支持GC呢?实际原因是人们编写软件时对于堆存储的使用太没有纪律。在一个成百上千行代码到处使用new和delete的程序中,我很难期望其不出现内存泄漏和访问无效指针。我对于开始一个新项目的人们的建议很简单:“不要那么做!” 在C++中写出正确且高效的代码以避免这些问题是很简单的:使用容器(STL或其它),资源管理类(注:使用类包裹裸指针,RAII特性保证类对象析构时资源被释放),以及智能指针。然而,我们中的很多人需要处理无纪律使用内存的老旧代码,而对于那样的代码,接入一个保守的GC通常是最佳选择。我期望C++0x要求每个C++实现都带有一个可以是激活或未激活的垃圾回收器。
笔者补充:目前看来,虽然标准C++没有提供GC,但RAII特性和C++11提供的更完善的智能指针(unique_ptr和shared_ptr,详见我之前的博文)实际保证了对象会在作用域结束时被析构释放。因此,现代C++非常提倡使用这些特性而非裸指针。