C++中的名称修饰

来自维基百科https://en.wikipedia.org/wiki/Name_mangling

Name mangling in C++

C++ compilers are the most widespread users of name mangling. The first C++ compilers were implemented as translators toC source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembly directly, the system'slinker generally did not support C++ symbols, and mangling was still required.

名称修饰在C++中存在着广泛的使用。第一个C++编译器将C++代码转换为C代码,然后使用C编译器来编译从而得到目标代码;正因为如此,符号的名称需要符合C语言的标志符规则。即使后来,C++的编译器能够直接将将C++代码转换为机器码或者汇编代码,系统的连接器却通常不支持C++的符号,所以名称修饰仍然需要。

The C++ language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such asclasses, templates, namespaces, and operator overloading, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling (decorating) the name of asymbol. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.

C++编程语言并未规定标准的名称修饰方案,所以每一种编译器按照自己的方法实现。C++由于具有一些复杂的语言特性,比如:类,模板,命名空间,运算符重载等,这会改变了特定符号在上下文或者使用中的含义。这些特性的元信息可以通过修饰名称的符号来消除。因为这种名称修饰系统在不同的编译器之间并没有标准化,几乎没有链接器能够链接不同的编译器产生的目标文件。

Simple example

Consider the following two definitions of f() in a C++ program:

int  f (void) { return 1; }
int  f (int)  { return 0; }
void g (void) { int i = f(), j = f(0); }

These are distinct functions, with no relation to each other apart from the name. If they were natively translated into C with no changes, the result would be an error — C does not permit two functions with the same name. The C++ compiler therefore will encode the type information in the symbol name, the result being something resembling:

这些是不同的函数,相互之间除了名称没有其他关联。如果这些函数不做任何改变而直接转换为C代码,会带来一个错误——C语言不允许存在两个同名的函数。所以C++编译器需要将函数的签名信息编码到函数的符号名称中,结果大概如下所示:

int  __f_v (void) { return 1; }
int  __f_i (int)  { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }

Notice that g() is mangled even though there is no conflict; name mangling applies toall symbols.

注意函数g() 的名称也被修饰了,即使不存在与函数g() 的名称相冲突的地方:名称修饰会应用到所用的符号上。

Complex example

For a more complex example, we'll consider an example of a real-world name mangling implementation that is used by GNU GCC 3.x, and how it mangles the following example class. The mangled symbol is shown below the respective identifier name.

举一个更加复杂的例子:我们来研究现实世界中GNU GCC 3.x 的编译器实现中的名称修饰规则,看看它是怎样来修饰下边例子中的class 名称。修饰后的名称分别都在各标识符名称下方。

namespace wikipedia 
{
   class article 
   {
   public:
      std::string format (void); 
         /* = _ZN9wikipedia7article6formatEv */

      bool print_to (std::ostream&); 
         /* = _ZN9wikipedia7article8print_toERSo */

      class wikilink 
      {
      public:
         wikilink (std::string const& name);
            /* = _ZN9wikipedia7article8wikilinkC1ERKSs */
      };
   };
}

All mangled symbols begin with _Z (note that an underscore followed by a capital is areserved identifier in C, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed byN, then a series of pairs (the length being the length of the next identifier), and finallyE. For example, wikipedia::article::format becomes

所有修饰的符号都是以_Z开头(注意下划线后紧跟一个大写字母的名称在C语言中是保留的标志符,这避免了与用户的标志符相冲突)。对于嵌套的名称(包括命名空间和类),会紧跟着一个N,接着是一系列的<标志符长度,标志符>对儿,最后接着一个E。比如,名称 wikipedia::article::format 被修饰为如下的符号所示。

_ZN9wikipedia7article6formatE

For functions, this is then followed by the type information; as format() is avoid function, this is simply v; hence:

对于函数,接着的是类型信息,正如format()的参数的类型为void,所以名称中简单的加入一个 v 来表示。

_ZN9wikipedia7article6formatEv

For print_to, a standard type std::ostream (or more properlystd::basic_ostream >) is used, which has the special aliasSo; a reference to this type is therefore RSo, with the complete name for the function being:

对于名字 print_to ,参数的类型为std::ostream(准确来说是 std::basic_ostream > ), 修饰时将用一个特定的别名So来表示。实际上参数的类型是引用,所以实际中采用的修饰名为RSo, 最终得到的完整的修饰名称为:

_ZN9wikipedia7article8print_toERSo

How different compilers mangle the same functions

There isn't a standard scheme by which even trivial C++ identifiers are mangled, and consequently different compiler vendors (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:

将C++中哪怕最平凡的名称进行修饰并没有一套标准,结果不同的编译器供应商(或者甚至是相同编译器的不同版本,或者相同编译器在不同平台上的实现)的名称修饰规则迥异(这也导致了相互之间的不兼容)。考虑下表中不同的C++编译器修饰相同的函数名称:

Compiler void h(int) void h(int, char) void h(void)
Intel C++ 8.0 for Linux _Z1hi _Z1hic _Z1hv
HP aC++ A.05.55 IA-64
IAR EWARM C++ 5.4 ARM
GCC 3.x and 4.x
IAR EWARM C++ 7.4 ARM _Zhi _Zhic _Zhv
GCC 2.9x h__Fi h__Fic h__Fv
HP aC++ A.03.45 PA-RISC
Microsoft Visual C++ v6-v10 (mangling details) ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
Digital Mars C++
Borland C++ v3.1 @h$qi @h$qizc @h$qv
OpenVMS C++ V6.5 (ARM mode) H__XI H__XIC H__XV
OpenVMS C++ V6.5 (ANSI mode)   CXX$__7H__FIC26CDH77 CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 CXX$_Z1HI2DSQ26A CXX$_Z1HIC2NP3LI4 CXX$_Z1HV0BCA19V
SunPro CC __1cBh6Fi_v_ __1cBh6Fic_v_ __1cBh6F_v_
Tru64 C++ V6.5 (ARM mode) h__Xi h__Xic h__Xv
Tru64 C++ V6.5 (ANSI mode) __7h__Fi __7h__Fic __7h__Fv
Watcom C++ 10.6 W?h$n(i)v W?h$n(ia)v W?h$n()v

Notes:

  • The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (ARM). With the advent of new features in standard C++, particularly templates, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identical mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backwards compatible.
  • On IA-64, a standard ABI exists (see external links), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x, in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
  • The Visual Studio and Windows SDK include the program undname which prints the C-style function prototype for a given mangled name.
  • On Microsoft Windows the Intel compiler uses the Visual C++ name mangling for compatibility.[1]
  • For the IAR EWARM C++ 7.4 ARM compiler the best way to determine the name of a function is to compile with the assembler output turned on and to look at the output in the ".s" file thus generated.

Handling of C symbols when linking from C++

The job of the common C++ idiom:

#ifdef __cplusplus 
extern "C" {
#endif
    /* ... */
#ifdef __cplusplus
}
#endif

is to ensure that the symbols following are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.

通常C++中的extern “C”的惯用法就是保证C++编译器像C语言那样对待这些符号不进行修饰。C语言中的定义是不进行修饰的,C++编译器需要避免对那些符号引用的修饰。

For example, the standard strings library, usually contains something resembling:

#ifdef __cplusplus
extern "C" {
#endif

void *memset (void *, int, size_t);
char *strcat (char *, const char *);
int   strcmp (const char *, const char *);
char *strcpy (char *, const char *);

#ifdef __cplusplus
}
#endif

Thus, code such as:

这样,下边的代码

if (strcmp(argv[1], "-x") == 0) 
    strcpy(a, argv[2]);
else 
    memset (a, 0, sizeof(a));

uses the correct, unmangled strcmp and memset. If the extern had not been used, the (SunPro) C++ compiler would produce code equivalent to:

将会使用正确的,没有被C++编译器修饰过的  strcmpmemset。而如果没有使用 extern “C”,C++编译器(SunPro版本)将会产生下边的代码:

if (__1cGstrcmp6Fpkc1_i_(argv[1], "-x") == 0) 
    __1cGstrcpy6Fpcpkc_0_(a, argv[2]);
else 
    __1cGmemset6FpviI_0_ (a, 0, sizeof(a));

Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.

由于上边的这些函数符号在C语言运行时库中并不存在,所以最后将发生链接错误。

Standardised name mangling in C++

Though it would seem that standardised name mangling in the C++ language would lead to greater interoperability between compiler implementations, such a standardization by itself would not suffice to guarantee C++ compiler interoperability and it might even create a false impression that interoperability is possible and safe when it isn't. Name mangling is only one of severalapplication binary interface (ABI) details that need to be decided and observed by a C++ implementation. Other ABI aspects likeexception handling, virtual table layout, structure and stack frame padding, etc. also cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g., length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.

尽管看起来将C++中的名称修饰标准化会致使不同的编译器实现之间能够更加通用。名称修饰标准化不能足够的保证编译器之间的通用,而且可能会造成通用是可能的和安全的这一虚假的印象,实际上并不能!实际上名称修饰只是C++ ABI实现中需要明确和研究的ABI细节之一。其他的ABI 方面像异常处理、虚函数表的布局、结构体和栈帧中的填充等,都导致了不同的C++实现之间的不兼容。另外,名称修饰的标注化在那些有实现限制(比如符号的长度等),只能进行特定的修饰的系统上将会带来问题。标准化名称修饰需要能够阻止进行名称修饰——有的地方根本就不需要名称修饰——比如一个能够懂得C++语言的链接器。

The C++ standard therefore does not attempt to standardise name mangling. On the contrary, theAnnotated C++ Reference Manual (also known as ARM, ISBN 0-201-51459-1, section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI, such asexception handling and virtual table layout, are incompatible.

正式因为这些差异,C++标准并没有尝试将名称修饰标准化。相反,C++参考手册甚至还鼓励使用不用的名称修饰方法,这样在其他导致ABI,包括异常处理、虚函数表的布局不兼容的时候,能够激发不同的链接器实现。

Nevertheless, as detailed in the section above, on some platforms[example needed] the full C++ ABI has been standardized, including name mangling.

然后,如同上边这部分说陈述的,在有些平台上,完整的C++ ABI已经标准化了,包括名称修饰。

Real-world effects of C++ name mangling

Because C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g., GNU GCC and the OS vendor's compiler) wished to install theBoost C++ Libraries, it would have to be compiled twice — once for the vendor compiler and once for GCC.

因为C++的符号通常由DLL或者一些共享对象的文件中导出,名称修饰方案就不仅仅是编译器内部的问题。不同的编译器(或者相同编译器的不同版本之间)在不同的修饰方案下得到符号,程序使用编译出的库文件,意味着符号将频繁的出现未决议的问题。所以,如果安装多个C++编译器的系统(比如GNU的GCC和操作系统供应商的编译器)想要安装boost C++库,系统就必须将库编译两次来安装——一个给GCC编译器使用,一个给操作系统供应商的编译器使用。

It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g., classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).

以安全性为目的来考量,使用不同的名称修饰产生不兼容的目标代码(目标代码以不同的ABI为基础,而非类或者异常)是个好主意。这保证了那些不兼容在链接阶段就能检测出来,而非程序运行的时候(运行时的问题将导致不明的漏洞和严重的稳定性问题)。

For this reason name decoration is an important aspect of any C++-related ABI.

正因为如此,名称修饰在C++中与ABI相关的方面处于重要的地位。

你可能感兴趣的:(C++中的名称修饰)