摘要:详细介绍了C++中的Name Mangling的原理和gcc中对应的实现,通过程序代码和nm c++filt等工具来验证这些原理。对于详细了解程序的链接过程有一定的帮助。
Name Mangling概述
C++的语言特性比C丰富的多,C++支持的函数重载功能是需要Name Mangling技术的最直接的例子。对于重载的函数,不能仅依靠函数名称来区分不同的函数,因为C++中重载函数的区分是建立在以下规则上的:
当然,C++还有很多其他的地方需要Name Mangling,如namespace, class, template等等。
- /*
- * simple_test.c
- * a demo to show that different name mangling technology in C++ and C
- * Author: Chaos Lee
- */
- #include<stdio.h>
- int rect_area(int x1,int x2,int y1,int y2)
- {
- return (x2-x1) * (y2-y1);
- }
- int elipse_area(int a,int b)
- {
- return 3.14 * a * b;
- }
- int main(int argc,char *argv[])
- {
- int x1 = 10, x2 = 20, y1 = 30, y2 = 40;
- int a = 3,b=4;
- int result1 = rect_area(x1,x2,y1,y2);
- int result2 = elipse_area(a,b);
- return 0;
- }
- [lichao@sg01 name_mangling]$ gcc -c simple_test.c
- [lichao@sg01 name_mangling]$ nm simple_test.o
- 0000000000000027 T elipse_area
- 0000000000000051 T main
- 0000000000000000 T rect_area
- [lichao@sg01 name_mangling]$ nm simple_test.o
- 0000000000000028 T _Z11elipse_areaii
- 0000000000000000 T _Z9rect_areaiiii
- U __gxx_personality_v0
- 0000000000000052 T main
l C++语言中规定 :以下划线并紧挨着大写字母开头或者以两个下划线开头的标识符都是C++语言中保留的标示符。所以_Z9rect_areaiiii是保留的标识符,g++编译的目标文件中的符号使用_Z开头(C99标准)。
- /*
- * simple_test.c
- * a demo to show that different name mangling technology in C++ and C
- * Author: Chaos Lee
- */
- #include<stdio.h>
- #ifdef __cplusplus
- extern "C" {
- #endif
- int rect_area(int x1,int x2,int y1,int y2)
- {
- return (x2-x1) * (y2-y1);
- }
- int elipse_area(int a,int b)
- {
- return (int)(3.14 * a * b);
- }
- #ifdef __cplusplus
- }
- #endif
- int main(int argc,char *argv[])
- {
- int x1 = 10, x2 = 20, y1 = 30, y2 = 40;
- int a = 3,b=4;
- int result1 = rect_area(x1,x2,y1,y2);
- int result2 = elipse_area(a,b);
- return 0;
- }
- [lichao@sg01 name_mangling]$ gcc -c simple_test.c
- [lichao@sg01 name_mangling]$ nm simple_test.o
- 0000000000000027 T elipse_area
- 0000000000000051 T main
- 0000000000000000 T rect_area
- [lichao@sg01 name_mangling]$ g++ -c simple_test.c
- [lichao@sg01 name_mangling]$ nm simple_test.o
- U __gxx_personality_v0
- 0000000000000028 T elipse_area
- 0000000000000052 T main
- 0000000000000000 T rect_area
事实上,C标准库中使用了大量的extern “C”关键字,因为C标准库也是可以用C++编译器编译的,但是要确保编译之后仍然保持C的接口而不是C++的接口(因为是C标准库),所以需要使用extern “C”关键字。
- /*
- * libc_test.c
- * a demo program to show that how the standard C
- * library are compiled when encountering a C++ compiler
- */
- #include<stdio.h>
- int main(int argc,char * argv[])
- {
- puts("hello world.\n");
- return 0;
- }
搜索一下puts,我们并没有看到extern “C”.奇怪么?
- [lichao@sg01 name_mangling]$ g++ -E libc_test.c | grep 'puts'
- extern int fputs (__const char *__restrict __s, FILE *__restrict __stream);
- extern int puts (__const char *__s);
- extern int fputs_unlocked (__const char *__restrict __s,
- puts("hello world.\n");
- [lichao@sg01 name_mangling]$ g++ -E libc_test.c | grep 'extern "C"'
- extern "C" {
- extern "C" {
不同编译器使用不同的方式进行name mangling, 你可能会问为什么不将C++的 name mangling标准化,这样就能实现各个编译器之间的互操作了。事实上,在C++的FAQ列表上有对此问题的回答:
"Compilers differ as to how objects are laid out, how multiple inheritance is implemented, how virtual function calls are handled, and so on, so if the name mangling were made the same, your programs would link against libraries provided from other compilers but then crash when run. For this reason, the ARM (Annotated C++ Reference Manual) encourages compiler writers to make their name mangling different from that of other compilers for the same platform. Incompatible libraries are then detected at link time, rather than at run time."
GCC采用IA 64的name mangling方案,此方案定义于Intel IA64 standard ABI.在g++的FAQ列表中有以下一段话:
"GNU C++ does not do name mangling in the same way as other C++ compilers.
This means that object files compiled with one compiler cannot be used with
GNU C++的name mangling方案和其他C++编译器方案不同,所以一种编译器生成的目标文件并不能被另外一种编译器生成的目标文件使用。
- Builtin types encoding
- <builtin-type> ::= v # void
- ::= w # wchar_t
- ::= b # bool
- ::= c # char
- ::= a # signed char
- ::= h # unsigned char
- ::= s # short
- ::= t # unsigned short
- ::= i # int
- ::= j # unsigned int
- ::= l # long
- ::= m # unsigned long
- ::= x # long long, __int64
- ::= y # unsigned long long, __int64
- ::= n # __int128
- ::= o # unsigned __int128
- ::= f # float
- ::= d # double
- ::= e # long double, __float80
- ::= g # __float128
- ::= z # ellipsis
- ::= u <source-name> # vendor extended type
Operator encoding
- <operator-name> ::= nw # new
- ::= na # new[]
- ::= dl # delete
- ::= da # delete[]
- ::= ps # + (unary)
- ::= ng # - (unary)
- ::= ad # & (unary)
- ::= de # * (unary)
- ::= co # ~
- ::= pl # +
- ::= mi # -
- ::= ml # *
- ::= dv # /
- ::= rm # %
- ::= an # &
- ::= or # |
- ::= eo # ^
- ::= aS # =
- ::= pL # +=
- ::= mI # -=
- ::= mL # *=
- ::= dV # /=
- ::= rM # %=
- ::= aN # &=
- ::= oR # |=
- ::= eO # ^=
- ::= ls # <<
- ::= rs # >>
- ::= lS # <<=
- ::= rS # >>=
- ::= eq # ==
- ::= ne # !=
- ::= lt # <
- ::= gt # >
- ::= le # <=
- ::= ge # >=
- ::= nt # !
- ::= aa # &&
- ::= oo # ||
- ::= pp # ++
- ::= mm # --
- ::= cm # ,
- ::= pm # ->*
- ::= pt # ->
- ::= cl # ()
- ::= ix # []
- ::= qu # ?
- ::= st # sizeof (a type)
- ::= sz # sizeof (an expression)
- ::= cv <type> # (cast)
- ::= v <digit> <source-name> # vendor extended operator
- <type> ::= <CV-qualifiers> <type>
- ::= P <type> # pointer-to
- ::= R <type> # reference-to
- ::= O <type> # rvalue reference-to (C++0x)
- ::= C <type> # complex pair (C 2000)
- ::= G <type> # imaginary (C 2000)
- ::= U <source-name> <type> # vendor extended type qualifier
- /*
- * Author: Chaos Lee
- * Description: A simple demo to show how the rules used to mangle functions' names work
- * Date:2012/05/06
- */
- #include<iostream>
- #include<string>
- using namespace std;
- int test_func(int & tmpInt,const char * ptr,double dou,string str,float f)
- {
- return 0;
- }
- int main(int argc,char * argv[])
- {
- char * test="test";
- int intNum = 10;
- double dou = 10.012;
- string str="str";
- float f = 1.2;
- test_func(intNum,test,dou,str,f);
- return 0;
- }
- [lichao@sg01 name_mangling]$ g++ -c func.cpp
- [lichao@sg01 name_mangling]$ nm func.cpp
- nm: func.cpp: File format not recognized
- [lichao@sg01 name_mangling]$ nm func.o
- 0000000000000060 t _GLOBAL__I__Z9test_funcRiPKcdSsf
- U _Unwind_Resume
- 0000000000000022 t _Z41__static_initialization_and_destruction_0ii
- 0000000000000000 T _Z9test_funcRiPKcdSsf
- U _ZNSaIcEC1Ev
- U _ZNSaIcED1Ev
- U _ZNSsC1EPKcRKSaIcE
- U _ZNSsC1ERKSs
- U _ZNSsD1Ev
- U _ZNSt8ios_base4InitC1Ev
- U _ZNSt8ios_base4InitD1Ev
- 0000000000000000 b _ZSt8__ioinit
- U __cxa_atexit
- U __dso_handle
- U __gxx_personality_v0
- 0000000000000076 t __tcf_0
- 000000000000008e T main
加粗的那行就是函数test_func经过name mangling之后的结果,其中:
C++的name mangling技术一般使得函数变得面目全非,而很多情况下我们在查看这些符号的时候并不需要看到这些函数name mangling之后的效果,而是想看看是否定义了某个函数,或者是否引用了某个函数,这对于我们调试程序是非常有帮助的。
所以需要一种方法从name mangling之后的符号变换为name mangling之前的符号,这个过程称之为name demangling.事实上有很多工具提供这些功能,最常用的就是c++file命令,c++filt命令接受一个name mangling之后的符号作为输入并输出demangling之后的符号。例如:
- [lichao@sg01 name_mangling]$ c++filt _Z9test_funcRiPKcdSsf
- test_func(int&, char const*, double, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, float)
- [lichao@sg01 name_mangling]$ nm func.o | c++filt
- 0000000000000060 t global constructors keyed to _Z9test_funcRiPKcdSsf
- U _Unwind_Resume
- 0000000000000022 t __static_initialization_and_destruction_0(int, int)
- 0000000000000000 T test_func(int&, char const*, double, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, float)
- U std::allocator<char>::allocator()
- U std::allocator<char>::~allocator()
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
- U std::ios_base::Init::Init()
- U std::ios_base::Init::~Init()
- 0000000000000000 b std::__ioinit
- U __cxa_atexit
- U __dso_handle
- U __gxx_personality_v0
- 0000000000000076 t __tcf_0
- 000000000000008e T main
- [lichao@sg01 name_mangling]$ nm -C func.o
- 0000000000000060 t global constructors keyed to _Z9test_funcRiPKcdSsf
- U _Unwind_Resume
- 0000000000000022 t __static_initialization_and_destruction_0(int, int)
- 0000000000000000 T test_func(int&, char const*, double, std::string, float)
- U std::allocator<char>::allocator()
- U std::allocator<char>::~allocator()
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
- U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
- U std::ios_base::Init::Init()
- U std::ios_base::Init::~Init()
- 0000000000000000 b std::__ioinit
- U __cxa_atexit
- U __dso_handle
- U __gxx_personality_v0
- 0000000000000076 t __tcf_0
- 000000000000008e T main
又到了Last but not least important的时候了,还有一个特别重要的接口函数就是__cxa_demangle(),此函数的原型为:
- namespace abi {
- extern "C" char* __cxa_demangle (const char* mangled_name,
- char* buf,
- size_t* n,
- int* status);
- }
- /*
- * Author: Chaos Lee
- * Description: Employ __cxa_demangle to demangle a mangling function name.
- * Date:2012/05/06
- *
- */
- #include<iostream>
- #include<cxxabi.h>
- using namespace std;
- using namespace abi;
- int main(int argc,char *argv[])
- {
- const char * mangled_string = "_Z9test_funcRiPKcdSsf";
- char buffer[100];
- int status;
- size_t n=100;
- __cxa_demangle(mangled_string,buffer,&n,&status);
- cout<<buffer<<endl;
- cout<<status<<endl;
- return 0;
- }
- [lichao@sg01 name_mangling]$ g++ cxa_demangle.cpp -o cxa_demangle
- [lichao@sg01 name_mangling]$ ./cxa_demangle
- test_func(int&, char const*, double, std::string, float)
- 0
l 编写名称为name mangling接口函数,打开重复符号的编译开关,可以替换原来函数中链接函数的指向,从而改变程序的运行结果。