C++之Function语意学(virtual)

这篇准备通过《深度探索C++对象模型》的第四章节Function语意学,简单描述下多重继承下的函数调用,因为最近是在再次阅读此书,顺便记下一些实现并反汇编出具体的实现。

之前面试过一些候选人,对于虚函数的理解,只是停留在一个虚表,但只要一深入问一点就不清楚了。可能背题只是应付面试或者笔试,但如果想要走更深入一些,还是要理解下原理,甚至自己gdb看一下。当然,这些只是个人选择和兴趣。

之前由于阅读STL源码剖析不是特别仔细,有些接口的全名和使用差不多都忘记了。这里也贴上之前分析过的:
类实例的构造
为什么会继续基类虚函数的默认形参?
基类指针指向派生类数组的一些问题
C++对象模型及性能优化杂谈一
C++对象模型及性能优化杂谈二

这里不分析inline/static function的情况,只是单纯分析通过对象和引用/指针调用(非)虚成员函数的表现,及多重继承下的情况。

情况一:

  1 #include 
  2 using std::cout;
  3 using std::endl;
  4 
  5 class base {
  6 public:
  7     void show() { cout << "base" << endl;}
  8 private:
  9     int mvalue;
 10 };
 11 
 12 int main() {
 13     base obj;
 14     obj.show();
 15     base* pbase = &obj;
 16     pbase->show();
 17     pbase = nullptr;

以下经过返回汇编后的结果,由于mac系统反汇编出的结果有点不一样,这里直接gdb进汇编代码看出:

   0x0000000100000e30 <+0>: push   %rbp
   0x0000000100000e31 <+1>: mov    %rsp,%rbp
   0x0000000100000e34 <+4>: sub    $0x10,%rsp //rsp=rsp-16
   0x0000000100000e38 <+8>: lea    -0xc(%rbp),%rax
=> 0x0000000100000e3c <+12>:    mov    %rax,%rdi //obj的地址
   0x0000000100000e3f <+15>:    callq  0x100000ebe
   0x0000000100000e44 <+20>:    lea    -0xc(%rbp),%rax
   0x0000000100000e48 <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000100000e4c <+28>:    mov    -0x8(%rbp),%rax
   0x0000000100000e50 <+32>:    mov    %rax,%rdi //obj的地址
   0x0000000100000e53 <+35>:    callq  0x100000ebe

(gdb) si
base::show (this=0x0) at struct.cpp:7

从上面反汇编出来的结果看出,没有合成构造函数,所以不会调用,这里并不需要编译器合成一个;再之,成员函数show没有用到类实例中的成员变量,所以这里的thisnull;这里都到某一跳转callq 0x100000ebe;成员函数show 被编译器name mangling成__ZN4base4showEv;所以通过对象或指针调用非表态成员函数(no virtual),最终都会被转化成类似:__ZN4base4showEv(&obj),传进对象地址,所以这里的性能其实都是一样的。

带虚函数的单重继承:

  1 #include 
  2 using std::cout;
  3 using std::endl;
  4 
  5 class base {
  6 public:
  7     virtual void show() { cout << "base" << endl;}
  8 private:
  9     int mvalue;
 10 };
 11 
 12 class derived : public base {
 13 public:
 14     void show() {
 15         cout << "derived" << endl;
 16     }
 17 };
 18 
 19 int main() {
 20     base obj;
 21     obj.show();
 22     base* pbase = new derived;//new (std::nothrow) derived; check pbase
 23     pbase->show();
 24     delete pbase;
 25     pbase = nullptr;

这里作为基类的base,并没有声明一个virtual ~base(),这里测试就省略掉,实际使用中还是要加上。
在起始处:

   0x0000000100000cef <+0>: push   %rbp
   0x0000000100000cf0 <+1>: mov    %rsp,%rbp
   0x0000000100000cf3 <+4>: push   %rbx
   0x0000000100000cf4 <+5>: sub    $0x28,%rsp
=> 0x0000000100000cf8 <+9>: lea    -0x30(%rbp),%rax
   0x0000000100000cfc <+13>:    mov    %rax,%rdi
   0x0000000100000cff <+16>:    callq  0x100000dac

因为base有虚函数,所以会对base合成一个默认构造函数,此时并不初始化mvalue值(内建类型的整型一般是0),只是设置base实例的vptr:$2 = {_vptr.base = 0x0, mvalue = 0}

Dump of assembler code for function base::base():
   0x0000000100000ca4 <+0>: push   %rbp
   0x0000000100000ca5 <+1>: mov    %rsp,%rbp
   0x0000000100000ca8 <+4>: mov    %rdi,-0x8(%rbp)
   0x0000000100000cac <+8>: mov    0x365(%rip),%rax        # 0x100001018
   0x0000000100000cb3 <+15>:    lea    0x10(%rax),%rax
   0x0000000100000cb7 <+19>:    mov    -0x8(%rbp),%rdx
=> 0x0000000100000cbb <+23>:    mov    %rax,(%rdx)
   0x0000000100000cbe <+26>:    nop
   0x0000000100000cbf <+27>:    pop    %rbp
   0x0000000100000cc0 <+28>:    retq   
End of assembler dump.

最后构造函数返回时:

(gdb) p *this
$21 = (base) {_vptr.base = 0x100001060 , mvalue = 0}
(gdb) p /a *(void**)0x100001060@1
$22 = {0x100000c12 }

接着调用obj.show:

   0x0000000100000d04 <+21>:    lea    -0x30(%rbp),%rax
   0x0000000100000d08 <+25>:    mov    %rax,%rdi
   0x0000000100000d0b <+28>:    callq  0x100000da6

接着new derived分配内存:

   0x0000000100000d10 <+33>:    mov    $0x10,%edi
   0x0000000100000d15 <+38>:    callq  0x100000dd0
   0x0000000100000d1a <+43>:    mov    %rax,%rbx
=> 0x0000000100000d1d <+46>:    mov    %rbx,%rdi

此时:

0x0000000100000d1d  22      base* pbase = new derived;
(gdb) p pbase
$26 = (base *) 0x0

这里new derived分三步走,分配内存,进行构造,再设置pbase,其他平台可能在后两步顺序对调下:

Dump of assembler code for function derived::derived():
   0x0000000100000cc2 <+0>: push   %rbp
   0x0000000100000cc3 <+1>: mov    %rsp,%rbp
   0x0000000100000cc6 <+4>: sub    $0x10,%rsp
   0x0000000100000cca <+8>: mov    %rdi,-0x8(%rbp)
   0x0000000100000cce <+12>:    mov    -0x8(%rbp),%rax
   0x0000000100000cd2 <+16>:    mov    %rax,%rdi
=> 0x0000000100000cd5 <+19>:    callq  0x100000db2
   0x0000000100000cda <+24>:    mov    0x33f(%rip),%rax        # 0x100001020
   0x0000000100000ce1 <+31>:    lea    0x10(%rax),%rax
   0x0000000100000ce5 <+35>:    mov    -0x8(%rbp),%rdx
   0x0000000100000ce9 <+39>:    mov    %rax,(%rdx)
   0x0000000100000cec <+42>:    nop
   0x0000000100000ced <+43>:    leaveq 
   0x0000000100000cee <+44>:    retq   
End of assembler dump.

Dump of assembler code for function base::base():
=> 0x0000000100000c86 <+0>: push   %rbp
   0x0000000100000c87 <+1>: mov    %rsp,%rbp
   0x0000000100000c8a <+4>: mov    %rdi,-0x8(%rbp)
   0x0000000100000c8e <+8>: mov    0x383(%rip),%rax        # 0x100001018
   0x0000000100000c95 <+15>:    lea    0x10(%rax),%rax
   0x0000000100000c99 <+19>:    mov    -0x8(%rbp),%rdx
   0x0000000100000c9d <+23>:    mov    %rax,(%rdx)
   0x0000000100000ca0 <+26>:    nop
   0x0000000100000ca1 <+27>:    pop    %rbp
   0x0000000100000ca2 <+28>:    retq   
End of assembler dump.

(gdb) p *this
$3 = {_vptr.base = 0x100001060 , mvalue = 0}

当执行完base的构造函数时,回到derived的构造函数那边,会重新设置派生类derived自己的虚表:

$5 = { = {_vptr.base = 0x100001048 , mvalue = 0}, }
(gdb) p pbase //调用derived 构造函数前
$7 = (base *) 0x0
(gdb) i r rbx
rbx            0x100600440         4301259840

(gdb) p pbase//调用derived 构造函数后
$8 = (base *) 0x100600440

最后几行中:

   0x0000000100000d25 <+54>:    mov    %rbx,-0x18(%rbp)
   0x0000000100000d29 <+58>:    mov    -0x18(%rbp),%rax//pbase地址
   0x0000000100000d2d <+62>:    mov    (%rax),%rax//虚表地址
   0x0000000100000d30 <+65>:    mov    (%rax),%rdx//show虚函数地址
=> 0x0000000100000d33 <+68>:    mov    -0x18(%rbp),%rax
   0x0000000100000d37 <+72>:    mov    %rax,%rdi//准备this参数
   0x0000000100000d3a <+75>:    callq  *%rdx//调用show虚函数
(gdb) p /a *(void**)0x100001060@1
$15 = {0x100000c12 }
(gdb) p /a *(void**)0x100001048@1
$16 = {0x100000c4c }

所以通过上面的调试,可以看出通过对象调用的虚函数,并不会引发虚机制,和调用普通函数一样__ZN4base4showEv(&obj);而通过指向派生类的基类指针,则会转化成类似:(* pbase->vptr[1])(pbase),所以从汇编代码上看,是多了一些指令。

如果在base的show下面再加个hello的虚函数,并且derived类重写那反汇编后面一段:

   0x0000000100000cb9 <+62>:    mov    (%rax),%rax
=> 0x0000000100000cbc <+65>:    add    $0x8,%rax//hello的偏移量
   0x0000000100000cc0 <+69>:    mov    (%rax),%rdx
   0x0000000100000cc3 <+72>:    mov    -0x18(%rbp),%rax
   0x0000000100000cc7 <+76>:    mov    %rax,%rdi //hello函数地址
   0x0000000100000cca <+79>:    callq  *%rdx
$2 = {_vptr.base = 0x100001048 , mvalue = 0}
(gdb) p /a *(void**)0x100001048@1
$3 = {0x100000b9e }
(gdb) p /a *(void**)0x100001048@2
$4 = {0x100000b9e , 0x100000bd8 }

所以在获取到hello函数在虚表中的位置时,是需要调整位置add $0x8,%rax,类似(* pbase->vptr[2])(pbase);因为base是没有声明并定义virtual ~base所以没在虚表中,当然虚表中也包含其他执行期需要的信息,比如type_info for base等。

多重继承下的virtual functions问题,因为这里涉及到指针的调整,所以当使用第一个base class指针时和第二及后面的base class指针时,是有些区别,后者需要对this指针作一定的偏移量,这里写上virtual析构可以查看虚表中的内容:

  5 class base1 {
  6 public:
  7     virtual void show() {}
  8     virtual ~base1() {}
  9 private:
 10     int mvalue;
 11 };
 12 
 13 class base2 {
 14 public:
 15     virtual void show() {}
 16     virtual ~base2() {}
 17 private:
 18     int mvalue;
 19 };
 20 
 21 class derived : public base1, public base2 {
 22 public:
 23     void show() {}
 24     virtual ~derived() {}
 25 };
 26 
 27 int main() {
 28     base1* pbase1 = new derived;
 29     pbase1->show();
 30     delete pbase1;
 31     pbase1 = nullptr;
 32 
 33     base2* pbase2 = new derived;
 34     pbase2->show();
 35     delete pbase2;
 36     pbase2 = nullptr;
 37     return 0;
 38 }

因为之前分析过相关的多重继承,这里只是重点关注下class derived的虚表内容,和main中指针的调整:

(gdb) p *(class derived*)pbase1
$25 = { = {_vptr.base1 = 0x100001040 , mvalue = 0},  = {
    _vptr.base2 = 0x100001068 , mvalue = 0}, }
(gdb) p /a *(void**)0x100001040@2
$26 = {0x1000008ee , 0x100000902 }
(gdb) p /a *(void**)0x100001068@2
$27 = {0x1000008f9 <_ZThn16_N7derived4showEv>, 0x100000952 <_ZThn16_N7derivedD1Ev>}

当执行完base1* pbase1 = new derived后的derived对象实例的内容为上面的情况,但查看pbase1只能看到class base1的那一部分:

(gdb) p pbase1
$28 = (base1 *) 0x100700020
(gdb) p *pbase1
$29 = {_vptr.base1 = 0x100001040 , mvalue = 0}

因为pbase1声明时的静态类型为class base1,告诉编译器他的寻址范围是sizeof(class base1)这么大;接着调用derived::show

=> 0x0000000100000a55 <+57>:    test   %rax,%rax
   0x0000000100000a58 <+60>:    je     0x100000a69 
   0x0000000100000a5a <+62>:    mov    (%rax),%rdx
   0x0000000100000a5d <+65>:    add    $0x10,%rdx
   0x0000000100000a61 <+69>:    mov    (%rdx),%rdx
   0x0000000100000a64 <+72>:    mov    %rax,%rdi
   0x0000000100000a67 <+75>:    callq  *%rdx
   0x0000000100000a69 <+77>:    movq   $0x0,-0x18(%rbp)

上面几行对应delete pbase1; pbase1 = nullptr;,会先判断pbase1是否为空,否则取class derived的析构函数,并准备this参数,调用~derived()

Dump of assembler code for function derived::~derived():
=> 0x000000010000095c <+0>: push   %rbp
   0x000000010000095d <+1>: mov    %rsp,%rbp
   0x0000000100000960 <+4>: sub    $0x10,%rsp
   0x0000000100000964 <+8>: mov    %rdi,-0x8(%rbp)
   0x0000000100000968 <+12>:    mov    -0x8(%rbp),%rax
   0x000000010000096c <+16>:    mov    %rax,%rdi
   0x000000010000096f <+19>:    callq  0x100000b6e
   0x0000000100000974 <+24>:    mov    -0x8(%rbp),%rax
   0x0000000100000978 <+28>:    mov    $0x20,%esi
   0x000000010000097d <+33>:    mov    %rax,%rdi
   0x0000000100000980 <+36>:    callq  0x100000b7a
   0x0000000100000985 <+41>:    leaveq 
   0x0000000100000986 <+42>:    retq   
End of assembler dump.

这里发现一个现像,如果pbase1的内容为0x100700020,那么delete pbase1后不执行pbase1 = nullptr,然后再执行base2* pbase2 = new derived时,new derived的地址也是0x100700020,如果这里手误delete pbase1,那么会出现问题,比如破坏pbase2内存,double delete的情况。

   0x0000000100000a81 <+101>:   callq  0x100000b62
   0x0000000100000a86 <+106>:   test   %rbx,%rbx
   0x0000000100000a89 <+109>:   je     0x100000a91 
   0x0000000100000a8b <+111>:   lea    0x10(%rbx),%rax
   0x0000000100000a8f <+115>:   jmp    0x100000a96 
   0x0000000100000a91 <+117>:   mov    $0x0,%eax
=> 0x0000000100000a96 <+122>:   mov    %rax,-0x20(%rbp)
   0x0000000100000a9a <+126>:   mov    -0x20(%rbp),%rax
   0x0000000100000a9e <+130>:   mov    (%rax),%rax
   0x0000000100000aa1 <+133>:   mov    (%rax),%rdx
   0x0000000100000aa4 <+136>:   mov    -0x20(%rbp),%rax
   0x0000000100000aa8 <+140>:   mov    %rax,%rdi
   0x0000000100000aab <+143>:   callq  *%rdx
(gdb) p pbase2
$35 = (base2 *) 0x100700030
(gdb) p *pbase2
$36 = {_vptr.base2 = 0x100001068 , mvalue = 0}

上面是base2* pbase2 = new derived的汇编,这里new一段内存后,会对返回的地址进行判断,不为0则会加上sizeof(class base1)的偏移量作为pbase2内容;接着准备参数调用虚函数。

这句base2* pbase2 = new derived,差不多会编译成这样的语句:

derived* temp = new derived;
base2* pbase2 = temp ? temp + sizeof(base1) : 0;

最后delete pbase2时需要调整指针位置:

   0x0000000100000aad <+145>:   mov    -0x20(%rbp),%rax
   0x0000000100000ab1 <+149>:   test   %rax,%rax
   0x0000000100000ab4 <+152>:   je     0x100000ac5 
   0x0000000100000ab6 <+154>:   mov    (%rax),%rdx
   0x0000000100000ab9 <+157>:   add    $0x10,%rdx
   0x0000000100000abd <+161>:   mov    (%rdx),%rdx
   0x0000000100000ac0 <+164>:   mov    %rax,%rdi
   0x0000000100000ac3 <+167>:   callq  *%rdx

以上的程序只是测试使用,实际工程项目中的代码要更严谨些,有良好的编码习惯,比如作为base class的要带个virtual destructor,delete后要置null,对于new的使用要加std::nothrow并判断空,并作后续处理等。

最后关于虚拟继承这个,其实算是比较复杂的,之前经历中也很少见到此类的使用场景,有兴趣自行参考书《深度探索C++对象模型》。

你可能感兴趣的:(C++之Function语意学(virtual))