non-virtual thunk for Virtual Function in multiple inheritance

转载自

http://thomas-sanchez.net/computer-sciences/2011/08/15/what-every-c-programmer-should-know-the-hard-part/

What every C++ programmer should know, The hard part

Previously, I explained how C++ does to handle the classes and inheritance between them. But, I did not cover how the virtual is handled.

It adds a lot of complexity, C++ is compiled and when a binary is linked against a library they have to speak the same language: they have to share the same ABI. The C++ creators had to find a way to give along the program lifetime metadata about the manipulated classes.

They chose the Virtual Tables.

The Virtual Table

When a C++ program is compiled, the binary embedded some information about the manipulated classes by the program. When a class inherits from an interface, the actual implementation of the method should always be accessible. The Virtual Table (VTable) are generated during the compilation process,they can be seen as array of method pointers.

Let’s take an example:

01 #include
02  
03 struct Interface
04 {
05         Interface() : i(0x424242) {}
06         virtual void test_method() = 0;
07         virtual ~Interface(){}
08         int i;
09 };
10  
11 struct Daughter : public Interface
12 {
13         void test_method()
14         {
15             std::cout << "This is a call to the method" << std::endl;
16             std::cout << "This: " << this << std::endl;
17         }
18 };
19  
20 int main()
21 {
22     Daughter* d = new Daughter;
23     Interface* i = d;
24  
25     i->test_method();
26  
27     std::cout << sizeof(Daughter) << std::endl;
28     std::cout << *((void**)i) << std::endl;
29     std::cout << ((void**)i)[1] << std::endl;
30 }

I recall that all the test have been done on a Linux 64bits.

The size of a Daughter instance is not 8 as we could expect but 16bytes. The memory dump shows that the first field of the class is not the value of i but a strange value and our field come next to it. Our ‘strange’ value is actually a pointer, in fact it is a pointer inside our binary.

nm -C test | grep 400d
0000000000400de0 V vtable for Daughter

I will explain after why there is a difference of some bytes between the two. So this pointer represent the location of the Daughter VTable. We can now check its content.

As I said, a VTable is a kind of array of method pointer.

To get a pointer on it, it is simply:

size_t** vtable = *(size_t***)i;
std::cout << vtable[0] << std::endl;

And if we check the new address printed on the output we can see that it is actually our pointer on method.

nm -C test | grep -E 400c6a
0000000000400c6a W Daughter::test_method()

We can play a little bite more to test deeper:

typedef void (*VtablePtr) (Daughter*);
VtablePtr ptr = (VtablePtr)vtable[0];
ptr(d);

The VTable are determined along the compilation. When the compiler see a virtual method in a class in start to construct a VTable associated to this class. When this class is inherited by another one, it will automatically duplicate and receive a pointer on a VTable for the current parsed class. Each entry of the VTable will be filled when the actual definition of the method is encountered. It is always the last definition which is kept.

The index of the method in the VTable is the same as the apparition order in the source file, that's why it's very important that all the part of a project is compiled with consistent header. It is always embarrassing when the bad method is called in a project without knowing why…

Here is the complete code:

01 #include
02  
03 struct Interface
04 {
05         Interface() : i(0x424242) {}
06         virtual void test_method() = 0;
07         virtual ~Interface(){}
08         int i;
09 };
10  
11 struct Daughter : public Interface
12 {
13         void test_method()
14         {
15             std::cout << "This is a call to the method" << std::endl;
16             std::cout << "This: " << this << std::endl;
17         }
18 };
19  
20 int main()
21 {
22     Daughter* d = new Daughter;
23     Interface* i = d;
24  
25     i->test_method();
26  
27     std::cout << sizeof(Daughter) << std::endl;
28     std::cout << *((void**)i) << std::endl;
29     std::cout << ((void**)i)[1] << std::endl;
30  
31     size_t** vtable = *(size_t***)i;
32     std::cout << vtable[0] << std::endl;
33  
34     typedef void (*VtablePtr) (Daughter*);
35     VtablePtr ptr = (VtablePtr)vtable[0];
36     ptr(d);
37  
38 }

In conclusion, when virtual appears an instance should be seen like this:

VPTR
Base1
Daughter

And the instance is heavier of sizeof(void*)*nb_of_vptr bytes.

Virtual in multiple inheritance

As usual, we are going to start with a trivial code:

01 #include
02  
03 struct Mother
04 {
05         virtual void mother()=0;
06         virtual ~Mother() {}
07         int i;
08 };
09  
10 struct Father
11 {
12         virtual void father()=0;
13         virtual ~Father() {}
14         int j;
15 };
16  
17 struct Daughter : public Mother, public Father
18 {
19         void mother()
20         { std::cout << "Mother: " << this << std::endl; }
21  
22         void father()
23         { std::cout << "Father: " << this << std::endl; }
24  
25         int k;
26 };
27  
28 int main()
29 {
30     Daughter* d = new Daughter;
31     Mother* m = d;
32     Father* f = d;
33  
34     std::cout << "Daughter: " << (void*)d << std::endl;
35     std::cout << "Father  : " << (void*)f << std::endl;
36     std::cout << sizeof(*d) << std::endl;
37  
38     std::cout << *((void**)d) << std::endl;
39     std::cout << *((void**)f) << std::endl;
40 }

As you can note, the two table used are different. When the types are manipulated, this is not always (never?) the concrete type used but the abstract one. With multiple inheritance it can be a Mother or a Father instances, so when a Father is used and the actual implementation is in Daughter, the method should be accessible. That's why there is another VTable pointer.

However, when an instance of type Daughter is used through a Father pointer, Daughtermethod cannot be called directly. Indeed, the instance pointer needs to be adjusted to match a Daughter instance. To solve this problem, there are the Thunk function.

If we print the first entry of the VTable and if we disassemble the code a this location, we have this:

1 0000000000400cf4 <non-virtual thunk to Daughter::father()>:
2   400cf4:       48 83 ef 10             sub    $0x10,%rdi
3   400cf8:       eb 00                   jmp    400cfa

These two instructions perform pointer adjustment by subtracting the size of the Motherclass (and then match the Daughter instance). Therefore, if you have multiple inheritance with method you can add some indirection very easily:

  • Get the VTable;
  • Move to the wanted method (apply an offset on the VTable pointer, for example 8 to get the second method);
  • Call the method;
  • Adjust the this pointer;
  • Jump to the actual method definition.

 

Method Pointer

Yes, method pointer have a cost. Contrary to the C where function pointers have no overhead, the C++ had to deal with the difference between:

  • From which instance the method is accessed;
  • Is the method virtual?

The first point require a pointer adjustment. The second point, well, lot of things.

Firstly, the size of a method pointer is 16 bytes (against 8 in C). The method pointer is in three parts:

  1. Offset
  2. Address/index
  3. virtual?

The first one is on 8 bytes, the second on 8 bytes also. The third part is on one byte and is merged with the second one. If the last byte is set then the second part should be seen as an index (the index of the method in the VTable), otherwise it is the address of the method.

Therefore, calling a method pointer require ~ 20 asm instructions (in the worst case):

  1. Get the offset to apply on the instance pointer;
  2. Apply it;
  3. Check if we call a virtual member function;
  4. If yes, subtract 1;
  5. Get the VTable;
  6. Get the method address;
  7. Call the method.

Conclusion

In a next article I'll cover the VTable prefix and the virtual inheritance but there are less common in C++ code. In these two articles I tried to put some light on C++'s internal mechanism. The C++ is a fast language but it can become much less efficient because of complex class relation. I don't say: "don't use virtual and method pointer", I think programmers should be aware of these counterparts.

I think the readability is more important than performances. Yes, you can have a lot of overhead in C++ but it will still be more efficient than a lot of languages. But sometimes you can avoid virtualization. For example, the common ways for a beginner (and sometimes less beginners C++ programmers) to do an abstraction is to define an interface and for the different implementation, define a new class which inherits from this interface.

Sometimes, ok it is the right thing to do, sometimes not. If you are asked to write an abstraction to the filesystem on Linux and Windows if you follow the described way, you'll write an iFS interface, a WindowsFS and a LinuxFS. It'll work well but you can do even better: You can write a WindowsFS and LinuxFS and define a new type FS according to the platform where the code is compiled, on Linux we could imagine something like this:

typedef LinuxFS FS;

With a code like this, you'll avoid some overheard due to the interface. It works well on abstraction of platform specific features but it does not work on data abstraction and you'll need an interface.

Here are some resources:

  • CRTP
  • Wikipedia

你可能感兴趣的:(语言特性)