The Pimpl technique is a useful way to minimize coupling, and separate interface and implementation. Here's a way to simplify Pimpl deployment.
By Vladimir Batov
January 25, 2008
URL:http://drdobbs.com/cpp/205918714
The Pimpl idiom is a simple yet robust technique to minimize coupling via the separation of interface and implementation and then implementation hiding. Successfully popularized by Sutter ([1,2]) under the cheeky Pimpl name (as for "pointer to implementation") the technique has been known long before under various names (the Cheshire Cat in [3], the Handle/Body in [4], the Bridge in [5]) and conceptually deployed in C and C++ as far back as at least early eighties ([6]).
In the domain of industrial software development where portability, implementation hiding and minimization of compilation dependencies come to the fore, the Pimpl technique can help great deal achieving those goals. Still, it does not seem to be very widely deployed. Sometimes that is due to efficiency considerations (valid or misguided) and probably more often due to additional implementation scaffolding and coding overhead associated with the technique.
I have been using the technique quite extensively lately and I found myself increasingly frustrated having to waste time cutting and pasting the same Pimpl-enabling set. I Googled around to see if there were easier deployment alternatives available. Not surprisingly, I found numerous Pimpl-related discussions and quite a few attempts to generalize the Pimpl idiom (see [7,8]). However, I was not able to find anything that I thought would fit my objective.
The conventional Pimpl deployment packs quite a punch by tackling two issues at once. The first is the separation of interface and implementation. The second is implementation hiding. The former is the obvious prerequisite for the latter. However, it is the latter that brings in all the benefits like portability and minimization of compilation dependencies.
All the implementations that I looked at managed the separation of interface and implementation quite well. However, the implementation hiding — the main quality and attraction of the Pimpl idiom — was routinely sacrificed in the name of generalization.
In addition, I was looking for a simple and convenient yet flexible and generic deployment technique. The conventional Pimpl deployment technique is uninspiring but already fairly straightforward. The complexity of any alternative technique is to be measured against that original and simplifying something so straightforward might be not as simple as it might seem.
In the end I rolled up my sleeves and came up with my own Pimpl generalization technique which (not surprisingly) I find quite satisfying. It's seemingly complete and broadly applicable, yet minimal, simple and pleasant to use. But I am admittedly biased. You decide for yourself. Read on. Try the code (located here). Let me know if there is anything that I missed or got wrong that could be improved.
Let's say we need to write a Pimpl-based Book
class with pointer semantics. That is:
a) We want to separate interface and implementation and to hide implementation through the deployment of the Pimpl idiom.
b) Our Book
class needs to have pointer semantics, i.e. with regard to the underlying data it will behave in the smart-pointer/proxy fashion by allowing shared access to the underlying implementation data.
In that setting the Book
class public declaration is quite likely to look like the following:
class Book { public: Book(string const& title, string const& author); string const& title() const; string const& author() const; bool operator==(Book const& that) const { return impl_ == that.impl_; } bool operator!=(Book const& that) const { return !operator==(that); } operator bool() const { return impl_; } private: struct Implementation; boost::shared_ptr<Implementation> impl_; };
Thanks to boost::shared_ptr
, applying the Pimpl idiom is fairly straightforward as boost::shared_ptr
takes care of much of the scaffolding hassle. As a result, the auto-generated destructor, copy constructor and assignment operator suffice and writing the comparison operators is child's play. What more could we wish for? For one thing, lumping the application interface with the infrastructure scaffolding is messy. Moreover, in our (admittedly simple) Book
class more than half of the code is the Pimpl-related scaffolding. For one class in isolation that might not look like that big a deal. On a larger scale, analyzing and maintaining twice as much code, mentally separating the application interface from the infrastructure scaffolding, and making sure nothing is forgotten, misused, or misplaced is a tiring exercise and not exactly fun. The following, therefore, seems like a worthwhile improvement:
struct Book : public pimpl<Book>::pointer_semantics { Book(string const& title, string const& author); string const& title() const; string const& author() const; };
It is considerably shorter, application-focused and reasonably self-explanatory. It is lean and mean, consisting of nothing but pure application-specific public interface. It does not even need to be a class as there is nothing to restrict access to.
Probably due to the specificity of my task (and/or my programming style) I have been using Pimpl-based classes with pointer semantics (as in the example above) almost exclusively. However, it is certainly not always the right solution. Let's say we needed a Pimpl-based Book
class but with value semantics instead. That is, we still want to have interface and implementation properly separated to hide implementation, but our Book
class needs to be a class with every Book
instance having and managing its own internal data (rather than sharing).
If boost::shared_ptr
is an indisputable favorite for Pimpls with pointer semantics, its deployment for Pimpls with value semantics is certain to cause a debate about efficiency, associated overhead, etc. However, writing and getting right a "raw" Pimpl-based class is certainly more involved and possibly more challenging with all things considered. The corresponding declaration might look as follows:
class Book { public: Book(string const& title, string const& author); string const& title() const; string const& author() const; bool operator==(Book const&) const; bool operator!=(Book const&) const; Book(Book const&); Book& operator=(Book const&); ~Book(); private: struct Implementation; Implementation* impl_; };
Again, the interface can be improved upon and shrunk to:
struct Book : public pimpl<Book>::value_semantics { Book(string const& title, string const& author); string const& title() const; string const& author() const; bool operator==(Book const&) const; };
It is still almost three times shorter and it consists of pure application-related public interface. Clean, minimal, elegant.
Both presented pimpl-based declarations (with pointer and value semantics) look almost identical and internal implementations (as we'll see later) are as close. A notable difference is that for value-semantics classes the comparison operators are not freebies as with pointer-semantics classes. Well, the comparison operators are never freebies (they are never auto-generated). However, due to the specificity of classes with pointer semantics those comparison operators can be reduced to pointer comparisons and generalized. Clearly, that's not true with value-semantics classes. If such a class needs to be comparable, we have to write those comparison operators ourselves. That is, for value-semantics classes the comparison operators become part of the user-provided interface. Still, pimpl<>
tries to help and adds
bool operator!=(T const& that) const
when
bool operator==(T const& that) const
is supplied.
So far our public for-all-to-see interface looks clean, minimal, and application-related. Our Book
class users and those who are to maintain the code later are likely to appreciate that. Let’s have a look at the implementation.
Safely tucked away, the implementation is hidden. Not merely separated from the interface into another class or a header file, but truly internal. It is all ours to implement as we wish, to optimize as we like, to modify as often as we need. Still, the deployment pattern is easy to remember and fairly straightforward to follow. Something like the following will be in some for-our-eyes-only book_implementation.cpp file:
template<> struct pimpl<Book>::implementation { implementation( string const& the_title, string const& the_author) : title(the_title), author(the_author) {} string title; string author; int price; }; Book::Book( string const& title, string const& author) : base(title, author) {} string const& Book::author() const { implementation const& impl = **this; return impl.author; } void Book::set_price(int new_price) { (*this)->price = new_price; }
In addition, if comparison functionality is required, as mentioned earlier, a class with value semantics will have to implement something like the following:
bool Book::operator==(Book const& that) const { implementation const& self = **this; return self.title = that.title && self.author == that.author; }
Notably, pimpl<Book>::implementation
is again a struct
rather than a class. As long as the declaration is local to one file, there is generally little value in making it a class (your mileage may vary).
Another liberating and unifying feature is that we do not need to follow (and fight over) a particular naming convention to draw attention to member variables (like the trailing underscore, the 'm_' prefix or the myriad others). Member variables are accessed and clearly identified as impl.title
or (*this)->title
or something of that sort.
An important design-related point to note is that the external Book
class describes and implements the behavior, while the internal pimpl<Book>::implementation
is all about data. I consider that clean separation of data and behavior to be a good code-management technique and good programming style. Data and behavior are different views of a system. They serve different purposes and are easier managed when kept separate. At this point OO fans should not be getting up in arms about that perceived attempt to pull data and behavior apart. Indeed, the association of data with behavior is the cornerstone of the OO programming paradigm. However, in all (that I know of) languages that association is done in the most direct and economical way — by tying data and the behavior together in a class. Straightforward and good for many applications, that kind of data-behavior association is not exactly ideal for implementation hiding purposes. The Pimpl idiom creates data-behavior association in a different way that better suits our implementation-hiding purpose.
Book::null()
Most often Pimpl implementations ultimately boil down to an opaque pointer to the internal implementation data. That data is allocated on the memory heap. That heap-allocated data has to be managed. The family of smart-pointer classes (like std::auto_ptr
, boost::shared_ptr
and the like) take good care of objects after they are created. Our technique takes it one step further by fully automating memory management with better encapsulated internal-data management and less room for user error. For our Book
class instead of the more conventional:
Book::Book(string const& title, string const& author) : base(new pimpl<Book>::implementation(title, author)) {}
we simply write:
Book::Book(string const& title, string const& author) : base(title, author) {}
All arguments passed to the base will be diligently forwarded to the matching pimpl<Book>::implementation
constructor or fail to compile if a suitable constructor is not found. The base is an actual ready-to-go convenience typedef
to simplify references to the base class. That forwarding mechanism works for the constructor with no parameters as well. That is,
Book::Book() : base() {}
or the same but not as explicit
Book::Book() {}
will try to call pimpl<Book>::implementation::implementation()
and fail if there is no such.
Here it distinctly differs from the conventional approach (deployed by the smart-pointer family) where an implementation object is created manually and explicitly and then again manually associated with the interface object. The pimpl's approach demonstrates a considerably stronger (and automatically managed) association between the public pimpl-derived class (the interface) and its internal implementation. Hence, the default behavior is that there is always an implementation data behind every interface object. To override this default behavior we might write something like:
Book::Book() : base(null()) { // an invalid Book object is created // that does not have data behind it } void Book::do_something() { if (!*this) { // implementation is created only when needed. implementation* impl = new implementation(...); this->reset(impl); } // do actual processing ... }
What happens here is that we explicitly (via null()
) instruct the underlying pimpl base to be created empty/invalid (like the NULL pointer or an empty boost::shared_ptr()
). Later we create an implementation object explicitly and assign the base to manage it. That technique is useful for lazy instantiation optimization (as in the example above) or to support dynamic polymorphism that is discussed later.
Above we used null()
to create an invalid Book
object with no internal data:
Book::Book() : base(null()) {}
We might use such an invalid Book
object to indicate a no-book condition in the same fashion as the NULL pointer is used:
Book find_book() { ... // found nothing, return an invalid Book return Book(); } ... Book book = find_book(); if (!book) report book-not-found;
Well, there is no need to write code constructing such an invalid object. All pimpl-based classes already have this — that same mentioned null()
. Fully qualified for our Book example, it is Book::null()
. Consequently, the code above is most likely to look as follows:
Book find_book() { ... // found nothing, return an invalid Book return Book::null(); }
Pimpl (especially its variation with pointer semantics) might well be classified as yet another deployment of the smart-pointer idiom. However, the similarity with boost::shared_ptr and the like does not go far. Pimpl's primary goal is implementation hiding. For Pimpl the smart-pointer behavior is secondary and somewhat incidental rather than the primary design objective (as for boost::shared_ptr
, std::auto_ptr
, etc.). Due to different design goal, Pimpl possesses far stronger association (and deliberate coupling) between the external interface and internal implementation classes. More so, it does not provide the dereferencing functionality (something expected from purpose-built smart pointers). Whatever there might be in the internal implementation of a Pimpl-based class, it is unreasonable to expect (and incorrect to provide) public access to that implementation via operator->()
. In fact, it is outright impossible for a properly implemented Pimpl-based class. After all, the Pimpl idiom is about implementation hiding and it is not called hiding for nothing.
The application of the Pimpl idiom to polymorphic class hierarchies is well described in the "Bridge" section of [5]. In a nutshell, as Pimpl splits one class into two distinct classes (interface and implementation), the same goes for hierarchies of classes. With the Pimpl idiom applied a class hierarchy is split into two class hierarchies, with one hierarchy for interfaces and the other separate hierarchy for implementations. For example,
struct Widget : public pimpl<Widget>::pointer_semantics { Widget(parameters); virtual ~Widget(); ... }; struct Button : public Widget { Button(parameters); ... }; struct PushButton : public Button { PushButton(parameters); ... };
And the implementation hierarchy might be looking as follows:
typedef pimpl<Widget>::implementation WidgetImpl; template<> struct pimpl<Widget>::implementation { implementation(parameters) {...} virtual ~implementation() {...} ... }; struct ButtonImpl : public WidgetImpl { ButtonImpl(parameters) : WidgetImpl(parameters) {...} ... }; struct PushButtonImpl : public ButtonImpl { PushButtonImpl(parameters) : ButtonImpl(parameters) {...} ... };
So far, building two separate &mdash: interface and implementation — class hierarchies looks nothing out of the ordinary. However, it gets more interesting when we need to establish correct interface-implementation associations. The standard behavior is that Pimpl-based infrastructure automatically creates those associations between a Foo
interface class and a pimpl<Foo>::implementation
implementation class. That works well for the Widget
class in our example above, as an instance of pimpl<Widget>::implementation
is automatically created and associated with an instance of the Widget
interface class. That is what is needed. Therefore, Widget
constructors still look familiar:
Widget::Widget(parameters) : base(parameters)
{ ...
}
For the derived classes, though, the situation gets somewhat more involved as doing something like:
Button::Button(parameters) : Widget(parameters) { ... }
will result in pimpl<Widget>::implementation
internally created and associated with an instance of Button
when we actually needed ButtonImpl
. Given that the automatically created interface-implementation association is not good for run-time polymorphic classes, we have to manage those associations themselves:
Button::Button(parameters) : Widget(null<Widget>()) { reset(new ButtonImpl(parameteres)); ... } PushButton::PushButton(parameters) : Button(null<Button>()) { reset(new PushButtonImpl(parameteres)); ... }
Above, we
null<>();
new;
reset()
. Having been using the described Pimpl quite extensively lately, I could not help noticing that deploying other programming techniques was easy with the Pimpl. We have already talked about lazy data instantiation. Other examples might include deploying the Singleton:
// declaration struct Foo : public pimpl<Foo>::pointer_semantics { // The public "constructor". // Does not create new data but returns // a reference to the singleton instance. Foo(); private: // Actual constructor. Foo(parameters); }; // implementation Foo::Foo(): base(null()) { static Foo single_foo(parameters); *this = single_foo; }
or managing/accessing a dictionary:
// In the implementation file typedef std::map<string, Book> AllBooks; static AllBooks books; Book::Book(string const& title) : base(null()) { AllBooks::iterator it = books.find(title); // If the title found, return it. // Otherwise, return an invalid book. if (it != books.end()) *this = it->second; }
or easy integration with boost::serialization
and many other applications of the described Pimpl.
Writing a conventional Pimpl-based class is not hard. However, repeating the same scaffolding over and over again is tedious and error-prone. Why do that if we do not have to? The suggested Pimpl generalization technique seems flexible, minimal and elegant, and helpful. It is yet another small gadget in your programming toolbox to make your work fun. Grab it, use it, tell me if there is anything missed and/or wrong and together we will get it even better.
Many thanks to the people on the Boost developers mailing list for their constructive suggestions and especially to Peter Dimov for his incomplete-type management technique and the implementation of boost::impl_ptr
([10]) that I used the ideas for pimpl::impl_ptr
from.
1. Guru of the Week #24. http://www.gotw.ca/gotw/024.htm
2. Herb Sutter. Exceptional C++ (Addison-Wesley, 1999)
3. J. Carolan. Constructing bullet-proof classes. In Proceedings C++ at Work'89 (SIGS Publications, 1989)
4. James O. Coplien. Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992)
5. Eric Gamma et al. Design Patterns (Addison-Wesley,1995)
6. Paul J. Asente & Ralph R. Swick. X Window System Toolkit (Butterworth-Heinemann, 1985)
7. Peter Kümmel. The Loki library. http://loki-lib.sourceforge.net/index.php?n=Idioms.Pimpl
8. Asger Mangaard. http://article.gmane.org/gmane.comp.lib.boost.devel/132547
9. Boost File Vault. http://www.boost-consulting.com/vault/index.php
10. Peter Dimov. The boost::impl_ptr source code. http://tech.groups.yahoo.com/group/boost/files/impl_ptr/