Problem with pimpls

That's "pimpl", not "pimple" - I'm talking about opaque pointers, these beauties help protect your public interfaces from changing implementation. This useful technique has a few drawbacks that aren't so well publicised. In order to prevent others making the same mistakes I have, I thought I'd outline the general use of the pimpl pointer, and some of it's drawbacks:

Here's an example of a first-pass class to encapsulate a user account (I'm making this up on the fly, so bear with me):


class
userAccount

{
public:
// public methods go here
private:
// private data members:
unsigned int accountId;
std::string username;
std::string realName;
};

This will work just fine, but there's a problem. If you need to change the implementation (say you want to store the user's real name in two fields instead of one), unless you're very careful you will end up changing the size and / or the declaration of the class.

Changing the size of the class is a big problem if you're trying to maintain binary compatibility. Changing the declaration of the class is a problem because (some) compilers will now recompile every file that includes your changed header file, even if the changes make no difference to the binary output.

The solution comes in the form of an opaque pointer, or "pointer to implementation" (which is where we get the charming "pimpl" name from). The idea is that the implementation details are put in a separate class that is forward declared in the header file, and fully declared in the cpp file. Your external interface now only contains a single pointer - you can change the size of your implementation class to your hearts content, and you will never change the size or declaration of your external interface. The class above refactored to use a pimpl pointer looks like this:


Header File:

// forward declare implementation class:
class
userCountImpl;


class
userAccount

{
public:
userAccount();
// public methods go here
private:
// private data members:
userAccountImpl *pimpl;
};

Source File:

class
userAccountImpl

{
public:
unsigned int accountId;
std::string username;
std::string realName;
};


userAccount::userAccount()
: pimpl(new userAccountImpl)
{
}


Now any data members can be accesed via the private implementation class. There are several things you can do to extend the example above (using a shard_ptr is a start), but I want to keep things simple for the sake of the example.

Until recently, I took this method for granted and used it as often as I could. As so often happens when I learn something new I rush to use it in every possible situation, including ones where it doesn't make sense. The pimpl idiom has a few problems associated with it, which I will outline here:

  1. Your object's memory footprint is now split into more than one place in memory. This may not be a huge problem for 90% of classes, but consider a small utility class that contains only standard data types:

    class colour
    {
    public:
    // public methods go here
    private:
    unsigned char red;
    unsigned char green;
    unsigned char blue;
    };

    In order to serialise this into a buffer (like a file), you can get away with using a memcpy or similar technique. Since your object's memory footprint is contiguos, copying the entire object into a file is simple (Yes yes, I know: there are many reasons why this isn't a good idea, but let's face it - this happens all the time). Once you start using pimpl pointers it gets a bit more difficult. Since your implementation class is private to the .cpp file, the code to do the copying needs to be in the same cpp file (otherwise it doesn't have access to the definition of the implementation class). This is relatively easy to work around, but the trouble doesn't stop there - consider what you need to do to un-serialise an object from a buffer. You can no longer be cheeky and use a reinterpret-cast like so:
    colour *pCol = (colour*) pBuffer;
    Again - I realise that this isn't the best idea in the world, but when you're programming with constraints sometimes this is the best way to do things.


  2. The default new and delete operators are expensive. I never realized just how expensive they can be. In a recent bout of performance testing on some real-time software I saw the default new operator take 55ms to allocate a block of 4B of memory. That's way too slow for the real-time application i was working on, and may be too slow for other applications as well. What's more, the times get worse the more memory you allocate - so using the pimpl pointer may not be a good idea at all, since it adds the overhead of a new call to each constructor, and a delete to each destructor. ouch!


If you're looking for more info on the pimpl idiom, Sutter wrote a good article on the pimpl pointer, and a more recent article that talks about some of the performance issues associated with pimpl pointers. This technique is worth using - it can save many attacks of "code cheese" in the future, just be careful where you us it, or you may end up with some nasty performance issues you didn't expect!