WNDPROC Thunks

WNDPROC Thunks

Abstract

If you ever program classes for controlling Windows™ windows you will have met with the task of associating windows messages with the correct instance of the class. There are a few alternative ways of doing so. This document outlines one method using self-modifying code which is fast, efficient, reliable, unlikely to interfer with other code (including code from other users of the same window). Most importantly, in my opinion, it’s a technique that largely stays out of your way once it is in place.

WNDPROC Recap

Windows messages are communicated by to a callback function defined by the user which matches the type ::WNDPROC.

::WNDPROC is defined as:

typedef LRESULT (CALLBACK* WNDPROC)(HWND, UINT, WPARAM, LPARAM);

From now on I shall put the global scope resolution operator :: to distinguish those types and functions defined by Windows™ from those created for the sake of this document. (This assumes that you haven’t wrapped your Windows header includes in a namespace).

Examination of ::WNDPROC’s declaration

CALLBACK is a macro defined thus:

#define CALLBACK __stdcall

So the functions matching the ::WNDPROC type must use the __stdcall calling convention. We’ll examine consequences of this later.

I shall now summarise the types used by the function. The declarations I give here are not the same as those found in the headers, rather to ease discussion of them I have done some of the work the compiler does when it sees a macro or typedef and boiled a few levels of indirection into a single typedef. The actual definitions can be easily found in the windows headers.

Types used in ::WNDPROC
Type Conventional Parameter Name Definition* Purpose Size
::LRESULT (n/a) typedef long LRESULT; Return value 32 bits
::HWND hWnd Handle† Handle to window 32 bits
::UINT uMsg typedef unsigned int UINT; Specifies the message 32 bits
::WPARAM wParam typedef unsigned int WPARAM; first message paramenter 32 bits
::LPARAM lParam typedef long LPARAM; Second message parameter 32 bits

Notes

*

These are definitions that are used to match a particular compiler’s choices with various implementation-defined features (most notably the size of the built-in C++ types) with the binary model on which Windows™ runs. You may have different results here, the important thing is that they are currently all of 32 bits in size.

The definition of a handle depends on whether you compile with STRICT defined or not. If you don’t then the definition is:

typedef void* HWND

If you do then it is:

struct HWND__ { int unused; }; typedef HWND__* HWND

As you can probably guess this isn’t really what a window handle points to. The struct HWND__ exists purely to make it impossible to pass a window handle to another type without a cast (more specifically, without a C-style cast or a reinterpret_cast). As such STRICT makes certain mis-uses of window handles (and other handles) more likely to be caught at compile-time.

Use of WNDPROC

Every window has an associated window procedure (a function that matches the declaration of ::WNDPROC) defined in the ::WNDCLASS or ::WNDCLASSEX structure that defines its features. The address of this function can be obtained or changed using ::GetWindowLong or ::SetWindowLong with a value of GWL_WNDPROC for the second parameter.

A WindowProc will essentially be a large switch statement (all right-thinking C++ hackers will shudder at the phrase “large switch statement”), which will determine action based on the value of the uMsg parameter. It may or may not use the values of the wParam and lParam parameters in a variety of ways (the get casted and converted all over the place) in deciding how to fulfill the request. It will often need to use the hWnd member directly or indirectly in doing so.

As such ::WNDPROC is a simple interface which allows one function to act as the gateway between the OS and your code. While it is often wasteful (the wParam and lParam parameters are often unused, or used to pass 16-bit or smaller values), it is easier to implement one function than two hundred. This is especially true since the function can delegate either to another ::WNDPROC (in particular if you are subclassing a window you will delegate to the window’s initial procedure) or the system-provided ::DefWindowProc which carries out default actions for many messages. (In particular such delegation would be the normal for your default: label to catch any messages you didn’t implement, or even messages that didn’t exist when your code was written).

WNDPROC and classes

The natural way of programming a window in an object-orientated language like C++ is to have the ::WNDPROC call a member function of the object that represents the window. This member function would then act on the message, most likely by calling further functions that more naturally represent what the message requests.

The problem with this is how to locate the object. ::WNDPROC is not defined as a member function. It can be called on a global, or namespace-scoped function, or on a static member function. There is no direct way to get from the function to the object.

Our best clue is the hWnd parameter, which gives us the handle of the window in question. This gives us a few options:

  1. Only have one window of that type, and one object. The object can be stored in a global or namespace-scoped variable, or indeed you can completely avoid all object-orientated techniques. This method works really well when it is natural enough for the application, but is simply inapplicable much of the time.
  2. Make use of the extra memory that you are allowed to set aside for custom data when you create a window and then use ::GetWindowLong to obtain a pointer to the object. This is rather slow, more fiddly than one might like, and runs the risk of someone else using the same memory to do something else.
  3. Add a property to the window which is a pointer to the object, and then use ::GetProp to obtain it. This is slower than the ::GetWindowLong method, especially if you take the precaution of using a longish property name to avoid conflicts. (BTW this is probably the most common method used when subclassing with Visual Basic).
  4. Maintain a lookup table of some sort which matches window handles with object pointers. This has the advantage that you can keep this as a private static member of the class, so there is little risk of other code interferring, but it does tend to become slow and is again rather fiddly.

Really we’re going to have some fiddly-ness with this whatever technique we use, we are after all marrying two completely different systems through a pointer to something defined by the partner to that marriage that wasn’t defined by us.

Me, I’d rather get all my fiddly-ness done at once if I can. Preferably early on in the proceedings. And that’s exactly what we are going to do here.

Transubstantiation

Let’s look again at what we have:

::LRESULT CALLBACK WindowProc(::HWND hWnd, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);

Now consider what we want:

class myWindowClass{
    ::HWND m_hWnd;
    /* other code elided */
    ::LRESULT CALLBACK WindowProc(::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);
    /* other code elided */
};

We don’t need the ::HWND parameter in the second case, because we have that stored as a member of the class (it would be very unusual not to do this in such classes).

Consider now, how such a C++ member function looks to a piece of C code:

::LRESULT WindowProc(myWindowClass* this, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);

As you probably know the this pointer in C++ is passed as a normal parameter (give or take a few optimisations) but is hidden from us. (If the member function was a const function then this would have been of type const myWindowClass*).

When we remember about the this pointer then the member-function version doesn’t look all that different from the original. We just need to replace one pointer with another. But how do we do this without introducing the same (or maybe even worse) problems that we have in the beginning?

Self-Modifying Code

When I coded in Z80 assember on a spectrum there were a few important differences between the majority of programming I did then and what I do now. One of these is that I am quite happy to be largely ignorant of Pentium assembly. In those days that would have stuck me with a single version of old (non-Visual) Basic. Another is that while I now complain about how slow the 233MHz machine I have at home is and how paltry its 128MB of memory, in those days I had 4MHz to make use of the 128K of memory (of which 80K needed some awkward paging code to gain access to).

A lot of things were different then right enough. And it was some time then before I came across a technique I used to enjoy the sheer hackiness of then — self-modifying code.

Consider the following piece of code:

if (globalBoolean)
    func1(x, y);
else
    func2(x, y);

Most likely none of the above seems terribly foreign to you. You may (should) be suspicious of the global variable, and you might wonder if a function pointer would make the logic cleaner and more extensible.

Now lets say we want to make that piece of code blisteringly fast. Our crappy 4MHz processor comes to that piece of code a lot as it loops through some code that will appreciably affect the responsiveness of the user experience (and a lot of programming on Spectrums was games). Lets also say that globalBoolean is very rarely changed.

One thing we can do with the assembly version of this code is to replace the if statement with a call to func1. Whenever we would alter the value of globalBoolean with the above code we instead change machine code instruction so that it calls another function.

There is now zero overhead in determining which function to call. If we go further and actually copy in a block of code, rather than a call to a block of code, we can increase the speed further.

Yuck! This is a horrible mind-hurting way to code. However there were times when it seemed worth it. It’s also kind of l33t though to have a program re-write itself during its own execution.

This technique was never that common-place then, and there is less call for it now. Games programs will always look to squeeze every cycle, debuggers may need to re-write code while it is debugging it, and buffer-overflow and heap-overflow spl0its of course re-write code in order to compromise the system. As an indicator of how rare this technique now is on Windows™ we can search for FlushInstructionCache on the web (an API call we’ll use, and examine later). Most of the documents returned will either be documentation of the call itself, games or debugger code, or the very technique we are examining here.

Self-Modifying Windows™ Code

If we were to take a technique like this what gains would it give us here?

  1. We could create some executable code on the heap for each window, so we could create a new piece of code for each window. We would then no longer need to find the object for each window, that could be hard-coded into that window’s own piece of code.
  2. It will be something that we can forget about for most of the coding we’ll be doing. We’ll have the object pointer at the start of our function’s working and won’t have to think about it much later.
  3. It should be pretty fast.

To my mind the second of these is the winner. The less we have to think about what’s going on, and the more we can put into one well-tested piece of code, the easier and more reliable our code will be. The speed point isn’t a biggy to my mind, at least not most of the time, like most hackers I’ve learnt to mistrust speed and efficiency as an over-riding concern when programming, though like most hackers I still like my code to be as fast as possible.

Design Considerations

There are a few properties we’ll want our self-modifying code to have:

  1. Relatively small size. Since this code will be created dynamically, and may be created at a time-sensitive point in execution, we will want it to be small.
  2. Consistency. You may not know much ASM. I know next to none. I don’t want to have to re-write this code, but rather to be able to reuse it as is whenever I use this technique.
  3. Depend on C++ code. Again because I don’t know ASM (and even if you know lots you are presumably using C++ for a reason). I want this code to quickly get back to using C++ to do stuff, so that I’m back on familiar ground.

The way we fulfil these three requirements is to have a small piece of code that gives us the object pointer we need, and then lets some C++ code do all the work.

The way we do this is to replace the ::HWND parameter we don’t need with the myWindowClass* parameter we do need, and then jump to the start of a function that expects the parameter list we now have. Because we’ve jumped rather than called the return from that function will be treated as a return from the first function.

That is Windows™ thinks it’s called a function with this signature:

::LRESULT CALLBACK WindowProc(::HWND, ::UINT, ::WPARAM, ::LPARAM);

Our C++ code thinks that it’s function that looks like one of the following was called:

::LRESULT CALLBACK WindowProc(myWindowClass*, ::UINT, ::WPARAM, ::LPARAM);

And our dynamically-created machine code does the necessary work to bridge those two.

Of course the second function isn’t exactly what we want, but it’s close. It’s also a start, in that it is easy to write a static class member function that matches that signature and calls a more appropriate member of the actual object:

::LRESULT CALLBACK WindowProc(myWindowClass* pThis, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam){
    return pThis->InternalWndProc(uMsg, wParam, lParam);
}

That will be enough to be getting on with.

The Machine Code

There is another reason for starting with this slightly more indirect method. I said before that I don’t know ASM. Luckily ATL uses the same technique (see [Rector & Sells]) that we are going to use here. It uses the indirect version, so we’ll begin with copying their ASM and using that.

The Intel version of the assembly is:

mov dword ptr [esp+0x4], pThis
jmp WndProc

If you want to make your code work on Alpha as well then when you’ve read this you should be able to adapt the ATL code in the same way that we are going to do.

How the Machine Code Works

The first instruction copies the constant pThis to the position occupied by the window handle (which will be a known offset, 0x4, from the esp register). The second jumps to the our C++ function.

Note that we use CALLBACK on our window function. This means that the rules for how the parameters are placed on the stack, how the value will be returned, and how the stack will be cleaned up when the function returns are the same for the function Windows™ thinks it’s calling and the function we have written.

Our Thunk Object

The first thing our thunk object will need to do is contain the actual bytes of the machine code. We’ll need to have set the packing to 1 so that there are no alignment gaps. It’ll probably also be a good idea to make sure that we don’t compile this for anything other than Intel:

#if defined(_M_IX86)

#pragma pack(push,1)

class winThunk{
    const DWORD m_mov;              // mov dword ptr [esp+0x4], m_this
    const myWindowClass* m_this;    //
    const BYTE m_jmp;               // jmp WndProc
    const ptrdiff_t m_relproc;      // relative jmp
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

We’ve made all of the member variables const because they won’t be changed once we’ve created the object, and hence we want any code that tries to change them to error at compile-time.

Next we need a constructor to set values of these variables. This constructor will necessarily take two parameters, a pointer to the window class, and a pointer to the procedure.

#if defined(_M_IX86)

#pragma pack(push,1)

class winThunk{
    typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
    const DWORD m_mov;              // mov dword ptr [esp+0x4], m_this
    const myWindowClass* m_this;    //
    const BYTE m_jmp;               // jmp WndProc
    const ptrdiff_t m_relproc;      // relative jmp

public:
    winThunk(WndProc proc, myWindowClass* obj)
        :m_mov(0x042444C7),
        m_this(obj),
        m_jmp(0xE9),
        m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
    {
        ::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
    }
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

The values for m_mov and m_jmp are constant and hard-coded (note that we don’t use static const members as we normally would in such a case, they still have to be the right bytes in the right place). m_this is given the object pointer as it’s value. m_relproc is a bit more complicated. The jmp instruction needs a relative value, so we need to work out the distance from where the instruction pointer is now to where it needs to be, in bytes. Hence we cast the two pointers to char* and work out the distance, and put that in m_relproc.

Finally we call ::FlushInstructionCache. This ensures that no processor has a stale copy of the instructions we have just created on the heap.

A final addition to this version of the thunk is to simplify the task of using it with API calls (in particular ::SetWindowLong) that expect a ::WNDPROC value. We do this by adding a cast operator so that the object can then be handed straight to such a function.

#if defined(_M_IX86)

#pragma pack(push,1)

class winThunk{
    typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
    const DWORD m_mov;              // mov dword ptr [esp+0x4], m_this
    const myWindowClass* m_this;    //
    const BYTE m_jmp;               // jmp WndProc
    const ptrdiff_t m_relproc;      // relative jmp

public:
    winThunk(WndProc proc, myWindowClass* obj)
        :m_mov(0x042444C7),
        m_this(obj),
        m_jmp(0xE9),
        m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
    {
        ::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
    }

    operator ::WNDPROC() const{
        return reinterpret_cast<::WNDPROC>(this);
    }
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

If you try to compile this code you will receive a warning:

warning C4355: 'this' : used in base member initializer list

Yep, we’ve been bold. However we can get away with it, and because we can get away with it (and we test thoroughly to ensure that) we may as well just disable that warning. We’ll re-enable that warning again after we’ve compiled this class because using this in an initializer isn’t something you want to blithely do — indeed the main reason to disable it here is so we don’t mentally dismiss it if we see it because of code elsewhere.

#pragma warning( push )
#pragma warning( disable : 4355 )

#if defined(_M_IX86)

#pragma pack(push,1)

class winThunk{
    typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
    const DWORD m_mov;              // mov dword ptr [esp+0x4], m_this
    const myWindowClass* m_this;    //
    const BYTE m_jmp;               // jmp WndProc
    const ptrdiff_t m_relproc;      // relative jmp

public:
    winThunk(WndProc proc, myWindowClass* obj)
        :m_mov(0x042444C7),
        m_this(obj),
        m_jmp(0xE9),
        m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
    {
        ::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
    }

    operator ::WNDPROC() const{
        return reinterpret_cast<::WNDPROC>(this);
    }
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

#pragma warning
( pop )

Cookie-Cutter Time

One obvious problem with the above code is that it only works for myWindowClass objects. We have to write a new one for each window class we write.

One way around this is to use void* instead of myWindowClass*. This is the approach ATL took. Personally I prefer the safety of as much static typing and compile-time checking as I can get, especially when I’m doing something as hairy as dynamically creating pieces of machine code I only barely understand!

The obvious solution is to rewrite the code as a template which takes the window class as a template parameter. Since the only two member functions are both inline this adds little or nothing to the size of the generated code.

#pragma warning( push )
#pragma warning( disable : 4355 )

#if defined(_M_IX86)

#pragma pack(push,1)

template<typename W> class winThunk{
    typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
    const DWORD m_mov;          // mov dword ptr [esp+0x4], m_this
    const W* m_this;            //
    const BYTE m_jmp;           // jmp WndProc
    const ptrdiff_t m_relproc;  // relative jmp

public:
    winThunk(WndProc proc, W* obj)
        :m_mov(0x042444C7),
        m_this(obj),
        m_jmp(0xE9),
        m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
    {
        ::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
    }

    operator ::WNDPROC() const{
        return reinterpret_cast<::WNDPROC>(this);
    }
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

#pragma warning
( pop )

Skipping the Static Function

To my mind the static member function that we use here is an unnecessary step. You may disagree, there are advantages in doing it this way; it doesn’t matter whether then function it calls is virtual or not, and if that next function can be inlined then it probably will be, so there isn’t even the overhead of another function call. Still to me it seems like one more thing to remember.

As I said before, at a level lower than C++ that static member function looks much the same as a normal member function. In theory we should be able to use a member function in the same way. There are two main problems with this.

  1. A member function may be virtual. This means that getting to it is more complicated than getting to a static member function.
  2. You can’t cast a pointer to a member function to anything other than a member function of a derived type.

I’m willing to live with the first point. If I’m concerned about speed then I’m likely going to avoid virtual functions. If I’m not concerned about speed then I can have a non-virtual function call a virtual one. I can also get a compromise between the flexibility of virtual functions and the static binding of non-virtual functions if I use a base class which is parameterised by its derived class (a useful technique, but not one I’ll go into here).

The second point will require a bit of a kludge:

template<typename To, typename From> inline To union_cast(From fr) throw(){
    union{
        From f;
        To t;
    } uc;
    uc.f = fr;
    return uc.t;
}

Ugly! Ugly! Ugly! In fact another good name for this function is ugly_cast, to remind yourself of what an ugly thing you are doing, and how you should avoid it.

Indeed Bjarne Stroustrup actually gives code like the above as an example of how unions can be abused in [Stroustrup]. He does say “can be used”, a quote I’ve taken completely out of context to justify this dreadful kludge!

Anyway on Windows™, with VC++, it works. It only works if the function isn’t virtual, and if it is virtual it will probably GPF you pretty soon. Ugly! Ugly! Ugly! Still on with the show…

As well as using this “cast” we will also need to change the typedef of the first parameter to the constructor. This leaves us with:

#pragma warning( push )
#pragma warning( disable : 4355 )

#if defined(_M_IX86)

#pragma pack(push,1)

template<typename W> class winThunk{
    typedef ::LRESULT (CALLBACK W::* WndProc) (::UINT, ::WPARAM, ::LPARAM);
    const DWORD m_mov;          // mov dword ptr [esp+0x4], m_this
    const W* m_this;            //
    const BYTE m_jmp;           // jmp WndProc
    const ptrdiff_t m_relproc;  // relative jmp

public:
    winThunk(WndProc proc, W* obj)
        :m_mov(0x042444C7),
        m_this(obj),
        m_jmp(0xE9),
        m_relproc(union_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
    {
        ::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
    }

    operator ::WNDPROC() const{
        return reinterpret_cast<::WNDPROC>(this);
    }
};
#pragma pack(pop)

#else
#error
Only X86 supported
#endif

#pragma warning
( pop )

And that is our final version of the thunk class.

Lifetime Management

It is essential that the class isn’t destroyed while the window messages are being sent to it. It is leaky to have it in existence after the window class is destroyed. The obvious approach is to have a thunk member of the window class. If the destructor for the window class is either called when the window itself is safely destroyed, and no more messages will result, or if the destructor explicity sets the window’s procedure to something else before the CRT object clean-up kicks in, then all will be well.

Possible Extensions

There are two things I’d like to try with this, I’d be interested in hearing from anyone who has done so.

The first is to allow virtual functions to be used directly. This should be possible by altering the constructor to dereference the v-table to get the function pointer and then using it as before.

The second is to allow for member funtions which use the thiscall calling convention (the default for member functions). In this calling convention the this pointer is stored in the ECX register (note that this goes hand-in-hand with the fact that you can’t take the address of the this pointer). If we could do this and leave the window handle where it is it would be of particular convenience for messages where the only piece of member data we’ll look for is the window handle (often quite a few messages can be completely dealt with in this way, without needing the advantages C++ classes give to us).

Update. I’ve just coded a thiscall thunk such as I describe above. I’ll write it up after some testing.

你可能感兴趣的:(WNDPROC Thunks)