If you ever program classes for controlling Windows™ windows you will have met with the task of associating windows messages with the correct instance of the class. There are a few alternative ways of doing so. This document outlines one method using self-modifying code which is fast, efficient, reliable, unlikely to interfer with other code (including code from other users of the same window). Most importantly, in my opinion, it’s a technique that largely stays out of your way once it is in place.
Windows messages are communicated by to a callback function defined by the user which matches the type ::WNDPROC
.
::WNDPROC
is defined as:
typedef LRESULT (CALLBACK* WNDPROC)(HWND, UINT, WPARAM, LPARAM);
From now on I shall put the global scope resolution operator ::
to distinguish those types and functions defined by Windows™ from those created for the sake of this document. (This assumes that you haven’t wrapped your Windows header includes in a namespace).
CALLBACK
is a macro defined thus:
#define CALLBACK __stdcall
So the functions matching the ::WNDPROC
type must use the __stdcall
calling convention. We’ll examine consequences of this later.
I shall now summarise the types used by the function. The declarations I give here are not the same as those found in the headers, rather to ease discussion of them I have done some of the work the compiler does when it sees a macro or typedef and boiled a few levels of indirection into a single typedef. The actual definitions can be easily found in the windows headers.
Type | Conventional Parameter Name | Definition* | Purpose | Size |
---|---|---|---|---|
::LRESULT | (n/a) | typedef long LRESULT; |
Return value | 32 bits |
::HWND | hWnd | Handle† | Handle to window | 32 bits |
::UINT | uMsg | typedef unsigned int UINT; |
Specifies the message | 32 bits |
::WPARAM | wParam | typedef unsigned int WPARAM; |
first message paramenter | 32 bits |
::LPARAM | lParam | typedef long LPARAM; |
Second message parameter | 32 bits |
These are definitions that are used to match a particular compiler’s choices with various implementation-defined features (most notably the size of the built-in C++ types) with the binary model on which Windows™ runs. You may have different results here, the important thing is that they are currently all of 32 bits in size.
The definition of a handle depends on whether you compile with STRICT
defined or not. If you don’t then the definition is:
typedef void* HWND
If you do then it is:
struct HWND__ { int unused; }; typedef HWND__* HWND
As you can probably guess this isn’t really what a window handle points to. The struct HWND__ exists purely to make it impossible to pass a window handle to another type without a cast (more specifically, without a C-style cast or a reinterpret_cast). As such STRICT
makes certain mis-uses of window handles (and other handles) more likely to be caught at compile-time.
Every window has an associated window procedure (a function that matches the declaration of ::WNDPROC
) defined in the ::WNDCLASS
or ::WNDCLASSEX
structure that defines its features. The address of this function can be obtained or changed using ::GetWindowLong
or ::SetWindowLong
with a value of GWL_WNDPROC
for the second parameter.
A WindowProc will essentially be a large switch statement (all right-thinking C++ hackers will shudder at the phrase “large switch statement”), which will determine action based on the value of the uMsg parameter. It may or may not use the values of the wParam and lParam parameters in a variety of ways (the get casted and converted all over the place) in deciding how to fulfill the request. It will often need to use the hWnd member directly or indirectly in doing so.
As such ::WNDPROC
is a simple interface which allows one function to act as the gateway between the OS and your code. While it is often wasteful (the wParam
and lParam
parameters are often unused, or used to pass 16-bit or smaller values), it is easier to implement one function than two hundred. This is especially true since the function can delegate either to another ::WNDPROC
(in particular if you are subclassing a window you will delegate to the window’s initial procedure) or the system-provided ::DefWindowProc
which carries out default actions for many messages. (In particular such delegation would be the normal for your default: label to catch any messages you didn’t implement, or even messages that didn’t exist when your code was written).
The natural way of programming a window in an object-orientated language like C++ is to have the ::WNDPROC
call a member function of the object that represents the window. This member function would then act on the message, most likely by calling further functions that more naturally represent what the message requests.
The problem with this is how to locate the object. ::WNDPROC
is not defined as a member function. It can be called on a global, or namespace-scoped function, or on a static member function. There is no direct way to get from the function to the object.
Our best clue is the hWnd
parameter, which gives us the handle of the window in question. This gives us a few options:
::GetWindowLong
to obtain a pointer to the object. This is rather slow, more fiddly than one might like, and runs the risk of someone else using the same memory to do something else. ::GetProp
to obtain it. This is slower than the ::GetWindowLong
method, especially if you take the precaution of using a longish property name to avoid conflicts. (BTW this is probably the most common method used when subclassing with Visual Basic). Really we’re going to have some fiddly-ness with this whatever technique we use, we are after all marrying two completely different systems through a pointer to something defined by the partner to that marriage that wasn’t defined by us.
Me, I’d rather get all my fiddly-ness done at once if I can. Preferably early on in the proceedings. And that’s exactly what we are going to do here.
Let’s look again at what we have:
::LRESULT CALLBACK WindowProc(::HWND hWnd, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);
Now consider what we want:
class myWindowClass{
::HWND m_hWnd;
/* other code elided */
::LRESULT CALLBACK WindowProc(::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);
/* other code elided */
};
We don’t need the ::HWND
parameter in the second case, because we have that stored as a member of the class (it would be very unusual not to do this in such classes).
Consider now, how such a C++ member function looks to a piece of C code:
::LRESULT WindowProc(myWindowClass* this, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam);
As you probably know the this
pointer in C++ is passed as a normal parameter (give or take a few optimisations) but is hidden from us. (If the member function was a const function then this
would have been of type const myWindowClass*
).
When we remember about the this
pointer then the member-function version doesn’t look all that different from the original. We just need to replace one pointer with another. But how do we do this without introducing the same (or maybe even worse) problems that we have in the beginning?
When I coded in Z80 assember on a spectrum there were a few important differences between the majority of programming I did then and what I do now. One of these is that I am quite happy to be largely ignorant of Pentium assembly. In those days that would have stuck me with a single version of old (non-Visual) Basic. Another is that while I now complain about how slow the 233MHz machine I have at home is and how paltry its 128MB of memory, in those days I had 4MHz to make use of the 128K of memory (of which 80K needed some awkward paging code to gain access to).
A lot of things were different then right enough. And it was some time then before I came across a technique I used to enjoy the sheer hackiness of then — self-modifying code.
Consider the following piece of code:
if (globalBoolean)
func1(x, y);
else
func2(x, y);
Most likely none of the above seems terribly foreign to you. You may (should) be suspicious of the global variable, and you might wonder if a function pointer would make the logic cleaner and more extensible.
Now lets say we want to make that piece of code blisteringly fast. Our crappy 4MHz processor comes to that piece of code a lot as it loops through some code that will appreciably affect the responsiveness of the user experience (and a lot of programming on Spectrums was games). Lets also say that globalBoolean
is very rarely changed.
One thing we can do with the assembly version of this code is to replace the if
statement with a call to func1
. Whenever we would alter the value of globalBoolean
with the above code we instead change machine code instruction so that it calls another function.
There is now zero overhead in determining which function to call. If we go further and actually copy in a block of code, rather than a call to a block of code, we can increase the speed further.
Yuck! This is a horrible mind-hurting way to code. However there were times when it seemed worth it. It’s also kind of l33t though to have a program re-write itself during its own execution.
This technique was never that common-place then, and there is less call for it now. Games programs will always look to squeeze every cycle, debuggers may need to re-write code while it is debugging it, and buffer-overflow and heap-overflow spl0its of course re-write code in order to compromise the system. As an indicator of how rare this technique now is on Windows™ we can search for FlushInstructionCache
on the web (an API call we’ll use, and examine later). Most of the documents returned will either be documentation of the call itself, games or debugger code, or the very technique we are examining here.
If we were to take a technique like this what gains would it give us here?
To my mind the second of these is the winner. The less we have to think about what’s going on, and the more we can put into one well-tested piece of code, the easier and more reliable our code will be. The speed point isn’t a biggy to my mind, at least not most of the time, like most hackers I’ve learnt to mistrust speed and efficiency as an over-riding concern when programming, though like most hackers I still like my code to be as fast as possible.
There are a few properties we’ll want our self-modifying code to have:
The way we fulfil these three requirements is to have a small piece of code that gives us the object pointer we need, and then lets some C++ code do all the work.
The way we do this is to replace the ::HWND
parameter we don’t need with the myWindowClass*
parameter we do need, and then jump to the start of a function that expects the parameter list we now have. Because we’ve jumped rather than called the return from that function will be treated as a return from the first function.
That is Windows™ thinks it’s called a function with this signature:
::LRESULT CALLBACK WindowProc(::HWND, ::UINT, ::WPARAM, ::LPARAM);
Our C++ code thinks that it’s function that looks like one of the following was called:
::LRESULT CALLBACK WindowProc(myWindowClass*, ::UINT, ::WPARAM, ::LPARAM);
And our dynamically-created machine code does the necessary work to bridge those two.
Of course the second function isn’t exactly what we want, but it’s close. It’s also a start, in that it is easy to write a static class member function that matches that signature and calls a more appropriate member of the actual object:
::LRESULT CALLBACK WindowProc(myWindowClass* pThis, ::UINT uMsg, ::WPARAM wParam, ::LPARAM lParam){
return pThis->InternalWndProc(uMsg, wParam, lParam);
}
That will be enough to be getting on with.
There is another reason for starting with this slightly more indirect method. I said before that I don’t know ASM. Luckily ATL uses the same technique (see [Rector & Sells]) that we are going to use here. It uses the indirect version, so we’ll begin with copying their ASM and using that.
The Intel version of the assembly is:
mov dword ptr [esp+0x4], pThis
jmp WndProc
If you want to make your code work on Alpha as well then when you’ve read this you should be able to adapt the ATL code in the same way that we are going to do.
The first instruction copies the constant pThis
to the position occupied by the window handle (which will be a known offset, 0x4, from the esp
register). The second jumps to the our C++ function.
Note that we use CALLBACK
on our window function. This means that the rules for how the parameters are placed on the stack, how the value will be returned, and how the stack will be cleaned up when the function returns are the same for the function Windows™ thinks it’s calling and the function we have written.
The first thing our thunk object will need to do is contain the actual bytes of the machine code. We’ll need to have set the packing to 1 so that there are no alignment gaps. It’ll probably also be a good idea to make sure that we don’t compile this for anything other than Intel:
#if defined(_M_IX86)
#pragma pack(push,1)
class winThunk{
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const myWindowClass* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
We’ve made all of the member variables const
because they won’t be changed once we’ve created the object, and hence we want any code that tries to change them to error at compile-time.
Next we need a constructor to set values of these variables. This constructor will necessarily take two parameters, a pointer to the window class, and a pointer to the procedure.
#if defined(_M_IX86)
#pragma pack(push,1)
class winThunk{
typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const myWindowClass* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
public:
winThunk(WndProc proc, myWindowClass* obj)
:m_mov(0x042444C7),
m_this(obj),
m_jmp(0xE9),
m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
{
::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
}
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
The values for m_mov
and m_jmp
are constant and hard-coded (note that we don’t use static const members as we normally would in such a case, they still have to be the right bytes in the right place). m_this
is given the object pointer as it’s value. m_relproc
is a bit more complicated. The jmp
instruction needs a relative value, so we need to work out the distance from where the instruction pointer is now to where it needs to be, in bytes. Hence we cast the two pointers to char*
and work out the distance, and put that in m_relproc
.
Finally we call ::FlushInstructionCache
. This ensures that no processor has a stale copy of the instructions we have just created on the heap.
A final addition to this version of the thunk is to simplify the task of using it with API calls (in particular ::SetWindowLong
) that expect a ::WNDPROC
value. We do this by adding a cast operator so that the object can then be handed straight to such a function.
#if defined(_M_IX86)
#pragma pack(push,1)
class winThunk{
typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const myWindowClass* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
public:
winThunk(WndProc proc, myWindowClass* obj)
:m_mov(0x042444C7),
m_this(obj),
m_jmp(0xE9),
m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
{
::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
}
operator ::WNDPROC() const{
return reinterpret_cast<::WNDPROC>(this);
}
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
If you try to compile this code you will receive a warning:
Yep, we’ve been bold. However we can get away with it, and because we can get away with it (and we test thoroughly to ensure that) we may as well just disable that warning. We’ll re-enable that warning again after we’ve compiled this class because using this
in an initializer isn’t something you want to blithely do — indeed the main reason to disable it here is so we don’t mentally dismiss it if we see it because of code elsewhere.
#pragma warning( push )
#pragma warning( disable : 4355 )
#if defined(_M_IX86)
#pragma pack(push,1)
class winThunk{
typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const myWindowClass* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
public:
winThunk(WndProc proc, myWindowClass* obj)
:m_mov(0x042444C7),
m_this(obj),
m_jmp(0xE9),
m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
{
::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
}
operator ::WNDPROC() const{
return reinterpret_cast<::WNDPROC>(this);
}
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
#pragma warning( pop )
One obvious problem with the above code is that it only works for myWindowClass
objects. We have to write a new one for each window class we write.
One way around this is to use void*
instead of myWindowClass*
. This is the approach ATL took. Personally I prefer the safety of as much static typing and compile-time checking as I can get, especially when I’m doing something as hairy as dynamically creating pieces of machine code I only barely understand!
The obvious solution is to rewrite the code as a template which takes the window class as a template parameter. Since the only two member functions are both inline this adds little or nothing to the size of the generated code.
#pragma warning( push )
#pragma warning( disable : 4355 )
#if defined(_M_IX86)
#pragma pack(push,1)
template<typename W> class winThunk{
typedef ::LRESULT (CALLBACK* WndProc) (::UINT, ::WPARAM, ::LPARAM);
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const W* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
public:
winThunk(WndProc proc, W* obj)
:m_mov(0x042444C7),
m_this(obj),
m_jmp(0xE9),
m_relproc(reinterpret_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
{
::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
}
operator ::WNDPROC() const{
return reinterpret_cast<::WNDPROC>(this);
}
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
#pragma warning( pop )
To my mind the static member function that we use here is an unnecessary step. You may disagree, there are advantages in doing it this way; it doesn’t matter whether then function it calls is virtual or not, and if that next function can be inlined then it probably will be, so there isn’t even the overhead of another function call. Still to me it seems like one more thing to remember.
As I said before, at a level lower than C++ that static member function looks much the same as a normal member function. In theory we should be able to use a member function in the same way. There are two main problems with this.
I’m willing to live with the first point. If I’m concerned about speed then I’m likely going to avoid virtual functions. If I’m not concerned about speed then I can have a non-virtual function call a virtual one. I can also get a compromise between the flexibility of virtual functions and the static binding of non-virtual functions if I use a base class which is parameterised by its derived class (a useful technique, but not one I’ll go into here).
The second point will require a bit of a kludge:
template<typename To, typename From> inline To union_cast(From fr) throw(){
union{
From f;
To t;
} uc;
uc.f = fr;
return uc.t;
}
Ugly! Ugly! Ugly! In fact another good name for this function is ugly_cast, to remind yourself of what an ugly thing you are doing, and how you should avoid it.
Indeed Bjarne Stroustrup actually gives code like the above as an example of how unions can be abused in [Stroustrup]. He does say “can be used”, a quote I’ve taken completely out of context to justify this dreadful kludge!
Anyway on Windows™, with VC++, it works. It only works if the function isn’t virtual, and if it is virtual it will probably GPF you pretty soon. Ugly! Ugly! Ugly! Still on with the show…
As well as using this “cast” we will also need to change the typedef of the first parameter to the constructor. This leaves us with:
#pragma warning( push )
#pragma warning( disable : 4355 )
#if defined(_M_IX86)
#pragma pack(push,1)
template<typename W> class winThunk{
typedef ::LRESULT (CALLBACK W::* WndProc) (::UINT, ::WPARAM, ::LPARAM);
const DWORD m_mov; // mov dword ptr [esp+0x4], m_this
const W* m_this; //
const BYTE m_jmp; // jmp WndProc
const ptrdiff_t m_relproc; // relative jmp
public:
winThunk(WndProc proc, W* obj)
:m_mov(0x042444C7),
m_this(obj),
m_jmp(0xE9),
m_relproc(union_cast<char*>(proc) - reinterpret_cast<char*>(this) - sizeof(winThunk))
{
::FlushInstructionCache(GetCurrentProcess(), this, sizeof(winThunk));
}
operator ::WNDPROC() const{
return reinterpret_cast<::WNDPROC>(this);
}
};
#pragma pack(pop)
#else
#error Only X86 supported
#endif
#pragma warning( pop )
And that is our final version of the thunk class.
It is essential that the class isn’t destroyed while the window messages are being sent to it. It is leaky to have it in existence after the window class is destroyed. The obvious approach is to have a thunk member of the window class. If the destructor for the window class is either called when the window itself is safely destroyed, and no more messages will result, or if the destructor explicity sets the window’s procedure to something else before the CRT object clean-up kicks in, then all will be well.
There are two things I’d like to try with this, I’d be interested in hearing from anyone who has done so.
The first is to allow virtual functions to be used directly. This should be possible by altering the constructor to dereference the v-table to get the function pointer and then using it as before.
The second is to allow for member funtions which use the thiscall calling convention (the default for member functions). In this calling convention the this
pointer is stored in the ECX
register (note that this goes hand-in-hand with the fact that you can’t take the address of the this
pointer). If we could do this and leave the window handle where it is it would be of particular convenience for messages where the only piece of member data we’ll look for is the window handle (often quite a few messages can be completely dealt with in this way, without needing the advantages C++ classes give to us).
Update. I’ve just coded a thiscall thunk such as I describe above. I’ll write it up after some testing.