OK, your program works. You've tested everything in sight. It's time to ship it. So you make a release version.
And the world crumbles to dust.
You get memory access failures, dialogs don't come up, controls don't work, results come out incorrectly, or any or all of the above. Plus a few more problems that are specific to your application.
Now what?
That's what this essay is all about.
A bit of background: I have been working with optimizing compiler since 1969. My PhD dissertation (1975) was on the automatic generation of sophisticated optimizations for an optimizing compiler. My post-doctoral work involved the use of a highly optimizing compiler (Bliss-11) in the construction of a large (500K line source) operating system for a multiprocessor. After that, I was one of the architects of the PQCC (Production Quality Compiler-Compiler) effort at CMU, which was a research project to simplify the creation of sophisticated optimizing compilers. In 1981 I left the University to join Tartan Laboratories, a company that developed highly-optimizing compilers, where I was one of the major participants in the tooling development for the compilers. I've lived with, worked with, built, debugged, and survived optimizing compilers for over 30 years.
The usual first response is "the optimizer has bugs". While this can be true, it is actually a cause-of-last-resort. It is more likely that there is something else wrong with your program. We'll come back to the "compiler bugs" question a bit later. But the first assumption is that the compiler is correct, and you have a different problem. So we'll discuss those problems first.
The debug version of the MFC runtime allocates storage differently than the release version. In particular, the debug version allocates some space at the beginning and end of each block of storage, so its allocation patterns are somewhat different. The changes in storage allocation can cause problems to appear that would not appear in the debug version--but almost always these are genuine problems, as in bugs in your program, which somehow managed to not be detected in the debug version. These are usually rare.
Why are they rare? Because the debug version of the MFC allocator initializes all storage to really bogus values, so an attempt to use a chunk of storage that you have failed to allocate will give you an immediate access fault in the debug version. Furthermore, when a block of storage is freed, it is initialized to another pattern, so that if you have retained any pointers to the storage and try to use the block after it is freed you will also see some immediately bogus behavior.
The debug allocator also checks the storage at the start and end of the block it allocated to see if it has been damaged in any way. The typical problem is that you have allocated a block of n values as an array and then accessed elements 0 through n, instead of 0 through n-1, thus overwriting the area at the end of the array. This condition will cause an assertion failure most of the time. But not all of the time. And this leads to a potential for failure.
Storage is allocated in quantized chunks, where the quantum is unspecified but is something like 16, or 32 bytes. Thus, if you allocated a DWORD
array of six elements (size = 6 * sizeof(DWORD)
bytes = 24 bytes) then the allocator will actually deliver 32 bytes (one 32-byte quantum or two 16-byte quanta). So if you write element [6] (the seventh element) you overwrite some of the "dead space" and the error is not detected. But in the release version, the quantum might be 8 bytes, and three 8-byte quanta would be allocated, and writing the [6] element of the array would overwrite a part of the storage allocator data structure that belongs to the next chunk. After that it is all downhill. There error might not even show up until the program exits! You can construct similar "boundary condition" situations for any size quantum. Because the quantum size is the same for both versions of the allocator, but the debug version of the allocator adds hidden space for its own purposes, you will get different storage allocation patterns in debug and release mode.
Perhaps the greatest single cause of release-vs-debug failures is the occurrence of uninitialized local variables. Consider a simple example:
thing * search(thing * something) BOOL found; for(int i = 0; i < whatever.GetSize(); i++) { if(whatever[i]->field == something->field) { /* found it */ found = TRUE; break; } /* found it */ } if(found) return whatever[i]; else return NULL; }
Looks pretty straightforward, except for the failure to initialize the found variable to FALSE
. But this bug was never seen in the debug version! But what happens in the release version is that the whatever array, which holds n elements, has whatever[n]
returned, a clearly invalid value, which later causes some other part of the program to fail horribly. Why didn't this show up in the debug version? Because in the debug version, due entirely to a fortuitous accident, the value of found was always initially 0 (FALSE
), so when the loop exited without finding anything, it was correctly reporting that nothing was found, and NULL
was returned.
Why is the stack different? In the debug version, the frame pointer is always pushed onto the stack at routine entry, and variables are almost always assigned locations on the stack. But in the release version, optimizations of the compiler may detect that the frame pointer is not needed, or variable locations be inferred from the stack pointer (a technique we called frame pointer simulation in compilers I worked on), so the frame pointer is not pushed onto the stack. Furthermore, the compiler may detect that it is by far more efficient to assign a variable, such as i in the above example, to a register rather than use a value on the stack, so the initial value of a variable may depend on many factors (the variable i is clearly initially assigned, but what if found were the variable?
Other than careful reading of the code, and turning on high levels of compiler diagnostics, there is absolutely no way to detect uninitialized local variables without the aid of a static analysis tool. I am particularly fond of Gimpel Lint (see http://www.gimpel.com/), which is an excellent tool, and one I highly recommend.
There are many valid optimizations which uncover bugs that are masked in the debug version. Yes, sometimes it is a compiler bug, but 99% of the time it is a genuine logic error that just happens to be harmless in the absence of optimization, but fatal when it is in place. For example, if you have an off-by-one array access, consider code of the following general form
void func() { char buffer[10]; int counter; lstrcpy(buffer, "abcdefghik"); // 11-byte copy, including NULL ...
In the debug version, the NULL
byte at the end of the string overwrites the high-order byte of counter, but unless counter gets > 16M, this is harmless even if counter is active. But in the optimizing compiler, counter is moved to a register, and never appears on the stack. There is no space allocated for it. The NULL
byte overwrites the data which follows buffer, which may be the return address from the function, causing an access error when the function returns.
Of course, this is sensitive to all sorts of incidental features of the layout. If instead the program had been
void func() { char buffer[10]; int counter; char result[20]; wsprintf(result, _T("Result = %d"), counter); lstrcpy(buffer, _T("abcdefghik")); // 11-byte copy, including NUL
then the NUL byte, which used to overlap the high order byte of counter (which doesn't matter in this example because counter is obviously no longer needed after the line using it is printed) now overwrites the first byte of result, with the consequence that the string result now appears to be an empty string, with no explanation of why it is so. If result had been a char *
variable or some other pointer you would be getting an access fault trying to access through it. Yet the program "worked in the debug version"! Well, it didn't, it was wrong, but the error was masked.
In such cases you will need to create a version of the executable with debug information, then use the break-on-value-changed feature to look for the bogus overwrite. Sometimes you have to get very creative to trap these errors.
Been there, done that. I once got a company award at the monthly company meeting for finding a fatal memory overwrite error that was a "seventh-level bug", that is, the pointer that was clobbered by overwriting it with another valid (but incorrect) pointer caused another pointer to be clobbered which caused an index to be computed incorrectly which caused...and seven levels of damage later it finally blew up with a fatal access error. In that system, it was impossible to generate a release version with symbols, so I spent 17 straight hours single-stepping instructions, working backward through the link map, and gradually tracking it down. I had two terminals, one running the debug version and one running the release version. It was obvious in the debug version what had gone wrong, after I found the error, but in the unoptimized code the phenomenon shown above masked the actual error.
Certain functions require a specific linkage type, such as __stdcall
. Other functions require correct parameter matching. Perhaps the most common errors are in using incorrect linkage types. When a function specifies a __stdcall
linkage you must specify the __stdcall
for the function you declare. If it does not specify __stdcall
, you must not use the __stdcall
linkage. Note that you rarely if ever see a "bare" __stdcall linkage declared as such. Instead, there are many linkage type macros, such as WINAPI
, CALLBACK
, IMAGEAPI
, and even the hoary old (and distinctly obsolete) PASCAL
which are macros which are all defined as __stdcall
. For example, the top-level thread function for an AfxBeginThread
function is defined as a function whose prototype uses the AFX_THREADPROC
linkage type.
UINT (AFX_CDECL * AFX_THREADPROC)(LPVOID);
which you might guess as being a CDECL
(that is, non-__stdcall
) linkage. If you declared your thread function as
UINT CALLBACK MyThreadFunc(LPVOID value)
and started the thread as
AfxBeginThread((AFX_THREAD_PROC)MyThreadFunc, this);
then the explicit cast (often added to make a compiler warning go away!) would fool the compiler into generating code. This often results in the query "My thread function crashes the app when the thread completes, but only in release mode". Exactly why it doesn't do this in debug mode escapes me, but most of the time when we look at the problem it was a bad linkage type on the thread function. So when you see a crash like this, make sure that you have all the right linkages in place. Beware of using casts of function types; instead. write the function as
AfxBeginThread(MyThreadFunc, (LPVOID)this);
which will allow the compiler to check the linkage types and parameter counts.
Using casts will also result in problems with parameter counts. Most of these should be fatal in debug mode, but for some reason some of them don't show up until the release build. In particular, any function with a __stdcall
linkage in any of its guises must have the correct number of arguments. Usually this shows up instantly at compile time unless you have used a function-prototype cast (like the (AFX_THREADPROC
) cast in the previous section) to override the compiler's judgment. This almost always results in a fatal error when the function returns.
The most common place this shows up is when user-defined messages are used. You have a message which doesn't use the WPARAM
and LPARAM
values, so you write
wnd->PostMessage(UWM_MY_MESSAGE);
to simply send the message. You then write a handler that looks like
afx_msg void OnMyMessage(); // incorrect!
and the program crashes in release mode. Again, I've not investigated why this doesn't cause a problem in debug mode, but we've seen it happen all the time when the release build is created. The correct signature for a user-defined message is always, without exception,
afx_msg LRESULT OnMyMessage(WPARAM, LPARAM);
You must return a value, and you must have the parameters as specified (and you must use the types WPARAM
and LPARAM
if you want compatibility into the 64-bit world; the number of people who "knew" that WPARAM
meant WORD and simply wrote (WORD, LONG) in their Win16 code paid the penalty when they went to Win32 where it is actually (UNSIGNED LONG
, LONG
), and it will be different again in Win64, so why do it wrong by trying to be cute?)
Note that if you don't use the parameter values, you don't provide a name for the parameters. So your handler for OnMyMessage
is coded as
LRESULT CMyClass::OnMyMessage(WPARAM, LPARAM) { ...do something here... return 0; // logically void, 0, always }
An optimizing compiler makes several assumptions about the reality it is dealing with. The problem is that the compiler's view of reality is based entirely on a set of assumptions which a C programmer can all too readily violate. The result of these misrepresentations of reality are that you can fool the compiler into generating "bad code". It isn't, really; it is perfectly valid code providing the assumptions the compiler made were correct. If you have lied to your compiler, either implicitly or explicitly, all bets are off.
An alias to a location is an address to that location. Generally, a compiler assumes that unless otherwise instructed, aliasing exists (it is typical of C programs). You can get tighter code if you tell the compiler that it can assume no aliasing, and therefore, values that it has computed will remain constant across function calls. Consider the following example:
int n; int array[100]; int main(int argc, char * argv) { n = somefunction(); array[0] = n; for(int i = 1; i < 100; i++) array[i] = f(i) + array[0]; }
This looks pretty easy; it computes a function of i
, f(i)
, which at the moment we won't bother to define, and adds the array entry value to it. So a clever compiler says, "Look, array[0]
isn't modified at all in the loop body, so we can change the code to store the value in a register and rearrange the code:
register int compiler_generated_temp_001 =somefunction(); n = compiler_generated_temp_001; array[0] = compiler_generated_temp_001; for(int i = 1; i < 100; i++) array[i] = f(i) + compiler_generated_temp_001;
This optimization, which is a combination of loop invariant optimization and value propagation, works only if the assumption that array
[0]is not modified by f(i). But if we later define
int f(int i) { array[0]++; return i; }
Note that we have now violated the assumption that array[0]
is constant; there is an alias to the value. Now this alias is fairly easy to see, but when you have complex structures with complex pointers you can get exactly the same thing, but it is not detecta ...