C++ Debugging (by null_pointer)

C++ has several powerful features available for debugging no matter which platform you use, whether or not you have access to a debugger. The purpose of this article is to enumerate the methods you can use to debug your code, and discuss circumstances for their use.

When finding out about a new feature in a programming language, one's first inclination is often to ignore its drawbacks and try to substitute it for all other features. Since no design model is perfect for every problem, this inclination is wasteful and merely leads to poorly designed code, since everything must be made to fit into the "better" design model.

You cannot choose the most suitable model without first considering your circumstances and the relative strengths and weaknesses of several different methods. Assertions, exceptions, logging, return values, etc. all have specific strengths and weaknesses.

I'll list some of my observations on these methods.

Method 0 - Nothing

Pros: Easy to write tons of code, imposes no execution burden in debug or release builds

Cons: Better skip the country if it doesn't work

This method is more of a non-method - that's why it is called method 0. I thought I'd include it for completeness. If you use this method often, do your clients a favor and seek professional help.

It also gives me a chance to explain the theory of debugging as I see it and some conventions we'll use throughout this article. There are basically two versions of code in C++ - debug and release. The code must be functionally equivalent in both modes. The difference is that in debug mode we favor useful debugging aids over speed, and in release mode we often value speed over debugging. Of course, you can define different levels if need be, but for this article we'll only use two.

Note that debugging is different than cleanup.

Bugs are typically poorly designed code that fails under certain conditions. Debugging is the process of finding and eliminating bugs. There are many causes of bugs, but here is a short list:

  1. Poor understanding of the language, API, other code, and/or platform
  2. Bad code design/organization
  3. Operating System bugs
  4. Miscommunication between members of a team
  5. Hastily written code
  6. Absolutely no reason at all :)

Note that when you write code, your code is rarely completely independent. Viz., your code is typically dependent on the standard library. Further, you will rarely code alone (if you intend to be a commercial success), which means interdependencies will exist between your code and that of your team members, and possibly that of your customers (if you code libraries).

Two terms are often used to describe the roles of people or source code as they relate to depencies: server and client. People who use and depend on your code are said to be clients. When you depend on their code, you are their client. Server is rarely used here, but it means "the person who wrote the code."

Method 1 - Assertions

Pros: Relatively fast, imposes no overhead in a release build, extremely simple to code.

Cons: Slows the code down a little bit in a debug build, provide no safety in a release build, requires clients to read source code when they debug.

Explanation

An assertion is a boolean expression that must hold true in order for the program to continue to execute properly. You state an assertion in C++ by using the assert function, and passing it the expression that must be true:

assert( this );

If this is zero, then the assert function stops program execution, displays a message informing you that "assert( this )" failed on such and such a line in your source file, and lets you go from there. If this wasn't zero, assert will simply return and your program will continue to execute normally.

Note that the assert function does nothing and imposes no overhead in a release build, so do NOT use it like this:

FILE* p = 0;

assert( p = fopen("myfile.txt", "r+") );

...because fopen will not be called in the release version! This is the correct way to do it:

FILE* p = fopen("myfile.txt", "r+") );

assert( p );

Examples

These are best used in writing new code, where assumptions must almost always be made. Consider the following function:

void sort(int* const myarray) // an overly simple example

{

  for( unsigned int x = 0; x < sizeof(myarray)-1; x++ )

    if( myarray[x] < myarray[y] ) swap(myarray[x], myarray[y]);

}

Count the number of assumptions this function makes. Now take a look at the better version, which makes debugging a bit easier:

void sort_array(int* const myarray)

{

  assert( myarray );

  assert( sizeof(myarray) > sizeof(int*) );



  for( unsigned int x = 0; x < sizeof(myarray)-1; x++ )

    if( myarray[x] < myarray[y] ) swap(myarray[x], myarray[y]);

}

You see, that innocent-looking algorithm won't work if:

  1. The pointer is null, or
  2. sizeof(myarray) cannot be used to determine the number of elements in the array, either because the array was not allocated on the stack, or because someone has passed in the address of a single (non-array) object.
Although that is a simple algorithm, many functions you will write and/or encounter will be much larger and more complex than that. It is surprising when you see the amount of conditions which can cause a piece of code to fail. Let's take a look at a portion of an alpha-blending routine that I was playing with a while back:
void blend(const video::memory& source,

           video::memory& destination,

           const float colors[3])

{

  // The algorithm used is: B = A * alpha



  const unsigned int width = source.width();

  const unsigned int height = source.height();

  const unsigned int depth = source.depth();

  const unsigned int pitch = source.pitch();



  switch( depth )

  {

  case 15:

    // ...

    break;



  case 16:

    // ...

    break;



  case 24:

  {

    unsigned int offset = 0;

    unsigned int index = 0;



    for( unsigned int y = 0; y < height; y++ )

    {

      offset = y * pitch;



      for( unsigned int x = destination.get_width(); x > 0; x-- )

      {

        index = (x * 3) + offset;



        destination[index + 0] = source[index + 0] * colors[0];

        destination[index + 1] = source[index + 1] * colors[1];

        destination[index + 2] = source[index + 2] * colors[2];

      }

    }

  } break;



  case 32:

    // ...

    break;

  }

}

Do you realize the amount of assumptions that function makes in the name of optimization? Let's try listing them:

assert( source.locked() and destination.locked() );

assert( source.width() == destination.width() );

assert( source.height() == destination.height() );

assert( source.depth() == destination.depth() );

assert( source.pitch() == destination.pitch() );

assert( source.depth() == 15 or source.depth() == 16

    or source.depth() == 24 or source.depth() == 32 );

Typically, the more you optimize in low-level code such as that, the more assumptions you make. My function requires that the source and destination video memory be locked (so multiple blend functions can be called with a single lock()/unlock()), and that they both have the same width, height, depth, and pitch. Placing these assertions at the top of the function will prevent some programmer in the future from wondering why my function doesn't work or causes access violations - all requirements are now stated clearly at the top of the function.

However, you have to be careful which version of assert you use. If you use the ANSI C assert (defined in ), then you may wind up with a very large debug build because of all the string constants it creates. If you find this to be a problem, override assert to trigger an exception or something else instead of building the string constants.

Also, you don't have to check every parameter and condition - some things are painfully obvious to a good programmer. Sometimes comments might be better because they do not increase the size of the final build.

Good code should not require a plethora of assertions at the top of each function. If you find that you are writing a class and you have to place assertions in each member function to test the state, etc. then it is probably better to split the class up into other classes.

Conclusion

Getting into the habit of sprinkling assertions throughout your code has the following benefits:

  1. It also requires that you _think_ conciously about the assumptions your code makes on its environment and on other code/data, and thus it gives you the opportunity to develop better techniques and prevent bugs.
  2. By placing assertions near the top of the function and/or right before and after their conditions are used, other programmers will be able to more easily prevent bugs in the way they use your code.
  3. It helps to communicate the intent of your code, its affects on related functions/data, and any design limitations it might have.
  4. Bugs are easier to track down when using assertions to check key parameters, because they are found when the arguments are passed into the function, not in some obscure algorithm or function down the road, or even worse, going undetected until they cause other errors.
  5. It makes it easier to track incorrect return values (see assert( SUCCEEDED(hr) ) with DirectX) ;)
  6. You don't have to think up exception descriptions or return value error codes, and this is helpful when you are simply writing new code and wish to test something quickly, and for painfully obvious stuff like assert( this ).

Method 2 - Exceptions

Pros: Automatic cleanup and elegant shutdown, opportunity to continue if handled, works in both debug and release builds

Cons: Relatively slow

Explanation

Basically, you use the throw keyword to throw data up to some unknown function caller and the stack continues to unwind (think of it like the program is reversing itself) until someone catches the data. You use the try keyword to enclose code that you'd like to catch exceptions from. See:

void some_function() { throw 5; } // some function that throws an exception



int main(int argc, char* argv[])

{

  try // just letting the compiler know we want to catch any exceptions from this code

  {

    some_function();

  }



  catch(const int x) // if the type matches the data thrown, we execute this code

  {

    // do something about the exception...

  }

}

If you don't place try blocks in your code, then the stack will simply continue to unwind until it gets past the main function, and your program will exit. You don't have to place them everywhere - only where you can catch an exception and can recover from it. If you can only recover from it partially, you can rethrow the original exception (by the empty throw statement "throw;") and the stack will continue unwinding until it hits the next matching catch block.

Examples

Exceptions are best used in key places in debug and release builds to track exceptional conditions. If used properly, they provide automatic cleanup and then either force the application to quit or put itself back into a valid state. Thus exceptions are perfect for release code, because they provide everything the end user wants in a well-behaved program that encounters unexpected errors. Correctly used, they provide the following benefits:

  1. Automatic cleanup of all resources from any point in execution.
  2. They force the app to either quit or put itself back into a valid state.
  3. They force the recipient of the exception to handle it, instead of merely being optional (as with return values and assertions).
  4. They allow the deallocation code to be written by the person who wrote the allocation code and be handled implicitly (destructors, of course!).

Because of the overhead, it is generally a bad idea to use them for normal flow control because other control structures will be faster and more efficient.

Method 3 - Return Values

Pros: Fast when used with built-in types and/or constants, allow a change in the client's logic and possible cleanup

Cons: Error-handling isn't mandatory, values could be confusing

Explanation

Basically, we either return valid data back to the caller, or a value to indicate an error:

const int divide(const int divisor, const int dividend)

{

  if( dividend == 0 ) return( 0 ); // avoid "integer divide-by-zero" error



  return( divisor/dividend );

}

A value of zero indicates that the function failed. Unfortunately, in this example you can also get a return value of zero by passing in zero for the divisor (which is perfectly valid), so the caller has no idea whether this function returned an error. This function is nonsensical, but it illustrates the problem of using return values for error handling. It's hard or impossible to choose error values for all functions.

Conclusion

Return values are best used in conditions when there is a grey area between an error and a simple change in logic. For example, a function might return a set of bit flags, some of which might be considered erroroneous by one client, and not by the other. Return values are great for conditional logic.

A function trusting a function caller to notice an error condition is like a lifeguard trusting other swimmers to notice someone who is drowning.

Method 4 - Logging

Sometimes you do not have access to a debugger, and logging errors to a file can be quite helpful in debugging. Declare a global log file (or use std::clog) and output text to it when an error occurs. It might also help to output the file name and line number so you can tell where the error occurred. __FILE__ and __LINE__ tell the compiler to insert the current filename and line number.

You can also use log files to record messages other than errors, such as the maximum number of matrices in use or some other such data that you can't access with a debugger. Or you could output the data that caused your function to fail, etc. std::fstream is great for this purpose. If you are really clever, you could figure out some way to make your assertions and exceptions log their messages to a file. :)

This provides the following benefits:

  1. It's easy to integrate with your existing code and highly portable.
  2. It can provide human-readable output in lieu of a debugger.
  3. It can provide detailed information without interrupting the program.

Of course, it does have some overhead so you'll have to decide whether that is offset by the benefits in your situation.

One More Thing...

I had intended to finish this article here, but I wish to show how valuable a mixed approach to debugging can be. The easiest way to do this is by creating a simple class, preferrably one that must work with non-C++ code. A file class will do nicely. We'll use C's fopen() and related functions for simplicity and portability.

We need to meet the following requirements:

  1. Constructor and destructor that match the lifetime of the file pointer.
  2. Assertions to describe assumptions made by each function.
  3. Exception types used to force the client to handle exceptional conditions.
  4. Member functions for reading and writing data.
  5. Templated member functions as shortcuts for stack-based data.
  6. Exception safety, including a fail-safe destructor and responsible member functions.
  7. Portability.

Here it is:

#include <cstdio>

#include <cassert>

#include <ciso646>

#include <string>



class file

{

public:



  // Exceptions

  struct exception {};

  struct not_found : public exception {};

  struct end : public exception {};



  // Constants

  enum modes { relative, absolute };



  file(const std::string& filename, const std::string& parameters);

  ~file();



  void seek(const unsigned int position, const enum modes = relative);

  void read(void* const data, const unsigned int size);

  void write(const void* const data, const unsigned int size);

  void flush();



  // Stack only!

  template <typename T> void read(T& data) { read(&data, sizeof(data)); }

  template <typename T> void write(const T& data) { write(&data, sizeof(data)); }



private:

  FILE* pointer;



  file(const file& other) {}

  file& operator = (const file& other) { return( *this ); }

};





file::file(const std::string& filename, const std::string& parameters)

  : pointer(0)

{

  assert( not filename.empty() );

  assert( not parameters.empty() );



  pointer = fopen(filename.c_str(), parameters.c_str());



  if( not pointer ) throw not_found();

}





file::~file()

{

  int n = fclose(pointer);

  assert( not n );

}





void file::seek(const unsigned int position, const enum file::modes mode)

{

  int n = fseek(pointer, position, (mode == relative) ? SEEK_CUR : SEEK_SET);

  assert( not n );

}





void file::read(void* const data, const unsigned int size)

{

  size_t s = fread(data, size, 1, pointer);



  if( s != 1 and feof(pointer) ) throw end();

  assert( s == 1 );

}





void file::write(const void* const data, const unsigned int size)

{

  size_t s = fwrite(data, size, 1, pointer);

  assert( s == 1 );

}





void file::flush()

{

  int n = fflush(pointer);

  assert( not n );

}





int main(int argc, char* argv[])

{

  file myfile("myfile.txt", "w+");



  int x = 5, y = 10, z = 20;

  float f = 1.5f, g = 29.4f, h = 0.0129f;

  char c = 'I';



  myfile.write(x);

  myfile.write(y);

  myfile.write(z);

  myfile.write(f);

  myfile.write(g);

  myfile.write(h);

  myfile.write(c);



  return 0;

}

If you compile this under Windows, make sure the project type is set to "Win32 Console App." What benefits does this class provide to its clients?

  1. It allows the user to manually debug in order to see which function and thus which parameters caused the member function to fail.
  2. It checks all return values and most parameters in a debug build.
  3. It throws two types of exceptions if the condition is exceptional and the client must decide what to do (i.e., run an installation repair wizard and restore some data files from the CD, prompt the user, or whatever).
  4. It provides shortcuts for the most common operations.
  5. It disallows copy construction and copy assignment, which it is not prepared to handle.

Note that the exceptions are thrown at two points which are vital to the file pointer's state:

  1. When the file is created, we make sure it succeeded.
  2. When the file is read from, we make sure the end is not reached prematurely.

Overhead

Note that all variables used to hold the return values in preparation for evaluation by assertions should be taken out by any optimizing compiler in a release build. The assertions are not evaluated in release build. This ensures that the program runs with very little overhead. If you look at the source code for the member functions, they evaluate to basically the function call in a release build.

The exceptions are, however, present in a release build, and this is especially good because they may affect the program's ability to function and recover properly.

We don't need to use assert( this ) or assert( pointer ) in the member functions because file objects can only ever be created in a valid state. If operator new doesn't allocate a file object correctly, the constructor/destructor is never called and operator new throws an exception. If fopen doesn't return a valid file pointer, we throw a file::not_found exception and the file destructor is never called, nor are any of the member functions used. So we never have to worry about having an invalid this pointer or invalid file pointer in our member functions.

(If the user tries to call a member function using a null pointer, this will probably be null and accessing the member functions will probably cause access violations. Programmers should have experience with that though.)

If we were to put the member functions into the header files and use the inline keyword, the compiler should be able to inline those functions in the release build, eliminating the function call overhead and associated temporaries, and making the class almost as fast in a release build as if we had simply used straight C code. :)

Judicious use of assertions can make your code easier to debug without decreasing the speed or size of the final (release) build.

Conclusion

The techniques illustrated with the file class can be used for most legacy code that exposes only handles or pointers to its internal objects.

Extension of the file class's functionality is an educational exercise left to the reader. Try adding a copy constructor and assignment operator, and test the class under different conditions. Note which assertions become invalid when you modify the class, and which existing assertions help you to catch errors with new code. With your changes, is it possible to put the file object in an invalid state?

Conclusion

It is best to use assertions when debugging conditions that will almost certainly cause the code to fail anyway down the function, to use exceptions as panic buttons for release code, and return values for what they were intended - passing valid data back to the calling function. Use logs for data that you can't or don't want to check when the program is running.

Fortunately, all of these suggestions are good ol' portable C++. Unfortunately, not everyone writes in C++ :) and there are also times when clients should not need to browse your source code. You must choose which methods to use based on your circumstances, and more importantly, those of your client(s).

The sad part is that return values are often the only communication of errors between C++ code and non-C++ code. It is quite a pain to check every return value (and you should). Assertions serve to alleviate some of that pain.

你可能感兴趣的:(debug)