When you run two programs on an Operating System that offers memory protection, as Windows and UNIX/Linux do, the two programs are executed as separate processes, which means they are given separate address spaces. This means that when program #1 modifies the address 0x800A 1234 in its memory space, program #2 does not see any change in the contents of its memory at address 0x800A 1234. With simpler Operating Systems that cannot accomplish this separation of processes, a faulty program can bring down not only itself but other programs running on that computer (including the Operating System itself).
The ability to execute more than one process at a time is known as multi-processing. A process consists of a program (usually called the application) whose statements are performed in an independent memory area. There is a program counter that remembers which statement should be executed next, and there is a stack which holds the arguments passed to functions as well as the variables local to functions, and there is a heap which holds the remaining memory requirements of the program. The heap is used for the memory allocations that must persist longer than the lifetime of a single function. In the C language, you use malloc
to acquire memory from the heap, and in C++, you use the new
keyword.
Sometimes, it is useful to arrange for two or more processes to work together to accomplish one goal. One situation where this is beneficial is where the computer's hardware offers multiple processors. In the old days this meant two sockets on the motherboard, each populated with an expensive Xeon chip. Thanks to advances in VLSI integration, these two processor chips can now fit in a single package. Examples are Intel's "Core Duo" and AMD's "Athlon 64 X2". If you want to keep two microprocessors busy working on a single goal, you basically have two choices:
So, what's a thread? A thread is another mechanism for splitting the workload into separate execution streams. A thread is lighter weight than a process. This means, it offers less flexibility than a full blown process, but can be initiated faster because there is less for the Operating System to set up. What's missing? The separate address space is what is missing. When a program consists of two or more threads, all the threads share a single memory space. If one thread modifies the contents of the address 0x800A 1234, then all the other threads immediately see a change in the contents of their address 0x800A 1234. Furthermore, all the threads share a single heap. If one thread allocates (via malloc
or new
) all of the memory available in the heap, then attempts at additional allocations by the other threads will fail.
But each thread is given its own stack. This means, thread #1 can be calling FunctionWhichComputesALot()
at the same time that thread #2 is calling FunctionWhichDrawsOnTheScreen()
. Both of these functions were written in the same program. There is only one program. But, there are independent threads of execution running through that program.
What's the advantage? Well, if your computer's hardware offers two processors, then two threads can run simultaneously. And even on a uni-processor, multi-threading can offer an advantage. Most programs can't perform very many statements before they need to access the hard disk. This is a very slow operation, and hence the Operating System puts the program to sleep during the wait. In fact, the Operating System assigns the computer's hardware resources to somebody else's program during the wait. But, if you have written a multi-threaded program, then when one of your threads stalls, your other threads can continue.
One good way to learn any new programming concept is to study other people's code. You can find source code in magazine articles, and posted on the Internet at sites such as CodeProject. I came across some good examples of multi-threaded programs in two articles written for the C/C++ Users Journal, by Rex Jaeschke. In the October 2005 issue, Jaeschke wrote an article entitled "C++/CLI Threading: Part 1", and in the November 2005 issue, he wrote his follow-up article entitled "C++/CLI Threading: Part 2". Unfortunately, the C/C++ Users Journal magazine folded shortly after these articles appeared. But, the original articles and Jaeschke's source code are still available at the following websites:
You'll notice that the content from the defunct C/C++ Users Journal has been integrated into the Dr. Dobb's Portal website, which is associated with Dr. Dobb's Journal, another excellent programming magazine.
You might not be familiar with the notation C++/CLI. This stands for "C++ Common Language Infrastructure" and is a Microsoft invention. You're probably familiar with Java and C#, which are two languages that offer managed code where the Operating System rather than the programmer is responsible for deallocating all memory allocations made from the heap. C++/CLI is Microsoft's proposal to add managed code to the C++ language.
I am not a fan of this approach, so I wasn't very interested in Jaeschke's original source code. I am sure Java and C# are going to hang around, but C++/CLI attempts to add so many new notations (and concepts) on top of C++, which is already a very complicated language, that I think this language will disappear.
But, I still read the original C/C++ Users Journal article and thought Jaeschke had selected good examples of multi-threading. I especially liked how his example programs were short and yet displayed data corruption when run without the synchronization methods that are required for successful communication between threads. So, I sat down and rewrote his programs in standard C++. This is what I am sharing with you now. The source code I present could also be written in standard C. In fact, that's easier than accomplishing it in C++ for a reason we will get to in just a minute.
This is probably the right time to read Jaeschke's original articles, since I don't plan to repeat his great explanations of multitasking, reentrancy, atomicity, etc. For example, I don't plan to explain how a program is given its first thread automatically and all additional threads must be created by explicit actions by the program (oops). The URLs where you can find Jaeschke's two articles are given above.
It is unfortunate that the C++ language didn't standardize the method for creating threads. Therefore, various compiler vendors invented their own solutions. If you are writing a program to run under Windows, then you will want to use the Win32 API to create your threads. This is what I will demonstrate. The Win32 API offers the following function to create a new thread:
uintptr_t _beginthread( void( __cdecl *start_address )( void * ), unsigned stack_size, void *arglist );
This function signature might look intimidating, but using it is easy. The _beginthread()
function takes three passed parameters. The first is the name of the function which you want the new thread to begin executing. This is called the thread's entry-point-function. You get to write this function, and the only requirements are that it take a single passed parameter (of type void*
) and that it returns nothing. That is what is meant by the function signature:
void( __cdecl *start_address )( void * ),
The second passed parameter to the _beginthread()
function is a requested stack size for the new thread (remember, each thread gets its own stack). However, I always set this parameter to 0, which forces the Windows Operating System to select the stack size for me, and I haven't had any problems with this approach. The final passed parameter to the _beginthread()
function is the single parameter you want passed to the entry-point-function. This will be made clear by the following example program:
#include <stdio.h> #include <windows.h> #include <process.h> // needed for _beginthread() void silly( void * ); // function prototype int main() { // Our program's first thread starts in the main() function. printf( "Now in the main() function.\n" ); // Let's now create our second thread and ask it to start // in the silly() function. _beginthread( silly, 0, (void*)12 ); // From here on there are two separate threads executing // our one program. // This main thread can call the silly() function if it wants to. silly( (void*)-5 ); Sleep( 100 ); } void silly( void *arg ) { printf( "The silly() function was passed %d\n", (INT_PTR)arg ) ; }
Go ahead and compile this program. Simply request a Win32 Console Program from Visual C++ .NET 2003's New Project Wizard and then "Add a New item" which is a C++ source file (.CPP file) in which you place the statements I have shown. I am providing Visual C++ .NET 2003 workspaces for Jaeschke's (modified) programs, but you need to know the key to starting a multi-threaded program from scratch: you must remember to perform one modification to the default project properties that the New Project Wizard gives you. Namely, you must open up the Project Properties dialog (select "Project" from the main Visual C++ menu and then select "Properties"). In the left hand column of this dialog, you will see a tree view control named "Configuration Properties", with the main sub-nodes labeled "C/C++", "Linker", etc. Double-click on the "C/C++" node to open this entry up. Then, click on "Code Generation". In the right hand area of the Project Properties dialog, you will now see listed "Runtime Library". This defaults to "Single Threaded Debug (/MLd)". [The notation /MLd indicates that this choice can be accomplished from the compiler command line using the /MLd switch.] You need to click on this entry to observe a drop-down list control, where you must select Multi-threaded Debug (/MTd). If you forget to do this, your program won't compile, and the error message will complain about the _beginthread()
identifier.
A very interesting thing happens if you comment out the call to the Sleep()
function seen in this example program. Without the Sleep()
statement, the program's output will probably only show a single call to the silly()
function, with the passed argument -5. This is because the program's process terminates as soon as the main thread reaches the end of the main()
function, and this may occur before the Operating System has had the opportunity to create the other thread for this process. This is one of the discrepancies from what Jaeschke says concerning C++/CLI. Evidently, in C++/CLI, each thread has an independent lifetime, and the overall process (which is the container for all the threads) persists until the last thread has decided to die. Not so for straight C++ Win32 programs: the process dies when the primary thread (the one that started in the main function) dies. The death of this thread means the death of all the other threads.
The example program I just listed really isn't a C++ program because it doesn't use any classes. It is just a C language program. The Win32 API was really designed for the C language, and when you employ it with C++ programs, you sometimes run into difficulties. Such as this difficulty: "How can I employ a class member function (a.k.a. an instance function) as the thread's entry-point-function?"
If you are rusty on your C++, let me remind you of the problem. Every C++ member function has a hidden first passed parameter known as the this
parameter. Via the this
parameter, the function knows which instance of the class to operate upon. Because you never see these this
parameters, it is easy to forget they exist.
Now, let's again consider the _beginthread()
function which allows us to specify an arbitrary entry-point-function for our new thread. This entry-point-function must accept a single void*
passed param. Aye, there's the rub. The function signature required by _beginthread()
does not allow the hidden this
parameter, and hence a C++ member function cannot be directly activated by _beginthread()
.
We would be in a bind were it not for the fact that C and C++ are incredibly expressive languages (famously allowing you the freedom to shoot yourself in the foot) and the additional fact that _beginthread()
does allow us to specify an arbitrary passed parameter to the entry-point-function. So, we use a two-step procedure to accomplish our goal: we ask _beginthread()
to employ a static class member function (which, unlike an instance function, lacks the hidden this
parameter), and we send this static class function the hidden this
pointer as a void*
. The static class function knows to convert the void*
parameter to a pointer of a class instance. Voila! We now know which instance of the class should call the real entry-point-function, and this call completes the two step process. The relevant code (from Jaeschke's modified Part 1 Listing 1 program) is shown below:
class ThreadX { public: // In C++ you must employ a free (C) function or a static // class member function as the thread entry-point-function. static unsigned __stdcall ThreadStaticEntryPoint(void * pThis) { ThreadX * pthX = (ThreadX*)pThis; // the tricky cast pthX->ThreadEntryPoint(); // now call the true entry-point-function // A thread terminates automatically if it completes execution, // or it can terminate itself with a call to _endthread(). return 1; // the thread exit code } void ThreadEntryPoint() { // This is the desired entry-point-function but to get // here we have to use a 2 step procedure involving // the ThreadStaticEntryPoint() function. } }
Then, in the main()
function, we get the two step process started as shown below:
hth1 = (HANDLE)_beginthreadex( NULL, // security 0, // stack size ThreadX::ThreadStaticEntryPoint,// entry-point-function o1, // arg list holding the "this" pointer CREATE_SUSPENDED, // so we can later call ResumeThread() &uiThread1ID );
Notice that I am using _beginthreadex()
rather than _beginthread()
to create my thread. The "ex" stands for "extended", which means this version offers additional capability not available with _beginthread()
. This is typical of Microsoft's Win32 API: when shortcomings were identified, more powerful augmented techniques were introduced. One of these new extended capabilities is that the _beginthreadex()
function allows me to create but not actually start my thread. I elect this choice merely so that my program better matches Jaeschke's C++/CLI code. Furthermore, _beginthreadex()
allows the entry-point-function to return an unsigned value, and this is handy for reporting status back to the thread creator. The thread's creator can access this status by calling GetExitCodeThread()
. This is all demonstrated in the "Part 1 Listing 1" program I provide (the name comes from Jaeschke's magazine article).
At the end of the main()
function, you will see some statements which have no counterpart in Jaeschke's original program. This is because in C++/CLI, the process continues until the last thread exits. That is, the threads have independent lifetimes. Hence, Jaeschke's original code was designed to show that the primary thread could exit and not influence the other threads. However, in C++, the process terminates when the primary thread exits, and when the process terminates, all its threads are then terminated. We force the primary thread (the thread that starts in the main()
function) to wait upon the other two threads, via the following statements:
WaitForSingleObject( hth1, INFINITE ); WaitForSingleObject( hth2, INFINITE );
If you comment out these waits, the non-primary threads will never get a chance to run because the process will die when the primary thread reaches the end of the main()
function.
In the Part 1 Listing 1 program, the multiple threads don't interact with one another, and hence they cannot corrupt each other's data. The point of the Part 1 Listing 2 program is to demonstrate how this corruption comes about. This type of corruption is very difficult to debug, and this makes multi-threaded programs very time consuming if you don't design them correctly. The key is to provide synchronization whenever shared data is accessed (either written or read).
A synchronization object is an object whose handle can be specified in one of the Win32 wait functions such as WaitForSingleObject()
. The synchronization objects provided by Win32 are:
An event notifies one or more waiting threads that an event has occurred.
A mutex can be owned by only one thread at a time, enabling threads to coordinate mutually exclusive access to a shared resource. The state of a mutex object is set to signaled when it is not owned by any thread, and to nonsignaled when it is owned by a thread. Only one thread at a time can own a mutex object, whose name comes from the fact that it is useful in coordinating mutually exclusive access to a shared resource.
Critical section objects provide synchronization similar to that provided by mutex objects, except that critical section objects can be used only by the threads of a single process (hence they are lighter weight than a mutex). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. There is no guarantee about the order in which threads will obtain ownership of the critical section; however, the Operating System will be fair to all threads. Another difference between a mutex and a critical section is that if the critical section object is currently owned by another thread, EnterCriticalSection()
waits indefinitely for ownership whereas WaitForSingleObject()
, which is used with a mutex, allows you to specify a timeout.
A semaphore maintains a count between zero and some maximum value, limiting the number of threads that are simultaneously accessing a shared resource.
A waitable timer notifies one or more waiting threads that a specified time has arrived.
This Part 1 Listing 2 program demonstrates the Critical Section synchronization object. Take a look at the source code now. Note that in the main()
function, we create two threads and ask them both to employ the same entry-point-function, namely the function called StartUp()
. However, because the two object instances (o1
and o2
) have different values for the mover class data member, the two threads act completely different from each other. Because in one case isMover = true
and in the other case isMover = false
, one of the threads continually changes the Point
object's x
and y
values while the other thread merely displays these values. But, this is enough interaction that the program will display a bug if used without synchronization.
Compile and run the program as I provide it to see the problem. Occasionally, the print out of x and y values will show a discrepancy between the x and y values. When this happens, the x value will be 1 larger than the y value. This happens because the thread that updates x and y was interrupted by the thread that displays the values between the moments when the x value was incremented and when the y value was incremented.
Now, go to the top of the Main.cpp file and find the following statement:
//#define WITH_SYNCHRONIZATION
Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly. This one change activates all of the critical section statements in the program. I could have just as well used a mutex or a semaphore, but the critical section is the most light-weight (hence fastest) synchronization object offered by Windows.
One of the most common uses for a multi-threaded architecture is the familiar producer/consumer situation where there is one activity to create packets of stuff and another activity to receive and process those packets. The next example program comes from Jaeschke's Part 2 Listing 1 program. An instance of the CreateMessages
class acts as the producer, and an instance of the ProcessMessages
class acts as the consumer. The producer creates exactly five messages and then commits suicide. The consumer is designed to live indefinitely, until commanded to die. The primary thread waits for the producer thread to die, and then commands the consumer thread to die.
The program has a single instance of the MessageBuffer
class, and this one instance is shared by both the producer and the consumer threads. Via synchronization statements, this program guarantees that the consumer thread can't process the contents of the message buffer until the producer thread has put something there, and that the producer thread can't put another message there until the previous one has been consumed.
Since my Part 1 Listing 2 program demonstrates a critical section, I elected to employ a mutex in this Part 2 Listing 1 program. As with the Part 1 Listing 2 example program, if you simply compile and run the Part 2 Listing 1 program as I provide it, you will see that it has a bug. Whereas the producer creates the five following messages:
1111111111 2222222222 3333333333 4444444444 5555555555
the consumer receives the five following messages:
1 2111111111 3222222222 4333333333 5444444444
There is clearly a synchronization problem: the consumer is getting access to the message buffer as soon as the producer has updated the first character of the new message. But the rest of the message buffer has not yet been updated.
Now, go to the top of the Main.cpp file and find the following statement:
//#define WITH_SYNCHRONIZATION
Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly.
Between the English explanation in Jaeschke's original magazine article and all the comments I have put in my C++ source code, you should be able to follow the flow. The final comment I will make is that the GetExitCodeThread()
function returns the special value 259 when the thread is still alive (and hence hasn't really exited). You can find the definition for this value in the WinBase header file:
#define STILL_ACTIVE STATUS_PENDING
where you can find STATUS_PENDING
defined in the WinNT.h header file:
#define STATUS_PENDING ((DWORD )0x00000103L)
Note that 0x00000103 = 259.
Jaeschke's Part 2 Listing 3 program demonstrates thread local storage. Thread local storage is memory that is accessible only to a single thread. At the start of this article, I said that an Operating System could initiate a new thread faster than it could initiate a new process because all threads share the same memory space (including the heap) and hence there is less that the Operating System needs to set up when creating a new thread. But, here is the exception to that rule. When you request thread local storage, you are asking the Operating System to erect a wall around certain memory locations in order that only a single one of the threads may access that memory.
The C++ keyword which declares that a variable should employ thread local storage is __declspec(thread)
.
As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem, go to the top of the Main.cpp file and find the following statement:
//#define WITH_SYNCHRONIZATION
Uncomment this statement (that is, remove the double slashes). Then, re-compile and re-run the program. It now works perfectly.
Jaeschke's Part 2 Listing 4 program demonstrates the problem of atomicity, which is the situation where an operation will fail if it is interrupted mid-way through. This usage of the word "atomic" relates back to the time when an atom was believed to be the smallest particle of matter and hence something that couldn't be further split. Assembly language statements are naturally atomic: they cannot be interrupted half-way through. This is not true of high-level C or C++ statements. Whereas you might consider an update to a 64 bit variable to be an atomic operation, it actually isn't on 32 bit hardware. Microsoft's Win32 API offers the InterlockedIncrement()
function as the solution for this type of atomicity problem.
This example program could be rewritten to employ 64 bit integers (the LONGLONG
data type) and the InterlockedIncrement64()
function if it only needed to run under a Windows 2003 Server. But, alas, Windows XP does not support InterlockedIncrement64()
. Hence, I was originally worried that I wouldn't be able to demonstrate an atomicity bug in a Windows XP program that dealt only with 32 bit integers. But, curiously, such a bug can be demonstrated as long as we employ the Debug mode settings in the Visual C++ .NET 2003 compiler rather than the Release mode settings. Therefore, you will notice that unlike the other example programs inside the .ZIP file that I distribute, this one is set for a Debug configuration.
As with my other example programs, this one will display an obvious synchronization problem if you compile and run it unchanged. After you have seen the problem, go to the top of the Main.cpp file and find the following statement:
static bool interlocked = false; // change this to fix the problem
Change false
to true
, and then re-compile and re-run the program. It now works perfectly because it is now employing InterlockedIncrement()
.
In order that other C++ programmers can experiment with these multithreaded examples, I make available a .ZIP file holding five Visual C++ .NET 2003 workspaces for the Part 1 Listing 1, Part 1 Listing 2, Part 2 Listing 1, Part 2 Listing 3, and Part 2 Listing 4 programs from Jaeschke's original article (now translated to C++). Enjoy!
This is my second submission to CodeProject. The first demonstrated how to use Direct3D 8 to model the Munsell color solid so that you could then fly through this color cube as in a video game. I also have a website where I offer a complete introduction to programming, including assembly language programming. My home page is www.computersciencelab.com.