September 1, 2009
Posted by Bartosz Milewski under Programming
[9] Comments
Spawning a thread in non-functional languages is considered a very low-level primitive. Often spawn or CreateThread takes a function pointer and an untyped (void) pointer to “data”. The newly created thread will execute the function, passing it the untyped pointer, and it’s up to the function to cast the data into something more palatable. This is indeed the lowest of the lowest. It’s the stinky gutters of programming.
Isn’t it much nicer to create a Thread or a Runnable object and let the ugly casting be done under the covers? But, as I argued before, the Thread object doesn’t really buy you much in terms of the most important safety issue: the avoidance of data races. So we can have a Thread object instead of a void pointer, and a run method that understands the format of the Thread object (or Runnable, take your pick). But because the Thread /Runnable object has reference semantics, we still end up inadvertently sharing data between threads. Unless the programmer consciously avoids or synchronizes shared access, he or she is left exposed to the most vile concurrency bugs–by default!
As they say, Cooks cover their mistakes with sauces; doctors, with six feet of dirt; language designers, with objects.
Requirements
But enough ranting! I have the opportunity to design the spawn function for D and I don’t want to do any more cover-ups beyond hiding the ugly systems’ APIs. Here are my design requirements:
spawn should take an arbitrary function as the main argument. It should refuse (at compile time) delegates or closures, which would introduce back-door sharing. (This might be relaxed later as we gain experience in controlling the sharing.)
It should take a variable number of arguments of the types compatible with those of the function parameters. It should detect type mismatches at compile time.
It should refuse the types of arguments that are prone to introducing data races. For now, I’ll allow only value types, immutable types, and explicitly shared types (shared is a type modifier in D).
I wish I could use the more precise race-free type system that I’ve been describing in my previous posts, but since I can’t get it into D2, there’s still a little bit of “programmer beware” in this implementation.
These requirement seem like a tall order for any language other than D. I wouldn’t say it’s a piece of cake in D, but it’s well within the reach of a moderately experienced programmer.
Unit Tests
Let me start by writing a little use case for my design (Oh, the joys of extreme programming!):
S s = { 3.14 };
Tid tid = spawn(&thrFun, 2, s, "hello");
tid.join;
Here’s the definition of the function, thrFun:
void thrFun(int i, S s, string str) {
writeln("thread function called with: ", i, ", ", s.fl, " and ", str);
}
Its parameter types fulfill the restrictions I listed above. The int is a value and so is S (structs are value types in D, unless they contain references):
struct S {
float fl;
}
Interestingly, the string is okay too, because its reference part is immutable. In D, a string is defined as an array of immutable characters, immutable (char)[].
Besides positive tests, the even more important cases are negative. For instance, I don’t want spawn to accept a function that takes an Object as argument. Objects are reference types and (if not declared shared) can sneak in unprotected sharing.
How do you build unit tests whose compilation should fail? Well, D has a trick for that (ignore the ugly syntax):
void fo(Object o) {}
assert (!__traits(compiles,
(Object o) { return spawn(&fo, o); }));
This code asserts that the function literal (a lambda),
(Object o){ return spawn(&fo, o); }
does not compile with the thread function fo. Now that’s one useful construct worth remembering!
Implementation
Without further ado, I present you with the implementation of spawn that passes all the above tests (and more):
Tid spawn(T...)(void function(T) fp, T args)
if (isLocalMsgTypes!(T))
{
return core.thread.spawn( (){ fp(args); });
}
This attractively terse code uses quite a handful of D features, so let me first read it out loud for kicks:
spawn is a function template returning the Tid (Thread ID) structure. Tid is a reference-counted handle, see my previous blog.
It is parameterized by a type tuple T….
It takes the following parameters:
a pointer to a function, fp, taking arguments of the types specified by the tuple T…
a variable number of parameters, args, of types T….
The type tuple T… must obey the predicate isLocalMsgTypes, which is defined elsewhere.
The implementation of spawn calls the (in general, unsafe) function core.thread.spawn (defined in the module core.thread) with the following closure (nested function):
(){ fp(args); }
which captures local variables, args.
As you may guess, the newly spawned thread runs the closure, so it has access to captured args from the original thread. In general, that’s a recipe for a data race. What saves the day is the predicate isLocalMsgTypes, which defines what types are safe to pass as inter-thread messages.
Note the important point: there should be no difference between the constraints imposed on the types of parameters passed to spawn and the types of messages that can be sent to a thread. You can think of spawn parameters as initial messages sent to a nascent thread. As I said before, message types include value types, immutable types and shared types (no support for unique types yet).
Useful D features
Let me explain some of D novelties I used in the definition of spawn.
A function with two sets of parenthesized parameters is automatically a template–the first set are template parameters, the second, runtime parameters.
-Tuples
Type tuples, like T…, represent arbitrary lists of types. Similar constructs have also been introduced in C++0x, presumably under pressure from Boost, to replace the unmanageably complex type lists.
What are the things that you can do with a type-tuple in D? You can retrieve its length (T.length), access its elements by index, or slice it; all at compile time. You can also define a variable-argument-list function, like spawn and use one symbol for a whole list of arguments, as in T args:
Tid spawn(T...)(void function(T) fp, T args)
Now let’s go back to my test:
Tid tid = spawn(&f, 2, s, "hello");
I spawn a thread to execute a function of three arguments, void f(int i, S s, string str). The spawn template is instantiated with a type tuple (int, S, string). At compile time, this tuple is successfully tested by the predicate isLocalMsgTypes. The actual arguments to spawn, besides the pointer to function, are (2, s, “hello”), which indeed are of correct types. They appear inside spawn under the collective name, args. They are then used as a collective argument to fp inside the closure, (){ fp(args); }.
-Closures
The closure captures the arguments to spawn. It is then passed to the internal function (not a template anymore),
core.thread.spawn(void delegate() dlg)
When the new thread is created, it calls the closure dlg, which calls fp with the captured arguments. At that point, the value arguments, i and s are copied, along with the shallow part of the string, str. The deep part of the string, the buffer, is not copied–and for a good reason too– it is immutable, so it can safely be read concurrently. At that point, the thread function is free to use those arguments without worrying about races.
-Restricted Templates
The if statement before the body of a template is D’s response to C++0x DOA concepts (yes, after years of design discussions, concepts were finally killed with extreme prejudice).
if (isLocalMsgTypes!(T))
The if is used to create “restricted templates”. It contains a logical compile-time expression that is checked before the template is instantiated. If the expression is false, the template doesn’t match and you get a compile error. Notice that template restrictions not only produce better error messages, but can also impose restrictions that are otherwise impossible or very hard to enforce. Without the restriction, spawn could be called with an unsuitable type, e.g. an Object not declared as shared and the compiler wouldn’t even blink.
(I will talk about template restrictions and templates in general in a future blog.)
–Message Types
Besides values, we may also pass to spawn objects that are declared as immutable or shared (in fact, we may pass them inside values as well). In D, shared objects are supposed to provide their own synchronization–their methods must either be synchronized or lock free. An example of a shared object that you’d want to pass to spawn is a message queue–to be shared between the parent thread and the spawned thread.
You might remember that my race-free type system proposal included unique types, which would be great for message passing, and consequently as arguments to spawn (there is a uniqueness proposal for Scala, and there’s the Kilim message-passing system for Java based on unique types). Unfortunately, unique types won’t be available in D2. Instead some kind of specialized Unique library classes might be defined for that purpose.
Conclusion
The D programming language has two faces. On the one hand, it’s easy to use even for a beginner. On the other hand, it provides enough expressive power to allow for the creation of sophisticated and safe libraries. What I tried to accomplish in this post is to give a peek at D from the perspective of a library writer. In particular I described mechanisms that help make the concurrency library safer to use.
This is still work in progress, so don’t expect to see it in the current releases of D2.