Threads and fork(): think twice before mixing them

When debugging a program I came across a bug that was caused by using fork(2) in a multi-threaded program. I thought it's worth to write some words about mixing POSIX threads with fork(2) because there are non-obvious problems when doing that.

What happens after fork() in a multi-threadeed program


The  fork(2) function creates a copy of the process, all memory pages are copied, open file descriptors are copied etc. All this stuff is intuitive for a UNIX programmer. One important thing that differs the child process from the parent is that the child has only one thread. Cloning the whole process with all threads would be problematic and in most cases not what the programmer wants. Just think about it: what to do with threads that are suspended executing a system call? So the  fork(2) call clones just the thread which executed it.

What are the problems


Critical sections, mutexes


The non-obvious problem in this approach is that at the moment of the  fork(2) call some threads may be in critical sections of code, doing non-atomic operations protected by mutexes. In the child process the threads just disappears and left data half-modified without any possibility to "fix" them, there is no way to say what other threads were doing and what should be done to make the data consistent. Moreover: state of mutexes is undefined, they might be unusable and the only way to use them in the child is to call  pthread_mutex_init() to reset them to a usable state. It's implementation dependent how mutexes behave after  fork(2) was called. On my Linux machine locked mutexes are locked in the child.

Library functions


Problem with mutexes and critical code sections implies another non-obvious issue. It's theoretically possible to write your code executed in threads so that you are sure it's safe to call fork when such threads run but in practice there is one big problem: library functions. You're never sure if a library function you are using doesn't use global data. Even if it is thread safe, it may be achieved using mutexes internally. You are never sure. Even system library functions that are thread-safe may use locks internally. One non-obvious example is the  malloc()function which at least in multi-threaded programs on my system uses locks. So it's not safe to call  fork(2) at the moment some other thread calls  malloc()! What the standard says about it? After  fork(2) in a multi-threaded program you may only call async-safe functions (listed in  signal(7)). It's similar limitation to the list of functions you are allowed to call in a  signal handler and the reason is similar: in both cases the thread might be "interrupted" while executing a function.

Here is a list of few functions that use locks internally on my system, just to show you that really almost nothing is safe:

  • malloc()
  • stdio functions like printf() - this is required by the standard.
  • syslog()

execve() and open file descriptors


It seems that calling  execve(2) to start another program is the only sane reason you would like to call  fork(2) in a multi-threaded program. But even doing that has at least one problem. When calling  execve(2) one must remember that open file descriptors remain open and the program that was executed may read and write to them. It creates a problem if you leave open file descriptor at the time you call  execve(2)that was not intended to be visible by the executed program. It even creates security issues in some cases. There is a solution for that: you must set the  FD_CLOEXEC flag on all file descriptors using  fcntl(2) so they are automatically closed on new programs execution. Unfortunately it's not as simple in multi-threaded program. When using  fcntl(2) to set the  FD_CLOEXEC flag there is a race:
    
     
     
     
     
  1. fd = open ( "file" , O_RDWR | O_CREAT | O_TRUNC , 0600 );
  2. if (fd < 0 ) {
  3. perror ( "open()" );
  4. return 0;
  5. }
  6.  
  7. fcntl (fd , F_SETFD , FD_CLOEXEC );

When another thread executes fork(2) and execve(2) just between this thread does open(2) but before fcntl(2) a new program is started having this file descriptor duplicated. This is not what we want. A solution was created with newer standards (like POSIX.1-2008) and newer Linux kernel (changes in 2.6.23 and later). We now have O_CLOEXEC flag to the open(2) function, so the whole operation of opening a file and setting the FD_CLOEXEC flag is atomic.

There are other ways to create file descriptors than using open(2): duplicating them with dup(2), creating sockets with socket(2) etc. All those functions have now a flag similar to O_CLOEXEC or a newer version that can take similar flag (some of them, like dup2(2) does not have a flags argument, so dup3(2) was created).

It's worth to mention that a similar thing may happen in a single threaded program when it does fork(2) and execve(2) in a signal handler. This operation is perfectly legal because both of the functions are async-safe and can be called from a signal handler, but the problem is the program may be interrupted between open(2) and fcntl(2).

For more information about the new API to set FD_CLOEXEC flag see Ulrich Drepper's blog: Secure File Descriptor Handling.

Useful system functions: pthread_atfork()


One useful function that tries to solve the problem with  fork(2) in multi-threaded programs is  pthread_atfork(). It has the following prototype:
    
     
     
     
     
  1. int pthread_atfork ( void ( *prepare ) ( void ) , void ( *parent ) ( void ) , void ( *child ) ( void ) );

It allows to set handler functions that will be automatically executed on fork call:

  • prepare - Called just before a new process is created.
  • parent - Called after a new process is created in the parent.
  • child - Called after a new process is created in the child.

The purpose of this call is to deal with critical sections of a multi-threaded program at the time fork(2) is called. A typical scenario is when in the prepare handler mutexes are locked, in the parent handler unlocked and in the child handler reinitialized.

Summary


In my opinion there are so many problems with  fork(2) in multi-threaded programs that it's almost impossible to do it right. The only clear case is to call  execve(2) in the child process just after  fork(2). If you want to do something more, just do it some other way, really. From my experience it's not worth trying to make the  fork(2) call save, even with  pthread_atfork(). I truly hope you read this article before hitting problems described here.

Resources


  • fork() description from The Open Group Single UNIX Specification
  • Ulrich Drepper's blog: Secure File Descriptor Handling
  • pthread_atfork()
  • daper's blog
  •  
  • Add new comment

Comments

Saved my life

Submitted by Anonymous on Fri, 02/01/2013 - 12:34.
It took me some weeks of debugging and searching the web to find your page helping me to remove those random coredumps (which of course only contained a corrupt stack which did not help a lot) In my case I just forked and called a child with exec. But had a "Starting application %s" debug output (with our internal wrapper using syslog) before exec :( Thanks a lot for those hints!
  • reply

Fork, exec, stdout redirection, and multithreading

Submitted by Anonymous on Thu, 03/08/2012 - 21:02.
Great article! As said, the one sane reason to use fork with multithreading is to call exec shortly after fork. This seems ok, but in my case I want the child to have stdout redirected to a designated file, something like this:
pid=fork();
if (pid == 0) {
    freopen("stdout.txt", "w", stdout);
    execvp(procPath, procArgv);
}
Is this safe enough, or I have to do a workaround and redirect stdout in the parent before fork and restore it after?
  • reply

I wouldn't use any libc

Submitted by daper on Thu, 03/08/2012 - 22:30.
I wouldn't use any libc function, they may use memory allocation internally which may deadlock. I would use plain open() and then dup2() which are just wrappers for system calls.
  • reply

I see. However I think I'm

Submitted by Anonymous on Fri, 03/09/2012 - 20:02.
I see. However I think I'm bound to fork-exec: I need to control pid with waitpid, and also be able to kill the process.
  • reply

Maybe I was not clear. You

Submitted by daper on Fri, 03/09/2012 - 23:21.
Maybe I was not clear. You can use fork(), then open() and dup2() in the child to rediret the output and then execve(). Non of these are libc functions: just wrappers to system calls and are safe. Creating a redirection bofore fork() and restoring it will create a race in your program.
  • reply

Now it's clear, thanks!

Submitted by Anonymous on Fri, 03/09/2012 - 23:24.
Now it's clear, thanks!
  • reply

all memory pages are copied . . .

Submitted by Anonymous on Wed, 10/27/2010 - 09:13.
Nope. Copy on Write only duplicates pages that the child uses. Why waste time and memory copying things when most of the time programs fork() just to immediately call exec()?
  • reply

You're right, there is the

Submitted by daper on Wed, 10/27/2010 - 09:18.
You're right, there is the COW mechanism that avoids copying memory, but it's unrelated to how fork() behaves in a multi-threaded program.
  • reply

Could I translate this article into Chinese in my blog?

Submitted by Anonymous on Tue, 10/26/2010 - 18:44.
Hi Daper, I am a programmer in China. I am kind of loving this article. Could I translate it into Chinese and put it in my blog? Logan ( xorcererzc @gmail.com )
  • reply

Yes, I'm glad you like it and

Submitted by daper on Tue, 10/26/2010 - 20:39.
Yes, I'm glad you like it and want to translate, remember to keep the information about the source :)
  • reply

We are doing it and run into trouble

Submitted by Anonymous on Tue, 08/17/2010 - 09:01.
We are doing exact the same (using fork in a multithreaded application) and run into trouble. After 2 or 3 days our application crashes, because a stack pointer of one thread gets corrupted. We use fork, because we have to do "system" calls as "hwclock -u -w" every hour. You say "If you want to do something more, just do it some other way, really." Can you explain what other ways we have to achieve the same as the system call ? Thanks a million Bjoern
  • reply

RTC direct access

Submitted by Anonymous on Tue, 02/28/2012 - 14:56.
You can use ioctl() function with /dev/rtc if of course driver of your RTC supports it.
  • reply

There is no simple

Submitted by daper on Tue, 08/17/2010 - 12:06.
There is no simple replacement - you must change the way your program works. In your case I would spawn a dedicated process just after program starts and before creating any threads to which you can communicate using a pipe or a socket. This process should do the system() call on request and send back the result. In the place in your program where you need to run hwclock, just send a request to do that to the dedicated process. This way you will use fork()/system() in the process that is single-threaded.
  • reply

Very interesting read.

Submitted by Anonymous on Sat, 11/07/2009 - 10:41.
Very interesting read. Thanks. Also, you have a few typos (save instead of safe, hear instead of here), I'd suggest correcting them, as they cast an image of you, that somehow weakens your authority on such precise matters.
  • reply

Thanks, typos fixed :)

Submitted by daper on Sat, 11/07/2009 - 11:16.
Thanks, typos fixed :)
  • reply

Alex

Submitted by Anonymous on Wed, 09/23/2009 - 21:07.
Thanks for article! Has helped to solve a problem with tcmalloc!!!
  • reply

Another useful article

Submitted by mattismyname (not verified) on Sat, 08/08/2009 - 00:47.
I had this exact problem myself. For me the issue was tcmalloc locks which caused any dynamic allocation after fork() to hang. As you mention, the only solution I found was to execve() ASAP after fork(). Wish I had read your article earlier though...would have saved me some debugging!

你可能感兴趣的:(Threads and fork(): think twice before mixing them)