We can view Linux process by ps or top, major difference between them is that,
ps shows a static state at that time point, while
top dynamically shows process status
There's a field called "S" or "STAT" in ps & top echo.
These two fields indicate the process states.
Examples:
In Linux, processes have below states:
D uninterruptiabe sleep (usually IO) (struct: task_uninterruptible)
R running or runnable (on run queue) (struct: task_running)
S interuptable sleep (waiting for an event to complete) (struct: task_interruptible)
T stopped, either by a job control signal or because it is being traced (struct: task_stopped or task_traced)
W paging (not valid since the 2.6.xx kernel) (struct: )
X dead (should never be seen) (struct: task_dead)
Z defunct ("zombie") proces, terminated but not reaped by its parent
In Linux programming, fork() will make a child process which is almost the same with the parent process.
For example:
#include <stdio.h> #include <unistd.h> //including fork(), getpid(), getppid(), etc. main() { if (!fork()) { printf("Child pid=%d\n",getpid()); exit(0); } printf("parent pid=%d\n",getpid()); exit(0); }
fork() create a child process, and this child process print its pid and exit, while the parent process can't enter the if condition.
They, parent process and child process, share the same code segment, the same data segment, stack and heap, until this space's content will be changed.
(The until stuff will be explained in the next Copy-on-write section)
Copy-on-write (sometims rferred to as "COW") is an optimization strategy used in compuer programmin. Copy-on-write stems from the understanding tht when multiple separate tasks use identical copies of the same information (i.e., code segment, data segment, stack & heap), it is not necessary to create separate copies of that information for each instance, instead they can all be given pointers to the same resource. When there are many separate processes all using the same resource, it is possible to make significant resource savings by sharing resources this way.
However, if one or more process needs to change some part of the shared information, steps must be taken to ensure that those changes do not impact any other user of the shared resource. Copy-on-write attempts to make a change to the shared information, creating a separate (private) copy of that information for the task and redirecting the task to making changes to the private copy to prevent its changes from becoming visible to all the other taskes. All of this happens within the operating system kernel making the process transparent to both the task requesting the change and the other tasks using the shared copy.
Here comes how zombie process to be:
When a process is terminated by exit(), it's not completely destroyed. Actually:
Usually, above information shall be recovered by parent process. At this time, parent process will be interrupted, use zombie process' control information, then ultimately finish the zombie.
This happens very quickly, so we seldom see process in such state.
But...if parent doesn't respond to the SIGCHLD signal or explicitely ignore it, either parent doesn't wait() or waitpid(), the zombie will always be a zombie.
Let's chage above code with a line of sleep(30).
#include <stdio.h> #include <unistd.h> //including fork(), getpid(), getppid(), etc. main() { if (!fork()) { printf("Child pid=%d\n",getpid()); exit(0); } sleep(30); printf("parent pid=%d\n",getpid()); exit(0); }
The sleep makes parent process sleep 30 seconds after fork a child process.
The child process exits soon after it print the information, while parent is...still sleeping.
So, we get the process state by
# ps -el
in another terminal tab.
Actually, PID 2684 is the parent who's sleeping and 2685 is the <defunct> zombie child.
You can try
# kill 2685
or
# kill -9 2685
or
# kill -15 2685
What I can guarantee is above kill command will make nothing different and zombie will still be a zombie.
Here comes the tip:
# kill 2684
Yes, kill the irresponsible parent process instead of the child. If you really make things so bad.
After doing this, the defunct child will be adopted by the No.1 process init, and init will clean up the defunct child.
(It's good to learn more on init process.)
It's much better to prevent above mess rather than make it up after everything happen.
There are some methods here:
1.Explicitely invoke wait() or waitpid() in parent process. But it makes parent process hang.
2.If parent process can't afford to hang a lot, it can register a SIGCHLD handler with signal(). Child process will give a SIGCHLD signal after it eixts, and parent can recover child in handler.
3.If parent doesn't care when child exits, it can notice kernel by signal(SIGCHLD, SI_IGN). Then kernel will recover defunct child and parent process will not receive the SIGCHLD signal.
4.fork() twice before you do something in child. A forks a child B, B forks a child C, put your business in C. B will exit after it forks C, so C will be adopted by init. But you have to care about B's recovery