Why doesn't kill work?

Mon Feb 21 18:12:35 UTC 2005

On Mon, 21 Feb 2005 11:21:36 -0600, William Wilson
<williamwilson at uwmail.com> wrote:
> Zombie processes are spawned by a parent process that terminates before
> receiving the childs signal that they have completed.  The Zombie is
> waiting on acknowledgement from the parent process, thus they run
> forever, or until the next reboot or someone kills them.

This is absolutely incorrect.  I suggest you read any of Steven's
UNIX programming books.

  "...[the child] terminates first, when it calls exit ....  It then
   becomes a zombie: a process that has terminated, but whose
   parent is still running but has not yet waited for the child."

You might also want to read this from the Unix FAQ:
http://www.faqs.org/faqs/unix-faq/faq/part3/section-13.html

What you are talking about is called an "orphan" process: one whose
parent has died first, but that is still alive.  All orphans are adopted by
process 1 (init).  An orphan and a zombie are two different things.

A zombie process is already dead and not running.  It can not run,
be spawned, or be killed.  It can not possibly wait on anything, including
it's parent, or do anything.   A zombie has no memory, no code, no
process counter, no CPU time, nothing.  The *kernel* on the other
hand will remember the zombie until it's parent either 1) dies, or
2) calls one of the wait() system calls, or 3) the parent had previously
told the kernel it doesn't care about dead children.

The signaling you talk about is done by the kernel; not the zombie process.

> They [zombies] WILL NOT go away after the parent process dies.
>  This was incorrectly stated in a previous email.

Try to find one that doesn't go away!  I've never seen one in 15 years
of Unix programming.  Yes, technically, what happens is that when
the parent process dies it will be adopted by process number 1 (init).
But init (aka the kernel) always reaps dead children immediately.  So
the end effect is that whenever a parent dies, all of it's zombie children
are immediately reaped and disappear.

> If you are programming this, you can turn off the signaling of the
> parent/child process to make the child not wait until the parent
> responds and the child will do it's thing, then cleanly exit.  This also
> frees up the parent to go about something else, rather than waiting on
> the child, which may not complete.

Regardless of any signalling (SIGCHLD), the child *never* waits
on anything before it is allowed to die.  It just dies.  The *only* things
that can delay or keep a process from dying when it wants to are
1) it is being traced, 2) it is in the stopped state.  (And technically
a process in the stopped state by definition can not be dying).  Neither
of these conditions have anything to do with what a parent process
may or may not be doing.

The parent of any non-lazily coded application can certainly do
useful work at all times without being blocked while monitoring it's
children.  This is why the SIGCHLD asynchronous signal is sent,
and also why there are the wait3() and wait4() system calls with
the WNOHANG flag.  (Or, if you're coding directly to the low-level
clone() system call any other signal can be set up instead).

The cause of almost all zombies (that stay around more than a
millisecond) is either incorrectly or lazily coded parent processes.
But if you kill the parent, the zombies will disappear.  If this is
not the case then file a kernel bug immediately.

And the only bad effect of a zombie process is that it occupies one
of the process "slots" in the kernel table.  It otherwise can not consume
any resources (memory, cpu, etc), nor can it hold locks, keep files
or sockets open, or block any other process in any way.
-- 
Deron Meranda