[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: Options to stop processes that can't be killed -9 other than reboot

> Mike Burger wrote:
>> If you have a process that is stuck in a zombie mode and kill -9
>> getting rid of it, you may need to do something with the parent
>> that spawned it in the first place.
> Yeah, but too often, the parent process has gone, and the zombie's now
> a parent of 1.

That would stink, yeah. :-(

[Jack Allen] I thought I would add a few comments about this type of
problem. If a process will not exit after a "kill -9 PID" has been done,
then it is stuck waiting on the kernel to complete something on its
behalf. When you "send a single" to a PID, you are not really send a
single, you are only setting a bit in the process that indicates a
single has been posted for that process. When the kernel schedules the
process to run again the bits are looked at and handled as setup up by
the process, single catchers. But -9 cannot be caught and processed by
the process. The kernel will cause the process to exit.

Now how can a process get stuck waiting on the kernel. Here is an
example that use to happen quite often when 9trk tape drives were used.
Many of you may have never seen one. Anyway, say some type of backup was
being writing to a 9trk tape drive that is 2400 feet long. When the
backup completed it may display a message to that affect and then close
the file descriptor associated with the tape drive causing it to rewind
the tape. Well it takes maybe 20 to 30 seconds or more to rewind the
tape and the operator would push the online button during that time, to
take it offline and push the unload button. The process is waiting for
the kernel to let it know the tape has rewound and is back at load point
and considered closed. This will never happen because the tape drive is
now offline and will not generate an interrupt when the tape completes
the rewind and is at load point. Therefore the operator does not get
their prompt back or whatever should have happened next. You can do
"kill -9 PID" on the process but it is not going to terminate. All they
had to do was thread the tape and put it online again and the kernel
received an interrupt from the device and determine a process was
waiting to be woke up and wake it up. But if a "kill -9 PID" had been
done the process will terminated, if not then it may display something
else for the operator to do, like mount another tape.

Now about a Zombie process. A Zombie is a process that has exited,
wither that be because it called exit() or received some signal that
caused it to end. It is in the Zombie state because its parent has not
done a wait() to pick up its exit status. If the parent has exited then
it is inherited by PID 1 (init). This is by design. When this happens,
PID 1 is woke up and does a wait() which returns the PID and the exit
status. It determines that it was not a PID it started and just ignores
it. But the fact that it did the wait(), the PID is removed from the
process table.

So if you do "kill -9 PID" and the process does not become a Zombie,
then it is stuck waiting on the kernel. It will do no good to kill the
parent. If it does become a Zombie, and the parent does not do a wait(),
then the parent has a bug or it is waiting on something and just has not
gotten around to doing a wait().

Jack Allen

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]