Process Watchdogs
Romain Kang
romain at kzsu.stanford.edu
Mon Jan 24 05:43:31 UTC 2005
Short answer: Take a look a Dan Bernstein's daemontools
(http://cr.yp.to/daemontools.html).
Long answer: Here are a few approaches to keeping a process running.
- Have a parent process spawn the watched process. If the WP dies,
the parent can know immediately if it's pending on a wait() for
the WP, or if it's arranged to handle a SIGCHLD caused by the
WP exit. This may be the easiest way to do it, since you don't
have to do anything to the WP itself. Running the WP out of
inittab is basically taking this approach.
- Use some kind of pipe with one end open in the WP, the other in
the monitor. If the WP exits, the monitor end of the pipe can
detect this. Some systems favor this strategy, since you can
check for lots of WP exits when convenient; e.g., with a select()
on the pipes of N WPs. Can't think of a good example off the
top of my head, though...
- Provide the WP with some kind of status harness, like a
control/status socket, or some specific response to a UNIX
signal (e.g., dump current program status with SIGUSR1,
restart if the status request doesn't get a response in
reasonable time). It looks to me as if the monit tools
provide a framework for this (http://www.tildeslash.com/monit/).
- Periodically poll via UNIX to see if the WP is still alive.
This can be done by sending signal 0 (i.e., kill(NNNNN, 0)) or
if you want, through all kinds of other interfaces like
checking for /proc/NNNNN/status (Linux specific) or grepping
through ps commands (like pgrep). However, you should do
some checking to be sure that it's really the process you
think it is, not just some random proc that spawned and
reclaimed pid NNNNN after your WP silently died.
Romain Kang Disclaimer: I speak for myself alone,
romain at kzsu.stanford.edu except when indicated otherwise.
More information about the fedora-list
mailing list