[Linux-cluster] CS4 Update 2 & Patch watchdog on
Lon Hohberger
lhh at redhat.com
Wed Sep 13 14:18:10 UTC 2006
On Wed, 2006-09-13 at 09:51 +0200, Alain Moulle wrote:
> >> The self-watchdog patch adds a process which monitors the "real"
> >> clurgmgrd. The monitoring process should be the lower-numbered PID
> >> (it's the parent of the one doing the work).
>
> >> The monitoring process watches for crash signals (SIGBUS, SIGSEGV,
> >> etc.), and will simply exit if you kill the child with SIGKILL.
>
> >> So, basically, killing the higher-numbered PID with something like
> >> SIGSEGV should cause the node to reboot.
>
> >> -- Lon
>
> Thanks Lon, I understand.
> And if I kill -9 (SIGKILL) the higher-numbered PID at test purpose,
> is it expected to reboot or not ?
>
> I see in code :
> case SIGCHLD:
> case SIGILL:
> case SIGFPE:
> case SIGSEGV:
> case SIGBUS:
> setup_signal(i, SIG_DFL);
> break;
> default:
> setup_signal(i, signal_handler);
> but can't conclude for a SIGKILL on higher-numbered PID process ...
No, sigkill will just cause the watchdog to commit suicide:
if (waitpid(child, &status, 0) <= 0)
continue;
if (WIFEXITED(status))
exit(WEXITSTATUS(status));
if (WIFSIGNALED(status)) {
if (WTERMSIG(status) == SIGKILL) {
clulog(LOG_CRIT, "Watchdog: Daemon
killed, exiting\n");
raise(SIGKILL);
Use something like SIGSEGV (e.g. to simulate a crash) and the
nanny/watchdog process should reboot the node.
-- Lon
More information about the Linux-cluster
mailing list