[Linux-cluster] node fenced by dlm_controld on a clean shutdown

Jacek Konieczny jajcus at jajcus.net
Wed Nov 21 14:48:59 UTC 2012


On Wed, Nov 21, 2012 at 11:19:02AM +0100, Jan Friesse wrote:
> Hi,
> we've discussed this problem with dave, but I would like to get some
> information:
> - What distro are you using?

PLD Linux

> - Packages are compiled or disro?

I am making packages for the distro as a part of my job.

> - what you mean by "clean shutdown"? This is something like service
> dlm_control stop, or your own script?

systemd, using the corosync.service unit file provided with corosync
sources (it is far from being 'systemd' native) and the dlm.service
as comes with dlm sources (includes my patches).

Shutdown is started by '/sbin/halt' or '/sbin/reboot' using standard
systemd procedure. I have added some rules to make sure Pacemaker is
stopped before the rest, but dlm and corosync order is not affected.

Systemd kills dlm_controld first and as soon as it exits its initiates
stop of corosync. Adding an artificial delay between those two fixes my
problem.

When calling shutdown scripts by hand or the old SysVinit way (through
other shell scripts), the delay between the two jobs could be
'naturally' longer.

Unfortunately, I have been distracted recently by some other, higher
priority, job, so I could not do more investigation in this matter
(still on my TODO, though).

Greets,
        Jacek




More information about the Linux-cluster mailing list