[Linux-cluster] Re: Re: Any idea on a stop problem with CS4 ?

Alain Moulle Alain.Moulle at bull.net
Thu Feb 23 14:44:26 UTC 2006

>>>>We use a 2 nodes cluster to manage failover services via dedicated scripts.
>>>>Using clusvcadm -r <service_name> to migrate a service from one node
>>>>to the other, it happens from time to time that the CS4 is stuck with
>>>>"service_name stopping" diagnostic.

>>Could you let us know:
>>- architecture

Two nodes connected to the backbone through eth0 and by a direct connection
between them through eth1. Hostsname is set on eth0 which is also used as
fencing interface. Heart-beat is also configured on eth0.

>>- dlm-kernel package version : dlm-kernel.2.6.9-37.7.b.3

>>- rgmanager version : rgmanager.1.9.38-0.b.5

>>- service XML structure : what do you mean ? cluster.conf file ?
>>- if possible, the service script itself (though this is the least
>>likely problem)
>>If you can, install the corresponding -debuginfo packages so we can get
>>a backtrace of the rgmanager daemon.
Will do that. At present, the dead-lock does not occur systematically, however
it is frequent.
It can take a while for us to reproduce the problem with debug packages.

>>>>The stop target of the script associated with the service is not called.


>>>>clusvcadm -d <service_name> calls return a success diagnostic but do
>>>>effectively strictly nothing : the service script is not called.

>>There's a segfault (which is fixed in RHCS4U3 beta and CVS) which might
>>explain the behavior.

>>-- Lon


