[Linux-cluster] Re: Re: Any idea on a stop problem with CS4 ?
Alain Moulle
Alain.Moulle at bull.net
Thu Feb 23 14:44:26 UTC 2006
>>>>We use a 2 nodes cluster to manage failover services via dedicated scripts.
>>>>Using clusvcadm -r <service_name> to migrate a service from one node
>>>>to the other, it happens from time to time that the CS4 is stuck with
>>>>"service_name stopping" diagnostic.
>>Could you let us know:
>>
>>- architecture
Two nodes connected to the backbone through eth0 and by a direct connection
between them through eth1. Hostsname is set on eth0 which is also used as
fencing interface. Heart-beat is also configured on eth0.
>>- dlm-kernel package version : dlm-kernel.2.6.9-37.7.b.3
>>- rgmanager version : rgmanager.1.9.38-0.b.5
>>- service XML structure : what do you mean ? cluster.conf file ?
>>- if possible, the service script itself (though this is the least
>>likely problem)
>>If you can, install the corresponding -debuginfo packages so we can get
>>a backtrace of the rgmanager daemon.
>>
>>
Will do that. At present, the dead-lock does not occur systematically, however
it is frequent.
It can take a while for us to reproduce the problem with debug packages.
>>>>The stop target of the script associated with the service is not called.
>>Subsequent
>>>>clusvcadm -d <service_name> calls return a success diagnostic but do
>>>>effectively strictly nothing : the service script is not called.
>>There's a segfault (which is fixed in RHCS4U3 beta and CVS) which might
>>explain the behavior.
>>-- Lon
--
mailto:Alain.Moulle at bull.net
+------------------------------+--------------------------------+
| Alain Moullé | from France : 04 76 29 75 99 |
| | FAX number : 04 76 29 72 49 |
| Bull SA | |
| 1, Rue de Provence | Adr : FREC B1-041 |
| B.P. 208 | |
| 38432 Echirolles - CEDEX | Email: Alain.Moulle at bull.net |
| France | BCOM : 229 7599 |
+-------------------------------+-------------------------------+
More information about the Linux-cluster
mailing list