[Linux-cluster] failover hangs on open file
danwest at comcast.net
danwest at comcast.net
Wed Nov 30 15:39:44 UTC 2005
There seems to be a bug that affects service groups when a process outside the clusters control has open files on a file system that is managed via the cluster. I am running the RHEL4U1 code release. An example is defined below.
A simple 2 node cluster (nodeA and nodeB) with a Virtual IP resource and an ext3 filesystem resource managed via CLVMD. I have removed a script resource for simplicity. My service is started on nodeA, it has the VIP and ext3 mount (/mnt/cluster). I can relocate the service to nodeB with no problem clusvcadm r service m nodeB. I can also relocate it back without a problem
but if I open a file on the cluster managed ext3 mount (vi /mnt/cluster/test) and try to migrate the service it fails every time.
The behavior of the RHEL3 codebase was to kill all processes associated with the mount on failure and/or relocation.
Here is the output from /var/log/messages during the relocation error:
Nov 29 12:22:12 nodeA clurgmgrd[8445]: <notice> Stopping service SERVICE1
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <notice> stop on fs "testfs" returned 2 (invalid argument(s))
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <crit> #12: RG SERVICE1 failed to stop; intervention required
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <notice> Service SERVICE1 is failed
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <warning> #70: Attempting to restart service SERVICE1 locally.
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <err> #43: Service SERVICE1 has failed; can not start.
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <alert> #2: Service SERVICE1 returned failure code. Last Owner: nodeA
Nov 29 12:22:16 nodeA clurgmgrd[8445]: <alert> #4: Administrator intervention required.
Output of clustat after the relocation with open file:
Member Status: Quorate, Group Member
Member Name State ID
------ ---- ----- --
NodeB Online 0x0000000000000002
Service Name Owner (Last) State
------- ---- ----- ------ -----
SERVICE1 (null) failed
Any ideas?
Thanks,
Dan
More information about the Linux-cluster
mailing list