[Linux-cluster] Question on maxrestarts and maxfalserestarts

Anu Matthew anu.matthew at bms.com
Mon Jun 20 20:48:32 UTC 2005


Hi all,


We run a 4 node cluster on RHEL AS3.0 with these versions of clumanager 
and redhat-config-cluster:

: redhat-config-cluster-1.0.3-1 clumanager-1.2.22-2

Everything has been working fine for a while, and today it started to 
log messages like:

Jun 20 15:13:24 node4 clulockd[3763]: <warning> Denied A.B.C.30: Broken pipe
Jun 20 15:13:24 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <warning> Denied A.B.C.29: Broken pipe
Jun 20 15:13:34 node4 clulockd[3763]: <err> select error: Broken pipe
Jun 20 15:13:48 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Offline
Jun 20 15:13:49 node4 clulockd[3763]: <warning> Denied A.B.C.30: 
Connection reset by peer
Jun 20 15:13:49 node4 clulockd[3763]: <err> select error: Connection 
reset by peer

And ended up restarting the local service saying:

Jun 20 15:17:06 node4 clusvcmgrd[22077]: <err> Unable to obtain cluster 
lock: Connection timed out
Jun 20 15:17:06 node4 clusvcmgrd[22077]: <warning> Restarting locally 
failed service ploracm3
Jun 20 15:17:06 node4 cluquorumd[3723]: <notice> IPv4 TB @ A.B.C.254 Online
Jun 20 15:17:09 node4 clulockd[3763]: <warning> Denied A.B.C.29: 
Connection reset by peer
Jun 20 15:17:09 node4 clulockd[3763]: <err> select error: Connection 
reset by peer

My question is about the significance of maxrestarts and 
maxfalserestarts. Could setting maxfalserestarts to say, 1 or so would 
have averted this situation?

[root at node4 root]# redhat-config-cluster-cmd --service=ploracm1

service:
  name = ploracm1
  checkinterval = 10
  failoverdomain = ploracm1
  userscript = /etc/cluster/scripts/ploracm1
  maxrestarts = 0
  maxfalsestarts = 0

service_ipaddress:
  ipaddress = A.B.C.D
  netmask = 255.255.255.0
  broadcast = A.B.C.255

Thanks in advance,


--AM




More information about the Linux-cluster mailing list