[Linux-cluster] Clurgmgrd error

Fri May 19 15:00:12 UTC 2006

On Fri, 2006-05-19 at 10:24 +0200, Mels Kooijman wrote:

> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <notice> Service infoserver
> is recovering
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #55: Failed changing
> RG status
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #44: Cannot start RG
> infoserver: Invalid State 117
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <crit> #13: Service
> infoserver failed to stop cleanly
> May 17 23:21:51 unxams4082 clurgmgrd[23408]: <err> #57: Failed changing
> RG status

What caused unxams4082 to die?  Was there anything in dmesg, like
dlm_emergency_shutdown?

> May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> Magma Event:
> Membership Change
> May 17 23:21:51 unxaal4082 clurgmgrd[19476]: <info> State change:
> unxams4082 DOWN
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <err> #44: Cannot start RG
> infoserver: Invalid State 117

What release of rgmanager?  This might be a bug.

> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <crit> #13: Service
> infoserver failed to stop cleanly
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Taking over
> service clustat from down member (null)
> May 17 23:21:52 unxaal4082 clurgmgrd[19476]: <notice> Service clustat
> started
> 
> Where can I find a description of the error numbers (55,44,13,57)?

/usr/share/doc/rgmanager-*/errors.txt

> What can be the course that we get often the message:
> clurgmgrd[23408]: <notice> status on ip "192.168.50.43" returned 1
> (generic error)

Can be caused by several things... link died, pre-U3 router ping failed.
The router-ping code was removed in U3 because it caused more problems
than it solved.  If this is happening every two minutes, this is
definitely the cause.

-- Lon