[Linux-cluster] Cluster node without access to all resources-trouble

Janne Peltonen janne.peltonen at helsinki.fi
Thu Jun 28 20:51:19 UTC 2007


On Thu, Jun 28, 2007 at 02:39:44PM -0400, Lon Hohberger wrote:
> 
> > *if all the nodes with SAN access are restarted (while the fifth node is
> > up), the nodes with SAN access first stop the services locally - and
> > then, apparently, ask the fifth node about the service status. Result:
> > a line like the following, for each service:
> > 
> > --cut--
> > Jun 28 17:56:20 pcn2.mappi.helsinki.fi clurgmgrd[5895]: <err> #34: Cannot get status for service service:im  
> > --cut--
> 
> What do you mean here, (sorry, being daft)
> 
> Restart all nodes = "just rgmanager on all nodes", or "reboot all
> nodes"?

Reboot all nodes.

> > *after that, the nodes with SAN access do nothing about any services
> > until after the fifth node has left the cluster and has been fenced. 
> If you're rebooting the other 4 nodes, it sounds like the 5th is holding
> some sort of a lock which it shouldn't be across quorum transitions
> (which would be a bug).
> 
> If this is the case, could you:
> 
> * install rgmanager-debuginfo
> * get me a backtrace:
> 
>     gdb clurgmgrd `pidof clurgmgrd`
>     thr a a bt

I'll try to find the time for this tomorrow or something. (This
behaviour doesn't really make the cluster un-production-useable, so I'm
trying to solve the other problems first ;)


--Janne
-- 
Janne Peltonen <janne.peltonen at helsinki.fi>




More information about the Linux-cluster mailing list