[Linux-cluster] Cluster node without access to all resources-trouble
Janne Peltonen
janne.peltonen at helsinki.fi
Thu Jun 28 20:51:19 UTC 2007
On Thu, Jun 28, 2007 at 02:39:44PM -0400, Lon Hohberger wrote:
>
> > *if all the nodes with SAN access are restarted (while the fifth node is
> > up), the nodes with SAN access first stop the services locally - and
> > then, apparently, ask the fifth node about the service status. Result:
> > a line like the following, for each service:
> >
> > --cut--
> > Jun 28 17:56:20 pcn2.mappi.helsinki.fi clurgmgrd[5895]: <err> #34: Cannot get status for service service:im
> > --cut--
>
> What do you mean here, (sorry, being daft)
>
> Restart all nodes = "just rgmanager on all nodes", or "reboot all
> nodes"?
Reboot all nodes.
> > *after that, the nodes with SAN access do nothing about any services
> > until after the fifth node has left the cluster and has been fenced.
> If you're rebooting the other 4 nodes, it sounds like the 5th is holding
> some sort of a lock which it shouldn't be across quorum transitions
> (which would be a bug).
>
> If this is the case, could you:
>
> * install rgmanager-debuginfo
> * get me a backtrace:
>
> gdb clurgmgrd `pidof clurgmgrd`
> thr a a bt
I'll try to find the time for this tomorrow or something. (This
behaviour doesn't really make the cluster un-production-useable, so I'm
trying to solve the other problems first ;)
--Janne
--
Janne Peltonen <janne.peltonen at helsinki.fi>
More information about the Linux-cluster
mailing list