[Linux-cluster] Possible cman init script race condition

David Teigland teigland at redhat.com
Tue Oct 2 19:03:05 UTC 2007


On Tue, Oct 02, 2007 at 05:51:40PM +0200, Borgstr?m Jonas wrote:
> > > And as I mentioned before, the really scary part is that I am able to
> > > mount gfs filesystems during this kind of cluster split. And if I one
> > > node is shot, the other node replays the gfs journal and makes the
> > > filesystem writable again without first fencing the shot/missing node.
> >
> > I would need to see the logs from the exact scenario you're talking about
> > here to determine if this is a new problem or an effect of the other one.
> 
> Ok, here's some log outpt:
> 
> Scenario: A gfs filesystem is mounted on two nodes in a "split cluster"

...

> So gfs is till mounted and writable on prod-db2 even though prod-db1 was
> never fenced.

Yes, you're correct.  I've looked at the logs, and it's a side effect of
the other bug where cman should disallow the merger of the two clusters.
So, in summary, you've identified three different problems, each one is an
effect of the one before it:

1. unidentified openais bug(s) in RHEL5.0 cause the two nodes to initially
   form independent clusters -- fixed in 5.1

2. bz 251966 is triggered by (1) -- fixed in 5.2 (maybe earlier)

3. groupd/fenced don't fence the failed node; this is triggered by (2).
   once (2) is fixed this won't happen

Dave




More information about the Linux-cluster mailing list