[Linux-cluster] CS5/ Question about behavior with a corrupted Quorum disk

Mon Feb 4 17:49:23 UTC 2008

On Mon, 2008-02-04 at 12:18 -0500, Lon Hohberger wrote:
> On Mon, 2008-02-04 at 08:33 +0100, Alain Moulle wrote:
> > Hi
> > 
> > Just for information, I wonder if this behavior is normal :
> > I have a two-nodes cluster with a quorum disk, and the
> > CS5 is started on both nodes with a service on each one.
> > Quorum is working fine when I break the quorum disk format
> > (with a mkfs on the device !) so that mkqisk -L returns
> > none.
> 
> It will keep *trying* to operate.
> 
> > The behavior is : the CS5 is always working fine as if nothing
> > has happen. I wonder if it is only due to the heuristics or
> > if this behavior is simply the std behavior of CS5 with
> > regard to the quorum disk ?
> 
> It /should/ throw warnings in the log for all the blocks that are
> corrupt (and it will probably annoy you ;) ).  After 1 cycle, the blocks
> corresponding to active cluster nodes will have correct/current data on
> them, and life should continue, but reading the rest of the 16 node
> blocks should continue throwing warnings:
> 
> [1533] warning: Error reading node ID block 3
> [1533] warning: Error reading node ID block 4
> [1533] warning: Error reading node ID block 5
> [1533] warning: Error reading node ID block 6
> [1533] warning: Error reading node ID block 7
> ...
> [1533] warning: Error reading node ID block 16
> 
> (Granted, I used 'dd if=/dev/zero ...' instead mkfs)
> 
> Qdiskd will not function if you restart it, however, and nodes will be
> unable to find the quorum disk after a reboot.  The header of the quorum
> disk is not rewritten while qdiskd is running.  You'll have to run
> mkqdisk to fix it - which should also work (but certainly isn't
> recommended!).

Whoops - "should also work while the cluster is running (but certainly
isn't recommended)"

-- Lon