[Linux-cluster] CS5/ Question about behavior with a corrupted Quorum disk
Lon Hohberger
lhh at redhat.com
Mon Feb 4 17:18:12 UTC 2008
On Mon, 2008-02-04 at 08:33 +0100, Alain Moulle wrote:
> Hi
>
> Just for information, I wonder if this behavior is normal :
> I have a two-nodes cluster with a quorum disk, and the
> CS5 is started on both nodes with a service on each one.
> Quorum is working fine when I break the quorum disk format
> (with a mkfs on the device !) so that mkqisk -L returns
> none.
It will keep *trying* to operate.
> The behavior is : the CS5 is always working fine as if nothing
> has happen. I wonder if it is only due to the heuristics or
> if this behavior is simply the std behavior of CS5 with
> regard to the quorum disk ?
It /should/ throw warnings in the log for all the blocks that are
corrupt (and it will probably annoy you ;) ). After 1 cycle, the blocks
corresponding to active cluster nodes will have correct/current data on
them, and life should continue, but reading the rest of the 16 node
blocks should continue throwing warnings:
[1533] warning: Error reading node ID block 3
[1533] warning: Error reading node ID block 4
[1533] warning: Error reading node ID block 5
[1533] warning: Error reading node ID block 6
[1533] warning: Error reading node ID block 7
...
[1533] warning: Error reading node ID block 16
(Granted, I used 'dd if=/dev/zero ...' instead mkfs)
Qdiskd will not function if you restart it, however, and nodes will be
unable to find the quorum disk after a reboot. The header of the quorum
disk is not rewritten while qdiskd is running. You'll have to run
mkqdisk to fix it - which should also work (but certainly isn't
recommended!).
This produced the following on the non-master node, but nothing
significant on the master node:
[1533] info: Node 1 shutdown
[1533] debug: Making bid for master
[1533] debug: Node 1 is marked master, but is dead.
[1533] debug: Node 1 is marked master, but is dead.
[1533] debug: Node 1 is marked master, but is dead.
[1533] debug: Node 1 is UP
[1533] info: Node 1 is the master
Looking at the code, if a node dies between the time you clobber qdisk
the quorum disk and the time qdiskd on that node writes a new block,
qdiskd won't evict that node. Solution: Don't rub salt in cuts.
Also, intentionally corrupting your quorum disk could result in the
following:
https://bugzilla.redhat.com/show_bug.cgi?id=430264
-- Lon
More information about the Linux-cluster
mailing list