[Linux-cluster] DRBD + GNBD + GFS Race Conditions?

Sat Mar 17 03:54:47 UTC 2007

Dear listmates,

Floating about the Internet are many howtos and references to backing GNBD 
with DRBD in order to have failover GNBD and mount GFS atop of the GNBD 
device.  Does anyone know how the following possible race condition is 
handled?

1. GFS writes to its GNBD device.
    GNBD client node writes to GNBD server node.
    GNBD server writes to DRBD-primary.
    DRBD begins to write to itself and to DRBD-secondary.
    Before DRBD completes the write to DRBD-secondary (thus, before 
it returns since writes are synchronous) the DRBD-primary node 
looses power.
    The GNBD server dies with the power loss.
    GNBD client node drops connection to the GNBD server.

2. Heartbeat notices the death of DRBD-primary, switches the 
DRBD-secondary to DRBD-primary, re-exports /dev/drbd0 via GNBD, and 
re-creates the virtual IP which the GNBD client was connecting to.

3. The GNBD client writing on behalf of GFS reconnects.

Now, what happens to the write originally going to the DRBD volume? Will 
the GNBD-client retry the write?  Are there situations where the write 
could be dropped all together?

Are there other kinds of race conditions which could take place?  Other 
concerns outside of this scenario?

We are thinking about implementing DRBD+GNBD+GFS+Xen to support failover 
and domain migration.  In the event of a failure like power loss, I would 
like to be certain that when the failed-to GNBD server node comes online, 
that any GNBD clients which were half-way through a write will re-commit 
the write.

Thoughts?

-Eric