[Linux-cluster] HA cluster cache coherency

Wed Nov 12 18:43:52 UTC 2008

Hello all,

I have been wondering what most people do about cache coherency issues
when doing high availability failover between two or more Linux
servers (RHEL, if it matters) with shared storage?

Consider a typical master-slave arrangement managed by heartbeat and a
fibrechannel HBA in each node each connected to a switch with a bunch
of storage also connected to the switch. The master and the slave are
sharing this disk to other clients on the network via NFS.

When the master fails during heavy writes with a gig of data in its
cache all of that data will be lost.

What do most people do about this? Is there any way to tell the kernel
to only do write-through and no caching? This might not be infeasible
if one has a lot of cache in the disk storage connected to the
fibrechannel switch which is the case for me.

I have read http://www.westnet.com/~gsmith/content/linux-pdflush.htm
which seems to be an excellent treatment of pdflush related issues.
However, it does not seem to address this specific issue. It mentions
four tunables in /proc/sys/vm which when set to zero seem like they
might accomplish what I'm looking for:

dirty_background_ratio
dirty_ratio
dirty_expire_centisecs
dirty_writeback_centisecs

but I set them all to zero on a test system and the Dirty field of
/proc/meminfo still routinely shows dirty pages.

Your comments are appreciated.

-- 
Tracy Reed
http://tracyreed.org