[Linux-cluster] About GFS1 and I/O barriers.

Mon Mar 31 13:16:22 UTC 2008

Le Mon, 31 Mar 2008 11:54:20 +0100,
Steven Whitehouse <swhiteho at redhat.com> a écrit :

> Hi,
> 

Hi,

> Both GFS1 and GFS2 are safe from this problem since neither of them
> use barriers. Instead we do a flush at the critical points to ensure
> that all data is on disk before proceeding with the next stage.
> 

I don't think this solves the problem.

Consider a cheap iSCSI disk (no NVRAM, no UPS) accessed by all my GFS
nodes; this disk has a write cache enabled, which means it will reply
that write requests are performed even if they are not really written
on the platters. The disk (like most disks nowadays) has some logic
that allows it to optimize writes by re-scheduling them. It is possible
that all writes are ACK'd before the power failure, but only a fraction
of them were really performed : some are before the flush, some are
after the flush. 
--Not all blocks writes before the flush were performed but other
blocks after the flush are written -> the FS is corrupted.--
So, after the power failure all data in the disk's write cache are
forgotten. If the journal data was in the disk cache, the journal was
not written to disk, but other metadata have been written, so there are
metadata inconsistencies.

This is the problem that I/O barriers try to solve, by really forcing
the block device (and the block layer) to have all blocks issued before
the barrier to be written before any other after the barrier starts
begin written.

The other solution is to completely disable the write cache of the
disks, but this leads to dramatically bad performances.

> Using barriers can improve performance in certain cases, but we've not
> yet implemented them in GFS2,
> 
> Steve.
> 
> On Mon, 2008-03-31 at 12:46 +0200, Mathieu Avila wrote:
> > Hello all again,
> > 
> > More information on this topic:
> > http://lkml.org/lkml/2007/5/25/71
> > 
> > I guess the problem also applies to GFSS2.
> > 
> > --
> > Mathieu
> > 
> > Le Fri, 28 Mar 2008 15:34:58 +0100,
> > Mathieu Avila <mathieu.avila at seanodes.com> a écrit :
> > 
> > > Hello GFS team,
> > > 
> > > Some recent kernel developements have brought IO barriers into the
> > > kernel to prevent corruptions that could happen when blocks are
> > > being reordered before write, by the kernel or the block device
> > > itself, just before an electrical power failure.
> > > (on high-end block devices with UPS or NVRAM, those problems
> > > cannot happen)
> > > Some file systems implement them, notably ext3 and XFS. It seems
> > > to me that GFS1 has no such thing.
> > > 
> > > Do you plan to implement it ? If so, could the attached patch do
> > > the work ? It's incomplete : it would need a global tuning like
> > > fast_stafs, and a mount option like it's done for ext3. The code
> > > is mainly a copy-paste from JBD, and does a barrier only for
> > > journal meta-data. (should i do it for other meta-data ?)
> > > 
> > > Thanks,
> > > 
> > > --
> > > Mathieu
> > > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster