[Linux-cluster] About GFS1 and I/O barriers.

Tue Apr 8 08:47:53 UTC 2008

Le Wed, 02 Apr 2008 16:17:08 +0100,
Steven Whitehouse <swhiteho at redhat.com> a écrit :

> Hi,
> 
> 
> If the data is not physically on disk when the ACK it sent back, then
> there is no way for the fs to know whether the data has (at a later
> date) not been written due to some error or other. Even ignoring that
> for the moment and assuming that such errors never occur, I don't
> think its too unreasonable to expect at a minimum that all
> acknowledged I/O will never be reordered with unacknowledged I/O.
> That is all that is required for correct operation of gfs1/2 provided
> that no media errors occur on write.

If I understand correctly your statement, I think you misinterpret what
a ACK on write means.
For the SCSI protocol, ACKing a write doesn't mean it has reached the
platters.
>From here:
http://t10.org/ftp/t10/drafts/sbc3/sbc3r14.pdf
4.11 Caches - 5th paragraph
"
During write operations, the device server uses the cache to store data
that is to be written to the medium at a later time. This is called
write-back caching. The command may complete prior to logical blocks
being written to the medium. As a result of using a write-back caching
there is a period of time when the data may be lost if power to the
SCSI target device is lost and a volatile cache is being used or a
hardware failure occurs. There is also the possibility of an error
occurring during the subsequent write operation. If an error occurred
during the write operation, it may be reported as a deferred error on a
later command. 
"

If you want some WRITEs to hit the persistent media, you must issue
special commands, like  "synchronize cache", or a write with
"FUA" (force unit acccess) bit set. All this is correctly (or at
least, it should be) handled by the kernel's barriers, if the device
supports it. In the case where no barriers are used, there is no
guarantee on reordering of WRITEs, so log corruption can occur.

>From where I understand the code, ext3 allows to activate barriers with
an option on mount, so when the device does not support them, it is
still possible to disable the option by remounting the device.
For XFS, barriers will be automatically disabled when the device
doesn't support them. 
(well, this is also what i've observed, but take those statements with
caution)

> 
> The message on lkml which Mathieu referred to suggested that there
> were three kinds of devices, but it seems to be that type 2
> (flushable) doesn't exist so far as the fs is concerned since
> blkdev_issue_flush() just issues a BIO with only a barrier in it. A
> device driver might support the barrier request by either waiting for
> all outstanding I/O and issuing a flush command (if required) or by
> passing the barrier down to the device, assuming that it supports
> such a thing directly.
> 
> Further down the message (the url is http://lkml.org/lkml/2007/5/25/71
> btw) there is a list of dm/md implementation status and it seems that
> for a good number of the common targets there is little or no support
> for barriers anyway at the moment.
> 
> Now I agree that it would be nice to support barriers in GFS2, but it
> won't solve any problems relating to ordering of I/O unless all of the
> underlying device supports them too. See also Alasdair's response to
> the thread: http://lkml.org/lkml/2007/5/28/81
> 
> So although I'd like to see barrier support in GFS2, it won't solve
> any problems for most people and really its a device/block layer
> issue at the moment.
> 
> Steve.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster