[rhelv6-beta-list] Anyone can explain more details of that?

Tom Coughlan coughlan at redhat.com
Fri Jul 2 19:59:32 UTC 2010


On Fri, 2010-07-02 at 14:21 -0400, Kirby Zhou wrote:
> Anyone can explain more details of that?
> 
> When using the DIF/DIX hardware checksum features of a storage path behind a
> block device, errors will occur if the block device is used as a general
> purpose block device. 
> Buffered I/O or mmap(2) based IO will not work reliably as there are no
> interlocks in the buffered write path to prevent overwriting cached data
> while the hardware is performing DMA operations. An overwrite during a DMA
> operation will cause a torn write and the write will fail checksums in the
> hardware storage path. This problem is common to all block device or file
> system based buffered or mmap(2) I/O, so the problem of I/O errors during
> overwrites cannot be worked around. 
> DIF/DIX enabled block devices should only be used with applications that use
> O_DIRECT I/O. Applications should use the raw block device, though it should
> be safe to use the XFS file system on a DIF/DIX enabled block device if only
> O_DIRECT I/O is issued through the file system. In both cases the
> responsibility for preventing torn writes lies with the application, so only
> applications designed for use with O_DIRECT I/O and DIF/DIX hardware should
> enable this feature.
> 
> What is a DIF/DIX hardware?

DIF is a relatively new data integrity feature in the SCSI Standard. It
allows the HBA to add a checksum to each block of data (increasing the
size of a logical block from 512 to 520 bytes). That checksum gets
written to the storage device, checked, stored, and checked on later
reads. DIF obviously requires support in the HBA driver/firmware, and in
the storage device. This type of hardware is just barely making its way
to the market at this time. 

> And why a DMA operation will conflict with cached data?

The problem is that the Linux memory management scheme allows a page to
be changed after it has been submitted to the lower layers for a write.
Rather than try to cancel the I/O (a time consuming and error prone
operation), the system just lets it complete, but remembers that the
page is still dirty after the I/O completes. 

When DIF is enabled, the driver/firmware may compute the checksum when
the I/O is submitted, then, if the data may changes, the checksum will
be wrong, and the I/O will fail. 

The solution is to use O_DIRECT, and  ensure that the application does
not change the data buffer after the I/O is issued. 

A longer term solution, to allow DIF to be used on memory mapped I/O and
on filesystems that do not use O_DIRECT, will be a difficult problem. 

Tom




More information about the rhelv6-beta-list mailing list