[dm-devel] [PATCH 0/2] dm thin: Flush data device before committing metadata to avoid data corruption

Nikos Tsironis ntsironis at arrikto.com
Thu Dec 5 15:31:37 UTC 2019


On 12/4/19 10:17 PM, Mike Snitzer wrote:
> On Wed, Dec 04 2019 at  2:58pm -0500,
> Eric Wheeler <dm-devel at lists.ewheeler.net> wrote:
> 
>> On Wed, 4 Dec 2019, Nikos Tsironis wrote:
>>
>>> The thin provisioning target maintains per thin device mappings that map
>>> virtual blocks to data blocks in the data device.
>>>
>>> When we write to a shared block, in case of internal snapshots, or
>>> provision a new block, in case of external snapshots, we copy the shared
>>> block to a new data block (COW), update the mapping for the relevant
>>> virtual block and then issue the write to the new data block.
>>>
>>> Suppose the data device has a volatile write-back cache and the
>>> following sequence of events occur:
>>
>> For those with NV caches, can the data disk flush be optional (maybe as a
>> table flag)?
> 
> IIRC block core should avoid issuing the flush if not needed.  I'll have
> a closer look to verify as much.
> 

For devices without a volatile write-back cache block core strips off
the REQ_PREFLUSH and REQ_FUA bits from requests with a payload and
completes empty REQ_PREFLUSH requests before entering the driver.

This happens in generic_make_request_checks():

		/*
		 * Filter flush bio's early so that make_request based
		 * drivers without flush support don't have to worry
		 * about them.
		 */
		if (op_is_flush(bio->bi_opf) &&
		    !test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
		        bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
		        if (!nr_sectors) {
		                status = BLK_STS_OK;
		                goto end_io;
		        }
		}

If I am not mistaken, it all depends on whether the underlying device
reports the existence of a write back cache or not.

You could check this by looking at /sys/block/<device>/queue/write_cache
If it says "write back" then flushes will be issued.

In case the sysfs entry reports a "write back" cache for a device with a
non-volatile write cache, I think you can change the kernel's view of
the device by writing to this entry (you could also create a udev rule
for this).

This way you can set the write cache as write through. This will
eliminate the cache flushes issued by the kernel, without altering the
device state (Documentation/block/queue-sysfs.rst).

Nikos

> Mike
> 




More information about the dm-devel mailing list