[dm-devel] [PATCH 0/2] dm thin: Flush data device before committing metadata to avoid data corruption
Eric Wheeler
dm-devel at lists.ewheeler.net
Thu Dec 5 22:34:01 UTC 2019
On Thu, 5 Dec 2019, Nikos Tsironis wrote:
> On 12/4/19 10:17 PM, Mike Snitzer wrote:
> > On Wed, Dec 04 2019 at 2:58pm -0500,
> > Eric Wheeler <dm-devel at lists.ewheeler.net> wrote:
> >
> > > On Wed, 4 Dec 2019, Nikos Tsironis wrote:
> > >
> > > > The thin provisioning target maintains per thin device mappings that map
> > > > virtual blocks to data blocks in the data device.
> > > >
> > > > When we write to a shared block, in case of internal snapshots, or
> > > > provision a new block, in case of external snapshots, we copy the shared
> > > > block to a new data block (COW), update the mapping for the relevant
> > > > virtual block and then issue the write to the new data block.
> > > >
> > > > Suppose the data device has a volatile write-back cache and the
> > > > following sequence of events occur:
> > >
> > > For those with NV caches, can the data disk flush be optional (maybe as a
> > > table flag)?
> >
> > IIRC block core should avoid issuing the flush if not needed. I'll have
> > a closer look to verify as much.
> >
>
> For devices without a volatile write-back cache block core strips off
> the REQ_PREFLUSH and REQ_FUA bits from requests with a payload and
> completes empty REQ_PREFLUSH requests before entering the driver.
>
> This happens in generic_make_request_checks():
>
> /*
> * Filter flush bio's early so that make_request based
> * drivers without flush support don't have to worry
> * about them.
> */
> if (op_is_flush(bio->bi_opf) &&
> !test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
> if (!nr_sectors) {
> status = BLK_STS_OK;
> goto end_io;
> }
> }
>
> If I am not mistaken, it all depends on whether the underlying device
> reports the existence of a write back cache or not.
>
> You could check this by looking at /sys/block/<device>/queue/write_cache
> If it says "write back" then flushes will be issued.
>
> In case the sysfs entry reports a "write back" cache for a device with a
> non-volatile write cache, I think you can change the kernel's view of
> the device by writing to this entry (you could also create a udev rule
> for this).
>
> This way you can set the write cache as write through. This will
> eliminate the cache flushes issued by the kernel, without altering the
> device state (Documentation/block/queue-sysfs.rst).
Interesting, I'll remember that. I think this is a documentation bug, isn't this backwards:
'This means that it might not be safe to toggle the setting from
"write back" to "write through", since that will also eliminate
cache flushes issued by the kernel.'
[https://www.kernel.org/doc/Documentation/block/queue-sysfs.rst]
How does this work with stacking blockdevs? Does it inherit from the
lower-level dev? If an upper-level is misconfigured, would a writeback at
higher levels would clear the flush for lower levels?
--
Eric Wheeler
> Nikos
>
> > Mike
> >
>
More information about the dm-devel
mailing list