[dm-devel] dm-integrity: Fix flush with external metadata device

Mikulas Patocka mpatocka at redhat.com
Fri Jan 8 19:00:48 UTC 2021



On Fri, 8 Jan 2021, Mike Snitzer wrote:

> > > Seems like a pretty bad oversight... but shouldn't you also make sure to
> > > flush the data device _before_ the metadata is flushed?
> > > 
> > > Mike
> > 
> > I think, ordering is not a problem.
> > 
> > A disk may flush its cache spontaneously anytime, so it doesn't matter in 
> > which order do we flush them. Similarly a dm-bufio buffer may be flushed 
> > anytime - if the machine is running out of memory and a dm-bufio shrinker 
> > is called.
> > 
> > I'll send another patch for this - I've created a patch that flushes the 
> > metadata device cache and data device cache in parallel, so that 
> > performance degradation is reduced.
> > 
> > My patch also doesn't use GFP_NOIO allocation - which can in theory 
> > deadlock if we are swapping on dm-integrity device.
> 
> OK, I see your patch, but my concern about ordering was more to do with
> crash consistency.  What if metadata is updated but data isn't?

In journal mode: do_journal_write will copy data from the journal to 
appropriate places and then flushes both data and metadata device. If a 
crash happens during flush, the journal will be replayed on next reboot - 
the replay operation will overwrite both data and metadata from the 
journal.

In bitmap mode: bitmap_flush_work will first issue flush on both data and 
metadata device, and then clear the dirty bits in a bitmap. If a crash 
happens during the flush, the bits in the bitmap are still set and the 
checksums will be recalculated on next reboot.

> On the surface, your approach of issuing the flushes in parallel seems
> to expose us to inconsistency upon recovery from a crash.
> If that isn't the case please explain why not.

The disk may flush its internal cache anytime. So, if you do 
"blkdev_issue_flush(disk1); blkdev_issue_flush(disk2);" there is no 
ordering guarantee at all. The firmware in disk2 may flush the cache 
spontaneously, before you called blkdev_issue_flush(disk1).

> Thanks,
> Mike

Mikulas




More information about the dm-devel mailing list