[dm-devel] dm-integrity: Fix flush with external metadata device
Mikulas Patocka
mpatocka at redhat.com
Fri Jan 8 19:00:48 UTC 2021
On Fri, 8 Jan 2021, Mike Snitzer wrote:
> > > Seems like a pretty bad oversight... but shouldn't you also make sure to
> > > flush the data device _before_ the metadata is flushed?
> > >
> > > Mike
> >
> > I think, ordering is not a problem.
> >
> > A disk may flush its cache spontaneously anytime, so it doesn't matter in
> > which order do we flush them. Similarly a dm-bufio buffer may be flushed
> > anytime - if the machine is running out of memory and a dm-bufio shrinker
> > is called.
> >
> > I'll send another patch for this - I've created a patch that flushes the
> > metadata device cache and data device cache in parallel, so that
> > performance degradation is reduced.
> >
> > My patch also doesn't use GFP_NOIO allocation - which can in theory
> > deadlock if we are swapping on dm-integrity device.
>
> OK, I see your patch, but my concern about ordering was more to do with
> crash consistency. What if metadata is updated but data isn't?
In journal mode: do_journal_write will copy data from the journal to
appropriate places and then flushes both data and metadata device. If a
crash happens during flush, the journal will be replayed on next reboot -
the replay operation will overwrite both data and metadata from the
journal.
In bitmap mode: bitmap_flush_work will first issue flush on both data and
metadata device, and then clear the dirty bits in a bitmap. If a crash
happens during the flush, the bits in the bitmap are still set and the
checksums will be recalculated on next reboot.
> On the surface, your approach of issuing the flushes in parallel seems
> to expose us to inconsistency upon recovery from a crash.
> If that isn't the case please explain why not.
The disk may flush its internal cache anytime. So, if you do
"blkdev_issue_flush(disk1); blkdev_issue_flush(disk2);" there is no
ordering guarantee at all. The firmware in disk2 may flush the cache
spontaneously, before you called blkdev_issue_flush(disk1).
> Thanks,
> Mike
Mikulas
More information about the dm-devel
mailing list