[dm-devel] dm writecache: fix data corruption when reloading the target
Mikulas Patocka
mpatocka at redhat.com
Wed Apr 15 15:01:05 UTC 2020
On Wed, 15 Apr 2020, Mikulas Patocka wrote:
>
>
> On Wed, 15 Apr 2020, Mike Snitzer wrote:
>
> > > > > + r = writecache_read_metadata(wc,
> > > > > + min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > > > + (sector_t)wc->metadata_sectors));
> > > >
> > > > Can you explain why this is needed? Why isn't wc->metadata_sectors
> > > > already compatible with wc->ssd_dev->bdev ?
> > >
> > > bdev_logical_block_size is the minimum size accepted by the device. If we
> > > used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by
> > > using extremely small device with large logical_block_size) trigger
> > > writing out of the allocated memory.
> >
> > OK...
> >
> > > > Yet you just use wc->metadata_sectors in the new call to
> > > > writecache_read_metadata() in writecache_resume()...
> > >
> > > This was my mistake. Change it to "region.count = n_sectors";
> >
> > sure, that addresses one aspect. But I'm also asking:
> > given what yoou said above about reading past end of smaller device, why
> > is it safe to do this in writecache_resume ?
> >
> > r = writecache_read_metadata(wc, wc->metadata_sectors);
> >
> > Shouldn't ctr do extra validation and then all calls to
> > writecache_read_metadata() use wc->metadata_sectors? Which would remove
> > need to pass extra 'n_sectors' arg to writecache_read_metadata()?
> >
> > Mike
>
> wc->memory_map = vmalloc(n_metadata_blocks << wc->block_size_bits);
> ...
> wc->metadata_sectors = n_metadata_blocks << (wc->block_size_bits - SECTOR_SHIFT);
>
> So we are always sure that we can read/write wc->metadata_sectors safely.
>
> The problem is - what if bdev_logical_block_size is larger than
> wc->metadata_sectors? Then, we would overread past the end of allocated
> memory. The device wouldn't work anyway in this case, so perhaps a better
> solution would be to reject this as an error in the constructor.
>
> Mikulas
... or, we can use wc->block_size >> SECTOR_SHIFT. It is guaranteed that
n_metadata_blocks has at least one block, so it won't over-read pass the
end of the device.
The problem with bdev_logical_block_size is that it may change if the
device under us is reloaded, so it is not safe to rely on it being stable.
Mikulas
More information about the dm-devel
mailing list