[dm-devel] dm writecache: fix data corruption when reloading the target

Mikulas Patocka mpatocka at redhat.com
Wed Apr 15 15:01:05 UTC 2020



On Wed, 15 Apr 2020, Mikulas Patocka wrote:

> 
> 
> On Wed, 15 Apr 2020, Mike Snitzer wrote:
> 
> > > > > +		r = writecache_read_metadata(wc,
> > > > > +			min((sector_t)bdev_logical_block_size(wc->ssd_dev->bdev) >> SECTOR_SHIFT,
> > > > > +			    (sector_t)wc->metadata_sectors));
> > > > 
> > > > Can you explain why this is needed?  Why isn't wc->metadata_sectors
> > > > already compatible with wc->ssd_dev->bdev ?
> > > 
> > > bdev_logical_block_size is the minimum size accepted by the device. If we 
> > > used just bdev_logical_block_size(wc->ssd_dev->bdev), someone could (by 
> > > using extremely small device with large logical_block_size) trigger 
> > > writing out of the allocated memory.
> > 
> > OK...
> >  
> > > > Yet you just use wc->metadata_sectors in the new call to
> > > > writecache_read_metadata() in writecache_resume()...
> > > 
> > > This was my mistake. Change it to "region.count = n_sectors";
> > 
> > sure, that addresses one aspect.  But I'm also asking:
> > given what yoou said above about reading past end of smaller device, why
> > is it safe to do this in writecache_resume ?
> > 
> > r = writecache_read_metadata(wc, wc->metadata_sectors);
> > 
> > Shouldn't ctr do extra validation and then all calls to
> > writecache_read_metadata() use wc->metadata_sectors?  Which would remove
> > need to pass extra 'n_sectors' arg to writecache_read_metadata()?
> > 
> > Mike
> 
> wc->memory_map = vmalloc(n_metadata_blocks << wc->block_size_bits);
> ...
> wc->metadata_sectors = n_metadata_blocks << (wc->block_size_bits - SECTOR_SHIFT);
> 
> So we are always sure that we can read/write wc->metadata_sectors safely. 
> 
> The problem is - what if bdev_logical_block_size is larger than 
> wc->metadata_sectors? Then, we would overread past the end of allocated 
> memory. The device wouldn't work anyway in this case, so perhaps a better 
> solution would be to reject this as an error in the constructor.
> 
> Mikulas

... or, we can use wc->block_size >> SECTOR_SHIFT. It is guaranteed that 
n_metadata_blocks has at least one block, so it won't over-read pass the 
end of the device.

The problem with bdev_logical_block_size is that it may change if the 
device under us is reloaded, so it is not safe to rely on it being stable.

Mikulas




More information about the dm-devel mailing list